├── .here ├── results ├── results_index.csv ├── tables │ └── readme.md ├── figures │ └── readme.md ├── other │ └── readme.md └── readme.md ├── data ├── data_index.csv ├── derived │ ├── readme.md │ ├── public │ │ └── readme.md │ ├── sample │ │ └── readme.md │ └── private │ │ └── readme.md ├── raw │ ├── public │ │ └── readme.md │ └── private │ │ └── readme.md ├── scratch │ └── readme.md ├── metadata │ ├── readme.md │ └── metadata_template.md └── readme.md ├── docs ├── manuscript │ └── readme.md ├── report │ ├── readme.md │ └── analysis_plan.md ├── presentation │ └── readme.md └── readme.md ├── procedure ├── code │ ├── readme.md │ ├── 00-Python-environment-setup.ipynb │ ├── 01-R-markdown.Rmd │ └── 01-Jupyter_notebook.ipynb ├── protocols │ └── readme.md ├── procedure_index.csv ├── environment │ └── readme.md └── readme.md ├── .gitattributes ├── template_reference.bib ├── CITATION.cff ├── LICENSE ├── Template_LICENSE ├── .gitignore ├── readme.md └── template_readme.md /.here: -------------------------------------------------------------------------------- 1 | 2 | -------------------------------------------------------------------------------- /results/results_index.csv: -------------------------------------------------------------------------------- 1 | path,name,description 2 | -------------------------------------------------------------------------------- /data/data_index.csv: -------------------------------------------------------------------------------- 1 | path,name,metadata,description 2 | -------------------------------------------------------------------------------- /results/tables/readme.md: -------------------------------------------------------------------------------- 1 | # Tables 2 | 3 | Store data table results of the analysis here. 4 | -------------------------------------------------------------------------------- /results/figures/readme.md: -------------------------------------------------------------------------------- 1 | # Figures 2 | 3 | Store graphic products (figures and maps) of the analysis here. 4 | -------------------------------------------------------------------------------- /docs/manuscript/readme.md: -------------------------------------------------------------------------------- 1 | # Manuscript 2 | 3 | Store compiled manuscript for submission and publication here. 4 | -------------------------------------------------------------------------------- /docs/report/readme.md: -------------------------------------------------------------------------------- 1 | # Preanalysis Plan and Research Reports 2 | 3 | Store the preanalysis plan and research report here. -------------------------------------------------------------------------------- /docs/presentation/readme.md: -------------------------------------------------------------------------------- 1 | # Presentation 2 | 3 | Store compiled presentations here. These may include presentations for conferences, public talks, lectures, etc. 4 | -------------------------------------------------------------------------------- /data/derived/readme.md: -------------------------------------------------------------------------------- 1 | # Derived Data 2 | 3 | Save cleaned, preprocessed data here. Data in this folder should be ready for analysis or contain the final results of analysis. -------------------------------------------------------------------------------- /data/derived/public/readme.md: -------------------------------------------------------------------------------- 1 | # Derived Public Data 2 | 3 | Store pre-processed data here if the data is suitable for public redistribution and if the data files are less than `100mb`. 4 | -------------------------------------------------------------------------------- /procedure/code/readme.md: -------------------------------------------------------------------------------- 1 | # Code 2 | Store computational code-based research procedures here. 3 | Document an index of files stored here in [procedures_index.csv](../procedure_index.csv) the root [procedure](../) folder. 4 | -------------------------------------------------------------------------------- /.gitattributes: -------------------------------------------------------------------------------- 1 | # Auto detect text files and perform LF normalization 2 | * text=auto 3 | 4 | # Count .Rmd files 5 | # https://github.com/github-linguist/linguist/blob/master/docs/overrides.md 6 | *.Rmd linguist-detectable 7 | -------------------------------------------------------------------------------- /results/other/readme.md: -------------------------------------------------------------------------------- 1 | # Other Research Outputs 2 | 3 | Store other research outputs here. These may include data tables for publication, non-graphical and non-map images (e.g. photographs), audio, video, animation, or other media. 4 | -------------------------------------------------------------------------------- /procedure/protocols/readme.md: -------------------------------------------------------------------------------- 1 | # Protocols 2 | Store any non-computational protocols and research procedures here. 3 | Document an index of files stored here in [procedures_index.csv](../procedure_index.csv) the root [procedure](../) folder. 4 | -------------------------------------------------------------------------------- /data/raw/public/readme.md: -------------------------------------------------------------------------------- 1 | # Raw Public Data 2 | 3 | Store raw data as collected or downloaded here if the data is suitable for public redistribution and if the data files are less than `100mb`. Include any required licenses and citations in the `data/metadata` folder. -------------------------------------------------------------------------------- /template_reference.bib: -------------------------------------------------------------------------------- 1 | @misc{Kedron_Holler_2023, 2 | title={Template for Reproducible and Replicable Research in Human-Environment and Geographical Sciences}, 3 | url={osf.io/w29mq}, 4 | DOI={10.17605/OSF.IO/W29MQ}, 5 | publisher={OSF}, 6 | author={Kedron, Peter and Holler, Joseph}, 7 | year={2023}, 8 | month={Jun} 9 | } 10 | -------------------------------------------------------------------------------- /data/derived/sample/readme.md: -------------------------------------------------------------------------------- 1 | # Sample Data 2 | 3 | Store deidentified and redistributable sample data here, if the data is required for analysis but is in a condition that cannot be publicly redistributed. Data in this folder may be randomized or simulated in order to provide some reproducibility in the cases where preprocessed data must remain private. 4 | -------------------------------------------------------------------------------- /procedure/procedure_index.csv: -------------------------------------------------------------------------------- 1 | path,name,purpose 2 | environment,readme.md,set up computational environment 3 | code,script1.R,download and preprocess data 4 | protocol,survey_irb.pdf,Institutional review board protocol for survey sampling and instrument 5 | protocol,mapworkshop.pdf,participatory mapping workshop protocol 6 | code,script2.R,run analysis 7 | code,script3.R,generate visualizations for results 8 | -------------------------------------------------------------------------------- /results/readme.md: -------------------------------------------------------------------------------- 1 | # Results 2 | 3 | Store final research outputs, e.g. figures, tables, or other media for publications and presentations. 4 | 5 | Complete the [results_metadata.csv](results_metadata.csv) file indexing each results file, including the fields: 6 | 7 | - `path`: the path from results folder, e.g., `figures`, `other`, or `tables` 8 | - `name`: the file name 9 | - `description`: very brief description or figure title 10 | -------------------------------------------------------------------------------- /data/scratch/readme.md: -------------------------------------------------------------------------------- 1 | # Scratch Data 2 | 3 | Store scratch data used in intermediary processing here. This is also a good place to store specialized databases that cannot be version controlled or redistributed. *This folder is ignored by Git versioning* with the exception of this `readme.md` file by the following lines in `.gitignore` 4 | 5 | ```gitignore 6 | # Ignore contents of scratch folder, with the exception of its readme file 7 | scratch/** 8 | !scratch/readme.md 9 | ``` 10 | -------------------------------------------------------------------------------- /docs/readme.md: -------------------------------------------------------------------------------- 1 | ## Reporting Reproducible Research 2 | Use the resources in this folder to develop a pre-registration plan, document a reproduction or replication, and organize your manuscript. 3 | 4 | Also, use this `docs` folder to store websites related to the research project. 5 | In some cases, this `readme.md` document may interfere with GitHub websites. 6 | If this is the case and you have already stored website files in this directory, this `readme.md` file may be renamed or deleted. 7 | -------------------------------------------------------------------------------- /data/metadata/readme.md: -------------------------------------------------------------------------------- 1 | # Metadata 2 | 3 | Organize and store documentation and metadata in this folder. 4 | Metadata files should be listed for relevant data sources in [data/data_metadata.csv](../data_metadata.csv) 5 | 6 | Best practices for geographic metadata is to create XML compliant with the ISO 191** series of standards for geospatial metadata. 7 | A more human- and GitHub-readable markdown form of the ISO 191** geospatial metadata is provided in [metadata_template.md](metadata_template.md) 8 | -------------------------------------------------------------------------------- /data/derived/private/readme.md: -------------------------------------------------------------------------------- 1 | # Derived Private Data 2 | 3 | Store preprocessed data here, if the data is required for analysis but is in a condition that cannot be publicly redistributed. For example, data versioning and sharing my be restricted because of large file sizes, licensing, ethics, privacy, or confidentiality. 4 | 5 | If data for analysis must be guarded in this way, **the authors are encouraged to provide a sample deidentified dataset** in the `data/derived/sample` directory. 6 | 7 | *This folder is ignored by Git versioning* with the exception of this `readme.md` file by the following lines in `.gitignore` 8 | 9 | ```gitignore 10 | # Ignore contents of private folder, with the exception of its readme file 11 | private/** 12 | !private/readme.md 13 | ``` 14 | -------------------------------------------------------------------------------- /procedure/environment/readme.md: -------------------------------------------------------------------------------- 1 | # Environment 2 | 3 | Store detailed information about the hardware and software environment requirements for procedures and code here. You may also document a recipe or container of the computational environment here. 4 | 5 | This directory is specifically for hardware and software environments. 6 | Contextual factors or confounds of human subjects research or field research should be communicated in protocol documents and stored in the `protocols` directory. 7 | 8 | For users of R, our template code, at a minimum, saves environment information using the `sessionInfo()` function. 9 | 10 | ## Set up instructions 11 | 12 | Researchers are encouraged to write instructions on setting up or accessing the computational environment for their study here. 13 | -------------------------------------------------------------------------------- /data/readme.md: -------------------------------------------------------------------------------- 1 | # Data 2 | 3 | Store all of your research data in subdirectories here. 4 | 5 | Complete the [data_metadata.csv](data_metadata.csv) file indexing each data file, including the fields: 6 | 7 | - `path`: the path to the data folder, likely one of: `raw\private`, `raw\public`, `derived\private` or `derived\public` 8 | - `name`: the file name, including extension 9 | - `metadata`: list of metadata files for this data source, stored in the `data\metadata` folder. These may include ISO-191** or FGDC standard `XML` files, data dictionaries, licenses or attributions, user guides, webpage printouts, etc. 10 | - `description`: *very* brief description of the dataset. If the data is **simulated**, **randomized**, or represents only a limited **sample** of the full research dataset, you should note those limitations here. 11 | -------------------------------------------------------------------------------- /data/raw/private/readme.md: -------------------------------------------------------------------------------- 1 | # Raw Private Data 2 | 3 | Store raw data in this folder as it is collected or downloaded if the data cannot be publicly redistributed. For example, data versioning and sharing my be restricted because of large file sizes, licensing, ethics, privacy, or confidentiality. Best practices are to include code to automate the process of downloading or simulating raw private data in the first step of the methods, or to include instructions here for accessing any private or restricted-access data. 4 | 5 | ## Instructions for accessing data 6 | Include instructions here, e.g., instructions to run the first code script in the `procedure/code` folder, or instructions on how to create or download the data. 7 | 8 | *This folder is ignored by Git versioning* with the exception of this `readme.md` file by the following lines in `.gitignore` 9 | 10 | ```gitignore 11 | # Ignore contents of private folder, with the exception of its readme file 12 | private/** 13 | !private/readme.md 14 | ``` 15 | -------------------------------------------------------------------------------- /CITATION.cff: -------------------------------------------------------------------------------- 1 | cff-version: 1.2.0 2 | message: "If you use this template, please cite it as below." 3 | authors: 4 | - family-names: "Kedron" 5 | given-names: "Peter" 6 | orcid: "https://orcid.org/0000-0002-1093-3416" 7 | - family-names: "Holler" 8 | given-names: "Joseph" 9 | orcid: "https://orcid.org/0000-0002-2381-2699" 10 | title: "Template for Reproducible and Replicable Research in Human-Environment and Geographical Sciences" 11 | version: 0.1 12 | doi: 10.17605/OSF.IO/5FGMC 13 | date-released: 2021-08-22 14 | url: "https://github.com/HEGSRR/HEGSRR-Template" 15 | license: BSD 3-Clause 16 | preferred-citation: 17 | type: generic 18 | authors: 19 | - family-names: "Kedron" 20 | given-names: "Peter" 21 | orcid: "https://orcid.org/0000-0002-1093-3416" 22 | - family-names: "Holler" 23 | given-names: "Joseph" 24 | orcid: "https://orcid.org/0000-0002-2381-2699" 25 | title: "Template for Reproducible and Replicable Research in Human-Environment and Geographical Sciences" 26 | doi: 10.17605/OSF.IO/W29MQ 27 | year: 2021 28 | url: "https://github.com/HEGSRR/HEGSRR-Template" 29 | -------------------------------------------------------------------------------- /procedure/readme.md: -------------------------------------------------------------------------------- 1 | # Procedure 2 | Catalog all procedures used here in an *ordered* table documenting any code or other research procedure/protocol documents. Provide a brief description of the purpose of each procedure and piece of code. 3 | 4 | Catalog the files in [procedure_index.csv](procedure_index.csv) 5 | 6 | See the example table below, and modify the table to suit your research design. 7 | 8 | - `path`: the path to the file or directory, usually one of `code` for software code and scripts, `environment` for the hardware/software computational environment, or `protocol` for non-code protocols like 9 | - `name`: the file name, including extension 10 | - `purpose`: *very* brief description of the purpose of the file 11 | 12 | The *sequence* of procedures to be followed is implied by the *order* in the table and should be explicit in the pre-analysis plan and post-analysis report. 13 | 14 | path | name | purpose | 15 | -- | -- | -- | 16 | environment | readme.md | set up computational environment | 17 | code | script1.R | download and preprocess data | 18 | protocol | survey_irb.pdf | Institutional review board protocol for survey sampling and instrument | 19 | protocol | mapworkshop.pdf | participatory mapping workshop protocol | 20 | code | script2.R | run analysis | 21 | code | script3.R | generate visualizations for results | 22 | -------------------------------------------------------------------------------- /LICENSE: -------------------------------------------------------------------------------- 1 | BSD 3-Clause License 2 | 3 | Copyright (c) 2021, Peter Kedron and Joseph Holler 4 | All rights reserved. 5 | 6 | Redistribution and use in source and binary forms, with or without 7 | modification, are permitted provided that the following conditions are met: 8 | 9 | 1. Redistributions of source code must retain the above copyright notice, this 10 | list of conditions and the following disclaimer. 11 | 12 | 2. Redistributions in binary form must reproduce the above copyright notice, 13 | this list of conditions and the following disclaimer in the documentation 14 | and/or other materials provided with the distribution. 15 | 16 | 3. Neither the name of the copyright holder nor the names of its 17 | contributors may be used to endorse or promote products derived from 18 | this software without specific prior written permission. 19 | 20 | THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" 21 | AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE 22 | IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE 23 | DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE 24 | FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL 25 | DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR 26 | SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER 27 | CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, 28 | OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE 29 | OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. 30 | -------------------------------------------------------------------------------- /Template_LICENSE: -------------------------------------------------------------------------------- 1 | BSD 3-Clause License 2 | 3 | Copyright (c) 2021, Peter Kedron and Joseph Holler 4 | All rights reserved. 5 | 6 | Redistribution and use in source and binary forms, with or without 7 | modification, are permitted provided that the following conditions are met: 8 | 9 | 1. Redistributions of source code must retain the above copyright notice, this 10 | list of conditions and the following disclaimer. 11 | 12 | 2. Redistributions in binary form must reproduce the above copyright notice, 13 | this list of conditions and the following disclaimer in the documentation 14 | and/or other materials provided with the distribution. 15 | 16 | 3. Neither the name of the copyright holder nor the names of its 17 | contributors may be used to endorse or promote products derived from 18 | this software without specific prior written permission. 19 | 20 | THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" 21 | AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE 22 | IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE 23 | DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE 24 | FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL 25 | DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR 26 | SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER 27 | CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, 28 | OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE 29 | OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. 30 | -------------------------------------------------------------------------------- /.gitignore: -------------------------------------------------------------------------------- 1 | # Ignore contents of derived private folder, with the exception of its readme file 2 | data/derived/private/** 3 | !readme.md 4 | 5 | # Ignore contents of raw private folder, with the exception of its readme file 6 | data/raw/private/** 7 | !readme.md 8 | 9 | # Ignore contents of scratch folder, with the exception of its readme file 10 | data/scratch/** 11 | !readme.md 12 | 13 | # Ignore Microsoft Office system files 14 | *.tmp 15 | ~$*.doc* 16 | Backup of *.doc* 17 | ~$*.ppt* 18 | ~$*.xls* 19 | *.xlt 20 | 21 | # Ignore common R files 22 | .Rproj.user 23 | .Rhistory 24 | .RData 25 | .RDataTmp* 26 | .Rapp.history 27 | 28 | # Jupyter Notebook 29 | .ipynb_checkpoints 30 | 31 | # IPython 32 | profile_default/ 33 | ipython_config.py 34 | 35 | # Byte-compiled / optimized / DLL files 36 | # https://github.com/github/gitignore/blob/main/Python.gitignore 37 | __pycache__/ 38 | *.py[cod] 39 | *$py.class 40 | 41 | # C extensions 42 | *.so 43 | 44 | # Python Environments 45 | .env 46 | .venv 47 | env/ 48 | venv/ 49 | ENV/ 50 | env.bak/ 51 | venv.bak/ 52 | pipfile* 53 | 54 | # General MacOS files 55 | .DS_Store 56 | .AppleDouble 57 | .LSOverride 58 | 59 | # Thumbnails 60 | ._* 61 | 62 | # Files that might appear in the root of a volume 63 | .DocumentRevisions-V100 64 | .fseventsd 65 | .Spotlight-V100 66 | .TemporaryItems 67 | .Trashes 68 | .VolumeIcon.icns 69 | .com.apple.timemachine.donotpresent 70 | 71 | # Directories potentially created on remote AFP share 72 | .AppleDB 73 | .AppleDesktop 74 | Network Trash Folder 75 | Temporary Items 76 | .apdisk 77 | 78 | # Windows thumbnail cache files 79 | Thumbs.db 80 | Thumbs.db:encryptable 81 | ehthumbs.db 82 | ehthumbs_vista.db 83 | 84 | # Dump file 85 | *.stackdump 86 | 87 | # Folder config file 88 | [Dd]esktop.ini 89 | 90 | # Recycle Bin used on file shares 91 | $RECYCLE.BIN/ 92 | 93 | # Windows Installer files 94 | *.cab 95 | *.msi 96 | *.msix 97 | *.msm 98 | *.msp 99 | 100 | # Windows shortcuts 101 | *.lnk 102 | 103 | # Geopackage temporary files 104 | *.gpkg-shm 105 | *.gpkg-wal 106 | 107 | # Icloud 108 | *.icloud 109 | 110 | # Jupyter cache folder 111 | cache/** 112 | -------------------------------------------------------------------------------- /data/metadata/metadata_template.md: -------------------------------------------------------------------------------- 1 | - `Title`: Title of data source 2 | - `Abstract`: Brief description of the data source 3 | - `Spatial Coverage`: Specify the geographic extent of your study. This may be a place name and link to a feature in a gazetteer like GeoNames or OpenStreetMap, or a well known text (WKT) representation of a bounding box. 4 | - `Spatial Resolution`: Specify the spatial resolution as a scale factor, description of the level of detail of each unit of observation (including administrative level of administrative areas), and/or or distance of a raster GRID size 5 | - `Spatial Representation Type`: Specify the model of spatial data representation, e.g. one of `vector`, `grid`, `textTable`, `tin` (triangulated irregular network), etc. If the type is `vector`, also specify the geometry type as in the OGC Simple Feature Access standard (https://www.ogc.org/publications/standard/sfa/) , e.g. `POINT`, `LINESTRING`, `MULTIPOLYGON`, etc. 6 | - `Spatial Reference System`: Specify the geographic or projected coordinate system for the study 7 | - `Temporal Coverage`: Specify the temporal extent of your study---i.e. the range of time represented by the data observations. 8 | - `Temporal Resolution`: Specify the temporal resolution of your study---i.e. the duration of time for which each observation represents or the revisit period for repeated observations 9 | - `Lineage`: Describe and/or cite data sources and/or methodological steps taken or planned to create this data source, e.g.: 10 | - sampling scheme, including spatial sampling 11 | - target sample size and method for determining sample size 12 | - stopping criteria for data collection and sampling (e.g. sample size, time elapsed) 13 | - de-identification / anonymization 14 | - experimental manipulation 15 | - `Distribution`: Describe who will make the data available and how? 16 | - `Constraints`: Legal constraints for *access* or *use* to protect *privacy* or *intellectual property rights* 17 | - `Data Quality`: State any planned quality assessment 18 | - `Variables`: For each variable, enter the following information. If you have two or more variables per data source, you may want to present this information in table form (shown below) 19 | - `Label`: variable name as used in the data or code 20 | - `Alias`: intuitive natural language name 21 | - `Definition`: Short description or definition of the variable. Include measurement units in description. 22 | - `Type`: data type, e.g. character string, integer, real 23 | - `Accuracy`: e.g. uncertainty of measurements 24 | - `Domain`: Expected range of Maximum and Minimum of numerical data, or codes or categories of nominal data, or reference to a standard codebook 25 | - `Missing Data Value(s)`: Values used to represent missing data and frequency of missing data observations 26 | - `Missing Data Frequency`: Frequency of missing data observations: not yet known for data to be collected 27 | 28 | | Label | Alias | Definition | Type | Accuracy | Domain | Missing Data Value(s) | Missing Data Frequency | 29 | | :--: | :--: | :--: | :--: | :--: | :--: | :--: | :--: | 30 | | variable1 | ... | ... | ... | ... | ... | ... | ... | 31 | | variable2 | ... | ... | ... | ... | ... | ... | ... | 32 | -------------------------------------------------------------------------------- /readme.md: -------------------------------------------------------------------------------- 1 | If you use this template for research, please [cite it](template_reference.bib): 2 | > Kedron, P., & Holler, J. (2023). Template for Reproducible and Replicable Research in Human-Environment and Geographical Sciences. https://doi.org/10.17605/OSF.IO/W29MQ 3 | 4 | # Title of Study 5 | 6 | ## Contributors 7 | 8 | - First Name Last Name\*, email address, @githubname, ORCID link, affiliated institution(s) 9 | - First Name Last Name, email address, @githubname, ORCID link, affiliated institution(s) 10 | 11 | \* Corresponding author and creator 12 | 13 | ## Abstract 14 | 15 | Write a brief abstract about your research project. 16 | If the project is a reproduction or replication study, include the full citation with a statement 17 | This study is a *reproduction/replication* of: 18 | 19 | > citation to prior study 20 | 21 | A graphical abstract of the study could also be included as an image here. 22 | 23 | ## Study Metadata 24 | 25 | - `Key words`: Comma-separated list of keywords (tags) for searchability. Geographers often use one or two keywords each for: theory, geographic context, and methods. 26 | - `Subject`: select from the [BePress Taxonomy](http://digitalcommons.bepress.com/cgi/viewcontent.cgi?article=1008&context=reference) 27 | - `Date created`: date when project was started 28 | - `Date modified`: date of most recent revision 29 | - `Spatial Coverage`: Specify the geographic extent of your study. This may be a place name and link to a feature in a gazetteer like GeoNames or OpenStreetMap, or a well known text (WKT) representation of a bounding box. 30 | - `Spatial Resolution`: Specify the spatial resolution as a scale factor, description of the level of detail of each unit of observation (including administrative level of administrative areas), and/or or distance of a raster GRID size 31 | - `Spatial Reference System`: Specify the geographic or projected coordinate system for the study 32 | - `Temporal Coverage`: Specify the temporal extent of your study---i.e. the range of time represented by the data observations. 33 | - `Temporal Resolution`: Specify the temporal resolution of your study---i.e. the duration of time for which each observation represents or the revisit period for repeated observations 34 | - `Funding Name`: name of funding for the project 35 | - `Funding Title`: title of project grant 36 | - `Award info URI`: web address for award information 37 | - `Award number`: award number 38 | 39 | ## Related to 40 | 41 | - `OSF Project`: 42 | - `Pre-analysis Registration`: 43 | - `Post-analysis Report Registration`: 44 | - `Preprint`: 45 | - `Conference Presentation`: 46 | - `Publication`: 47 | - `Prior Study`: 48 | - `...`: 49 | 50 | ## Metadata for access 51 | 52 | - `Rights`: [LICENSE](LICENSE): BSD 3-Clause "New" or "Revised" 53 | - `Resource type`: Collection 54 | - `Resource language`: English 55 | - `Conforms to`: Template for Reproducible and Replicable Research in Human-Environment and Geographical Sciences version 1.0, DOI:[10.17605/OSF.IO/W29MQ](https://doi.org/10.17605/OSF.IO/W29MQ) 56 | 57 | ## Compendium structure and contents 58 | 59 | This research compendium is structured with four main directories: 60 | 61 | - `data`: contains subdirectories for `raw` data and `derived` data. 62 | - `docs`: contains subdirectories for `manuscript`, `presentation`, and `report` 63 | - `procedure`: contains subdirectories for `code` or software scripts, information about the computational `environment` in which the research was conducted, and non-code research `protocols` 64 | - `results`: contains subdirectories for `figures`, formatted data `tables`, or `other` formats of research results. 65 | 66 | The data, procedures, and results of this repository are outlined in three tables: 67 | - Data: [data/data_index.csv](data/data_index.csv) 68 | - Procedures: [procedure/procedure_index.csv](procedure/procedure_index.csv) 69 | - Results: [results/results_index.csv](results/results_index.csv) 70 | 71 | Important local **documents** include: 72 | - Pre-analysis plan: [docs/report/preanalysis.pdf](docs/report/preanalysis.pdf) 73 | - Study report: [docs/report/report.pdf](docs/report/report.pdf) 74 | - Manuscript: [docs/manuscript/manuscript.pdf](docs/manuscript/manuscript.pdf) 75 | - Presentation: [docs/presentation/presentation.pdf](docs/presentation/presentation.pdf) 76 | 77 | #### Compendium reference 78 | 79 | The [template_readme.md](template_readme.md) file contains more information on the design of this template and references used in the design. 80 | The [Template_LICENSE](Template_LICENSE) file provides the BSD 3-Clause license for using this template. 81 | To cite the template, please use [template_reference.bib](template_reference.bib) or: 82 | > Kedron, P., & Holler, J. (2023). Template for Reproducible and Replicable Research in Human-Environment and Geographical Sciences. https://doi.org/10.17605/OSF.IO/W29MQ 83 | -------------------------------------------------------------------------------- /template_readme.md: -------------------------------------------------------------------------------- 1 | # Template for Reproducible and Replicable Research in Human-Environment and Geographical Sciences 2 | 3 | This template Git repository contains a folder structure, template documents, and best practice suggestions for conducting geographic research with a reproducible research compendium. 4 | The main [readme.md](readme.md) contains information about the research study. 5 | 6 | The [Template_LICENSE](Template_LICENSE) file provides the BSD 3-Clause license for using this template. 7 | To cite the template, please use [template_reference.bib](template_reference.bib) or: 8 | > Kedron, P., & Holler, J. (2023). Template for Reproducible and Replicable Research in Human-Environment and Geographical Sciences. https://doi.org/10.17605/OSF.IO/W29MQ 9 | 10 | The folder structure presented here can be used to: 11 | 12 | 1. pre-register, document, and share original research in a reproducible manner, or 13 | 2. document and share a reproduction and/or replication of original research. 14 | 15 | An overview of the folder structure of this repository is provided below. The `readme.md` file contained in each folder provides details about the purpose of that folder and suggestions on its use. 16 | The authors should maintain the [data/data_metadata.csv](data/data_metadata.csv) file to list all raw and derived data, the [procedure/procedure_metadata.csv](procedure/procedure_metadata.csv) file with an ordered list of all procedures and/or code, and the [results/results_metadata.csv](results/results_metadata.csv) folder with a list of all figures, tables, and other media produced by the research. 17 | The `docs/report/` folder contains templates to facilitate 1) the pre-registration of research plans, and 2) report the complete details of original, reproduction, or replication studies. 18 | 19 | ## Repository Overview 20 | 21 | 22 | |- docs/ # study documentation 23 | | +- report/ # reproduction plan, reproduction report 24 | | +- manuscript/ # manuscript components 25 | | +- presentation/ # presentation materials 26 | | 27 | |- data # study data 28 | | - raw/ # raw data, should not be altered 29 | | +- public/ # public data with version control 30 | | +- private/ # private data with no version control 31 | | +- derived/ # derived data 32 | | +- public/ # public data with version control 33 | | +- private/ # private data with no version control 34 | | +- scratch/ # temporary files that can be safely deleted or lost 35 | | +- metadata/ # documentation of metadata 36 | | 37 | |-procedure 38 | | +- environment/ # details of the computational environment 39 | | +- code/ # any programmatic code, clearly named and commented 40 | | +- protocols/ # any non-computational protocols 41 | | 42 | |- results # all output from workflows and analyses 43 | | +- figures/ # graphs, likely designated for manuscript 44 | | +- tables/ # tables, likely designated for manuscript 45 | | +- other/ # diagrams, images, and other non-graph graphics 46 | | 47 | |- readme.md # description of the study 48 | |- template_readme.md # description of repository design and references 49 | |- LICENSE # intellectual property license, ideally open source 50 | |- Template_LICENSE # BSD 3-Clause license for this template 51 | |- CITATION.cff # preferred citation for the research 52 | |- .gitignore # files to ignore from git tracking 53 | 54 | ## Reproducible Research Practices 55 | 56 | Every research project is different. This repository is designed to serve as a flexible guide capable of structuring work completed throughout the lifecycle of different types of research projects. 57 | No matter the project type, a few key suggested practices when using this repository include: 58 | 59 | - Register your pre-analysis plan with a service like Open Science Foundation at [https://osf.io/](https://osf.io/) or an equivalent and add crosslinks between your research repository and the pre-registered plan. 60 | - Keeping original, raw data in the `data/raw` folder. Do not alter that file during data analysis. 61 | - Keeping data derived from the raw data (e.g. subsets) separate from the raw data in the `data/derived` folder. 62 | - Keeping Exploratory/experimental outputs in the `data/scratch` folder. *Files in this folder should be able to be deleted without negatively impacting the project*. 63 | - Limiting manual changes to data. *Conduct as much data processing and analysis as possible with code*. 64 | - Maintain well-commented and human-readable code, e.g. following the [tidyverse style guide for R](https://style.tidyverse.org/) or the [PEP 8 Style Guide for Python](https://www.python.org/dev/peps/pep-0008/) 65 | - Creating a top-level `Makefile` or Rmarkdown file that documents computational work in executable form and/or clear comments and instructions in the header of each procedure and code file and good descriptions in the `procedure_metadata.csv` 66 | - Document and/or package the computational environment in the `procedure\environment` folder. 67 | 68 | ## References 69 | 70 | The structure of this repository closely follows the excellent [rr-init](https://github.com/Reproducible-Science-Curriculum/rr-init) repository, which in turn follows Nobel [(2009)](https://journals.plos.org/ploscompbiol/article?id=10.1371/journal.pcbi.1000424). 71 | We have also incorporated structural ideas from [Gandrud (2015)](http://christophergandrud.github.io/RepResR-RStudio/) and Camerer et al. ([2016](https://osf.io/pfdyw/), [2018](https://osf.io/bzm54/)). 72 | 73 | ### Pre-registration Template 74 | 75 | A pre-registration template for studies involving geographic analyses. 76 | This template is modelled on similar templates developed by the [Open Science Framework (OSF)](http://osf.io/x5w7h), [AsPredicted](https://osf.io/fnsb6/), the [prereg package](https://github.com/crsh/prereg), and Van den Akker et al. [(2019)](http://doi:10.31234/osf.io/hvfmr). 77 | The OSF template is our most direct source. 78 | This template can be used to transparently plan and pre-register original geographic research. [Cite the OSF preregistration template and the licenses](https://osf.io/preprints/metaarxiv/epgjd/) 79 | 80 | ### Reproduction and Replication Template 81 | 82 | A template to facilitate the documentation and reporting of reproductions and replications of original geographic research. 83 | Stylistically, this template follows the [ReScience article template](https://github.com/ReScience/template), but also draws inspiration from Camerer et al. ([2016](https://osf.io/pfdyw/), [2018](https://osf.io/bzm54/)). 84 | Following Camerer et al., we suggest using the template to first document and share the procedures of the planned reproduction/replication before re-analysis begins. 85 | After the reproduction/replication is complete, we suggest then completing the template and sharing the report alongside the originally published planning document. 86 | 87 | Other examples of registered replication reports are available from the [Reproducibilty Project](https://osf.io/s3hfr/), [registered replication projects](https://www.psychologicalscience.org/publications/replication/ongoing-projects) published by the Association of Psychological Science, and ReScience[C](http://rescience.github.io/) and [X](http://rescience.org/x). 88 | Users may also be interested in the [Transparency and Openess Promotion (TOP) Guidelines](https://www.cos.io/initiatives/top-guidelines), the [replication policy](https://royalsocietypublishing.org/rsos/replication-studies) of the Royal Society, or this example web-based [reproducibility workflow](https://odmap.wsl.ch/) for species distribution models which the authors converted into a web-based report generator. 89 | -------------------------------------------------------------------------------- /docs/report/analysis_plan.md: -------------------------------------------------------------------------------- 1 | # Title of Study 2 | 3 | ### Authors 4 | 5 | - First Name Last Name\*, email address, @githubname, ORCID link, affiliated institution(s) 6 | - First Name Last Name, email address, @githubname, ORCID link, affiliated institution(s) 7 | 8 | \* Corresponding author and creator 9 | 10 | ### Abstract 11 | 12 | Write a brief abstract about your research project. 13 | 14 | If the project is a reproduction or replication study, include a declaration of the study type with a full reference to the original study. 15 | For example: 16 | 17 | This study is a *replication* of: 18 | 19 | > citation to prior study 20 | 21 | A graphical abstract of the study could also be included as an image here. 22 | 23 | ### Study metadata 24 | 25 | - `Key words`: Comma-separated list of keywords (tags) for searchability. Geographers often use one or two keywords each for: theory, geographic context, and methods. 26 | - `Subject`: select from the [BePress Taxonomy](http://digitalcommons.bepress.com/cgi/viewcontent.cgi?article=1008&context=reference) 27 | - `Date created`: date when project was started 28 | - `Date modified`: date of most recent revision 29 | - `Spatial Coverage`: Specify the geographic extent of your study. This may be a place name and link to a feature in a gazetteer like GeoNames or OpenStreetMap, or a well known text (WKT) representation of a bounding box. 30 | - `Spatial Resolution`: Specify the spatial resolution as a scale factor, description of the level of detail of each unit of observation (including administrative level of administrative areas), and/or or distance of a raster GRID size 31 | - `Spatial Reference System`: Specify the geographic or projected coordinate system for the study, e.g. EPSG:4326 32 | - `Temporal Coverage`: Specify the temporal extent of your study---i.e. the range of time represented by the data observations. 33 | - `Temporal Resolution`: Specify the temporal resolution of your study---i.e. the duration of time for which each observation represents or the revisit period for repeated observations 34 | - `Funding Name`: name of funding for the project 35 | - `Funding Title`: title of project grant 36 | - `Award info URI`: web address for award information 37 | - `Award number`: award number 38 | 39 | #### Original study spatio-temporal metadata 40 | 41 | - `Spatial Coverage`: extent of original study 42 | - `Spatial Resolution`: resolution of original study 43 | - `Spatial Reference System`: spatial reference system of original study 44 | - `Temporal Coverage`: temporal extent of original study 45 | - `Temporal Resolution`: temporal resolution of original study 46 | 47 | ## Study design 48 | 49 | Describe how the study relates to prior literature, e.g. is it a **original study**, **meta-analysis study**, **reproduction study**, **reanalysis study**, or **replication study**? 50 | 51 | Also describe the original study archetype, e.g. is it **observational**, **experimental**, **quasi-experimental**, or **exploratory**? 52 | 53 | Enumerate specific **hypotheses** to be tested or **research questions** to be investigated here, and specify the type of method, statistical test or model to be used on the hypothesis or question. 54 | 55 | ## Materials and procedure 56 | 57 | ### Computational environment 58 | 59 | Define the hardware, operating system, and software requirements for the research. 60 | Include citations to important software projects, plugins or packages and their versions. 61 | 62 | ### Data and variables 63 | 64 | Describe the **data sources** and **variables** to be used. 65 | Data sources may include plans for observing and recording **primary data** or descriptions of **secondary data**. 66 | For secondary data sources with numerous variables, the analysis plan authors may focus on documenting only the variables intended for use in the study. 67 | 68 | Primary data sources for the study are to include ... . 69 | Secondary data sources for the study are to include ... . 70 | 71 | Each of the next subsections describes one data source. 72 | Complete standardized metadata for each data source. Either programmatically include standard metadata files into the analysis report or copy and fill out one metadata form (shown below for primary data source 1) for each data source. 73 | 74 | #### Primary data source1 name 75 | 76 | - `Abstract`: Brief description of the data source 77 | - `Spatial Coverage`: Specify the geographic extent of your study. This may be a place name and link to a feature in a gazetteer like GeoNames or OpenStreetMap, or a well known text (WKT) representation of a bounding box. 78 | - `Spatial Resolution`: Specify the spatial resolution as a scale factor, description of the level of detail of each unit of observation (including administrative level of administrative areas), and/or or distance of a raster GRID size 79 | - `Spatial Representation Type`: Specify the model of spatial data representation, e.g. one of `vector`, `grid`, `textTable`, `tin` (triangulated irregular network), etc. If the type is `vector`, also specify the geometry type as in the OGC Simple Feature Access standard (https://www.ogc.org/publications/standard/sfa/) , e.g. `POINT`, `LINESTRING`, `MULTIPOLYGON`, etc. 80 | - `Spatial Reference System`: Specify the geographic or projected coordinate system for the study 81 | - `Temporal Coverage`: Specify the temporal extent of your study---i.e. the range of time represented by the data observations. 82 | - `Temporal Resolution`: Specify the temporal resolution of your study---i.e. the duration of time for which each observation represents or the revisit period for repeated observations 83 | - `Lineage`: Describe and/or cite data sources and/or methodological steps planned to create this data source. 84 | - sampling scheme, including spatial sampling 85 | - target sample size and method for determining sample size 86 | - stopping criteria for data collection and sampling (e.g. sample size, time elapsed) 87 | - de-identification / anonymization 88 | - experimental manipulation 89 | - `Distribution`: Describe who will make the data available and how? 90 | - `Constraints`: Legal constraints for *access* or *use* to protect *privacy* or *intellectual property rights* 91 | - `Data Quality`: State any planned quality assessment 92 | - `Variables`: For each variable, enter the following information. If you have two or more variables per data source, you may want to present this information in table form (shown below) 93 | - `Label`: variable name as used in the data or code 94 | - `Alias`: intuitive natural language name 95 | - `Definition`: Short description or definition of the variable. Include measurement units in description. 96 | - `Type`: data type, e.g. character string, integer, real 97 | - `Accuracy`: e.g. uncertainty of measurements 98 | - `Domain`: Expected range of Maximum and Minimum of numerical data, or codes or categories of nominal data, or reference to a standard codebook 99 | - `Missing Data Value(s)`: Values used to represent missing data and frequency of missing data observations 100 | - `Missing Data Frequency`: Frequency of missing data observations: not yet known for data to be collected 101 | 102 | | Label | Alias | Definition | Type | Accuracy | Domain | Missing Data Value(s) | Missing Data Frequency | 103 | | :--: | :--: | :--: | :--: | :--: | :--: | :--: | :--: | 104 | | variable1 | ... | ... | ... | ... | ... | ... | ... | 105 | | variable2 | ... | ... | ... | ... | ... | ... | ... | 106 | 107 | #### Primary data source2 name 108 | 109 | ... same form as above... 110 | 111 | #### Secondary data source1 name 112 | 113 | ... same form as above... 114 | 115 | #### Secondary data source2 name 116 | 117 | ... same form as above... 118 | 119 | ### Prior observations 120 | 121 | Prior experience with the study area, prior data collection, or prior observation of the data can compromise the validity of a study, e.g. through p-hacking. 122 | Therefore, disclose any prior experience or observations at the time of study pre-registration here, with example text below: 123 | 124 | At the time of this study pre-registration, the authors had _____ prior knowledge of the geography of the study region with regards to the ____ phenomena to be studied. 125 | This study is related to ____ prior studies by the authors 126 | 127 | For each primary data source, declare the extent to which authors had already engaged with the data: 128 | 129 | - [ ] no data collection has started 130 | - [ ] pilot test data has been collected 131 | - [ ] data collection is in progress and data has not been observed 132 | - [ ] data collection is in progress and __% of data has been observed 133 | - [ ] data collection is complete and data has been observed. Explain how authors have already manipulated / explored the data. 134 | 135 | For each secondary source, declare the extent to which authors had already engaged with the data: 136 | 137 | - [ ] data is not available yet 138 | - [ ] data is available, but only metadata has been observed 139 | - [ ] metadata and descriptive statistics have been observed 140 | - [ ] metadata and a pilot test subset or sample of the full dataset have been observed 141 | - [ ] the full dataset has been observed. Explain how authors have already manipulated / explored the data. 142 | 143 | If pilot test data has been collected or acquired, describe how the researchers observed and analyzed the pilot test, and the extent to which the pilot test influenced the research design. 144 | 145 | ### Bias and threats to validity 146 | 147 | Given the research design and primary data to be collected and/or secondary data to be used, discuss common threats to validity and the approach to mitigating those threats, with an emphasis on geographic threats to validity. 148 | 149 | These include: 150 | - uneven primary data collection due to geographic inaccessibility or other constraints 151 | - multiple hypothesis testing 152 | - edge or boundary effects 153 | - the modifiable areal unit problem 154 | - nonstationarity 155 | - spatial dependence or autocorrelation 156 | - temporal dependence or autocorrelation 157 | - spatial scale dependency 158 | - spatial anisotropies 159 | - confusion of spatial and a-spatial causation 160 | - ecological fallacy 161 | - uncertainty e.g. from spatial disaggregation, anonymization, differential privacy 162 | 163 | ### Data transformations 164 | 165 | Describe all data transformations planned to prepare data sources for analysis. 166 | This section should explain with the fullest detail possible how to transform data from the **raw** state at the time of acquisition or observation, to the pre-processed **derived** state ready for the main analysis. 167 | Including steps to check and mitigate sources of **bias** and **threats to validity**. 168 | The method may anticipate **contingencies**, e.g. tests for normality and alternative decisions to make based on the results of the test. 169 | More specifically, all the **geographic** and **variable** transformations required to prepare input data as described in the data and variables section above to match the study's spatio-temporal characteristics as described in the study metadata and study design sections. 170 | Visual workflow diagrams may help communicate the methodology in this section. 171 | 172 | Examples of **geographic** transformations include coordinate system transformations, aggregation, disaggregation, spatial interpolation, distance calculations, zonal statistics, etc. 173 | 174 | Examples of **variable** transformations include standardization, normalization, constructed variables, imputation, classification, etc. 175 | 176 | Be sure to include any steps planned to **exclude** observations with *missing* or *outlier* data, to **group** observations by *attribute* or *geographic* criteria, or to **impute** missing data or apply spatial or temporal **interpolation**. 177 | 178 | ### Analysis 179 | 180 | Describe the methods of analysis that will directly test the hypotheses or provide results to answer the research questions. 181 | This section should explicitly define any spatial / statistical *models* and their *parameters*, including *grouping* criteria, *weighting* criteria, and *significance thresholds*. 182 | Also explain any follow-up analyses or validations. 183 | 184 | ## Results 185 | 186 | Describe how results are to be presented. 187 | 188 | ## Discussion 189 | 190 | Describe how the results are to be interpreted *vis a vis* each hypothesis or research question. 191 | 192 | ## Integrity Statement 193 | 194 | Include an integrity statement - The authors of this preregistration state that they completed this preregistration to the best of their knowledge and that no other preregistration exists pertaining to the same hypotheses and research. 195 | If a prior registration *does* exist, explain the rationale for revising the registration here. 196 | 197 | ## Acknowledgements 198 | 199 | - `Funding Name`: name of funding for the project 200 | - `Funding Title`: title of project grant 201 | - `Award info URI`: web address for award information 202 | - `Award number`: award number 203 | 204 | This report is based upon the template for Reproducible and Replicable Research in Human-Environment and Geographical Sciences, DOI:[10.17605/OSF.IO/W29MQ](https://doi.org/10.17605/OSF.IO/W29MQ) 205 | 206 | ## References 207 | -------------------------------------------------------------------------------- /procedure/code/00-Python-environment-setup.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "markdown", 5 | "metadata": { 6 | "id": "lpUbJuwsgQJu" 7 | }, 8 | "source": [ 9 | "# Computational environment\n", 10 | "\n", 11 | "Note: Lines starting with `!` run in your **terminal**, not in Python.\n", 12 | "\n", 13 | "## Recording\n", 14 | "\n", 15 | "Create a [virtual environment](https://realpython.com/python-virtual-environments-a-primer/) to ensure reproducibility in your Python packages.\n", 16 | "\n", 17 | "The following creates a virtual environment with [`pipenv`](https://pipenv.pypa.io/en/latest/).\n", 18 | "Other tools exist too, such as [venv](https://docs.python.org/3/library/venv.html) or [conda](https://conda.io/projects/conda/en/latest/user-guide/tasks/manage-environments.html).\n", 19 | "\n", 20 | "Be sure to let Jupyter know what environment you are using - search for \"venv with jupyter\", for example.\n", 21 | "\n", 22 | "Document the tools you choose to use, and instructions for recovering the computational environment, inside the `procedure/environment/readme.md` file.\n", 23 | "\n", 24 | "### `pipenv`\n", 25 | "\n", 26 | "First install `pipenv` by running the chunk below:" 27 | ] 28 | }, 29 | { 30 | "cell_type": "code", 31 | "execution_count": null, 32 | "metadata": { 33 | "colab": { 34 | "base_uri": "https://localhost:8080/" 35 | }, 36 | "id": "YCGcY3lah0BU", 37 | "outputId": "cc60f5ec-03d5-4722-f16b-b3e096a186f4" 38 | }, 39 | "outputs": [ 40 | { 41 | "name": "stdout", 42 | "output_type": "stream", 43 | "text": [ 44 | "Collecting pipenv\n", 45 | " Downloading pipenv-2023.7.11-py3-none-any.whl (2.8 MB)\n", 46 | "\u001b[2K \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m2.8/2.8 MB\u001b[0m \u001b[31m16.7 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n", 47 | "\u001b[?25hRequirement already satisfied: certifi in /usr/local/lib/python3.10/dist-packages (from pipenv) (2023.5.7)\n", 48 | "Requirement already satisfied: setuptools>=67.0.0 in /usr/local/lib/python3.10/dist-packages (from pipenv) (67.7.2)\n", 49 | "Collecting virtualenv-clone>=0.2.5 (from pipenv)\n", 50 | " Downloading virtualenv_clone-0.5.7-py3-none-any.whl (6.6 kB)\n", 51 | "Collecting virtualenv>=20.17.1 (from pipenv)\n", 52 | " Downloading virtualenv-20.24.1-py3-none-any.whl (3.0 MB)\n", 53 | "\u001b[2K \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m3.0/3.0 MB\u001b[0m \u001b[31m28.7 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n", 54 | "\u001b[?25hCollecting distlib<1,>=0.3.6 (from virtualenv>=20.17.1->pipenv)\n", 55 | " Downloading distlib-0.3.7-py2.py3-none-any.whl (468 kB)\n", 56 | "\u001b[2K \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m468.9/468.9 kB\u001b[0m \u001b[31m26.2 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n", 57 | "\u001b[?25hRequirement already satisfied: filelock<4,>=3.12 in /usr/local/lib/python3.10/dist-packages (from virtualenv>=20.17.1->pipenv) (3.12.2)\n", 58 | "Requirement already satisfied: platformdirs<4,>=3.5.1 in /usr/local/lib/python3.10/dist-packages (from virtualenv>=20.17.1->pipenv) (3.8.1)\n", 59 | "Installing collected packages: distlib, virtualenv-clone, virtualenv, pipenv\n", 60 | "\u001b[33m WARNING: The script virtualenv-clone is installed in '/root/.local/bin' which is not on PATH.\n", 61 | " Consider adding this directory to PATH or, if you prefer to suppress this warning, use --no-warn-script-location.\u001b[0m\u001b[33m\n", 62 | "\u001b[0m\u001b[33m WARNING: The script virtualenv is installed in '/root/.local/bin' which is not on PATH.\n", 63 | " Consider adding this directory to PATH or, if you prefer to suppress this warning, use --no-warn-script-location.\u001b[0m\u001b[33m\n", 64 | "\u001b[0m\u001b[33m WARNING: The scripts pipenv and pipenv-resolver are installed in '/root/.local/bin' which is not on PATH.\n", 65 | " Consider adding this directory to PATH or, if you prefer to suppress this warning, use --no-warn-script-location.\u001b[0m\u001b[33m\n", 66 | "\u001b[0mSuccessfully installed distlib-0.3.7 pipenv-2023.7.11 virtualenv-20.24.1 virtualenv-clone-0.5.7\n" 67 | ] 68 | } 69 | ], 70 | "source": [ 71 | "!pip install --user pipenv" 72 | ] 73 | }, 74 | { 75 | "cell_type": "markdown", 76 | "metadata": { 77 | "id": "F-1ndcK1iFYD" 78 | }, 79 | "source": [ 80 | "Then, install the packages you need using `pipenv install`.\n", 81 | "\n", 82 | "**Do not use** `pip`, since it will not record the install!\n", 83 | "\n", 84 | "We will install `pyhere`, a package to simplify directory management.\n", 85 | "\n", 86 | "Check out pyhere's documentation [here](https://pypi.org/project/pyhere/).\n", 87 | "\n", 88 | "**Note**: if you run into the error `pipenv: command not found`, then replace `pipenv` with `python -m pipenv`." 89 | ] 90 | }, 91 | { 92 | "cell_type": "code", 93 | "execution_count": null, 94 | "metadata": { 95 | "colab": { 96 | "base_uri": "https://localhost:8080/" 97 | }, 98 | "id": "pKr_XDW4iEa5", 99 | "outputId": "eba58871-fe7d-4f2f-8268-9fa503f92d07" 100 | }, 101 | "outputs": [ 102 | { 103 | "name": "stdout", 104 | "output_type": "stream", 105 | "text": [ 106 | "\u001b[1mCreating a virtualenv for this project...\u001b[0m\n", 107 | "Pipfile: \u001b[33m\u001b[1m/content/Pipfile\u001b[0m\n", 108 | "\u001b[1mUsing default python from\u001b[0m \u001b[33m\u001b[1m/usr/bin/python3\u001b[0m \u001b[32m(3.10.6)\u001b[0m \u001b[1mto create virtualenv...\u001b[0m\n", 109 | "\u001b[2K\u001b[32m⠹\u001b[0m Creating virtual environment...\u001b[36mcreated virtual environment CPython3.10.6.final.0-64 in 1601ms\n", 110 | " creator CPython3Posix(dest=/root/.local/share/virtualenvs/content-cQIIIOO2, clear=False, no_vcs_ignore=False, global=False)\n", 111 | " seeder FromAppData(download=False, pip=bundle, setuptools=bundle, wheel=bundle, via=copy, app_data_dir=/root/.local/share/virtualenv)\n", 112 | " added seed packages: pip==23.2, setuptools==68.0.0, wheel==0.40.0\n", 113 | " activators BashActivator,CShellActivator,FishActivator,NushellActivator,PowerShellActivator,PythonActivator\n", 114 | "\u001b[0m\n", 115 | "✔ Successfully created virtual environment!\n", 116 | "\u001b[2K\u001b[32m⠸\u001b[0m Creating virtual environment...\n", 117 | "\u001b[1A\u001b[2K\u001b[32mVirtualenv location: /root/.local/share/virtualenvs/content-cQIIIOO2\u001b[0m\n", 118 | "\u001b[1mCreating a Pipfile for this project...\u001b[0m\n", 119 | "\u001b[32m\u001b[1mInstalling pyhere...\u001b[0m\n", 120 | "\u001b[?25lResolving pyhere\u001b[33m...\u001b[0m\n", 121 | "\u001b[2K\u001b[1mAdding \u001b[0m\u001b[1;32mpyhere\u001b[0m to Pipfile's \u001b[1;33m[\u001b[0m\u001b[33mpackages\u001b[0m\u001b[1;33m]\u001b[0m \u001b[33m...\u001b[0m\n", 122 | "\u001b[2K✔ Installation Succeeded\n", 123 | "\u001b[2K\u001b[32m⠋\u001b[0m Installing pyhere...\n", 124 | "\u001b[1A\u001b[2K\u001b[1mPipfile.lock not found, creating...\u001b[0m\n", 125 | "Locking\u001b[0m \u001b[33m[packages]\u001b[0m dependencies...\u001b[0m\n", 126 | "\u001b[?25lBuilding requirements\u001b[33m...\u001b[0m\n", 127 | "\u001b[2KResolving dependencies\u001b[33m...\u001b[0m\n", 128 | "\u001b[2K✔ Success!\n", 129 | "\u001b[2K\u001b[32m⠧\u001b[0m Locking...\n", 130 | "\u001b[1A\u001b[2KLocking\u001b[0m \u001b[33m[dev-packages]\u001b[0m dependencies...\u001b[0m\n", 131 | "\u001b[1mUpdated Pipfile.lock (55a3de81a4921858ffb7a5cdc8cc04cf085bda69717d143b6346d23cf393900c)!\u001b[0m\n", 132 | "\u001b[1mInstalling dependencies from Pipfile.lock (93900c)...\u001b[0m\n", 133 | "To activate this project's virtualenv, run \u001b[33mpipenv shell\u001b[0m.\n", 134 | "Alternatively, run a command inside the virtualenv with \u001b[33mpipenv run\u001b[0m.\n" 135 | ] 136 | } 137 | ], 138 | "source": [ 139 | "!python -m pipenv install pyhere" 140 | ] 141 | }, 142 | { 143 | "cell_type": "markdown", 144 | "metadata": { 145 | "id": "26gqe2mtjkwA" 146 | }, 147 | "source": [ 148 | "When you installed `pyhere`, `pipenv` created a virtual environment for you.\n", 149 | "\n", 150 | "You can see the virtualenv's location given above:\n", 151 | "\n", 152 | "```\n", 153 | "Virtualenv location: /root/.local/share/virtualenvs/content-cQIIIOO2\n", 154 | "```\n", 155 | "\n", 156 | "Next, follow [these instructions](https://jupyterlab.readthedocs.io/en/stable/getting_started/installation.html#pipenv) to launch Jupyter using the `pipenv` environment you just created.\n", 157 | "\n", 158 | "#### The Pipfile\n", 159 | "You will see two files in the current notebook folder; refresh JupyterLab's file explorer if you do not.\n", 160 | "\n", 161 | "When you are finished with the analysis, move **both** `Pipfile` and `Pipfile.lock` into the `/procedure/environment` folder." 162 | ] 163 | }, 164 | { 165 | "cell_type": "markdown", 166 | "metadata": { 167 | "id": "o5wVGooof91k" 168 | }, 169 | "source": [ 170 | "### Record existing packages\n", 171 | "If you already have some code that imports Python packages, the `pigar` package can help you figure out which packages you are using.\n", 172 | "\n", 173 | "Comment out the first line to run the code below.\n", 174 | "Run the code once, when you are finished with the analysis and know what packages you are using.\n", 175 | "\n", 176 | "This will generate a `requirements.txt` in the `/procedure/environment` folder." 177 | ] 178 | }, 179 | { 180 | "cell_type": "code", 181 | "execution_count": null, 182 | "metadata": { 183 | "colab": { 184 | "base_uri": "https://localhost:8080/" 185 | }, 186 | "id": "0wzieMywf91l", 187 | "outputId": "3fca492a-80fc-454d-bfe8-4243253f2c88" 188 | }, 189 | "outputs": [ 190 | { 191 | "name": "stdout", 192 | "output_type": "stream", 193 | "text": [ 194 | "Requirement already satisfied: pigar in /usr/local/lib/python3.10/dist-packages (2.1.1)\n", 195 | "Requirement already satisfied: click>=8.1 in /usr/local/lib/python3.10/dist-packages (from pigar) (8.1.4)\n", 196 | "Requirement already satisfied: nbformat>=5.7 in /usr/local/lib/python3.10/dist-packages (from pigar) (5.9.1)\n", 197 | "Requirement already satisfied: aiohttp>=3.8 in /usr/local/lib/python3.10/dist-packages (from pigar) (3.8.4)\n", 198 | "Requirement already satisfied: attrs>=17.3.0 in /usr/local/lib/python3.10/dist-packages (from aiohttp>=3.8->pigar) (23.1.0)\n", 199 | "Requirement already satisfied: charset-normalizer<4.0,>=2.0 in /usr/local/lib/python3.10/dist-packages (from aiohttp>=3.8->pigar) (2.0.12)\n", 200 | "Requirement already satisfied: multidict<7.0,>=4.5 in /usr/local/lib/python3.10/dist-packages (from aiohttp>=3.8->pigar) (6.0.4)\n", 201 | "Requirement already satisfied: async-timeout<5.0,>=4.0.0a3 in /usr/local/lib/python3.10/dist-packages (from aiohttp>=3.8->pigar) (4.0.2)\n", 202 | "Requirement already satisfied: yarl<2.0,>=1.0 in /usr/local/lib/python3.10/dist-packages (from aiohttp>=3.8->pigar) (1.9.2)\n", 203 | "Requirement already satisfied: frozenlist>=1.1.1 in /usr/local/lib/python3.10/dist-packages (from aiohttp>=3.8->pigar) (1.4.0)\n", 204 | "Requirement already satisfied: aiosignal>=1.1.2 in /usr/local/lib/python3.10/dist-packages (from aiohttp>=3.8->pigar) (1.3.1)\n", 205 | "Requirement already satisfied: fastjsonschema in /usr/local/lib/python3.10/dist-packages (from nbformat>=5.7->pigar) (2.17.1)\n", 206 | "Requirement already satisfied: jsonschema>=2.6 in /usr/local/lib/python3.10/dist-packages (from nbformat>=5.7->pigar) (4.3.3)\n", 207 | "Requirement already satisfied: jupyter-core in /usr/local/lib/python3.10/dist-packages (from nbformat>=5.7->pigar) (5.3.1)\n", 208 | "Requirement already satisfied: traitlets>=5.1 in /usr/local/lib/python3.10/dist-packages (from nbformat>=5.7->pigar) (5.7.1)\n", 209 | "Requirement already satisfied: pyrsistent!=0.17.0,!=0.17.1,!=0.17.2,>=0.14.0 in /usr/local/lib/python3.10/dist-packages (from jsonschema>=2.6->nbformat>=5.7->pigar) (0.19.3)\n", 210 | "Requirement already satisfied: idna>=2.0 in /usr/local/lib/python3.10/dist-packages (from yarl<2.0,>=1.0->aiohttp>=3.8->pigar) (3.4)\n", 211 | "Requirement already satisfied: platformdirs>=2.5 in /usr/local/lib/python3.10/dist-packages (from jupyter-core->nbformat>=5.7->pigar) (3.8.1)\n", 212 | "\u001b[34m18:35:49\u001b[39m \u001b[31mdistribution \"blinker\" may be not editable: NotADirectoryError(20, 'Not a directory')\u001b[39m\n", 213 | "\u001b[33mRequirements file has been overwritten, no difference.\u001b[39m\n", 214 | "\u001b[32mRequirements has been written to /environment/requirements.txt.\u001b[39m\n" 215 | ] 216 | } 217 | ], 218 | "source": [ 219 | "%%script echo skipping\n", 220 | "!pip install pigar\n", 221 | "!python -m pigar generate -f ../environment/requirements.txt" 222 | ] 223 | }, 224 | { 225 | "cell_type": "markdown", 226 | "metadata": { 227 | "id": "6whdTQbphAFN" 228 | }, 229 | "source": [ 230 | "#### Cleanup\n", 231 | "Depending on your setup, the list generated by `pigar` may require cleanup.\n", 232 | "\n", 233 | "The goal of cleanup is for the `requirements.txt` to contain only comments (lines starting with #) and lines of the format `[package]==[version]`.\n", 234 | "\n", 235 | "Thus, version `1.15.post1` of the `CensusData` package can be represented as `CensusData==1.15.post1`.\n", 236 | "\n", 237 | "An example is packages installed with `conda`.\n", 238 | "These entries may look somewhat like this:\n", 239 | "```python\n", 240 | "# Editable install with no version control (pandas==1.3.5)\n", 241 | "-e /cvmfs/cybergis.illinois.edu/software/conda/cybergisx/python3-0.9.0/lib/python3.8/site-packages/pandas-1.3.5-py3.8.egg-info\n", 242 | "```\n", 243 | "Since this essentially installs `pandas` version `1.3.5`, you can replace these two lines with `pandas==1.3.5`.\n" 244 | ] 245 | }, 246 | { 247 | "cell_type": "markdown", 248 | "metadata": {}, 249 | "source": [ 250 | "## Recovering\n", 251 | "\n", 252 | "Depending on what is inside the `/procedure/environment` folder, you will recover the computational environment with different tools.\n", 253 | "\n", 254 | "### From a virtual environment\n", 255 | "\n", 256 | "If you have a `Pipfile` and a `Pipfile.lock`, run:" 257 | ] 258 | }, 259 | { 260 | "cell_type": "code", 261 | "execution_count": null, 262 | "metadata": {}, 263 | "outputs": [], 264 | "source": [ 265 | "!cd ../environment\n", 266 | "!pipenv sync" 267 | ] 268 | }, 269 | { 270 | "cell_type": "markdown", 271 | "metadata": {}, 272 | "source": [ 273 | "If you have a `Pipfile` but no `Pipfile.lock`, run:" 274 | ] 275 | }, 276 | { 277 | "cell_type": "code", 278 | "execution_count": null, 279 | "metadata": {}, 280 | "outputs": [], 281 | "source": [ 282 | "!cd ../environment\n", 283 | "!pipenv install\n", 284 | "!pipenv sync" 285 | ] 286 | }, 287 | { 288 | "cell_type": "markdown", 289 | "metadata": {}, 290 | "source": [ 291 | "If you have a `environment.yml`, run:" 292 | ] 293 | }, 294 | { 295 | "cell_type": "code", 296 | "execution_count": null, 297 | "metadata": {}, 298 | "outputs": [], 299 | "source": [ 300 | "!conda env create -f environment.yml" 301 | ] 302 | }, 303 | { 304 | "cell_type": "markdown", 305 | "metadata": {}, 306 | "source": [ 307 | "After you recover the virtual environment, activate it for the notebook environment you are using." 308 | ] 309 | }, 310 | { 311 | "cell_type": "markdown", 312 | "metadata": {}, 313 | "source": [ 314 | "### From a list of packages\n", 315 | "\n", 316 | "If you have a `requirements.txt`, then you may want to create a virtual environment with `venv` or `pipenv`.\n", 317 | "\n", 318 | "But if you are on a disposable environment, e.g. Google Colab or Binder, then there is no need for a virtual environment;\n", 319 | "simply run the next code cell.\n", 320 | "\n", 321 | "Should you choose `venv`, then create a virtual environment, activate it, then run:" 322 | ] 323 | }, 324 | { 325 | "cell_type": "code", 326 | "execution_count": null, 327 | "metadata": {}, 328 | "outputs": [], 329 | "source": [ 330 | "# run directly if disposable\n", 331 | "!pip install -r ../environment/requirements.txt" 332 | ] 333 | }, 334 | { 335 | "cell_type": "markdown", 336 | "metadata": {}, 337 | "source": [ 338 | "If you choose `pipenv`, refer to the instructions [here](https://docs.pipenv.org/basics/#importing-from-requirements-txt)." 339 | ] 340 | } 341 | ], 342 | "metadata": { 343 | "language_info": { 344 | "name": "python" 345 | }, 346 | "orig_nbformat": 4 347 | }, 348 | "nbformat": 4, 349 | "nbformat_minor": 2 350 | } 351 | -------------------------------------------------------------------------------- /procedure/code/01-R-markdown.Rmd: -------------------------------------------------------------------------------- 1 | --- 2 | title: "Analysis" 3 | author: "HEGSRR" 4 | date: "`r Sys.Date()`" 5 | output: html_document 6 | editor_options: 7 | markdown: 8 | wrap: sentence 9 | knit: (function(inputFile, encoding) { 10 | rmarkdown::render(inputFile, encoding = encoding, output_dir = "../../docs") }) 11 | nocite: '@*' 12 | bibliography: "../../software.bib" 13 | --- 14 | 15 | # Instructions 16 | 17 | This is an R Markdown document. Markdown is a simple formatting syntax for authoring HTML, PDF, and MS Word documents. 18 | For more details on using R Markdown see . 19 | In the header section above, you can configure [options for this document](https://bookdown.org/yihui/rmarkdown/html-document.html), including title, author(s), and additional style and output options. 20 | The `nocite` and `bibliography` lines automatically add a bibliography for the software packages you have used. 21 | Remove the `nocite` line to suppress references you haven't cited. 22 | You may delete this instruction section. 23 | 24 | # Abstract 25 | 26 | Write a brief abstract about your research project. 27 | 28 | If the project is a reproduction or replication study, include a declaration of the study type with a full reference to the original study. 29 | For example: 30 | 31 | This study is a *replication* of: 32 | 33 | > citation to prior study 34 | 35 | A graphical abstract of the study could also be included as an image here. 36 | 37 | # Study metadata 38 | 39 | - `Key words`: Comma-separated list of keywords (tags) for searchability. Geographers often use one or two keywords each for: theory, geographic context, and methods. 40 | - `Subject`: select from the [BePress Taxonomy](http://digitalcommons.bepress.com/cgi/viewcontent.cgi?article=1008&context=reference) 41 | - `Date created`: date when project was started 42 | - `Date modified`: date of most recent revision 43 | - `Spatial Coverage`: Specify the geographic extent of your study. This may be a place name and link to a feature in a gazetteer like GeoNames or OpenStreetMap, or a well known text (WKT) representation of a bounding box. 44 | - `Spatial Resolution`: Specify the spatial resolution as a scale factor, description of the level of detail of each unit of observation (including administrative level of administrative areas), and/or or distance of a raster GRID size 45 | - `Spatial Reference System`: Specify the geographic or projected coordinate system for the study, e.g. EPSG:4326 46 | - `Temporal Coverage`: Specify the temporal extent of your study---i.e. the range of time represented by the data observations. 47 | - `Temporal Resolution`: Specify the temporal resolution of your study---i.e. the duration of time for which each observation represents or the revisit period for repeated observations 48 | - `Funding Name`: name of funding for the project 49 | - `Funding Title`: title of project grant 50 | - `Award info URI`: web address for award information 51 | - `Award number`: award number 52 | 53 | ## Original study spatio-temporal metadata 54 | 55 | - `Spatial Coverage`: extent of original study 56 | - `Spatial Resolution`: resolution of original study 57 | - `Spatial Reference System`: spatial reference system of original study 58 | - `Temporal Coverage`: temporal extent of original study 59 | - `Temporal Resolution`: temporal resolution of original study 60 | 61 | # Study design 62 | 63 | Describe how the study relates to prior literature, e.g. is it a **original study**, **meta-analysis study**, **reproduction study**, **reanalysis study**, or **replication study**? 64 | 65 | Also describe the original study archetype, e.g. is it **observational**, **experimental**, **quasi-experimental**, or **exploratory**? 66 | 67 | Enumerate specific **hypotheses** to be tested or **research questions** to be investigated here, and specify the type of method, statistical test or model to be used on the hypothesis or question. 68 | 69 | # Materials and procedure 70 | 71 | ## Computational environment 72 | 73 | ```{r environment-setup, include = FALSE} 74 | # record all the packages you are using here 75 | # this includes any calls to library(), require(), 76 | # and double colons such as here::i_am() 77 | packages <- c("tidyverse", "here") 78 | 79 | # force all conflicts to become errors 80 | # if you load dplyr and use filter(), R has to guess whether you mean dplyr::filter() or stats::filter() 81 | # the conflicted package forces you to be explicit about this 82 | # disable at your own peril 83 | # https://conflicted.r-lib.org/ 84 | require(conflicted) 85 | 86 | # load and install required packages 87 | # https://groundhogr.com/ 88 | if (!require(groundhog)) { 89 | install.packages("groundhog") 90 | require(groundhog) 91 | } 92 | 93 | # this date will be used to determine the versions of R and your packages 94 | # it is best practice to keep R and its packages up to date 95 | groundhog.day <- "2023-06-26" 96 | 97 | # this replaces any library() or require() calls 98 | groundhog.library(packages, groundhog.day) 99 | # you may need to install a correct version of R 100 | # you may need to respond OK in the console to permit groundhog to install packages 101 | # you may need to restart R and rerun this code to load installed packages 102 | # In RStudio, restart r with Session -> Restart Session 103 | 104 | # record the R processing environment 105 | # alternatively, use devtools::session_info() for better results 106 | writeLines( 107 | capture.output(sessionInfo()), 108 | here("procedure", "environment", paste0("r-environment-", Sys.Date(), ".txt")) 109 | ) 110 | 111 | # save package citations 112 | knitr::write_bib(c(packages, "base"), file = here("software.bib")) 113 | 114 | # set up default knitr parameters 115 | # https://yihui.org/knitr/options/ 116 | knitr::opts_chunk$set( 117 | echo = FALSE, # Show outputs, but not code. Change to TRUE to show code as well 118 | fig.retina = 4, 119 | fig.width = 8, 120 | fig.path = paste0(here("results", "figures"), "/") 121 | ) 122 | ``` 123 | 124 | ## Data and variables 125 | 126 | Describe the **data sources** and **variables** to be used. 127 | Data sources may include plans for observing and recording **primary data** or descriptions of **secondary data**. 128 | For secondary data sources with numerous variables, the analysis plan authors may focus on documenting only the variables intended for use in the study. 129 | 130 | Primary data sources for the study are to include ... . 131 | Secondary data sources for the study are to include ... . 132 | 133 | Each of the next subsections describes one data source. 134 | 135 | ### Primary data source1 name 136 | 137 | - `Title`: Title of data source 138 | - `Abstract`: Brief description of the data source 139 | - `Spatial Coverage`: Specify the geographic extent of your study. This may be a place name and link to a feature in a gazetteer like GeoNames or OpenStreetMap, or a well known text (WKT) representation of a bounding box. 140 | - `Spatial Resolution`: Specify the spatial resolution as a scale factor, description of the level of detail of each unit of observation (including administrative level of administrative areas), and/or or distance of a raster GRID size 141 | - `Spatial Representation Type`: Specify the model of spatial data representation, e.g. one of `vector`, `grid`, `textTable`, `tin` (triangulated irregular network), etc. If the type is `vector`, also specify the geometry type as in the OGC Simple Feature Access standard (https://www.ogc.org/publications/standard/sfa/) , e.g. `POINT`, `LINESTRING`, `MULTIPOLYGON`, etc. 142 | - `Spatial Reference System`: Specify the geographic or projected coordinate system for the study 143 | - `Temporal Coverage`: Specify the temporal extent of your study---i.e. the range of time represented by the data observations. 144 | - `Temporal Resolution`: Specify the temporal resolution of your study---i.e. the duration of time for which each observation represents or the revisit period for repeated observations 145 | - `Lineage`: Describe and/or cite data sources and/or methodological steps planned to create this data source. 146 | - sampling scheme, including spatial sampling 147 | - target sample size and method for determining sample size 148 | - stopping criteria for data collection and sampling (e.g. sample size, time elapsed) 149 | - de-identification / anonymization 150 | - experimental manipulation 151 | - `Distribution`: Describe who will make the data available and how? 152 | - `Constraints`: Legal constraints for *access* or *use* to protect *privacy* or *intellectual property rights* 153 | - `Data Quality`: State any planned quality assessment 154 | - `Variables`: For each variable, enter the following information. If you have two or more variables per data source, you may want to present this information in table form (shown below) 155 | - `Label`: variable name as used in the data or code 156 | - `Alias`: intuitive natural language name 157 | - `Definition`: Short description or definition of the variable. Include measurement units in description. 158 | - `Type`: data type, e.g. character string, integer, real 159 | - `Accuracy`: e.g. uncertainty of measurements 160 | - `Domain`: Expected range of Maximum and Minimum of numerical data, or codes or categories of nominal data, or reference to a standard codebook 161 | - `Missing Data Value(s)`: Values used to represent missing data and frequency of missing data observations 162 | - `Missing Data Frequency`: Frequency of missing data observations: not yet known for data to be collected 163 | 164 | | Label | Alias | Definition | Type | Accuracy | Domain | Missing Data Value(s) | Missing Data Frequency | 165 | | :--: | :--: | :--: | :--: | :--: | :--: | :--: | :--: | 166 | | variable1 | ... | ... | ... | ... | ... | ... | ... | 167 | | variable2 | ... | ... | ... | ... | ... | ... | ... | 168 | 169 | ### Primary data source2 name 170 | 171 | ... same form as above... 172 | Metadata documents in the markdown `.md` format can be included directly with the `includeMarkdown()` function. 173 | This requires the `markdown` package. 174 | 175 | ### Secondary data source1 name 176 | 177 | - `Title`: Title of data source 178 | - `Abstract`: Brief description of the data source 179 | - `Spatial Coverage`: Specify the geographic extent of your study. This may be a place name and link to a feature in a gazetteer like GeoNames or OpenStreetMap, or a well known text (WKT) representation of a bounding box. 180 | - `Spatial Resolution`: Specify the spatial resolution as a scale factor, description of the level of detail of each unit of observation (including administrative level of administrative areas), and/or or distance of a raster GRID size 181 | - `Spatial Reference System`: Specify the geographic or projected coordinate system for the study 182 | - `Temporal Coverage`: Specify the temporal extent of your study---i.e. the range of time represented by the data observations. 183 | - `Temporal Resolution`: Specify the temporal resolution of your study---i.e. the duration of time for which each observation represents or the revisit period for repeated observations 184 | - `Lineage`: Describe and/or cite data sources and/or methodological steps used to create this data source 185 | - `Distribution`: Describe how the data is distributed, including any persistent identifier (e.g. DOI) or URL for data access 186 | - `Constraints`: Legal constraints for *access* or *use* to protect *privacy* or *intellectual property rights* 187 | - `Data Quality`: State result of quality assessment or state "Quality unknown" 188 | - `Variables`: For each variable, enter the following information. If you have two or more variables per data source, you may want to present this information in table form (shown below) 189 | - `Label`: variable name as used in the data or code 190 | - `Alias`: intuitive natural language name 191 | - `Definition`: Short description or definition of the variable. Include measurement units in description. 192 | - `Type`: data type, e.g. character string, integer, real 193 | - `Accuracy`: e.g. uncertainty of measurements 194 | - `Domain`: Range (Maximum and Minimum) of numerical data, or codes or categories of nominal data, or reference to a standard codebook 195 | - `Missing Data Value(s)`: Values used to represent missing data and frequency of missing data observations 196 | - `Missing Data Frequency`: Frequency of missing data observations 197 | 198 | | Label | Alias | Definition | Type | Accuracy | Domain | Missing Data Value(s) | Missing Data Frequency | 199 | | :--: | :--: | :--: | :--: | :--: | :--: | :--: | :--: | 200 | | variable1 | ... | ... | ... | ... | ... | ... | ... | 201 | | variable2 | ... | ... | ... | ... | ... | ... | ... | 202 | 203 | ### Secondary data source2 name 204 | 205 | ... same form as above... 206 | 207 | ## Prior observations 208 | 209 | Prior experience with the study area, prior data collection, or prior observation of the data can compromise the validity of a study, e.g. through p-hacking. 210 | Therefore, disclose any prior experience or observations at the time of study pre-registration here, with example text below: 211 | 212 | At the time of this study pre-registration, the authors had _____ prior knowledge of the geography of the study region with regards to the ____ phenomena to be studied. 213 | This study is related to ____ prior studies by the authors 214 | 215 | For each primary data source, declare the extent to which authors had already engaged with the data: 216 | 217 | - [ ] no data collection has started 218 | - [ ] pilot test data has been collected 219 | - [ ] data collection is in progress and data has not been observed 220 | - [ ] data collection is in progress and __% of data has been observed 221 | - [ ] data collection is complete and data has been observed. Explain how authors have already manipulated / explored the data. 222 | 223 | For each secondary source, declare the extent to which authors had already engaged with the data: 224 | 225 | - [ ] data is not available yet 226 | - [ ] data is available, but only metadata has been observed 227 | - [ ] metadata and descriptive statistics have been observed 228 | - [ ] metadata and a pilot test subset or sample of the full dataset have been observed 229 | - [ ] the full dataset has been observed. Explain how authors have already manipulated / explored the data. 230 | 231 | If pilot test data has been collected or acquired, describe how the researchers observed and analyzed the pilot test, and the extent to which the pilot test influenced the research design. 232 | 233 | ## Bias and threats to validity 234 | 235 | Given the research design and primary data to be collected and/or secondary data to be used, discuss common threats to validity and the approach to mitigating those threats, with an emphasis on geographic threats to validity. 236 | 237 | These include: 238 | - uneven primary data collection due to geographic inaccessibility or other constraints 239 | - multiple hypothesis testing 240 | - edge or boundary effects 241 | - the modifiable areal unit problem 242 | - nonstationarity 243 | - spatial dependence or autocorrelation 244 | - temporal dependence or autocorrelation 245 | - spatial scale dependency 246 | - spatial anisotropies 247 | - confusion of spatial and a-spatial causation 248 | - ecological fallacy 249 | - uncertainty e.g. from spatial disaggregation, anonymization, differential privacy 250 | 251 | ## Data transformations 252 | 253 | Describe all data transformations planned to prepare data sources for analysis. 254 | This section should explain with the fullest detail possible how to transform data from the **raw** state at the time of acquisition or observation, to the pre-processed **derived** state ready for the main analysis. 255 | Including steps to check and mitigate sources of **bias** and **threats to validity**. 256 | The method may anticipate **contingencies**, e.g. tests for normality and alternative decisions to make based on the results of the test. 257 | More specifically, all the **geographic** and **variable** transformations required to prepare input data as described in the data and variables section above to match the study's spatio-temporal characteristics as described in the study metadata and study design sections. 258 | Visual workflow diagrams may help communicate the methodology in this section. 259 | 260 | Examples of **geographic** transformations include coordinate system transformations, aggregation, disaggregation, spatial interpolation, distance calculations, zonal statistics, etc. 261 | 262 | Examples of **variable** transformations include standardization, normalization, constructed variables, imputation, classification, etc. 263 | 264 | Be sure to include any steps planned to **exclude** observations with *missing* or *outlier* data, to **group** observations by *attribute* or *geographic* criteria, or to **impute** missing data or apply spatial or temporal **interpolation**. 265 | 266 | ## Analysis 267 | 268 | Describe the methods of analysis that will directly test the hypotheses or provide results to answer the research questions. 269 | This section should explicitly define any spatial / statistical *models* and their *parameters*, including *grouping* criteria, *weighting* criteria, and *significance thresholds*. 270 | Also explain any follow-up analyses or validations. 271 | 272 | # Results 273 | 274 | Describe how results are to be presented. 275 | 276 | # Discussion 277 | 278 | Describe how the results are to be interpreted *vis a vis* each hypothesis or research question. 279 | 280 | # Integrity Statement 281 | 282 | Include an integrity statement - The authors of this preregistration state that they completed this preregistration to the best of their knowledge and that no other preregistration exists pertaining to the same hypotheses and research. 283 | If a prior registration *does* exist, explain the rationale for revising the registration here. 284 | 285 | # Acknowledgements 286 | 287 | - `Funding Name`: name of funding for the project 288 | - `Funding Title`: title of project grant 289 | - `Award info URI`: web address for award information 290 | - `Award number`: award number 291 | 292 | This report is based upon the template for Reproducible and Replicable Research in Human-Environment and Geographical Sciences, DOI:[10.17605/OSF.IO/W29MQ](https://doi.org/10.17605/OSF.IO/W29MQ) 293 | 294 | # References 295 | -------------------------------------------------------------------------------- /procedure/code/01-Jupyter_notebook.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "markdown", 5 | "metadata": { 6 | "id": "7gUZzqUXf91d" 7 | }, 8 | "source": [ 9 | "# Analysis\n", 10 | "\n", 11 | "Template for Jupyter notebooks running Python." 12 | ] 13 | }, 14 | { 15 | "cell_type": "markdown", 16 | "metadata": { 17 | "id": "ynQbJHvcVm55" 18 | }, 19 | "source": [ 20 | "Version 0.1.0 \\| First Created July 12, 2023 \\| Updated August 01, 2023\n", 21 | "\n", 22 | "## Jupyter Notebook\n", 23 | "\n", 24 | "This is an Jupyter Notebook document. For more details on using a Jupyter Notebook see .\n", 25 | "\n" 26 | ] 27 | }, 28 | { 29 | "cell_type": "markdown", 30 | "metadata": {}, 31 | "source": [ 32 | "# Title of Study\n", 33 | "\n", 34 | "### Authors\n", 35 | "\n", 36 | "- First Name Last Name\\*, email address, @githubname, ORCID link, affiliated institution(s)\n", 37 | "- First Name Last Name, email address, @githubname, ORCID link, affiliated institution(s)\n", 38 | "\n", 39 | "\\* Corresponding author and creator\n", 40 | "\n" 41 | ] 42 | }, 43 | { 44 | "cell_type": "markdown", 45 | "metadata": {}, 46 | "source": [ 47 | "### Abstract\n", 48 | "\n", 49 | "Write a brief abstract about your research project.\n", 50 | "\n", 51 | "If the project is a reproduction or replication study, include a declaration of the study type with a full reference to the original study.\n", 52 | "For example:\n", 53 | "\n", 54 | "This study is a *replication* of:\n", 55 | "\n", 56 | "> citation to prior study\n", 57 | "\n", 58 | "A graphical abstract of the study could also be included as an image here.\n", 59 | "\n" 60 | ] 61 | }, 62 | { 63 | "cell_type": "markdown", 64 | "metadata": {}, 65 | "source": [ 66 | "### Study metadata\n", 67 | "\n", 68 | "- `Key words`: Comma-separated list of keywords (tags) for searchability. Geographers often use one or two keywords each for: theory, geographic context, and methods.\n", 69 | "- `Subject`: select from the [BePress Taxonomy](http://digitalcommons.bepress.com/cgi/viewcontent.cgi?article=1008&context=reference)\n", 70 | "- `Date created`: date when project was started\n", 71 | "- `Date modified`: date of most recent revision\n", 72 | "- `Spatial Coverage`: Specify the geographic extent of your study. This may be a place name and link to a feature in a gazetteer like GeoNames or OpenStreetMap, or a well known text (WKT) representation of a bounding box.\n", 73 | "- `Spatial Resolution`: Specify the spatial resolution as a scale factor, description of the level of detail of each unit of observation (including administrative level of administrative areas), and/or or distance of a raster GRID size\n", 74 | "- `Spatial Reference System`: Specify the geographic or projected coordinate system for the study, e.g. EPSG:4326\n", 75 | "- `Temporal Coverage`: Specify the temporal extent of your study---i.e. the range of time represented by the data observations.\n", 76 | "- `Temporal Resolution`: Specify the temporal resolution of your study---i.e. the duration of time for which each observation represents or the revisit period for repeated observations\n", 77 | "- `Funding Name`: name of funding for the project\n", 78 | "- `Funding Title`: title of project grant\n", 79 | "- `Award info URI`: web address for award information\n", 80 | "- `Award number`: award number\n", 81 | "\n", 82 | "#### Original study spatio-temporal metadata\n", 83 | "\n", 84 | "- `Spatial Coverage`: extent of original study\n", 85 | "- `Spatial Resolution`: resolution of original study\n", 86 | "- `Spatial Reference System`: spatial reference system of original study\n", 87 | "- `Temporal Coverage`: temporal extent of original study\n", 88 | "- `Temporal Resolution`: temporal resolution of original study\n", 89 | "\n" 90 | ] 91 | }, 92 | { 93 | "cell_type": "markdown", 94 | "metadata": {}, 95 | "source": [ 96 | "## Study design\n", 97 | "\n", 98 | "Describe how the study relates to prior literature, e.g. is it a **original study**, **meta-analysis study**, **reproduction study**, **reanalysis study**, or **replication study**?\n", 99 | "\n", 100 | "Also describe the original study archetype, e.g. is it **observational**, **experimental**, **quasi-experimental**, or **exploratory**?\n", 101 | "\n", 102 | "Enumerate specific **hypotheses** to be tested or **research questions** to be investigated here, and specify the type of method, statistical test or model to be used on the hypothesis or question.\n", 103 | "\n", 104 | "## Materials and procedure" 105 | ] 106 | }, 107 | { 108 | "cell_type": "markdown", 109 | "metadata": { 110 | "id": "lpUbJuwsgQJu" 111 | }, 112 | "source": [ 113 | "## Computational environment\n", 114 | "\n", 115 | "Maintaining a reproducible computational environment requires some conscious choices in package management.\n", 116 | "\n", 117 | "Please refer to `00-Python-environment-setup.ipynb` for details.\n", 118 | "\n" 119 | ] 120 | }, 121 | { 122 | "cell_type": "code", 123 | "execution_count": null, 124 | "metadata": {}, 125 | "outputs": [], 126 | "source": [ 127 | "# Import modules, define directories\n", 128 | "from pyhere import here\n", 129 | "\n", 130 | "# You can define your own shortcuts for file paths:\n", 131 | "path = {\n", 132 | " \"dscr\": here(\"data\", \"scratch\"),\n", 133 | " \"drpub\": here(\"data\", \"raw\", \"public\"),\n", 134 | " \"drpriv\": here(\"data\", \"raw\", \"private\"),\n", 135 | " \"ddpub\": here(\"data\", \"derived\", \"public\"),\n", 136 | " \"ddpriv\": here(\"data\", \"derived\", \"private\"),\n", 137 | " \"rfig\": here(\"results\", \"figures\"),\n", 138 | " \"roth\": here(\"results\", \"other\"),\n", 139 | " \"rtab\": here(\"results\", \"tables\"),\n", 140 | " \"dmet\": here(\"data\", \"metadata\")\n", 141 | "}" 142 | ] 143 | }, 144 | { 145 | "cell_type": "markdown", 146 | "metadata": { 147 | "id": "dEwHjXmlVXZI" 148 | }, 149 | "source": [ 150 | "### Data and variables\n", 151 | "\n", 152 | "Describe the **data sources** and **variables** to be used.\n", 153 | "Data sources may include plans for observing and recording **primary data** or descriptions of **secondary data**.\n", 154 | "For secondary data sources with numerous variables, the analysis plan authors may focus on documenting only the variables intended for use in the study.\n", 155 | "\n", 156 | "Primary data sources for the study are to include ... .\n", 157 | "Secondary data sources for the study are to include ... .\n", 158 | "\n", 159 | "Each of the next subsections describes one data source.\n", 160 | "\n" 161 | ] 162 | }, 163 | { 164 | "cell_type": "markdown", 165 | "metadata": {}, 166 | "source": [ 167 | "#### Primary data source1 name\n", 168 | "\n", 169 | "**Standard Metadata**\n", 170 | "\n", 171 | "- `Abstract`: Brief description of the data source\n", 172 | "- `Spatial Coverage`: Specify the geographic extent of your study. This may be a place name and link to a feature in a gazetteer like GeoNames or OpenStreetMap, or a well known text (WKT) representation of a bounding box.\n", 173 | "- `Spatial Resolution`: Specify the spatial resolution as a scale factor, description of the level of detail of each unit of observation (including administrative level of administrative areas), and/or or distance of a raster GRID size\n", 174 | "- `Spatial Reference System`: Specify the geographic or projected coordinate system for the study\n", 175 | "- `Temporal Coverage`: Specify the temporal extent of your study---i.e. the range of time represented by the data observations.\n", 176 | "- `Temporal Resolution`: Specify the temporal resolution of your study---i.e. the duration of time for which each observation represents or the revisit period for repeated observations\n", 177 | "- `Lineage`: Describe and/or cite data sources and/or methodological steps planned to create this data source.\n", 178 | " - sampling scheme, including spatial sampling\n", 179 | " - target sample size and method for determining sample size\n", 180 | " - stopping criteria for data collection and sampling (e.g. sample size, time elapsed)\n", 181 | " - de-identification / anonymization\n", 182 | " - experimental manipulation\n", 183 | "- `Distribution`: Describe who will make the data available and how?\n", 184 | "- `Constraints`: Legal constraints for *access* or *use* to protect *privacy* or *intellectual property rights*\n", 185 | "- `Data Quality`: State any planned quality assessment\n", 186 | "- `Variables`: For each variable, enter the following information. If you have two or more variables per data source, you may want to present this information in table form (shown below)\n", 187 | " - `Label`: variable name as used in the data or code\n", 188 | " - `Alias`: intuitive natural language name\n", 189 | " - `Definition`: Short description or definition of the variable. Include measurement units in description.\n", 190 | " - `Type`: data type, e.g. character string, integer, real\n", 191 | " - `Accuracy`: e.g. uncertainty of measurements\n", 192 | " - `Domain`: Expected range of Maximum and Minimum of numerical data, or codes or categories of nominal data, or reference to a standard codebook\n", 193 | " - `Missing Data Value(s)`: Values used to represent missing data and frequency of missing data observations\n", 194 | " - `Missing Data Frequency`: Frequency of missing data observations: not yet known for data to be collected\n", 195 | "\n", 196 | "| Label | Alias | Definition | Type | Accuracy | Domain | Missing Data Value(s) | Missing Data Frequency |\n", 197 | "| :--: | :--: | :--: | :--: | :--: | :--: | :--: | :--: |\n", 198 | "| variable1 | ... | ... | ... | ... | ... | ... | ... |\n", 199 | "| variable2 | ... | ... | ... | ... | ... | ... | ... |\n", 200 | "\n" 201 | ] 202 | }, 203 | { 204 | "cell_type": "markdown", 205 | "metadata": {}, 206 | "source": [ 207 | "#### Primary data source2 name\n", 208 | "\n", 209 | "... same form as above...\n", 210 | "\n" 211 | ] 212 | }, 213 | { 214 | "cell_type": "markdown", 215 | "metadata": {}, 216 | "source": [ 217 | "#### Secondary data source1 name\n", 218 | "\n", 219 | "**Standard Metadata**\n", 220 | "\n", 221 | "- `Abstract`: Brief description of the data source\n", 222 | "- `Spatial Coverage`: Specify the geographic extent of your study. This may be a place name and link to a feature in a gazetteer like GeoNames or OpenStreetMap, or a well known text (WKT) representation of a bounding box.\n", 223 | "- `Spatial Resolution`: Specify the spatial resolution as a scale factor, description of the level of detail of each unit of observation (including administrative level of administrative areas), and/or or distance of a raster GRID size\n", 224 | "- `Spatial Reference System`: Specify the geographic or projected coordinate system for the study\n", 225 | "- `Temporal Coverage`: Specify the temporal extent of your study---i.e. the range of time represented by the data observations.\n", 226 | "- `Temporal Resolution`: Specify the temporal resolution of your study---i.e. the duration of time for which each observation represents or the revisit period for repeated observations\n", 227 | "- `Lineage`: Describe and/or cite data sources and/or methodological steps used to create this data source\n", 228 | "- `Distribution`: Describe how the data is distributed, including any persistent identifier (e.g. DOI) or URL for data access\n", 229 | "- `Constraints`: Legal constraints for *access* or *use* to protect *privacy* or *intellectual property rights*\n", 230 | "- `Data Quality`: State result of quality assessment or state \"Quality unknown\"\n", 231 | "- `Variables`: For each variable, enter the following information. If you have two or more variables per data source, you may want to present this information in table form (shown below)\n", 232 | " - `Label`: variable name as used in the data or code\n", 233 | " - `Alias`: intuitive natural language name\n", 234 | " - `Definition`: Short description or definition of the variable. Include measurement units in description.\n", 235 | " - `Type`: data type, e.g. character string, integer, real\n", 236 | " - `Accuracy`: e.g. uncertainty of measurements\n", 237 | " - `Domain`: Range (Maximum and Minimum) of numerical data, or codes or categories of nominal data, or reference to a standard codebook\n", 238 | " - `Missing Data Value(s)`: Values used to represent missing data and frequency of missing data observations\n", 239 | " - `Missing Data Frequency`: Frequency of missing data observations\n", 240 | "\n", 241 | "| Label | Alias | Definition | Type | Accuracy | Domain | Missing Data Value(s) | Missing Data Frequency |\n", 242 | "| :--: | :--: | :--: | :--: | :--: | :--: | :--: | :--: |\n", 243 | "| variable1 | ... | ... | ... | ... | ... | ... | ... |\n", 244 | "| variable2 | ... | ... | ... | ... | ... | ... | ... |\n", 245 | "\n" 246 | ] 247 | }, 248 | { 249 | "cell_type": "markdown", 250 | "metadata": {}, 251 | "source": [ 252 | "#### Secondary data source2 name\n", 253 | "\n", 254 | "... same form as above...\n", 255 | "\n" 256 | ] 257 | }, 258 | { 259 | "cell_type": "markdown", 260 | "metadata": {}, 261 | "source": [ 262 | "### Prior observations \n", 263 | "\n", 264 | "Prior experience with the study area, prior data collection, or prior observation of the data can compromise the validity of a study, e.g. through p-hacking.\n", 265 | "Therefore, disclose any prior experience or observations at the time of study pre-registration here, with example text below:\n", 266 | "\n", 267 | "At the time of this study pre-registration, the authors had _____ prior knowledge of the geography of the study region with regards to the ____ phenomena to be studied.\n", 268 | "This study is related to ____ prior studies by the authors\n", 269 | "\n", 270 | "For each primary data source, declare the extent to which authors had already engaged with the data:\n", 271 | "\n", 272 | "- [ ] no data collection has started\n", 273 | "- [ ] pilot test data has been collected\n", 274 | "- [ ] data collection is in progress and data has not been observed\n", 275 | "- [ ] data collection is in progress and __% of data has been observed\n", 276 | "- [ ] data collection is complete and data has been observed. Explain how authors have already manipulated / explored the data.\n", 277 | "\n", 278 | "For each secondary source, declare the extent to which authors had already engaged with the data:\n", 279 | "\n", 280 | "- [ ] data is not available yet\n", 281 | "- [ ] data is available, but only metadata has been observed\n", 282 | "- [ ] metadata and descriptive statistics have been observed\n", 283 | "- [ ] metadata and a pilot test subset or sample of the full dataset have been observed\n", 284 | "- [ ] the full dataset has been observed. Explain how authors have already manipulated / explored the data.\n", 285 | "\n", 286 | "If pilot test data has been collected or acquired, describe how the researchers observed and analyzed the pilot test, and the extent to which the pilot test influenced the research design.\n", 287 | "\n" 288 | ] 289 | }, 290 | { 291 | "cell_type": "markdown", 292 | "metadata": {}, 293 | "source": [ 294 | "### Bias and threats to validity\n", 295 | "\n", 296 | "Given the research design and primary data to be collected and/or secondary data to be used, discuss common threats to validity and the approach to mitigating those threats, with an emphasis on geographic threats to validity.\n", 297 | "\n", 298 | "These include:\n", 299 | " - uneven primary data collection due to geographic inaccessibility or other constraints\n", 300 | " - multiple hypothesis testing\n", 301 | " - edge or boundary effects\n", 302 | " - the modifiable areal unit problem\n", 303 | " - nonstationarity\n", 304 | " - spatial dependence or autocorrelation\n", 305 | " - temporal dependence or autocorrelation\n", 306 | " - spatial scale dependency\n", 307 | " - spatial anisotropies\n", 308 | " - confusion of spatial and a-spatial causation\n", 309 | " - ecological fallacy\n", 310 | " - uncertainty e.g. from spatial disaggregation, anonymization, differential privacy\n", 311 | "\n" 312 | ] 313 | }, 314 | { 315 | "cell_type": "markdown", 316 | "metadata": {}, 317 | "source": [ 318 | "### Data transformations\n", 319 | "\n", 320 | "Describe all data transformations planned to prepare data sources for analysis.\n", 321 | "This section should explain with the fullest detail possible how to transform data from the **raw** state at the time of acquisition or observation, to the pre-processed **derived** state ready for the main analysis.\n", 322 | "Including steps to check and mitigate sources of **bias** and **threats to validity**.\n", 323 | "The method may anticipate **contingencies**, e.g. tests for normality and alternative decisions to make based on the results of the test.\n", 324 | "More specifically, all the **geographic** and **variable** transformations required to prepare input data as described in the data and variables section above to match the study's spatio-temporal characteristics as described in the study metadata and study design sections.\n", 325 | "Visual workflow diagrams may help communicate the methodology in this section.\n", 326 | "\n", 327 | "Examples of **geographic** transformations include coordinate system transformations, aggregation, disaggregation, spatial interpolation, distance calculations, zonal statistics, etc.\n", 328 | "\n", 329 | "Examples of **variable** transformations include standardization, normalization, constructed variables, imputation, classification, etc.\n", 330 | "\n", 331 | "Be sure to include any steps planned to **exclude** observations with *missing* or *outlier* data, to **group** observations by *attribute* or *geographic* criteria, or to **impute** missing data or apply spatial or temporal **interpolation**.\n", 332 | "\n" 333 | ] 334 | }, 335 | { 336 | "cell_type": "markdown", 337 | "metadata": {}, 338 | "source": [ 339 | "### Analysis\n", 340 | "\n", 341 | "Describe the methods of analysis that will directly test the hypotheses or provide results to answer the research questions.\n", 342 | "This section should explicitly define any spatial / statistical *models* and their *parameters*, including *grouping* criteria, *weighting* criteria, and *significance thresholds*.\n", 343 | "Also explain any follow-up analyses or validations.\n", 344 | "\n" 345 | ] 346 | }, 347 | { 348 | "cell_type": "markdown", 349 | "metadata": {}, 350 | "source": [ 351 | "## Results\n", 352 | "\n", 353 | "Describe how results are to be presented.\n", 354 | "\n" 355 | ] 356 | }, 357 | { 358 | "cell_type": "markdown", 359 | "metadata": {}, 360 | "source": [ 361 | "## Discussion\n", 362 | "\n", 363 | "Describe how the results are to be interpreted *vis a vis* each hypothesis or research question.\n", 364 | "\n" 365 | ] 366 | }, 367 | { 368 | "cell_type": "markdown", 369 | "metadata": {}, 370 | "source": [ 371 | "## Integrity Statement\n", 372 | "\n", 373 | "Include an integrity statement - The authors of this preregistration state that they completed this preregistration to the best of their knowledge and that no other preregistration exists pertaining to the same hypotheses and research.\n", 374 | "If a prior registration *does* exist, explain the rationale for revising the registration here.\n", 375 | "\n" 376 | ] 377 | }, 378 | { 379 | "cell_type": "markdown", 380 | "metadata": {}, 381 | "source": [ 382 | "# Acknowledgements\n", 383 | "\n", 384 | "- `Funding Name`: name of funding for the project\n", 385 | "- `Funding Title`: title of project grant\n", 386 | "- `Award info URI`: web address for award information\n", 387 | "- `Award number`: award number\n", 388 | "\n", 389 | "This report is based upon the template for Reproducible and Replicable Research in Human-Environment and Geographical Sciences, DOI:[10.17605/OSF.IO/W29MQ](https://doi.org/10.17605/OSF.IO/W29MQ)" 390 | ] 391 | }, 392 | { 393 | "cell_type": "markdown", 394 | "metadata": {}, 395 | "source": [ 396 | "## References" 397 | ] 398 | } 399 | ], 400 | "metadata": { 401 | "colab": { 402 | "provenance": [] 403 | }, 404 | "kernelspec": { 405 | "display_name": "Python 3", 406 | "name": "python3" 407 | }, 408 | "language_info": { 409 | "name": "python" 410 | }, 411 | "orig_nbformat": 4 412 | }, 413 | "nbformat": 4, 414 | "nbformat_minor": 0 415 | } 416 | --------------------------------------------------------------------------------