├── .nojekyll ├── CNAME ├── images ├── favicon.png ├── pdf-cover.pdf └── netherlands-escience-center-logo-RGB.png ├── .docsifytopdfrc.yml ├── .gitignore ├── .pre-commit-config.yaml ├── lychee.toml ├── styles.css ├── .github ├── dependabot.yml ├── PULL_REQUEST_TEMPLATE └── workflows │ ├── link-checker.yml │ ├── link-checker-pr.yml │ └── upload-pdf.yml ├── privacy.md ├── technology ├── technology_overview.md ├── user_experience.md ├── datasets.md └── gpu.md ├── _sidebar.md ├── language_guides ├── languages_overview.md ├── fortran.md ├── rust.md ├── bash.md ├── r.md ├── javascript.md ├── ccpp.md └── python.md ├── README.md ├── index.html ├── CITATION.cff ├── best_practices.md ├── CONTRIBUTING.md └── LICENSE /.nojekyll: -------------------------------------------------------------------------------- 1 | -------------------------------------------------------------------------------- /CNAME: -------------------------------------------------------------------------------- 1 | guide.esciencecenter.nl -------------------------------------------------------------------------------- /images/favicon.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/NLeSC/guide/HEAD/images/favicon.png -------------------------------------------------------------------------------- /.docsifytopdfrc.yml: -------------------------------------------------------------------------------- 1 | contents: 2 | - _sidebar.md 3 | pathToPublic: guide-nlesc.pdf 4 | -------------------------------------------------------------------------------- /images/pdf-cover.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/NLeSC/guide/HEAD/images/pdf-cover.pdf -------------------------------------------------------------------------------- /images/netherlands-escience-center-logo-RGB.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/NLeSC/guide/HEAD/images/netherlands-escience-center-logo-RGB.png -------------------------------------------------------------------------------- /.gitignore: -------------------------------------------------------------------------------- 1 | # files for JetBrains editors: 2 | **/*.iml 3 | .idea 4 | 5 | # VS Code 6 | .vscode 7 | 8 | # Mac OS 9 | .DS_Store 10 | 11 | 12 | -------------------------------------------------------------------------------- /.pre-commit-config.yaml: -------------------------------------------------------------------------------- 1 | repos: 2 | - repo: https://github.com/rbubley/mirrors-prettier 3 | rev: v3.7.4 4 | hooks: 5 | - id: prettier 6 | -------------------------------------------------------------------------------- /lychee.toml: -------------------------------------------------------------------------------- 1 | # Lychee configuration file 2 | # See https://github.com/lycheeverse/lychee/blob/master/lychee.example.toml 3 | exclude_all_private = true 4 | include_mail = false 5 | no_progress = true 6 | verbose = "info" 7 | -------------------------------------------------------------------------------- /styles.css: -------------------------------------------------------------------------------- 1 | /* General theme*/ 2 | body { 3 | --theme-color: #009fe3; 4 | } 5 | 6 | /* Sidebar element order */ 7 | .sidebar { 8 | display: flex; 9 | flex-direction: column; 10 | } 11 | .sidebar .app-name { 12 | order: 1; 13 | margin: 10px 10px 0 10px; 14 | } 15 | .sidebar .search { 16 | order: 2; 17 | } 18 | .sidebar .sidebar-nav { 19 | order: 3; 20 | } 21 | -------------------------------------------------------------------------------- /.github/dependabot.yml: -------------------------------------------------------------------------------- 1 | # Please see the documentation for all configuration options: 2 | # https://docs.github.com/code-security/dependabot/dependabot-version-updates/configuration-options-for-the-dependabot.yml-file 3 | 4 | version: 2 5 | updates: 6 | # Set update schedule for GitHub Actions 7 | - package-ecosystem: "github-actions" 8 | directory: "/" 9 | schedule: 10 | # Check for updates to GitHub Actions every week 11 | interval: "weekly" 12 | -------------------------------------------------------------------------------- /privacy.md: -------------------------------------------------------------------------------- 1 | # Privacy policy 2 | 3 | We collect anonymised user data that helps us to monitor the effectiveness of our website. 4 | No personally identifiable information is recorded and no cookies containing such information are set in your browser session. 5 | 6 | 7 |
8 | 9 | -------------------------------------------------------------------------------- /technology/technology_overview.md: -------------------------------------------------------------------------------- 1 | # Technology Guides 2 | 3 | _Page maintainer: Patrick Bos_ [@egpbos](https://github.com/egpbos) 4 | 5 | These chapters are based on our experiences with using specific software technologies. 6 | 7 | The main audience is RSEs familiar with basic computing and programming concepts. 8 | 9 | The purpose of these chapters is for someone unfamiliar with the specific technology to get a quick overview of the most important concepts, practices and tools, without going into too much detail (we provide links to further reading material for more). 10 | -------------------------------------------------------------------------------- /_sidebar.md: -------------------------------------------------------------------------------- 1 | - [Introduction](/README.md) 2 | - [Best practices](/best_practices.md) 3 | - [Language Guides](/language_guides/languages_overview.md) 4 | - [Bash](/language_guides/bash.md) 5 | - [JavaScript and TypeScript](/language_guides/javascript.md) 6 | - [Python](/language_guides/python.md) 7 | - [R](/language_guides/r.md) 8 | - [C and C++](/language_guides/ccpp.md) 9 | - [Fortran](/language_guides/fortran.md) 10 | - [Rust](/language_guides/rust.md) 11 | - [Technology Guides](/technology/technology_overview.md) 12 | - [GPU programming](/technology/gpu.md) 13 | - [UX - User Experience](/technology/user_experience.md) 14 | - [Datasets](/technology/datasets.md) 15 | - [Contributing to this Guide](/CONTRIBUTING.md) 16 | - [Privacy](/privacy.md) 17 | -------------------------------------------------------------------------------- /.github/PULL_REQUEST_TEMPLATE: -------------------------------------------------------------------------------- 1 | # Changes in this PR 2 | 6 | 7 | 8 | 9 | # Checklist 10 | 14 | 15 | ## SIGNIFICANT changes / additions, e.g. new chapters 16 | - [ ] I checked whether the contribution fits in [The Turing Way](https://github.com/the-turing-way/the-turing-way) before considering contributing to this Guide. 17 | - [ ] I discussed my contribution in an issue and took into account feedback. 18 | 19 | ## ALL contributions 20 | - [ ] I previewed my changes locally using e.g. `python3 -m http.server 4000` and confirmed they work correctly. 21 | - [ ] I checked for broken links, e.g. using the link checker GitHub Action workflow, or locally by using ``docker run --init -it -v `pwd`:/docs lycheeverse/lychee /docs --config=docs/lychee.toml``, at least for the files I changed. 22 | - [ ] My name was added to the `CITATION.cff` file. 23 | -------------------------------------------------------------------------------- /.github/workflows/link-checker.yml: -------------------------------------------------------------------------------- 1 | name: Link Checker 2 | on: 3 | workflow_dispatch: 4 | push: 5 | branches: 6 | - main 7 | schedule: 8 | - cron: "0 4 * * *" 9 | jobs: 10 | linkChecker: 11 | runs-on: ubuntu-latest 12 | steps: 13 | - uses: actions/checkout@v6 14 | - name: Link Checker 15 | uses: lycheeverse/lychee-action@v2 16 | id: lychee 17 | with: 18 | # note: args has a long default value; when you override it, make sure you don't accidentally forget to include the default options you want! see https://github.com/lycheeverse/lychee-action/blob/master/action.yml 19 | args: --verbose --no-progress './**/*.md' './**/*.html' './**/*.rst' --accept '100..=103,200..=299, 429' --exclude nlesc.sharepoint.com --exclude support.posit.co --exclude www.intel.com --exclude reddit.com --exclude jsfiddle.net 20 | env: 21 | # This token is included to avoid github.com requests to error out with status 429 (too many requests). It only works for GitHub requests (also other GitHub REST API calls), not for the rest of the web. 22 | GITHUB_TOKEN: ${{secrets.TOKEN_GITHUB}} 23 | -------------------------------------------------------------------------------- /.github/workflows/link-checker-pr.yml: -------------------------------------------------------------------------------- 1 | name: Link Checker for Pull requests 2 | on: pull_request 3 | jobs: 4 | changedFiles: 5 | runs-on: ubuntu-latest 6 | outputs: 7 | files: ${{ steps.changed-markdown-files.outputs.all_changed_files }} 8 | steps: 9 | - uses: actions/checkout@v6 10 | - name: Get changed markdown files 11 | id: changed-markdown-files 12 | uses: tj-actions/changed-files@v47 13 | with: 14 | # Avoid using single or double quotes for multiline patterns 15 | files: | 16 | **.md 17 | matrix: true 18 | 19 | linkChecker: 20 | runs-on: ubuntu-latest 21 | needs: changedFiles 22 | if: ${{ needs.changedFiles.outputs.files != '' && toJSON(fromJSON(needs.changedFiles.outputs.files)) != '[]' }} 23 | strategy: 24 | matrix: 25 | file: ${{ fromJSON(needs.changedFiles.outputs.files) }} 26 | fail-fast: false 27 | steps: 28 | - uses: actions/checkout@v6 29 | with: 30 | fetch-depth: 2 31 | - name: download Lychee 32 | run: | 33 | wget https://github.com/lycheeverse/lychee/releases/download/lychee-v0.18.1/lychee-x86_64-unknown-linux-gnu.tar.gz 34 | tar xzf lychee-x86_64-unknown-linux-gnu.tar.gz 35 | - name: Check all this file's additions for broken links 36 | run: | 37 | export base_sha=$(git rev-parse ${{ github.sha }}^) 38 | git diff -U0 ${base_sha} ${{ github.event.pull_request.head.sha }} -- ${{ matrix.file }} | grep -v "+++" | grep "^+" | cut -c 2- | ./lychee --exclude nlesc.sharepoint.com --exclude support.posit.co --exclude www.intel.com --exclude reddit.com --exclude jsfiddle.net - 39 | -------------------------------------------------------------------------------- /language_guides/languages_overview.md: -------------------------------------------------------------------------------- 1 | # Language Guides 2 | 3 | _Page maintainer: Patrick Bos_ [@egpbos](https://github.com/egpbos) 4 | 5 | This chapter provides practical info on each of the main programming languages of the Netherlands eScience Center. 6 | 7 | This info is (on purpose) high level, try to provide "default" options, and mostly link to more info. 8 | 9 | Each chapter should contain: 10 | 11 | - Intro: philosophy, typical usecases. 12 | - Recommended sources of information 13 | - Installing compilers and runtimes 14 | - Editors and IDEs 15 | - Coding style conventions 16 | - Building and packaging code 17 | - Testing 18 | - Code quality analysis tools and services 19 | - Debugging and Profiling 20 | - Logging 21 | - Writing documentation 22 | - Recommended additional packages and libraries 23 | - Available templates 24 | 25 | ## Preferred Languages 26 | 27 | At the Netherlands eScience Center we prefer Java and Python over C++ and Perl, as these languages in general produce more sustainable code. It is not always possible to choose which libraries we use, as almost all projects have existing code as a starting point. 28 | 29 | (In alphabetical order) 30 | 31 | - Java 32 | - JavaScript (preferably Typescript) 33 | - Python 34 | - OpenCL and CUDA 35 | - R 36 | 37 | ## Selecting tools and libraries 38 | 39 | On GitHub there is a concept of an "awesome list", that collects awesome libraries and tools on some topic. For instance, here is one for Python: https://github.com/vinta/awesome-python 40 | 41 | Now, someone has been smart enough to see the pattern, and has created an awesome list of awesome lists: https://awesome.re/ 42 | 43 | Highly recommented to get some inspiration on available tools and libraries! 44 | 45 | ## Development Services 46 | 47 | To do development in any language you first need infrastructure (code hosting, ci, etc). Luckily a lot is available for free now. 48 | 49 | See this list: https://github.com/ripienaar/free-for-dev 50 | -------------------------------------------------------------------------------- /.github/workflows/upload-pdf.yml: -------------------------------------------------------------------------------- 1 | # Generates a PDF for the full guide and uploads it to Zenodo 2 | ## This action triggers when there is a new release of the guide 3 | ## Manual release of this action also triggers upload to the Zenodo Sandbox 4 | name: Generate PDF and upload to Zenodo 5 | on: 6 | # Trigger manually via the Actions tab 7 | workflow_dispatch: 8 | # Trigger when you publish a release via GitHub's release page 9 | release: 10 | types: 11 | - published 12 | 13 | jobs: 14 | publish: 15 | runs-on: ubuntu-latest 16 | steps: 17 | - name: Checkout the contents of your repository 18 | uses: actions/checkout@v6 19 | 20 | - name: Change absolute paths to relative 21 | run: perl -pi -e 's@\]\(\/@\]\(@' _sidebar.md 22 | 23 | - name: Pull Docker image 24 | run: docker pull ghcr.io/kernoeb/docker-docsify-pdf:main 25 | 26 | - name: Generate PDF using the Docker image 27 | run: | 28 | docker run --rm --privileged \ 29 | -v "${{ github.workspace }}/":/home/node/docs:rw \ 30 | -v "${{ github.workspace }}/":/home/node/pdf:rw \ 31 | -v "${{ github.workspace }}/images/pdf-cover.pdf":/home/node/resources/cover.pdf:rw \ 32 | --user $(id -u):$(id -g) \ 33 | -e "PDF_OUTPUT_NAME=guide-nlesc.pdf" \ 34 | -e "NO_SANDBOX=true" \ 35 | ghcr.io/kernoeb/docker-docsify-pdf:main 36 | 37 | - name: Generate .zenodo.json from CITATION.cff 38 | uses: citation-file-format/cffconvert-github-action@2.0.0 39 | with: 40 | args: "--format zenodo --outfile .zenodo.json" 41 | 42 | - name: Create a draft snapshot on Zenodo Sandbox 43 | if: github.event_name == 'workflow_dispatch' 44 | env: 45 | GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }} 46 | ZENODO_SANDBOX_ACCESS_TOKEN: ${{ secrets.ZENODO_SANDBOX_ACCESS_TOKEN }} 47 | uses: zenodraft/action@0.13.3 48 | with: 49 | concept: 277497 # doesn't matter which it is, it is only for testing 50 | publish: false 51 | sandbox: true 52 | filenames: guide-nlesc.pdf 53 | metadata: .zenodo.json 54 | 55 | - name: Create a new draft snapshot in the Zenodo record 56 | if: github.event_name == 'release' 57 | env: 58 | GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }} 59 | ZENODO_ACCESS_TOKEN: ${{ secrets.ZENODO_ACCESS_TOKEN }} 60 | uses: zenodraft/action@0.13.3 61 | with: 62 | concept: 4020564 63 | publish: false # let the user press the publish button manually 64 | sandbox: false 65 | filenames: guide-nlesc.pdf 66 | metadata: .zenodo.json 67 | -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | [![DOI](https://zenodo.org/badge/DOI/10.5281/zenodo.4020564.svg)](https://doi.org/10.5281/zenodo.4020564)[![Link Checker](https://github.com/NLeSC/guide/actions/workflows/link-checker.yml/badge.svg)](https://github.com/NLeSC/guide/actions/workflows/link-checker.yml) 2 | 3 | # Guide 4 | 5 | This is a guide to research software development at the Netherlands eScience Center. 6 | It is a living document, written by and for our research software engineers (RSEs) and our collaborators. 7 | 8 | We write it for two reasons: 9 | 10 | 1. To have a trusted source for quickly getting started on selected software development topics. 11 | We hope this will help RSEs (including our future selves!) to get off to a flying start on new projects in software/technological areas they are not yet familiar with. 12 | 2. To discuss and reach consensus on such topics/areas. 13 | This in itself is valuable experience! 14 | Discussing your practices can be confronting and a bit uncomfortable, but often teaches you new tricks and points of view. 15 | 16 | Openness and collaboration are at the heart of the eScience Center, which is why we develop and share these guidelines in the open. 17 | [Join us!](#contributing) 18 | 19 | ## Contents 20 | 21 | To get started, check out the checklist of generic research software engineering advice 22 | in the [Best Practices](/best_practices.md) chapter. 23 | This chapter lists the most important overall attention points while developing research software. 24 | For more details, the sections refer to selected resources in community guides that we collaborate with. 25 | 26 | If you are looking for more in-depth advise on using a specific programming language, have a look at the [language guides](/language_guides/languages_overview.md). 27 | Here we catalogue our experiences with the languages we use the most in our research software development projects. 28 | We also provide [technology guides](/technology/technology_overview.md) on digital technologies we use often in our projects with research partners. 29 | 30 | ## Resources 31 | 32 | All of the text in this guide is backed by our own experiences in developing high quality research software. 33 | However, we also learn from and share knowledge with other community-driven research software guides. 34 | The two most important of these are [The Turing Way](https://book.the-turing-way.org/index.html) and the 35 | [Research Software Quality Kit](http://everse.software/RSQKit/). 36 | Their scope is slightly different, but we collaborate with them when we can. 37 | 38 | ## Contributing 39 | 40 | Please consider contributing to this book! 41 | It is a great way to make long-lasting impact by sharing your time-tested knowledge and expertise. 42 | You'll hone your writing skills while you're at it. 43 | 44 | See the [Contributing to this Guide](/CONTRIBUTING.md) chapter if you want to know more about how you can help, or ask one of the editors. 45 | Currently the editorial team consists of: 46 | 47 | - Bouwe Andela [@bouweandela](https://github.com/bouweandela) (research software engineer) 48 | - Carlos Martínez Ortiz [@c-martinez](https://github.com/c-martinez) (community manager) 49 | - Patrick Bos [@egpbos](https://github.com/egpbos) (technology lead) 50 | -------------------------------------------------------------------------------- /language_guides/fortran.md: -------------------------------------------------------------------------------- 1 | # Fortran 2 | 3 | _Page maintainer: Gijs van den Oord_ [@goord](https://github.com/goord) 4 | 5 | **Disclaimer: In general the Netherlands eScience Center does not recommend using Fortran. However, in some cases it is the only viable option, for instance if a project builds upon existing code written in this language. This section will be restricted to Fortran90, which captures majority of Fortran source code.** 6 | 7 | The second use case may be extremely performance-critical dense 8 | numerical compute workloads, with no existing alternative. In this case it is recommended to keep the Fortran part of the application minimal, using a high-level language like Python for program control flow, IO, and user interface. 9 | 10 | ## Recommended sources of information 11 | 12 | - [Fortran90 best practices](https://github.com/certik/fortran90.org/blob/master/src/best-practices.rst). 13 | - [Fortran wiki](http://fortranwiki.org/fortran/show/HomePage) 14 | - [Fortran90 handbook](http://micro.ustc.edu.cn/Fortran/Fortran%2090%20Handbook.pdf) 15 | 16 | ## Compilers 17 | 18 | - **gfortran**: the official GNU Fortran compiler and part of the gcc compiler suite. 19 | - **ifort**: the Intel Fortran compiler, widely used in academia and industry because of its superior performance, but 20 | unfortunately this is commercial software so not recommended. The same holds for the Portland compiler **pgfortran** 21 | 22 | ## Debuggers and diagnostic tools 23 | 24 | There exist many commercial performance profiling tools by Intel and the Portland Group which we shall not discuss here. Most important freely available alternatives are 25 | 26 | - **gdb**: the GNU debugger, part of the gcc compiler suite. Use the **-g** option to compile with debugging symbols. 27 | - **gprof**: the GNU profiler, part of gcc too. Use the **-p** option to compile with profiling enabled. 28 | - **valgrind**: to detect memory leaks. 29 | 30 | ## Editors and IDEs 31 | 32 | Most lightweight editors provide Fortran syntax highlighting. Vim and emacs are most widely used, but for code 33 | completion and refactoring tools one might consider the [CBFortran](http://cbfortran.sourceforge.net/) distribution of Code::Blocks. 34 | 35 | ## Coding style conventions 36 | 37 | If working on an existing code base, adopt the existing conventions. Otherwise we recommend the 38 | standard conventions, described in the [official documentation](https://github.com/certik/fortran90.org/blob/master/src/best-practices.rst#fortran-style-guide) and the [Fortran company style guide](http://www.fortran.com/). We would like to add the following advice: 39 | 40 | - Use free-form text input style (the default), with a maximal line width well below the 132 characters imposed by the Fortran90 standard. 41 | - When a method does not need to alter any data in any module and returns a single value, use a function for it, otherwise use a subroutine. Minimize the latter to reasonable extent. 42 | - Use the intent attributes in subroutine variable declarations as it makes the code much easier to understand. 43 | - Use a performance-driven approach to the architecture, do not use the object-oriented features of Fortran90 if they slow down execution. Encapsulation by modules is perfectly acceptable. 44 | - Add concise comments to modules and routines, and add comments to less obvious lines of code. 45 | - Provide a test suite with your code, containing both unit and integration tests. Both automake and cmake provide test 46 | suite functionality; if you create your makefile yourself, add a separate testing target. 47 | -------------------------------------------------------------------------------- /index.html: -------------------------------------------------------------------------------- 1 | 2 | 3 | 4 | 5 | Netherlands eScience Center Guide 6 | 7 | 8 | 12 | 16 | 17 | 18 | 19 | 20 | 21 |
22 | 39 | 40 | 41 | 42 | 43 | 44 | 45 | 93 | 94 | 95 | -------------------------------------------------------------------------------- /technology/user_experience.md: -------------------------------------------------------------------------------- 1 | # User Experience (UX) 2 | 3 | _Page maintainer: Jesus Garcia_ [@ctwhome](https://github.com/ctwhome) 4 | 5 | User Experience Design (UX) is a broad, holistic science that combines many cognitive and brain sciences disciplines like psychology and sociology, content strategies, and arts and aesthetics by following human-center approaches. 6 | 7 | > Human-centred design is an approach to interactive systems development that aims to make systems usable and useful by focusing on the users, their needs and requirements, and applying human factors/ergonomics and usability knowledge and techniques. This approach enhances effectiveness and efficiency, improves human well-being, user satisfaction, accessibility, sustainability, and counteracts possible adverse effects on human health, safety, and performance. [Wikipedia](https://en.wikipedia.org/wiki/Human-centered_design) 8 | 9 | ## Table of content 10 | 11 | - UX disciplines 12 | - Design thinking process 13 | - Designing software 14 | - Tools and Resources 15 | 16 | ### UX disciplines 17 | 18 | The principles and indications taught by [interaction-design.org](https://www.interaction-design.org/literature) can be useful in the process of creating research software. 19 | 20 | The main UX disciplines are: 21 | 22 | 1. **User research**: understanding the people who use a product or system through observations. 23 | 2. **Information architecture**: identifying and organizing information within a system in a purposeful and meaningful way. 24 | 3. **Interaction design**: designing a product or system's interactive behaviors with a specific focus on their use. 25 | 4. **Usability evaluation**: measuring the quality of a user's experience when interacting with a product or system. 26 | 5. **Accessibility evaluation:** measuring the quality of a product or system to be accessed irrespective of personal abilities and device properties. 27 | 6. **Visual design**: designing the visual attributes of a product or system in an aesthetically pleasing way. 28 | 29 | The known UX umbrella diagram represents the different disciplines of UX: 30 | 31 | 32 | 33 | _Author/Copyright holder: J.G. Gonzalez and The Netherlands eScience Center. Copyright: Apache License 2.0_ 34 | 35 | ### Design Thinking 36 | 37 | Design thinking is an approach, mindset, or ideology for product development. According to the [IxF(Interaction Design Foundation](https://interaction-design.org), Design thinking achieves all these advantages at the same time: 38 | 39 | - It is a user-centered process that starts with user data, creates design artifacts that address real and not imaginary user needs, and then tests those artifacts with real users. 40 | - It leverages the collective expertise and establishes a shared language and buy-in amongst your team. 41 | - It encourages innovation by exploring multiple avenues for the same problem. 42 | 43 | 44 | 45 | _Author/Copyright holder: Teo Yu Siang and Interaction Design Foundation. Copyright licence: CC BY-NC-SA 3.0_ 46 | 47 | You can find more information about Design Thinking on the [IxF page](https://www.interaction-design.org/literature/topics/design-thinking). 48 | 49 | ### Designing software 50 | 51 | Heuristics, or commonly known 'as the rule of thumb,' play a significant role when users interact with software. The Nielsen/Norman group has a top [10 Usability Heuristics for User Interface Design](https://www.nngroup.com/articles/ten-usability-heuristics/) to consider when developing software. 52 | 53 | #### Designing Lovable software 54 | 55 | When delivering software iteratively, one of the common approaches to follow is to define a Minimum Value Product that contains the minimum requirements. Often is forgotten in this approach to deliver software that attracts and engages the users. When developing research software, researchers should present the new and innovative outcomes in a way that feels comfortable and easy to use from the very beginning, eliminating any cognitive burden that the software's interaction may include. 56 | 57 | 58 | 59 | _Author/Copyright holder: J.G. Gonzalez and The Netherlands eScience Center. Copyright: Apache License 2.0_ 60 | 61 | While MVP (Minumun Product Value) focuses on provide users with a way to explore the product and understand its main intent, MLP (Minimun Loveable Product) approach focuses on essential features instead of the bare minimum expected from a class software. Going beyond the bare functionality, the attention is driven towards a great user experience. The outcomes mush contains all elements in the pyramid being **functional, reliable, usable, and pleasurable.** 62 | 63 | ### Tools and resources 64 | 65 | Design tools used for Visual Design, Prototyping, and IxD testing collaborative, real-time, online, and multiplatform. 66 | 67 | - [Figma](https://www.figma.com/) 68 | - [Miro](https://miro.com/) 69 | - [Whimsical](https://whimsical.com/) 70 | -------------------------------------------------------------------------------- /language_guides/rust.md: -------------------------------------------------------------------------------- 1 | # Rust 2 | 3 | _Page maintainer: [Rodrigo V. Honorato](https://github.com/rvhonorato)_ 4 | 5 | Rust is a modern programming language designed to provide both high 6 | performance while enforcing memory safety through its unique ownership system 7 | and borrow checker. Developed by Mozilla and first released in 2015, 8 | Rust has rapidly gained popularity for its ability to prevent common 9 | programming errors at compile time. It is commonly categorized as a systems 10 | programming language but over the last few years its ecosystem has grown 11 | considerably and Rust is being adopted as a general programming language. 12 | 13 | Rust is increasingly adopted in **research software** for its unique blend of 14 | speed, safety, and modern tooling. It powers everything from 15 | high-throughput DNA sequencing pipelines to climate simulations, where even 16 | minor memory errors could invalidate results. By eliminating entire classes 17 | of bugs (e.g., null pointers, race conditions, type mismatches), Rust lets 18 | researchers focus on science, not on debugging. 19 | 20 | It is however a **low-level** language, which gives you direct control over 21 | hardware and memory (like [C/C++](./ccpp.md)). For comparison, [Python](./python.md) 22 | is a **high-level** language that prioritizes readability by abstracting these 23 | details - in Python you don't ever need to think about allocating or freeing 24 | memory as the interpreter takes care of it, making the code slower but much 25 | easier to program. In a **low-level** language you need to manage it yourself. 26 | Because Rust runs "closer to the metal", it achieves blazing-fast performance - 27 | similar to [C/C++](./ccpp.md) while avoiding common memory-safety and 28 | concurrency bugs. 29 | 30 | Here are some of Rust's key characteristics: 31 | 32 | - **Memory Safety**: Rust's unique ownership system guarantees memory safety at compile 33 | time, eliminating crashes from null pointers, dangling references, or leaks. 34 | 35 | - **Type Safety**: Strict compile-time checks ensure variables, data types, 36 | and operations are error-free, so there will be no surprises at runtime. 37 | 38 | - **Zero-Cost Abstractions**: High-level syntax (e.g., iterators, traits) compiles 39 | to machine code as efficiently as hand-written low-level code. 40 | 41 | - **Fearless Concurrency**: Built-in rules prevent data races, letting you 42 | write safe, parallel code without runtime crashes. 43 | 44 | - **Expressive Enums & Pattern Matching**: Enums can hold data, and match 45 | ensures all cases are handled—no forgotten edge cases. 46 | 47 | - **Traits for Polymorphism**: Define shared behavior across types without 48 | runtime overhead. 49 | 50 | - **Rich Ecosystem**: Tools like [Cargo](https://doc.rust-lang.org/cargo/) 51 | (package manager), [Clippy](https://doc.rust-lang.org/stable/clippy/usage.html) 52 | (linting), [crates.io](https://crates.io) (libraries) 53 | and [rustdoc](https://doc.rust-lang.org/stable/rustdoc/) (documentation) 54 | streamline development. 55 | 56 | ```rust 57 | // Ownership in action: the compiler tracks who "owns" data. 58 | fn main() { 59 | // Lets declare a string, here `s` owns it 60 | let s = String::from("hello"); 61 | 62 | // Borrow `s` as a read-only reference (no ownership transference) 63 | let len = calculate_length(&s); 64 | 65 | // `s` still owns the data and we can use it 66 | println!("'{}' has length {}", s, len); 67 | } 68 | 69 | fn calculate_length(s: &str) -> usize { 70 | s.len() 71 | } 72 | ``` 73 | 74 | ## Getting started 75 | 76 | To get started you will first need to install Rust, this can be done via [`rustup`](https://rustup.rs) 77 | which is a command line tool for managing Rust versions and tools. 78 | 79 | On Linux/MacOs: 80 | 81 | ```bash 82 | curl --proto '=https' --tlsv1.2 https://sh.rustup.rs -sSf | sh 83 | ``` 84 | 85 | On Windows, [see the instructions here](https://forge.rust-lang.org/infra/other-installation-methods.html#other-ways-to-install-rustup). 86 | 87 | Cargo is Rust's build system and package manager and is installed by `rustup`. 88 | You can use it to create a project: 89 | 90 | ```bash 91 | cargo new rust_project 92 | ``` 93 | 94 | This will create the project folder structure, add a `Cargo.toml` and a `src/main.rs` 95 | which contains a placeholder "Hello world", so you can already build this 96 | `rust_project` 97 | 98 | ```bash 99 | cd rust_project 100 | cargo build --release # using --release will build the optimized binary 101 | ./target/release/rust_project # execute the binary 102 | ``` 103 | 104 | ## Learning 105 | 106 | Its unique approach to memory management (ownership, borrowing and lifetimes) and 107 | the strict compiler can feel daunting at fist - especially if you are accustomed 108 | to high-level languages like [python](./python.md) or [javascript](./javascript.md). 109 | Learning Rust can be challenging as some new concepts, such as the borrow checker 110 | , may take time to be internalized. 111 | 112 | > Keep in mind that in the long run all the effort pays off. The code produced 113 | > will be faster while having _fewer bugs_ (thanks to the opinionated compiler), 114 | > you will learn _transferable skills_ that will make you a better programmer 115 | > in other languages. The general mindset should be **start small and embrace 116 | > the compiler**. 117 | 118 | To learn it, you only need: 119 | 120 | - [The Rust Book](https://doc.rust-lang.org/book/): This 121 | is the official book and it is very well written and easy to follow. It contains 122 | all the information you need to gain a deep understanding of Rust. It contains 123 | a fully guided tutorial on how to write a Guessing game as your first project. 124 | - [Rust by Example](https://doc.rust-lang.org/rust-by-example/): This contains 125 | smaller examples of how to use the language, and it is a good complement to 126 | the book or when you need to quickly look up how to do something. 127 | - [Rustlings](https://rustlings.cool): Fully interactive exercises 128 | that will help you get used to the syntax and the concepts of the language - 129 | it is paired with the book, so you should be doing the exercises as you go 130 | through the book. 131 | - [Rust Playground](https://play.rust-lang.org/): Lets you experiment with Rust 132 | online in your browser 133 | 134 | 🦀 135 | -------------------------------------------------------------------------------- /technology/datasets.md: -------------------------------------------------------------------------------- 1 | # Working with tabular data 2 | 3 | _Page maintainers: Suvayu Ali_ [@suvayu](https://github.com/suvayu) _, Flavio Hafner_ [@f-hafner](https://github.com/f-hafner) _and Reggie Cushing_ [@recap](https://github.com/recap) 4 | 5 | There are several solutions available to you as an RSE, with their own pros and cons. You should evaluate which one works best for your project, and project partners, and pick one. Sometimes it might be, that you need to combine two different types of technologies. Here are some examples from our experience. 6 | 7 | You will encounter datasets in various file formats like: 8 | 9 | - CSV/Excel 10 | - Parquet 11 | - HDF5/NetCDF 12 | - JSON/JSON-LD 13 | 14 | Or local database files like SQLite. It is important to note, the various trade-offs between these formats. For instance, doing a random seek is difficult with a large dataset for non-binary formats like: CSV, Excel, or JSON. In such cases you should consider formats like Parquet, or HDF5/NetCDF. Non-binary files can also be imported into local databases like SQLite or DuckDB. Below we compare some options to work with datasets in these formats. 15 | 16 | It's also good to know about [Apache Arrow](https://arrow.apache.org), which is not itself a file format, but a specification for a memory layout of (binary) data. 17 | There is an ecosystem of libraries for all major languages to handle data in this format. 18 | It is used as the back-end of [many data handling projects](https://arrow.apache.org/powered_by/), among which a few others mentioned in this chapter. 19 | 20 | ## Local database 21 | 22 | When you have a relational dataset, it is recommended that you use a database. Using local databases like SQLite and DuckDB can be very easy because of no setup requirements. But they come with some some limitations; for instance, multiple users cannot write to the database simultaneously. 23 | 24 | SQLite is a transactional database, so if you have a dataset that is changing with time (e.g. you are adding new rows), it would be more appropriate. However in research often we work with static databases, and are interested mostly in analytical tasks. For such a case, DuckDB is a more appropriate alternative. Between the two, 25 | 26 | - DuckDB can also create views (virtual tables) from other sources like files, other databases, but with SQLite you always have to import the data before running any queries. 27 | - DuckDB is multi-threaded. This can be an advantage for large databases, where aggregation queries tend to be faster than sqlite. 28 | - However if you have a really large dataset, say 100Ms of rows, and want to perform a deeply nested query, it would require substantial amount of memory, making it unfeasible to run on personal laptops. 29 | - There are options to customize memory handling, and push what is possible on a single machine. 30 | 31 | You need to limit the memory usage to prevent the operatings system, or shell from preemptively killing it. You can choose a value about 50% of your system's RAM. 32 | 33 | ```sql 34 | SET memory_limit = '5GB'; 35 | ``` 36 | 37 | By default, DuckDB spills over to disk when memory usage grows beyond the above limit. You can verify the temporary directory by running: 38 | 39 | ```sql 40 | SELECT current_setting('temp_directory') AS temp_directory; 41 | ``` 42 | 43 | Note, if your query is deeply nested, you should have sufficient disk space for DuckDB to use; e.g. for 4 nested levels of `INNER JOIN` combined with a `GROUP BY`, we observed a disk spill over of 30x the original dataset. However we found this was not always reliable. 44 | 45 | In this kind of borderline cases, it might be possible to address the limitation by splitting the workload into chunks, and aggregating later, or by considering one of the alternatives mentioned below. 46 | - You can also optimize the queries for DuckDB, but that requires a deeper dive into the documentation, and understanding how DuckDB query optimisation works. 47 | 48 | - Both databases support setting (unique) indexes. Indexes are useful and sometimes necessary 49 | - For both DuckDB and SQLite, unique indexes allow to ensure data integrity 50 | - For SQLite, indexes are crucial to improve the performance of queries. However, having more indexes makes writing new records to the database slower. So it's again a trade-off between query and write speed. 51 | 52 | # Useful libraries 53 | 54 | ## Database APIs 55 | 56 | - [SQLAlchemy](https://www.sqlalchemy.org/) 57 | - In Python, interfacing to SQL databases like SQLite, MySQL or PostgreSQL is often done using [SQLAlchemy](https://www.sqlalchemy.org/), which is an Object Relational Mapper (ORM) that allows you to map tables to Python classes. Note that you still need to use a lot of manual SQL outside of Python to manage the database. However, SQLAlchemy allows you to use the data in a Pythonic way once you have the database layout figured out. 58 | 59 | ## Data processing libraries on a single machine 60 | 61 | - Pandas 62 | - The standard tool for working with dataframes, and widely used in analytics or machine learning workflows. Note however how Pandas uses memory, because certain APIs create copies, while others do not. So if you are chaining multiple operations, it is preferable to use APIs that avoid copies. 63 | - Vaex 64 | - Vaex is an alternative that focuses on out-of-core processing (larger than memory), and has some lazy evaluation capabilities. 65 | - Polars 66 | - An alternative to Pandas (started in 2020), which is primarily written in Rust. Compared to pandas, it is multi-threaded and does lazy evaluation with query optimisation, so much more performant. However since it is newer, documentation is not as complete. It also allows you to write your own custom extensions in Rust. 67 | - [Apache Datafusion](https://datafusion.apache.org/) 68 | - A very fast, extensible query engine for building high-quality data-centric systems in [Rust](http://rustlang.org/), using the [Apache Arrow](https://arrow.apache.org/) in-memory format. DataFusion offers SQL and Dataframe APIs, excellent [performance](https://benchmark.clickhouse.com/), built-in support for CSV, Parquet, JSON, and Avro, extensive customization, and a great community. 69 | 70 | ## Distributed/multi-node data processing libraries 71 | 72 | - Dask 73 | - `dask.dataframe` and `dask.array` provides the same API as pandas and numpy respectively, making it easy to switch. 74 | - When working with multiple nodes, it requires communication across nodes (which is network bound). 75 | - Ray 76 | - Apache Spark 77 | -------------------------------------------------------------------------------- /technology/gpu.md: -------------------------------------------------------------------------------- 1 | # GPU Programming Languages 2 | 3 | _Page maintainer: Alessio Sclocco_ [@isazi](https://github.com/isazi) 4 | 5 | ## Learning Resources 6 | 7 | - Carpentries GPU Programming course 8 | - [Lesson material](https://carpentries-incubator.github.io/lesson-gpu-programming/) 9 | - Introduction to CUDA C 10 | - [Slides](http://developer.download.nvidia.com/compute/developertrainingmaterials/presentations/cuda_language/Introduction_to_CUDA_C.pptx) 11 | - [Video](http://on-demand.gputechconf.com/gtc/2012/video/S0624-Monday-Introduction-to-CUDA-C.mp4) 12 | - Introduction to OpenACC 13 | - [Slides](http://developer.download.nvidia.com/compute/developertrainingmaterials/presentations/openacc/Introduction_To_OpenACC.pptx) 14 | - Introduction to HIP Programming 15 | - [Video](https://www.youtube.com/watch?v=3ejUwypP0bI) 16 | - SYCL Introduction and Best Practices 17 | - [Video](https://www.youtube.com/watch?v=TbkrODiVDQY) 18 | - CSCS GPU Programming with Julia 19 | - [Course recordings](https://github.com/omlins/julia-gpu-course) 20 | 21 | ## Documentation 22 | 23 | - CUDA 24 | - [C programming guide](https://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html) 25 | - [Runtime API](https://docs.nvidia.com/cuda/cuda-runtime-api/) 26 | - [Driver API](https://docs.nvidia.com/cuda/cuda-driver-api/index.html) 27 | - [Fortran programming guide](https://docs.nvidia.com/hpc-sdk/compilers/cuda-fortran-prog-guide/index.html) 28 | - HIP 29 | - [Kernel language syntax](https://rocm.docs.amd.com/projects/HIP/en/latest/reference/kernel_language.html) 30 | - [Runtime API](https://rocm.docs.amd.com/projects/HIP/en/latest/reference/hip_runtime_api_reference.html) 31 | - SYCL 32 | - [Specification](https://registry.khronos.org/SYCL/specs/sycl-2020/html/sycl-2020.html) 33 | - [Reference guide](https://www.khronos.org/files/sycl/sycl-2020-reference-guide.pdf) 34 | - OpenCL 35 | - [Guide](https://github.com/KhronosGroup/OpenCL-Guide) 36 | - [API](https://registry.khronos.org/OpenCL/specs/3.0-unified/html/OpenCL_API.html) 37 | - [OpenCL C specification](https://registry.khronos.org/OpenCL/specs/3.0-unified/html/OpenCL_C.html) 38 | - [Reference guide](https://www.khronos.org/files/opencl30-reference-guide.pdf) 39 | - OpenACC 40 | - [Programming guide](https://www.openacc.org/sites/default/files/inline-files/OpenACC_Programming_Guide_0_0.pdf) 41 | - [Reference guide](https://www.openacc.org/sites/default/files/inline-files/API%20Guide%202.7.pdf) 42 | - OpenMP 43 | - [Reference guide](https://www.openmp.org/wp-content/uploads/OpenMPRef-5.0-111802-web.pdf) 44 | 45 | ## Overview of Libraries 46 | 47 | - CUDA 48 | - [cuBLAS](http://docs.nvidia.com/cuda/cublas/index.html) 49 | - [NVBLAS](http://docs.nvidia.com/cuda/nvblas/index.html) 50 | - [cuFFT](http://docs.nvidia.com/cuda/cufft/index.html) 51 | - [cuGRAPH](https://docs.rapids.ai/api/cugraph/stable/) 52 | - [cuRAND](http://docs.nvidia.com/cuda/curand/index.html) 53 | - [cuSPARSE](http://docs.nvidia.com/cuda/cusparse/index.html) 54 | - HIP 55 | - [hipBLAS](https://rocm.docs.amd.com/projects/hipBLAS/en/latest/index.html) 56 | - [hipFFT](https://rocm.docs.amd.com/projects/hipFFT/en/latest/index.html) 57 | - [hipRAND](https://rocm.docs.amd.com/projects/hipRAND/en/latest/index.html) 58 | - [hipSPARSE](https://rocm.docs.amd.com/projects/hipSPARSE/en/latest/index.html) 59 | - SYCL 60 | - [OneAPI BLAS](https://www.intel.com/content/www/us/en/docs/onemkl/developer-reference-dpcpp/2025-0/blas-routines.html) 61 | - [OneAPI FFT](https://www.intel.com/content/www/us/en/docs/onemkl/developer-reference-dpcpp/2025-0/fourier-transform-functions.html) 62 | - [OneAPI sparse](https://www.intel.com/content/www/us/en/docs/onemkl/developer-reference-dpcpp/2025-0/sparse-blas-routines.html) 63 | - [OneAPI random number generators](https://www.intel.com/content/www/us/en/docs/onemkl/developer-reference-dpcpp/2025-0/random-number-generators.html) 64 | - OpenCL 65 | - [CLBlast](https://github.com/CNugteren/CLBlast) 66 | - [clFFT](https://github.com/clMathLibraries/clFFT) 67 | 68 | ## Source-to-source Translation 69 | 70 | - CUDA to HIP 71 | - [hipify](https://github.com/ROCm/HIPIFY) 72 | - CUDA to SYCL 73 | - [SYCLomatic](https://github.com/oneapi-src/SYCLomatic) 74 | - CUDA to OpenCL 75 | - [cutocl](https://github.com/benvanwerkhoven/cutocl) 76 | 77 | ## Foreign Function Interfaces 78 | 79 | - C++ 80 | - CUDA 81 | - [cudawrappers](https://github.com/nlesc-recruit/cudawrappers) 82 | - OpenCL 83 | - [CLHPP](https://github.com/KhronosGroup/OpenCL-CLHPP) 84 | - Python 85 | - CUDA 86 | - [PyCuda](https://mathema.tician.de/software/pycuda/) 87 | - [CuPy](https://cupy.dev/) 88 | - [cuda-python](https://nvidia.github.io/cuda-python/) 89 | - HIP 90 | - [PyHIP](https://github.com/jatinx/PyHIP) 91 | - SYCL 92 | - [dpctl](https://github.com/IntelPython/dpctl) 93 | - OpenCL 94 | - [PyOpenCL](https://mathema.tician.de/software/pycuda/) 95 | - Julia 96 | - CUDA 97 | - [CUDA.jl](https://github.com/JuliaGPU/CUDA.jl) 98 | - HIP 99 | - [AMDGPU.jl](https://github.com/JuliaGPU/AMDGPU.jl) 100 | - SYCL 101 | - [oneAPI.jl](https://github.com/JuliaGPU/oneAPI.jl) 102 | - Java 103 | - CUDA 104 | - [JCuda](http://www.jcuda.org/) 105 | - OpenCL 106 | - [JOCL](http://www.jocl.org/) 107 | 108 | ## High-Level Abstractions 109 | 110 | - C++ 111 | - [Kokkos](https://github.com/kokkos/kokkos) 112 | - [Raja](https://github.com/LLNL/RAJA) 113 | - Python 114 | - [Numba](https://numba.pydata.org/) 115 | - [pykokkos](https://github.com/kokkos/pykokkos) 116 | 117 | ## Debugging and Profiling Tools 118 | 119 | - CUDA 120 | - [Nsight Systems](https://developer.nvidia.com/nsight-systems) 121 | - [Nsight Compute](https://developer.nvidia.com/nsight-compute) 122 | - [CUDA-GDB](http://docs.nvidia.com/cuda/cuda-gdb/index.html) 123 | - [compute-sanitizer](https://docs.nvidia.com/compute-sanitizer/index.html) 124 | - HIP 125 | - [omniperf](https://github.com/AMDResearch/omniperf) 126 | - [rocprof](https://github.com/ROCm/rocprofiler) 127 | - SYCL 128 | - [oneprof](https://github.com/intel/pti-gpu/tree/master/tools/oneprof) 129 | - [onetrace](https://github.com/intel/pti-gpu/tree/master/tools/onetrace) 130 | 131 | ## Performance Optimization 132 | 133 | - [PRACE best practice guide on modern accelerators](https://zenodo.org/records/5839488) 134 | - [CUDA best practices](https://docs.nvidia.com/cuda/cuda-c-best-practices-guide/index.html) 135 | - [OneAPI SYCL best practices](https://www.intel.com/content/www/us/en/docs/oneapi/programming-guide/2025-0/optimize-your-sycl-applications.html) 136 | 137 | ## Auto-tuning 138 | 139 | - Kernel Tuner 140 | - [GitHub repository](https://github.com/KernelTuner/kernel_tuner) 141 | - [Documentation](https://kerneltuner.github.io/kernel_tuner/stable/) 142 | - [Tutorial](https://github.com/KernelTuner/kernel_tuner_tutorial) 143 | -------------------------------------------------------------------------------- /CITATION.cff: -------------------------------------------------------------------------------- 1 | # This CITATION.cff file was generated with cffinit. 2 | # Visit https://bit.ly/cffinit to generate yours today! 3 | 4 | cff-version: 1.2.0 5 | title: Netherlands eScience Center - Software Development Guide 6 | message: "If you use this guide, please cite it." 7 | type: software 8 | authors: 9 | - affiliation: Netherlands eScience Center 10 | family-names: Drost 11 | given-names: Niels 12 | orcid: "https://orcid.org/0000-0001-9795-7981" 13 | - affiliation: Netherlands eScience Center 14 | family-names: Spaaks 15 | given-names: Jurriaan H. 16 | orcid: "https://orcid.org/0000-0002-7064-4069" 17 | - affiliation: Netherlands eScience Center 18 | family-names: Andela 19 | given-names: Bouwe 20 | - affiliation: Netherlands eScience Center 21 | family-names: Veen 22 | given-names: Lourens 23 | - affiliation: Netherlands eScience Center 24 | family-names: Zwaan 25 | name-particle: van der 26 | given-names: Janneke M. 27 | orcid: "https://orcid.org/0000-0002-8329-7000" 28 | - affiliation: Netherlands eScience Center 29 | family-names: Verhoeven 30 | given-names: Stefan 31 | orcid: "https://orcid.org/0000-0002-5821-2060" 32 | - affiliation: Netherlands eScience Center 33 | family-names: Bos 34 | given-names: Patrick 35 | orcid: "https://orcid.org/0000-0002-6033-960X" 36 | - family-names: Kuzak 37 | given-names: Mateusz 38 | orcid: "https://orcid.org/0000-0003-0087-6021" 39 | - affiliation: Netherlands eScience Center 40 | family-names: Werkhoven 41 | name-particle: van 42 | given-names: Ben 43 | orcid: "https://orcid.org/0000-0002-7508-3272" 44 | - affiliation: Netherlands eScience Center 45 | family-names: Attema 46 | given-names: Jisk 47 | orcid: "https://orcid.org/0000-0002-0948-1176" 48 | - affiliation: Netherlands eScience Center 49 | family-names: Hidding 50 | given-names: Johannes 51 | - family-names: Hees 52 | name-particle: van 53 | given-names: Vincent 54 | orcid: "https://orcid.org/0000-0003-0182-9008" 55 | - affiliation: Netherlands eScience Center 56 | family-names: Martinez-Ortiz 57 | given-names: Carlos 58 | orcid: "https://orcid.org/0000-0001-5565-7577" 59 | - affiliation: Netherlands eScience Center 60 | family-names: Spreeuw 61 | given-names: Hanno 62 | orcid: "https://orcid.org/0000-0002-5057-0322" 63 | - family-names: Borgdorff 64 | given-names: Joris 65 | orcid: "https://orcid.org/0000-0001-7911-9490" 66 | - family-names: Leinweber 67 | given-names: Katrin 68 | - affiliation: Netherlands eScience Center 69 | family-names: Diblen 70 | given-names: Faruk 71 | - affiliation: Netherlands eScience Center 72 | family-names: Oord 73 | name-particle: van den 74 | given-names: Gijs 75 | - affiliation: Netherlands eScience Center 76 | family-names: Goncalves 77 | given-names: Romulo 78 | orcid: "https://orcid.org/0000-0003-2225-1428" 79 | - affiliation: Netherlands eScience Center 80 | family-names: Kuzniar 81 | given-names: Arnold 82 | orcid: "https://orcid.org/0000-0003-1711-7961" 83 | - affiliation: Netherlands eScience Center 84 | family-names: Kuppevelt 85 | name-particle: van 86 | given-names: Dafne 87 | - affiliation: Netherlands eScience Center 88 | family-names: Weel 89 | given-names: Berend 90 | - affiliation: Netherlands eScience Center 91 | family-names: Meijer 92 | given-names: Christiaan 93 | - affiliation: Netherlands eScience Center 94 | family-names: Maassen 95 | given-names: Jason 96 | orcid: "https://orcid.org/0000-0002-8172-4865" 97 | - affiliation: Netherlands eScience Center 98 | family-names: Rodríguez-Sánchez 99 | given-names: Pablo 100 | orcid: "https://orcid.org/0000-0002-2855-940X" 101 | - affiliation: Netherlands eScience Center 102 | family-names: Klaver 103 | given-names: Tom 104 | - affiliation: Netherlands eScience Center 105 | family-names: Hage 106 | name-particle: van 107 | given-names: Willem Robert 108 | orcid: "https://orcid.org/0000-0002-6478-3003" 109 | - affiliation: Netherlands eScience Center 110 | family-names: Zapata 111 | given-names: Felipe 112 | orcid: "https://orcid.org/0000-0001-8286-677X" 113 | - affiliation: Netherlands eScience Center 114 | family-names: Bakker 115 | given-names: Tom 116 | - affiliation: Netherlands eScience Center 117 | family-names: Rijn 118 | name-particle: van 119 | given-names: Sander 120 | orcid: "https://orcid.org/0000-0001-6159-041X" 121 | - affiliation: Journal of Open Source Software 122 | family-names: Niemeyer 123 | given-names: Kyle 124 | - affiliation: Netherlands eScience Center 125 | family-names: Wehner 126 | given-names: Jens 127 | - affiliation: Netherlands eScience Center 128 | family-names: Burg 129 | name-particle: van der 130 | given-names: Sven 131 | - affiliation: Netherlands eScience Center 132 | family-names: Siqueira 133 | given-names: Abel 134 | - affiliation: Netherlands eScience Center 135 | family-names: Vreede 136 | given-names: Barbara 137 | - affiliation: Netherlands eScience Center 138 | family-names: Schnober 139 | given-names: Carsten 140 | - affiliation: Netherlands eScience Center 141 | family-names: Chandramouli 142 | given-names: Pranav 143 | - affiliation: Utrecht University 144 | family-names: Oberman 145 | given-names: Hanne 146 | - affiliation: Netherlands eScience Center 147 | family-names: Lüken 148 | given-names: Malte 149 | - affiliation: Netherlands eScience Center 150 | family-names: Isazi 151 | given-names: Alessio 152 | - affiliation: "Datadog, Inc." 153 | family-names: Lev 154 | given-names: Ofek 155 | - affiliation: Netherlands eScience Center 156 | family-names: Cahen 157 | given-names: Ewan 158 | - affiliation: Netherlands eScience Center 159 | family-names: Ali 160 | given-names: Suvayu 161 | - affiliation: Netherlands eScience Center 162 | family-names: Hafner 163 | given-names: Flavio 164 | - affiliation: Netherlands eScience Center 165 | family-names: Cushing 166 | given-names: Reggie 167 | - affiliation: Netherlands eScience Center 168 | family-names: Kasalica 169 | given-names: Vedran 170 | orcid: "https://orcid.org/0000-0002-0097-1056" 171 | - affiliation: Utrecht University 172 | family-names: Vargas Honorato 173 | given-names: Rodrigo 174 | orcid: "https://orcid.org/0000-0001-5267-3002" 175 | repository-code: "https://github.com/NLeSC/guide" 176 | abstract: >- 177 | This is a guide to software development and projects at 178 | the Netherlands eScience Center. It both serves as a 179 | source of information for exactly how we work at the 180 | eScience Center, and as a basis for discussions and 181 | reaching consensus on this topic. 182 | license: CC-BY-4.0 183 | -------------------------------------------------------------------------------- /best_practices.md: -------------------------------------------------------------------------------- 1 | # Best Practices for Software Development 2 | 3 | In this chapter we give an overview of the best practices for software development at the Netherlands eScience Center, including a rationale. 4 | 5 | ## Checklists 6 | 7 | An easy way to make sure you did not forget anything important is to use a well curated checklist. 8 | Great examples can be found via [FAIR Software NL](https://fair-software.nl/recommendations/checklist). 9 | [The Turing Way](https://book.the-turing-way.org) has specific topical checklists at the end of each of their chapters. 10 | 11 | ## Version control 12 | 13 | Use a version control tool like `git` to track changes in your codebase. 14 | This allows you to retrace your steps when debugging, keep your repository clean, easily collaborate with others asynchronously and more. 15 | More info: [The Turing Way chapter on Version Control](https://book.the-turing-way.org/reproducible-research/vcs), [RSQkit chapter on Version Control](http://everse.software/RSQKit/using_version_control). 16 | 17 | **At the Netherlands eScience Center:** we always use version control and we preferably use GitHub as our online repository and collaboration platform (see the [Project Management Protocol on our intranet](https://nlesc.sharepoint.com/sites/home/SitePages/Project-procedures.aspx) (only accessible to Netherlands eScience Center employees)). 18 | 19 | ## Testing 20 | 21 | Tests are important for two reasons: 1. confirming the expected workings of your code while developing for the first time and 2. making sure your features keep working when later on you or others modify the implementation. 22 | [The Turing Way gives an overview of the many ways to test code](https://book.the-turing-way.org/reproducible-research/testing). 23 | 24 | ## Code Reviews 25 | 26 | The most effective tool for improving software quality (and sharing knowledge at the same time) is doing code reviews. 27 | Have a look at the [The Turing Way chapter on Code Reviewing](https://book.the-turing-way.org/reproducible-research/reviewing) to learn more about ways to do this. 28 | 29 | ## Documentation 30 | 31 | Developed programs should be documented at multiple levels, from code comments, through API documentation, to installation and usage documentation. 32 | Comments at each level should take into account different target audiences, from experienced developers, to end users with no programming skills. 33 | In the [Turing Way chapter on Code Documentation](https://book.the-turing-way.org/reproducible-research/code-documentation) you will find a great overview of the how and why of documentation. 34 | 35 | ## Code Quality 36 | 37 | Ways to improve code quality are described in the [Code quality](https://book.the-turing-way.org/reproducible-research/code-quality.html) chapter on the Turing Way. 38 | 39 | Explore [online tools for software quality improvement](https://book.the-turing-way.org/reproducible-research/code-quality/code-quality-style.html#online-services-providing-software-quality-checks). Additionally, check our [language guides](/language_guides/languages_overview.md) for language-specific recommendations. 40 | [RSQKit: Research Software Quality Kit](https://everse.software/RSQKit/) also has many useful guides including software quality. These guides are result of an international collaboration primarily focusing on research software quality. 41 | 42 | ### EditorConfig 43 | 44 | The eScience Center provides a [shared config file](https://raw.githubusercontent.com/NLeSC/exemplum/master/.editorconfig) for IDEs and text editors. This file helps standardize coding styles across projects. 45 | 46 | ### Namespaces 47 | 48 | If your programming language supports namespaces, use your organization or project-specific namespace. 49 | 50 | **At the Netherlands eScience Center:**, the recommended namespace is **nl.esciencecenter**, or adapt it to a namespace that aligns with your project's context. 51 | 52 | ## Use standards 53 | 54 | Standard files and protocols should always be a primary choice. 55 | Using standards improves the interoperability of your software, thereby improving its usefulness. 56 | Examples include exchange formats like Unicode, NetCDF, and W3C web standards, and protocols like HTTP, TCP, TLS. 57 | 58 | ## Licensing 59 | 60 | Since source code is protected by copyright, to allow people to use your code it needs a license. 61 | For more information, see [The Turing Way chapter on licensing](https://the-turing-way.netlify.app/reproducible-research/licensing) or the [RSQkit Licensing software task](http://everse.software/RSQKit/licensing_software). 62 | 63 | **At the Netherlands eScience Center:** our first choice is the Apache v2 license. 64 | See the [Project Management Protocol on our intranet](https://nlesc.sharepoint.com/sites/home/SitePages/Project-procedures.aspx) (only accessible to Netherlands eScience Center employees) for more details on licensing and our intellectual property policies. 65 | 66 | ## Software management plans 67 | 68 | The Netherlands eScience Center and [NWO](https://www.nwo.nl/en) have authored the [practical guide to software management plans](https://doi.org/10.5281/zenodo.7248877) ([see also](https://www.esciencecenter.nl/national-guidelines-for-software-management-plans/)). 69 | For our projects we recommend using [our Software Sustainability Protocol](https://doi.org/10.5281/zenodo.1451750), which is based on these guidelines. 70 | For more information you can also [read here](https://github.com/the-turing-way/the-turing-way/issues/2419). 71 | 72 | ## Releases 73 | 74 | Releases are a way to mark or point to a particular milestone in software development. 75 | This is useful for users and collaborators, e.g. I found a bug running version x. 76 | For publications that refer to software, refering to a specific release enhances the reproducability. 77 | See [the RSQkit task on Creating code releases](http://everse.software/RSQKit/releasing_software) for the most essential guidelines. 78 | The Turing Way offers many related tips in their [chapter on Making Research Objects Citable](https://book.the-turing-way.org/communication/citable), like how to make code citable with CITATION.CFF files. 79 | 80 | ## Packaging 81 | 82 | A related, but separate topic is packaging, which allows users to conveniently install your released software. 83 | Most [languages](/language_guides/languages_overview) and OS'es have their particular ways of doing this. 84 | The Turing Way offers advice on [making reproducible environments](https://book.the-turing-way.org/reproducible-research/renv), in which packaging is an essential component. 85 | 86 | ## Know your tools 87 | 88 | In addition to the advice on the best practices above, knowing the 89 | tools that are available for software development can really help you getting 90 | things done faster. 91 | 92 | ### Learn how to use the command line efficiently 93 | 94 | Read the chapter on using [Bash](/language_guides/bash.md). 95 | 96 | ### Use an editor that helps you develop 97 | 98 | Commonly used editors and their ecosystem of plugins can really help you write 99 | better code faster. 100 | Note that for each of the editors and environments listed below, it is important 101 | to configure them such that they support the programming languages that you are 102 | developing in. 103 | 104 | Below is a list of editors that support many programming languages. 105 | 106 | Integrated Development Environments (IDEs): 107 | 108 | - [Visual Studio Code](https://code.visualstudio.com/) - modern editor with extensive plugin ecosystem that can make it as powerful as most IDEs 109 | - [JetBrains IDEs](https://www.jetbrains.com/ides/) - specialized IDEs for Python, C++, Java and web, all using the IntelliJ framework 110 | - [Eclipse](https://www.eclipse.org/ide/) - a bit older but still nice 111 | 112 | Text editors: 113 | 114 | - [vim](https://www.vim.org/) - classic text editor 115 | - [emacs](https://www.gnu.org/software/emacs/) - classic text editor 116 | -------------------------------------------------------------------------------- /CONTRIBUTING.md: -------------------------------------------------------------------------------- 1 | # Contributing to this Guide 2 | 3 | - [Who? You!](#who_you) 4 | - [Audience](#audience) 5 | - [Scope](#scope) 6 | - [How?](#how) 7 | - [Technical details (docsify)](#technical-details) 8 | - [Zen of the Guide](#zen-of-the-guide) 9 | 10 | # Who? You! 11 | 12 | This guide is primarily written by the Research Software Engineers at the Netherlands eScience Center. 13 | Contributions by anyone (also outside the Center) are most welcome! 14 | 15 | ## Page maintainers 16 | 17 | While everybody is encouraged to contribute where they can, we appoint maintainers for specific pages to regularly keep things up to date and think along with contributors. 18 | To see who is responsible for which part of the guide see the maintainer listed at the top of a page. 19 | If you are interested in becoming a chapter owner for a page that is listed as _unmaintained_, please open a pull request to add your name instead of _unmaintained_. 20 | 21 | ## Editorial board 22 | 23 | The editors make sure content is in line with [the scope](#scope), that it is maintainable and that it is maintained. 24 | In practice they will: 25 | 26 | - track, lead towards satisfactory conclusion of and when necessary (in case of disagreement) decide on issues, discussions and pull requests, 27 | - flag content that needs to be updated or removed, 28 | - ask for input from page maintainers or other contributors, 29 | - periodically organize sprints to work on content together with everyone interested in contributing; usually in the form of a "Book Dash" together with The Turing Way contributors, 30 | 31 | and do any other regular editing tasks. 32 | 33 | Currently the team consists of: 34 | 35 | - Bouwe Andela [@bouweandela](https://github.com/bouweandela) (research software engineer) 36 | - Carlos Martínez Ortiz [@c-martinez](https://github.com/c-martinez) (community manager) 37 | - Patrick Bos [@egpbos](https://github.com/egpbos) (technology lead) 38 | 39 | # Audience 40 | 41 | Our eScience Center _RSEs_ are the prototypical audience members, in particular those starting out in some unfamiliar area of technology. 42 | Some characteristics include: 43 | 44 | - They are interested in _intermediate to advanced level_ best practices. If there are already ten easily found blog posts about it, it doesn't have to be in the Guide. 45 | - They are a _programmer or researcher_ that is already familiar with some other programming language or software-related technology. 46 | - They may be generally interested (in particular topics of eScience practice and research software development in general or how this is done at the eScience Center specifically), but their main aim is towards _practical_ application, not to create a literature study of the current landscape of (research) software. 47 | 48 | # Scope 49 | 50 | To make sure the information in this guide stays relevant and up to date it is intentionally low on technical details. 51 | The guide contains and links to best practices we use to code and develop research software in our projects. 52 | 53 | The main goal: having information available about research software engineering best practices for our colleagues, collaborators and other interested people. 54 | It can be information that you can give a colleague starting in some area, for instance, a new language or a new technology. 55 | 56 | 80% of this goal will be met by [the Turing Way](https://book.the-turing-way.org/). 57 | For everything else: we have the Guide. 58 | 59 | We focus on eScience Center-specific best practices. 60 | These can be generic and complete or specific and highly curated. 61 | It depends! 62 | For instance, eScience specific content (e.g. we prefer `git` over `svn`) should be in the Guide, while content of interest to a general audience (e.g. it is good practice to use a version control system) should go in The Turing Way. 63 | When in doubt, discuss your doubts in an issue. 64 | 65 | A few things are excluded: 66 | 67 | 1. Project related practices (planning, communication, stake holders, management, etc.). These we gather on our intranet pages. 68 | 2. Project output is gathered on the [Research Software Directory](https://research-software-directory.org/organisations/netherlands-escience-center?tab=software&order=is_featured). 69 | 3. Generic research software engineering advice that can be added to [The Turing Way](https://github.com/the-turing-way/the-turing-way). 70 | 71 | In practice, this means the Guide (for now) will mostly consist of language guides and technology guides. 72 | 73 | It can also sometimes function as a staging/draft area for eventually moving content to the Turing Way. 74 | However, we will urge you to contribute to the Turing Way directly. 75 | 76 | ## For significant changes / additions, especially new chapters 77 | 78 | Please check if your contribution fits in [The Turing Way](https://github.com/the-turing-way/the-turing-way) before considering contributing to this guide. 79 | Feel free to ask the [editors](#editorial-board) if you are unsure or open an [issue](https://github.com/NLeSC/guide/issues) to discuss it. 80 | If it does not fit, please open an [issue](https://github.com/NLeSC/guide/issues) to discuss your planned contribution before starting to work on it, to avoid disappointment later. 81 | 82 | # How? 83 | 84 | ## Style, form 85 | 86 | A well written piece of advice should contain the following information: 87 | 88 | 1. What, e.g. _version control_ 89 | 2. Why, e.g. _why version control is a good idea_ 90 | 3. Short how / tl;dr: Recommend one solution for readers who don't want to spend time reading about all possible options, e.g. _at NLeSC we use git with GitHub because..._ This is where NLeSC specific info should go if it makes sense to do so. 91 | 4. Long how: also explain other options for implementing advice, e.g. _here's a list of some more version control programs and/or services which we can recommend_. 92 | 93 | ## Technical 94 | 95 | Please use branches and pull requests to contribute content. If you are not part of the Netherlands eScience Center organization but would still like to contribute please do by submitting a pull request from a fork. 96 | 97 | ```shell 98 | git clone https://github.com/NLeSC/guide.git 99 | cd guide 100 | git branch newbranch 101 | git checkout newbranch 102 | ``` 103 | 104 | Please install [pre-commit](https://pre-commit.com/) and enable the pre-commit 105 | hooks by running 106 | 107 | ```shell 108 | pre-commit install 109 | ``` 110 | 111 | to automatically format your changes when committing. 112 | 113 | Add your new awesome feature, fix bugs, make other changes. 114 | 115 | To preview changes locally, host the repo with a static file web server: 116 | 117 | ```shell 118 | python3 -m http.server 4000 119 | ``` 120 | 121 | to view the documentation in a web browser (default address: http://localhost:4000). 122 | 123 | To check if there are any broken links use [lychee](https://github.com/lycheeverse/lychee) in a Docker container: 124 | 125 | ```shell 126 | docker run --init -it -v `pwd`:/docs lycheeverse/lychee /docs --config=docs/lychee.toml 127 | ``` 128 | 129 | If everything works as it should, `git add`, `commit` and `push` like normal. 130 | 131 | If you have made a significant contribution to the guide, please make sure to add yourself to the `CITATION.cff` file so your name can be included in the list of authors of the guide. 132 | 133 | ## Create a PDF file 134 | 135 | We host a PDF version of the guide on [Zenodo](https://doi.org/10.5281/zenodo.4020564). 136 | To update it a [new release](https://github.com/NLeSC/guide/releases) needs to be made of the guide. This will trigger a GitHub action to create a new Zenodo version with the PDF file. 137 | 138 | # Technical details 139 | 140 | The basics of how the Guide is implemented. 141 | 142 | The Guide is rendered by [docsify](https://docsify.js.org) and hosted on GitHub Pages. 143 | Deployment is "automatic" from the main branch, because docsify requires no build step into static HTML pages, but rather generates HTML dynamically from the MarkDown files in the Guide repository. 144 | The only configuration that was necessary for this automatic deployment is: 145 | 146 | 1. The [index.html](https://github.com/NLeSC/guide/blob/main/index.html) file in the root directory that loads docsify. 147 | 2. The empty [.nojekyll](https://github.com/NLeSC/guide/blob/main/.nojekyll) file, which tells GitHub that we're not dealing with Jekyll here (the GitHub Pages default). 148 | 3. Telling GitHub in the Settings -> Pages menu to load the Pages content from the root directory. 149 | 4. The [\_sidebar.md](https://github.com/NLeSC/guide/blob/main/_sidebar.md) file for the table of contents. 150 | 151 | Plugins that we use: 152 | 153 | - The [docsify full text search plugin](https://docsify.js.org/#/plugins?id=full-text-search) 154 | - The [docsify Google Analytics plugin](https://docsify.js.org/#/plugins?id=google-analytics) 155 | - [Prism](https://docsify.js.org/#/language-highlight) is used for language highlighting. 156 | 157 | If you want to change anything in this part, please discuss in an issue. 158 | 159 | # Zen of the Guide 160 | 161 | 0. Help your colleagues. 162 | 1. Citing is better than copying. 163 | 2. Copying is better than rewriting from scratch. 164 | 3. ... but leaving out is often even better. 165 | 4. Don't state the obvious. 166 | 5. Don't assume that something is obvious. 167 | 6. Snippets are friends. 168 | 7. Remove outdated content. 169 | 8. Better yet, update outdated content. 170 | 9. Your practices are just _your_ practices. Best practices are shared practices. $N>1$. 171 | 10. Our best practices are just _our_ best practices. We don't have to agree with everyone. 172 | 11. Best practices are timeless (at least for a year or so). 173 | 12. Best practices are never set in stone. They are set in the Guide. 174 | 13. Best practices are not always practices. 175 | 14. ~~Best practices are not always best practices.~~ 176 | 15. Kill your darlings. 177 | 16. Consider The Turing Way first. 178 | 17. Sharing is better than guiding. 179 | 18. Guiding is better than turning a blind eye. 180 | 19. This Guide shall be under your pillow. 181 | -------------------------------------------------------------------------------- /language_guides/bash.md: -------------------------------------------------------------------------------- 1 | # Bash 2 | 3 | _Page maintainer: Bouwe Andela_ [@bouweandela](https://github.com/bouweandela) 4 | 5 | Bash is both a command line interface, 6 | also known as a **shell**, and a scripting language. 7 | On most Linux distributions, the Bash shell is the default way of interacting 8 | with the system. 9 | Zsh is an alternative shell that also understands the Bash scripting language, 10 | this is the default shell on recent versions of Mac OS. 11 | Both Bash and Zsh are available for most operating systems. 12 | 13 | At the Netherlands eScience Center, Bash is the recommended shell scripting 14 | language because it is the most commonly used shell language and therefore the 15 | most convenient for collaboration. 16 | To facilitate mutual understanding, it is also recommended that you are aware of 17 | the shell that your collaborators are using and that you write documentation 18 | with this in mind. 19 | Using the same shell as your collaborators is a simple way of making sure you 20 | are always on the same page. 21 | 22 | In this chapter, a short introduction and best practices for both interactive 23 | and use in scripts will be given. 24 | An excellent tutorial introducing Bash can be found 25 | [here](https://swcarpentry.github.io/shell-novice/). 26 | If you have not used Bash or another shell before, it is recommended that you 27 | follow the tutorial before continuing reading. 28 | Learning to use Bash is highly recommended, because after some initial learning, 29 | you will be more efficient and have a better understanding of what is going on 30 | than when clicking buttons from the graphical user interface of your operating 31 | system or integrated development environment. 32 | 33 | ## Interactive use 34 | 35 | If you are a (research) software engineer, it is highly recommended that you 36 | learn 37 | 38 | - the [keyboard shortcuts](#Bash-keyboard-shortcuts) 39 | - how to configure [Bash aliases](#Bash-aliases) 40 | - the name and function of [commonly used command line tools](#Commonly-used-command-line-tools) 41 | 42 | ### Bash keyboard shortcuts 43 | 44 | An introduction to 45 | [bash keyboard shortcuts](https://www.tecmint.com/linux-command-line-bash-shortcut-keys/) 46 | can be found here. 47 | Note that Bash can also be configured such that it uses the _vi_ keyboard 48 | shortcuts instead of the default _emacs_ ones, which can be useful if you 49 | [prefer vi](https://skeptics.stackexchange.com/questions/17492/does-emacs-cause-emacs-pinky). 50 | 51 | ### Bash aliases 52 | 53 | [Bash aliases](https://linuxize.com/post/how-to-create-bash-aliases/) 54 | allow you to define shorthands for commands you use often. 55 | Typically these are defined in the `~/.bashrc` or `~/.bash_aliases` file. 56 | 57 | ### Commonly used command line tools 58 | 59 | It is recommended that you know at least the names and use of the following 60 | command line tools. 61 | The details of how to use a tool exactly can easily be found by searching the 62 | internet or using `man` to read the manual, but you will be vastly more 63 | efficient if you already know the name of the command you are looking for. 64 | 65 | **Working with files** 66 | 67 | - `ls` - List files and directories 68 | - `tree` - Graphical representation of a directory structure 69 | - `cd` - Change working directory 70 | - `pwd` - Show current working directory 71 | - `cp` - Copy a file or directory 72 | - `mv` - Move a file or directory 73 | - `rm` - Remove a file or directory 74 | - `mkdir` - Make a new directory 75 | - `touch` - Make a new empty file or update its access and modification time to the current time 76 | - `chmod` - Change the permissions on a file or directory 77 | - `chown` - Change the owner of a file or directory 78 | - `find` - Search for files and directories on the file system 79 | - `locate`, `updatedb` - Search for files and directories quickly using a database 80 | - `tar` - (Un)pack .tar or .tar.gz files 81 | - `unzip` - Unpack .zip files 82 | - `df`, `du` - Show free space on disk, show disk space usage of files/folders 83 | 84 | **Working with text** 85 | 86 | Here we list the most commonly used Bash tools that are built to manipulate 87 | _lines of text_. 88 | The nice thing about these tools is that you can combine them by streaming the 89 | output of one tool to become the input of the next tool. 90 | Have a look at the 91 | [tutorial](https://swcarpentry.github.io/shell-novice/04-pipefilter.html) 92 | for an introduction. 93 | This can be done by creating 94 | [pipelines](https://www.gnu.org/savannah-checkouts/gnu/bash/manual/bash.html#Pipelines) 95 | with the pipe operator `|` and by redirecting text to output streams or files 96 | using 97 | [redirection operators](https://www.gnu.org/savannah-checkouts/gnu/bash/manual/bash.html#Redirections) 98 | like `>` for output and `<` for input to a command from a text file. 99 | 100 | - `echo` - Repeat some text 101 | - `diff` - Show the difference between two text files 102 | - `grep` - Search for lines of text matching a simple string or regular expressions 103 | - `sed` - Edit lines of text using regular expressions 104 | - `cut` - Select columns from text 105 | - `cat` - Print the content of a file 106 | - `head` - Print the first n lines 107 | - `tail` - Print the last n lines 108 | - `tee` - Read from standard input and write to standard output and file 109 | - `less` - Read text 110 | - `sort` - Sort lines of text 111 | - `uniq` - Keep unique lines 112 | - `wc` - Count words/lines 113 | - `nano`, `emacs`, `vi` - Interactive text editors found on most Unix systems 114 | 115 | **Working with programs** 116 | 117 | - `man` - Read the manual 118 | - `ps` - Print all currently running programs 119 | - `top` - Interactively display all currently running programs 120 | - `kill` - Stop a running program 121 | - `\time` - Collect statistics about resource usage such as runtime, memory use, storage access (the `\` in front is needed to run the `time` program instead of the bash builtin function with the same name) 122 | - `which` - Find which file will be executed when you run a command 123 | - `xargs` - Run programs with arguments in parallel 124 | 125 | **Working with remote systems** 126 | 127 | - `ssh` - Connect to a shell on a remote computer 128 | - `rsync` - Copy files between computers using SSH/SFTP 129 | - `lftp` - Copy files between computers using FTP 130 | - `wget`, `curl` - Copy a file using https or make a request to a remote API 131 | - `scp`, `sftp`, `ftp` - Simple tools for transferring files over (S)FTP - not recommended 132 | - `who` - show who is logged on 133 | - `screen` - Run multiple bash sessions and keep them running even when you log out 134 | 135 | **Installing software** 136 | 137 | - `apt` - The default package manager on Debian based Linux distributions 138 | - `yum`, `dnf` - The default package manager on RedHat/Fedora based Linux distributions 139 | - `brew` - A package manager for MacOS 140 | - `conda` - A package manager that supports many operating systems 141 | - `pip` - The Python package manager 142 | - `docker`, `singularity` - Run an entire Linux operating system including software from a [container](https://www.docker.com/resources/what-container) 143 | 144 | **Miscellaneous** 145 | 146 | - `bash`, `zsh` - The command to start Bash/Zsh 147 | - `history` - View all past commands 148 | - `fg`, `bg` - Move a program to the foreground, background, useful with Ctrl+Z 149 | - `su` - Switch user 150 | - `sudo` - Run a command with root permissions 151 | 152 | For further inspiration, see this 153 | [extensive list of command line tools](https://fossbytes.com/a-z-list-linux-command-line-reference/). 154 | 155 | ## Scripts 156 | 157 | It is possible to write bash scripts. 158 | This is done by writing the commands that you would normally use on the command 159 | line in text file and e.g. running the file with `bash some-file.sh`. 160 | 161 | However, doing this is only recommended if there really are no other options. 162 | If you have the option to write a Python script instead, that is the recommended 163 | way to go. 164 | This will bring you all the advantages of a fully-fledged programming language 165 | (such as libraries, frameworks for testing and documentation) and Python is the 166 | recommended programming language at the Netherlands eScience Center. 167 | If you do not mind having an extra dependency and would like to use the features 168 | and commands available in the shell from Python, the 169 | [sh](https://sh.readthedocs.io) library is a nice option. 170 | 171 | Disclaimer: if you are an experienced Bash developer, there might be situations 172 | where using a Bash script solves your problem faster or in a more portable way 173 | than a Python script. 174 | Do take take a moment to think about whether such a solution is easy to 175 | contribute to for collaborators and will be easy to maintain in the future, as 176 | the number of features, supported systems, and code paths grows. 177 | 178 | When writing a bash script, always use 179 | [`shellcheck`](https://www.shellcheck.net/) 180 | to make sure that your bash script is as likely to do what you think it should 181 | do as possible. 182 | 183 | In addition to that, always start the script with 184 | 185 | ```bash 186 | set -euo pipefail 187 | ``` 188 | 189 | this will stop the script if there is 190 | 191 | - `-e` a command that exits with a non-zero exit code 192 | - `-o pipefail` a command in a pipe that exits with a non-zero exit code 193 | - `-u` an undefined variable in your script 194 | 195 | an exit code other than zero usually indicates that an error occurred. 196 | If needed, you can temporarily allow this kind of error for a single line by 197 | wrapping it like this 198 | 199 | ```bash 200 | set +e 201 | false # A command that returns a non-zero exit code 202 | set -e 203 | ``` 204 | 205 | ## Further resources 206 | 207 | - [Bash Tutorial](https://swcarpentry.github.io/shell-novice/) 208 | - [Bash Cheat sheet](https://devhints.io/bash) 209 | - The [Bash Reference Manual](https://www.gnu.org/savannah-checkouts/gnu/bash/manual/bash.html) or use `man bash` 210 | - [Oh My Zsh](https://ohmyz.sh/) offers an extensive set of themes and shortcuts for the Zsh 211 | -------------------------------------------------------------------------------- /language_guides/r.md: -------------------------------------------------------------------------------- 1 | # R 2 | 3 | _Page maintainers: [Malte Lüken](https://github.com/maltelueken) and [Pablo Rodríguez-Sánchez](https://github.com/PabRod)_ . 4 | 5 | ## What is R? 6 | 7 | R is a functional programming language and software environment for statistical computing and graphics: https://www.r-project.org/. 8 | 9 | ### Philosophy and typical use cases 10 | 11 | R is particularly popular in the social, health, and biological sciences where it is used for statistical modeling. R can also be used for signal processing (e.g. FFT), machine learning, image analyses, and natural language processing. The R syntax is similar to that of Matlab and Python in terms of compactness and readability, which makes it a good prototyping language for science. 12 | 13 | One of the strengths of R is the large number of available open source statistical packages, often developed by domain experts. For example, R-package [Seewave](http://rug.mnhn.fr/seewave/) is specialised in sound analyses. Packages are typically released on CRAN [The Comprehensive R Archive Network](http://cran.r-project.org). 14 | 15 | ### Some crucial differences with Python 16 | 17 | Are you familiar with Python? Then kickstart your R journey by reading this [blog post](https://towardsdatascience.com/the-starter-guide-for-transitioning-your-python-projects-to-r-8de4122b04ad). 18 | 19 | ### Recommended sources of information 20 | 21 | All R functions come with documentation in a standardized format. Some R packages have their own google group. Further, stackoverflow and standard search engines can lead you to answers to issues. 22 | 23 | If you prefer books, consider the following resources: 24 | 25 | - [R for Data Science](https://r4ds.had.co.nz/) by Hadley Wickham, 26 | - [Advanced R](https://adv-r.hadley.nz/) by Hadley Wickham, 27 | - [Writing better R code](http://www.bioconductor.org/help/course-materials/2013/CSAMA2013/friday/afternoon/R-programming.pdf) by Laurent Gatto. 28 | 29 | ## Getting started 30 | 31 | ### Setting up R 32 | 33 | To install R check detailed description at [CRAN website](http://cran.r-project.org). 34 | 35 | #### IDE 36 | 37 | R programs can be written in any text editor. R code can be run from the command line or interactively within R environment, that can be started with `R` command in the shell. To quit R environment type `q()`. 38 | 39 | Said this, it is highly recommended to use an integrated development environment (IDE). The most popular one is [RStudio / Posit](https://posit.co/products/open-source/rstudio/). It is free and quite powerful. It features editor with code completion, command line environment, file manager, package manager and history lookup among others. 40 | 41 | It comes with many menus and key bindings (visible when you hover your mouse over the menu item). For instance, you can run code sections by selecting them and pressing `Ctrl+Enter`. 42 | 43 | Note you will have to install RStudio in addition to installing R. Please note that updating RStudio does not automatically update R and the other way around. 44 | 45 | Within RStudio you can work on ad-hoc code or create a project. Compared with Python an R project is a bit like a virtual environment as it preserves the workspace and installed packages for that project. Creating a project is needed to build an R package. A project is created via the menu at the top of the screen. 46 | 47 | ### Installing compilers and runtimes 48 | 49 | Not needed as most functions in R are already compiled in C, nevertheless R has compiling functionality as described in the [R manual](https://stat.ethz.ch/R-manual/R-devel/library/compiler/html/compile.html). See [overview by Hadley Wickham](http://r-pkgs.had.co.nz/src.html). 50 | 51 | ## Coding style conventions 52 | 53 | We recommend following the [Tidyverse style guide](https://style.tidyverse.org/). 54 | Its guidelines can be automatically followed using linters such as: 55 | 56 | - [styler](https://github.com/r-lib/styler) 57 | - [lintr](https://github.com/r-lib/lintr) 58 | 59 | ### The `<-` operator 60 | 61 | Assigning variables with `<-` instead of `=` is recommended, although **most** of the time both are equivalent. 62 | 63 | If you are interested in the controversy around assignment operators, check out this [blog post](https://csgillespie.wordpress.com/2010/11/16/assignment-operators-in-r-vs/). 64 | 65 | ### `%>%` and `|>` 66 | 67 | The symbols `%>%` and `|>` represent the pipe operator. 68 | The first one is part of the `magrittr` package, and it gained so much popularity that a similar operator, `|>`, was added as part of native R since version 4.1.0. For details on the differences between the two, see this [blog post](https://www.tidyverse.org/blog/2023/04/base-vs-magrittr-pipe/). 69 | They just add syntactic sugar to the way we pass a variable to a function. 70 | The example below shows its basic behavior: 71 | 72 | ```r 73 | var %>% function(params) 74 | # Is equivalent to 75 | function(var, params) 76 | ``` 77 | 78 | These operators are pretty useful for composing functions, and very often appear concatenated: 79 | 80 | ```r 81 | grades |> remove_nans() |> mean() |> print() 82 | ``` 83 | 84 | You can think of it as a production chain, were an object (the `grades`) passes through three machines, one that removes the `NaN`s, another one that takes the mean, and a last one that prints the result. 85 | 86 | ## Recommended additional packages and libraries 87 | 88 | One of the strengths of R is its community, that creates and maintains a constellation of packages. 89 | Very rarely will you use just base R. 90 | Here we give you a list of usual packages, starting by one solving the first problem you'll find... how to manage that many packages! 91 | 92 | ### Managing environments with `renv` 93 | 94 | [`renv`](https://rstudio.github.io/renv/articles/renv.html) allows you to create and manage a dependencies library on a per-project basis. It also keeps track of the specific versions of each package used in the project, which is great for reproducibility... and avoiding future headhaches! 95 | 96 | ### Plotting with basic functions and ggplot2 and ggvis 97 | 98 | For a generic impression about plotting with R, see: https://www.r-graph-gallery.com/all-graphs 99 | 100 | The basic R installation comes with a wide range of functions to plot data to a window on your screen or to a file. If you need to quickly inspect your data or create a custom-made static plot then the basic functions offer the building blocks to do the job. There is a [Statmethods.net tutorial with some examples of plotting options in R](http://www.statmethods.net/graphs/index.html). 101 | 102 | However, externally contributed plotting packages may offer easier syntax or convenient templates for creating plots. The most popular and powerful contributed graphics package is [ggplot2](https://ggplot2.tidyverse.org/). Interactive plots can be made with [ggvis](https://github.com/rstudio/ggvis) package and embeded in web application, and this [tutorial](https://www.statmethods.net/advgraphs/ggplot2.html). 103 | 104 | In summary, it is good to familiarize yourself with both the basic plotting functions as well as the contributed graphics packages. In theory, the basic plot functions can do everything that ggplot2 can do, it is mostly a matter of how much you like either syntax and how much freedom you need to tailor the visualisation to your use case. 105 | 106 | ### Building interactive web applications with shiny 107 | 108 | Thanks to [shiny.app](https://shiny.posit.co/) it is possible to make interactive web application in R without the need to write javascript or html. 109 | 110 | ### Building reports with knitr 111 | 112 | [knitr](https://yihui.name/knitr/) is an R package designed to build dynamic reports in R. It's possible to generate on the fly new pdf or html documents with results of computations embedded inside. 113 | 114 | ### Preparing data for analysis 115 | 116 | There are packages that ease tidying up messy data, e.g. [tidyr](https://github.com/hadley/tidyr) and [reshape2](https://github.com/hadley/reshape). The idea of tidy and messy data is explained in a [tidy data](http://vita.had.co.nz/papers/tidy-data.html) paper by Hadley Wickham. There is also the google group [manipulatr](https://groups.google.com/forum/#!forum/manipulatr) to discuss topics related to data manipulation in R. 117 | 118 | ### Speeding up code 119 | 120 | Speeding up code always start with knowing where your bottlenecks are. 121 | The following profiling tools will help you doing so: 122 | 123 | - Introduction to [profiling in R](https://bookdown.org/rdpeng/rprogdatascience/profiling-r-code.html) 124 | 125 | Some rules of thumb that can quickly improve your code are the follwing: 126 | 127 | - Avoid loops, use `apply` functionals instead 128 | - Try to use vectorized functions 129 | - Checkout the [`purrr`](https://purrr.tidyverse.org/) package 130 | - If you are really in a hurry, consider communicating with `C++` code using [`Rcpp`](https://www.rcpp.org/). 131 | 132 | For a deeper introduction to the many optimization methods, check the free ebook: 133 | 134 | - [Efficient R programming](https://csgillespie.github.io/efficientR/), by Colin Gillespie and Robin Lovelace. 135 | 136 | ## Package development 137 | 138 | ### Building R packages 139 | 140 | There is a great tutorial written by Hadley Wickam describing all the nitty gritty of building your own package in R. It's called [R packages](http://r-pkgs.had.co.nz). 141 | For a quicker introduction, consider this software Carpentries' [lesson on R packages](https://carpentries-incubator.github.io/lesson-R-packaging/), originated and developed at our Center! 142 | 143 | ### Package documentation 144 | 145 | Read [Documentation](http://r-pkgs.had.co.nz/man.html) chapter of Hadleys [R packages](http://r-pkgs.had.co.nz) book for details about documenting R code. 146 | 147 | Customary R uses `.Rd` files in `/man` directory for documentation. These files and folders are automatically created by RStudio when you create a new project from your existing R-function files. 148 | 149 | Function level comments starting with `#'` are used by `roxygen` to automatically generate the `.Rd` files. This means that you **don't have to edit the `.Rd` files directly**. 150 | 151 | R function documentation offers plenty of space to document the functionality, including code examples, literature references, and links to related functions. Nevertheless, it can sometimes be helpful for the user to also have a more generic description of the package with for example use-cases. You can do this with a `vignette`. 152 | 153 | Read more about vignettes in [Package documentation](http://r-pkgs.had.co.nz/vignettes.html) chapter of Hadleys [R packages](http://r-pkgs.had.co.nz) book. 154 | Read more about `roxygen` syntax on it's [github page](https://github.com/yihui/roxygen2). `roxygen` will also populate `NAMESPACE` file which is necessary to manage package level imports. 155 | 156 | ## Available templates 157 | 158 | Most of the templating is nativelly managed by the [`usethis`](https://usethis.r-lib.org/) package. 159 | It contains functions that create the boilerplate for you, reducing the burden on your memory and reducing chances for errors. 160 | In the snippet below you can see how it feels to use it. 161 | 162 | ```r 163 | usethis::create_package() # Creates a package structure 164 | usethis::use_readme_md() # Adds a readme 165 | usethis::use_apache_license() # Adds an Apache License 166 | usethis::use_testthat() # Adds the testing infrastructure 167 | usethis::use_citation() # Adds a citation file 168 | # etc... 169 | 170 | ``` 171 | 172 | Having said this, these others can serve as inspiration: 173 | 174 | - https://rapporter.github.io/rapport/ 175 | - https://shiny.posit.co/r/articles/build/templates/ 176 | - https://bookdown.org/yihui/rmarkdown/document-templates.html 177 | 178 | ## Testing, Checking, Debugging and Profiling 179 | 180 | ### Testing and checking 181 | 182 | [Testthat](https://github.com/hadley/testthat) is a testing package by Hadley Wickham. [Testing chapter](http://r-pkgs.had.co.nz/tests.html) of a book [R packages](http://r-pkgs.had.co.nz) describes in detail testing process in R with use of `testthat`. Further, [testthat: Get Started with Testing](https://journal.r-project.org/archive/2011-1/RJournal_2011-1_Wickham.pdf) by Whickham may also provide a good starting point. 183 | 184 | See also [checking](http://r-pkgs.had.co.nz/check.html) and [testing](http://r-pkgs.had.co.nz/tests.html) R packages. note that within RStudio R package check and R package test can be done via simple toolbar clicks. 185 | 186 | ### Continuous integration 187 | 188 | [Continuous integration](https://book.the-turing-way.org/reproducible-research/ci) should be done with an online service. We recommend using GitHub actions. 189 | 190 | ### Debugging and Profiling 191 | 192 | Debugging is possible in RStudio, see [link](https://support.posit.co/hc/en-us/articles/205612627-Debugging-with-RStudio). For profiling tips see [link](http://adv-r.had.co.nz/Profiling.html) 193 | 194 | ## Not in this tutorial yet: 195 | 196 | - Logging 197 | -------------------------------------------------------------------------------- /LICENSE: -------------------------------------------------------------------------------- 1 | Attribution 4.0 International 2 | 3 | ======================================================================= 4 | 5 | Creative Commons Corporation ("Creative Commons") is not a law firm and 6 | does not provide legal services or legal advice. Distribution of 7 | Creative Commons public licenses does not create a lawyer-client or 8 | other relationship. Creative Commons makes its licenses and related 9 | information available on an "as-is" basis. Creative Commons gives no 10 | warranties regarding its licenses, any material licensed under their 11 | terms and conditions, or any related information. Creative Commons 12 | disclaims all liability for damages resulting from their use to the 13 | fullest extent possible. 14 | 15 | Using Creative Commons Public Licenses 16 | 17 | Creative Commons public licenses provide a standard set of terms and 18 | conditions that creators and other rights holders may use to share 19 | original works of authorship and other material subject to copyright 20 | and certain other rights specified in the public license below. The 21 | following considerations are for informational purposes only, are not 22 | exhaustive, and do not form part of our licenses. 23 | 24 | Considerations for licensors: Our public licenses are 25 | intended for use by those authorized to give the public 26 | permission to use material in ways otherwise restricted by 27 | copyright and certain other rights. Our licenses are 28 | irrevocable. Licensors should read and understand the terms 29 | and conditions of the license they choose before applying it. 30 | Licensors should also secure all rights necessary before 31 | applying our licenses so that the public can reuse the 32 | material as expected. Licensors should clearly mark any 33 | material not subject to the license. This includes other CC- 34 | licensed material, or material used under an exception or 35 | limitation to copyright. More considerations for licensors: 36 | wiki.creativecommons.org/Considerations_for_licensors 37 | 38 | Considerations for the public: By using one of our public 39 | licenses, a licensor grants the public permission to use the 40 | licensed material under specified terms and conditions. If 41 | the licensor's permission is not necessary for any reason--for 42 | example, because of any applicable exception or limitation to 43 | copyright--then that use is not regulated by the license. Our 44 | licenses grant only permissions under copyright and certain 45 | other rights that a licensor has authority to grant. Use of 46 | the licensed material may still be restricted for other 47 | reasons, including because others have copyright or other 48 | rights in the material. A licensor may make special requests, 49 | such as asking that all changes be marked or described. 50 | Although not required by our licenses, you are encouraged to 51 | respect those requests where reasonable. More_considerations 52 | for the public: 53 | wiki.creativecommons.org/Considerations_for_licensees 54 | 55 | ======================================================================= 56 | 57 | Creative Commons Attribution 4.0 International Public License 58 | 59 | By exercising the Licensed Rights (defined below), You accept and agree 60 | to be bound by the terms and conditions of this Creative Commons 61 | Attribution 4.0 International Public License ("Public License"). To the 62 | extent this Public License may be interpreted as a contract, You are 63 | granted the Licensed Rights in consideration of Your acceptance of 64 | these terms and conditions, and the Licensor grants You such rights in 65 | consideration of benefits the Licensor receives from making the 66 | Licensed Material available under these terms and conditions. 67 | 68 | 69 | Section 1 -- Definitions. 70 | 71 | a. Adapted Material means material subject to Copyright and Similar 72 | Rights that is derived from or based upon the Licensed Material 73 | and in which the Licensed Material is translated, altered, 74 | arranged, transformed, or otherwise modified in a manner requiring 75 | permission under the Copyright and Similar Rights held by the 76 | Licensor. For purposes of this Public License, where the Licensed 77 | Material is a musical work, performance, or sound recording, 78 | Adapted Material is always produced where the Licensed Material is 79 | synched in timed relation with a moving image. 80 | 81 | b. Adapter's License means the license You apply to Your Copyright 82 | and Similar Rights in Your contributions to Adapted Material in 83 | accordance with the terms and conditions of this Public License. 84 | 85 | c. Copyright and Similar Rights means copyright and/or similar rights 86 | closely related to copyright including, without limitation, 87 | performance, broadcast, sound recording, and Sui Generis Database 88 | Rights, without regard to how the rights are labeled or 89 | categorized. For purposes of this Public License, the rights 90 | specified in Section 2(b)(1)-(2) are not Copyright and Similar 91 | Rights. 92 | 93 | d. Effective Technological Measures means those measures that, in the 94 | absence of proper authority, may not be circumvented under laws 95 | fulfilling obligations under Article 11 of the WIPO Copyright 96 | Treaty adopted on December 20, 1996, and/or similar international 97 | agreements. 98 | 99 | e. Exceptions and Limitations means fair use, fair dealing, and/or 100 | any other exception or limitation to Copyright and Similar Rights 101 | that applies to Your use of the Licensed Material. 102 | 103 | f. Licensed Material means the artistic or literary work, database, 104 | or other material to which the Licensor applied this Public 105 | License. 106 | 107 | g. Licensed Rights means the rights granted to You subject to the 108 | terms and conditions of this Public License, which are limited to 109 | all Copyright and Similar Rights that apply to Your use of the 110 | Licensed Material and that the Licensor has authority to license. 111 | 112 | h. Licensor means the individual(s) or entity(ies) granting rights 113 | under this Public License. 114 | 115 | i. Share means to provide material to the public by any means or 116 | process that requires permission under the Licensed Rights, such 117 | as reproduction, public display, public performance, distribution, 118 | dissemination, communication, or importation, and to make material 119 | available to the public including in ways that members of the 120 | public may access the material from a place and at a time 121 | individually chosen by them. 122 | 123 | j. Sui Generis Database Rights means rights other than copyright 124 | resulting from Directive 96/9/EC of the European Parliament and of 125 | the Council of 11 March 1996 on the legal protection of databases, 126 | as amended and/or succeeded, as well as other essentially 127 | equivalent rights anywhere in the world. 128 | 129 | k. You means the individual or entity exercising the Licensed Rights 130 | under this Public License. Your has a corresponding meaning. 131 | 132 | 133 | Section 2 -- Scope. 134 | 135 | a. License grant. 136 | 137 | 1. Subject to the terms and conditions of this Public License, 138 | the Licensor hereby grants You a worldwide, royalty-free, 139 | non-sublicensable, non-exclusive, irrevocable license to 140 | exercise the Licensed Rights in the Licensed Material to: 141 | 142 | a. reproduce and Share the Licensed Material, in whole or 143 | in part; and 144 | 145 | b. produce, reproduce, and Share Adapted Material. 146 | 147 | 2. Exceptions and Limitations. For the avoidance of doubt, where 148 | Exceptions and Limitations apply to Your use, this Public 149 | License does not apply, and You do not need to comply with 150 | its terms and conditions. 151 | 152 | 3. Term. The term of this Public License is specified in Section 153 | 6(a). 154 | 155 | 4. Media and formats; technical modifications allowed. The 156 | Licensor authorizes You to exercise the Licensed Rights in 157 | all media and formats whether now known or hereafter created, 158 | and to make technical modifications necessary to do so. The 159 | Licensor waives and/or agrees not to assert any right or 160 | authority to forbid You from making technical modifications 161 | necessary to exercise the Licensed Rights, including 162 | technical modifications necessary to circumvent Effective 163 | Technological Measures. For purposes of this Public License, 164 | simply making modifications authorized by this Section 2(a) 165 | (4) never produces Adapted Material. 166 | 167 | 5. Downstream recipients. 168 | 169 | a. Offer from the Licensor -- Licensed Material. Every 170 | recipient of the Licensed Material automatically 171 | receives an offer from the Licensor to exercise the 172 | Licensed Rights under the terms and conditions of this 173 | Public License. 174 | 175 | b. No downstream restrictions. You may not offer or impose 176 | any additional or different terms or conditions on, or 177 | apply any Effective Technological Measures to, the 178 | Licensed Material if doing so restricts exercise of the 179 | Licensed Rights by any recipient of the Licensed 180 | Material. 181 | 182 | 6. No endorsement. Nothing in this Public License constitutes or 183 | may be construed as permission to assert or imply that You 184 | are, or that Your use of the Licensed Material is, connected 185 | with, or sponsored, endorsed, or granted official status by, 186 | the Licensor or others designated to receive attribution as 187 | provided in Section 3(a)(1)(A)(i). 188 | 189 | b. Other rights. 190 | 191 | 1. Moral rights, such as the right of integrity, are not 192 | licensed under this Public License, nor are publicity, 193 | privacy, and/or other similar personality rights; however, to 194 | the extent possible, the Licensor waives and/or agrees not to 195 | assert any such rights held by the Licensor to the limited 196 | extent necessary to allow You to exercise the Licensed 197 | Rights, but not otherwise. 198 | 199 | 2. Patent and trademark rights are not licensed under this 200 | Public License. 201 | 202 | 3. To the extent possible, the Licensor waives any right to 203 | collect royalties from You for the exercise of the Licensed 204 | Rights, whether directly or through a collecting society 205 | under any voluntary or waivable statutory or compulsory 206 | licensing scheme. In all other cases the Licensor expressly 207 | reserves any right to collect such royalties. 208 | 209 | 210 | Section 3 -- License Conditions. 211 | 212 | Your exercise of the Licensed Rights is expressly made subject to the 213 | following conditions. 214 | 215 | a. Attribution. 216 | 217 | 1. If You Share the Licensed Material (including in modified 218 | form), You must: 219 | 220 | a. retain the following if it is supplied by the Licensor 221 | with the Licensed Material: 222 | 223 | i. identification of the creator(s) of the Licensed 224 | Material and any others designated to receive 225 | attribution, in any reasonable manner requested by 226 | the Licensor (including by pseudonym if 227 | designated); 228 | 229 | ii. a copyright notice; 230 | 231 | iii. a notice that refers to this Public License; 232 | 233 | iv. a notice that refers to the disclaimer of 234 | warranties; 235 | 236 | v. a URI or hyperlink to the Licensed Material to the 237 | extent reasonably practicable; 238 | 239 | b. indicate if You modified the Licensed Material and 240 | retain an indication of any previous modifications; and 241 | 242 | c. indicate the Licensed Material is licensed under this 243 | Public License, and include the text of, or the URI or 244 | hyperlink to, this Public License. 245 | 246 | 2. You may satisfy the conditions in Section 3(a)(1) in any 247 | reasonable manner based on the medium, means, and context in 248 | which You Share the Licensed Material. For example, it may be 249 | reasonable to satisfy the conditions by providing a URI or 250 | hyperlink to a resource that includes the required 251 | information. 252 | 253 | 3. If requested by the Licensor, You must remove any of the 254 | information required by Section 3(a)(1)(A) to the extent 255 | reasonably practicable. 256 | 257 | 4. If You Share Adapted Material You produce, the Adapter's 258 | License You apply must not prevent recipients of the Adapted 259 | Material from complying with this Public License. 260 | 261 | 262 | Section 4 -- Sui Generis Database Rights. 263 | 264 | Where the Licensed Rights include Sui Generis Database Rights that 265 | apply to Your use of the Licensed Material: 266 | 267 | a. for the avoidance of doubt, Section 2(a)(1) grants You the right 268 | to extract, reuse, reproduce, and Share all or a substantial 269 | portion of the contents of the database; 270 | 271 | b. if You include all or a substantial portion of the database 272 | contents in a database in which You have Sui Generis Database 273 | Rights, then the database in which You have Sui Generis Database 274 | Rights (but not its individual contents) is Adapted Material; and 275 | 276 | c. You must comply with the conditions in Section 3(a) if You Share 277 | all or a substantial portion of the contents of the database. 278 | 279 | For the avoidance of doubt, this Section 4 supplements and does not 280 | replace Your obligations under this Public License where the Licensed 281 | Rights include other Copyright and Similar Rights. 282 | 283 | 284 | Section 5 -- Disclaimer of Warranties and Limitation of Liability. 285 | 286 | a. UNLESS OTHERWISE SEPARATELY UNDERTAKEN BY THE LICENSOR, TO THE 287 | EXTENT POSSIBLE, THE LICENSOR OFFERS THE LICENSED MATERIAL AS-IS 288 | AND AS-AVAILABLE, AND MAKES NO REPRESENTATIONS OR WARRANTIES OF 289 | ANY KIND CONCERNING THE LICENSED MATERIAL, WHETHER EXPRESS, 290 | IMPLIED, STATUTORY, OR OTHER. THIS INCLUDES, WITHOUT LIMITATION, 291 | WARRANTIES OF TITLE, MERCHANTABILITY, FITNESS FOR A PARTICULAR 292 | PURPOSE, NON-INFRINGEMENT, ABSENCE OF LATENT OR OTHER DEFECTS, 293 | ACCURACY, OR THE PRESENCE OR ABSENCE OF ERRORS, WHETHER OR NOT 294 | KNOWN OR DISCOVERABLE. WHERE DISCLAIMERS OF WARRANTIES ARE NOT 295 | ALLOWED IN FULL OR IN PART, THIS DISCLAIMER MAY NOT APPLY TO YOU. 296 | 297 | b. TO THE EXTENT POSSIBLE, IN NO EVENT WILL THE LICENSOR BE LIABLE 298 | TO YOU ON ANY LEGAL THEORY (INCLUDING, WITHOUT LIMITATION, 299 | NEGLIGENCE) OR OTHERWISE FOR ANY DIRECT, SPECIAL, INDIRECT, 300 | INCIDENTAL, CONSEQUENTIAL, PUNITIVE, EXEMPLARY, OR OTHER LOSSES, 301 | COSTS, EXPENSES, OR DAMAGES ARISING OUT OF THIS PUBLIC LICENSE OR 302 | USE OF THE LICENSED MATERIAL, EVEN IF THE LICENSOR HAS BEEN 303 | ADVISED OF THE POSSIBILITY OF SUCH LOSSES, COSTS, EXPENSES, OR 304 | DAMAGES. WHERE A LIMITATION OF LIABILITY IS NOT ALLOWED IN FULL OR 305 | IN PART, THIS LIMITATION MAY NOT APPLY TO YOU. 306 | 307 | c. The disclaimer of warranties and limitation of liability provided 308 | above shall be interpreted in a manner that, to the extent 309 | possible, most closely approximates an absolute disclaimer and 310 | waiver of all liability. 311 | 312 | 313 | Section 6 -- Term and Termination. 314 | 315 | a. This Public License applies for the term of the Copyright and 316 | Similar Rights licensed here. However, if You fail to comply with 317 | this Public License, then Your rights under this Public License 318 | terminate automatically. 319 | 320 | b. Where Your right to use the Licensed Material has terminated under 321 | Section 6(a), it reinstates: 322 | 323 | 1. automatically as of the date the violation is cured, provided 324 | it is cured within 30 days of Your discovery of the 325 | violation; or 326 | 327 | 2. upon express reinstatement by the Licensor. 328 | 329 | For the avoidance of doubt, this Section 6(b) does not affect any 330 | right the Licensor may have to seek remedies for Your violations 331 | of this Public License. 332 | 333 | c. For the avoidance of doubt, the Licensor may also offer the 334 | Licensed Material under separate terms or conditions or stop 335 | distributing the Licensed Material at any time; however, doing so 336 | will not terminate this Public License. 337 | 338 | d. Sections 1, 5, 6, 7, and 8 survive termination of this Public 339 | License. 340 | 341 | 342 | Section 7 -- Other Terms and Conditions. 343 | 344 | a. The Licensor shall not be bound by any additional or different 345 | terms or conditions communicated by You unless expressly agreed. 346 | 347 | b. Any arrangements, understandings, or agreements regarding the 348 | Licensed Material not stated herein are separate from and 349 | independent of the terms and conditions of this Public License. 350 | 351 | 352 | Section 8 -- Interpretation. 353 | 354 | a. For the avoidance of doubt, this Public License does not, and 355 | shall not be interpreted to, reduce, limit, restrict, or impose 356 | conditions on any use of the Licensed Material that could lawfully 357 | be made without permission under this Public License. 358 | 359 | b. To the extent possible, if any provision of this Public License is 360 | deemed unenforceable, it shall be automatically reformed to the 361 | minimum extent necessary to make it enforceable. If the provision 362 | cannot be reformed, it shall be severed from this Public License 363 | without affecting the enforceability of the remaining terms and 364 | conditions. 365 | 366 | c. No term or condition of this Public License will be waived and no 367 | failure to comply consented to unless expressly agreed to by the 368 | Licensor. 369 | 370 | d. Nothing in this Public License constitutes or may be interpreted 371 | as a limitation upon, or waiver of, any privileges and immunities 372 | that apply to the Licensor or You, including from the legal 373 | processes of any jurisdiction or authority. 374 | 375 | 376 | ======================================================================= 377 | 378 | Creative Commons is not a party to its public 379 | licenses. Notwithstanding, Creative Commons may elect to apply one of 380 | its public licenses to material it publishes and in those instances 381 | will be considered the “Licensor.” The text of the Creative Commons 382 | public licenses is dedicated to the public domain under the CC0 Public 383 | Domain Dedication. Except for the limited purpose of indicating that 384 | material is shared under a Creative Commons public license or as 385 | otherwise permitted by the Creative Commons policies published at 386 | creativecommons.org/policies, Creative Commons does not authorize the 387 | use of the trademark "Creative Commons" or any other trademark or logo 388 | of Creative Commons without its prior written consent including, 389 | without limitation, in connection with any unauthorized modifications 390 | to any of its public licenses or any other arrangements, 391 | understandings, or agreements concerning use of licensed material. For 392 | the avoidance of doubt, this paragraph does not form part of the 393 | public licenses. 394 | 395 | Creative Commons may be contacted at creativecommons.org. 396 | 397 | -------------------------------------------------------------------------------- /language_guides/javascript.md: -------------------------------------------------------------------------------- 1 | # JavaScript 2 | 3 | _Page maintainer: Ewan Cahen_ [@ewan-escience](https://github.com/ewan-escience) 4 | 5 | [JavaScript](https://en.wikipedia.org/wiki/JavaScript) (JS) is a programming language that is one of the three (together with [HTML](https://en.wikipedia.org/wiki/HTML) and [CSS](https://en.wikipedia.org/wiki/CSS)) core technologies of the web. It is essential if you want to write interactive webpages or web applications, because JavaScript is, apart from [WebAssembly](https://webassembly.org/), the only programming language that runs in modern browsers. Furthermore, JS can also run [outside of the browser](/language_guides/javascript?id=javascript-outside-of-the-browser), e.g. for running short scripts or full-blown servers. 6 | 7 | ## Getting started 8 | 9 | A good introductory tutorial on JavaScript is [this one from W3Schools](https://www.w3schools.com/js/). 10 | 11 | Another source of information for JavaScript (and web development in general) is the [MDN Web Docs](https://developer.mozilla.org/en-US/docs/Learn). 12 | 13 | ## Frameworks 14 | 15 | Many people will jump straight to using a framework when building a web application. We, however, recommend that you learn the fundamentals first and get an impression of what problems frameworks are trying to solve for you. Read, for example, this article on [how the web works](https://developer.mozilla.org/en-US/docs/Learn/Getting_started_with_the_web/How_the_Web_works) a look at this [introduction to the DOM](https://developer.mozilla.org/en-US/docs/Web/API/Document_Object_Model/Introduction). 16 | 17 | A good video summary on the history of frameworks and the problems they try to solve can be found [here](https://www.youtube.com/watch?v=EPir6uxr1o8). 18 | 19 | Before you pick a framework, you should first consider what you are trying to build. 20 | 21 | - If you're building a (more traditional) website with mostly static content, like an info page for an event or a blog, whose content doesn't adapt to the visitor, consider using a [static site generator](https://jamstack.org/generators/) like [Jekyll](https://jekyllrb.com/) or [Hugo](https://gohugo.io/) or [Docusaurus](https://docusaurus.io/) for writing documentation. An advantage of this is that static sites can be hosted on [GitHub for free](https://pages.github.com/), which uses Jekyll by default (but you can use other static site generators as well). 22 | - If you're building a website that is not very interactive, but that many people have to edit, and when a static site generator is too technical, consider using [WordPress](https://wordpress.org/). Many hosting providers support WordPress out of the box. 23 | - When you need light interactivity, the options above can be combined with libraries like [jQuery](https://jquery.com/), [Alpine.js](https://alpinejs.dev/), [htmx](https://htmx.org/) or you can write the JavaScript yourself. 24 | - When you want to build a website that has high interactivity with its users, something you would call an "application" rather than a "website", consider using [htmx](https://htmx.org/) or one of the JavaScript frameworks below. 25 | 26 | Currently, the most popular frameworks are (ordered by popularity according to the [StackOverflow 2024 Developer Survey](https://survey.stackoverflow.co/2024/technology#1-web-frameworks-and-technologies)) 27 | 28 | - [React](https://react.dev/) 29 | - [Angular](https://angular.dev/) 30 | - [Vue.js](https://vuejs.org/) 31 | - [Svelte](https://svelte.dev/) 32 | - [SolidJS](https://www.solidjs.com/) 33 | 34 | ### React 35 | 36 | [React](https://react.dev/) is a framework which can used to create interactive User Interfaces by combining components. It is developed by Facebook. It is by far the most popular framework, resulting in a huge choice of libraries and a lot of available documentation. Contrary to most other frameworks, React apps are typically written in [JSX](https://react.dev/learn/writing-markup-with-jsx) instead of plain HTML, CSS and JS. 37 | 38 | Where other frameworks like Angular and Vue.js include rendering, routing and, state management functionality, React only does rendering, so other libraries must be used for routing and state management. 39 | [Redux](https://redux.js.org/) can be used to let state changes flow through React components. [React Router](https://reactrouter.com/) can be used to navigate the application using URLs. Or you can use a so-called "[meta-framework](https://prismic.io/blog/javascript-meta-frameworks-ecosystem)" like [Next.js](https://nextjs.org/). 40 | 41 | To create a React application, the official documentation recommends to [start with a meta-framework](https://react.dev/learn/start-a-new-react-project). Alternatively, you can use the tool [Create React App](https://create-react-app.dev/), optionally [with TypeScript](https://create-react-app.dev/docs/getting-started#creating-a-typescript-app). 42 | 43 | ### Angular 44 | 45 | [Angular](https://angular.dev/) is a application framework by Google written in [TypeScript](https://www.typescriptlang.org/). It is a full-blown framework, with many features included. It is therefore more used in enterprises and probably overkill for your average scientific project. Read more about what Angular is [in the documentation](https://angular.dev/overview). 46 | 47 | To create a Angular application see the [installation docs](https://angular.dev/installation). 48 | 49 | Angular also has a meta-framework called [Analog](https://analogjs.org/). 50 | 51 | ### Vue.js 52 | 53 | [Vue.js](https://vuejs.org/) is an open-source JavaScript framework for building user interfaces. Read about the use cases for Vue and reasons to use it [in their introduction](https://vuejs.org/guide/introduction.html). 54 | 55 | To create a Vue application, read the [quick start](https://vuejs.org/guide/quick-start). It also has info on using [TypeScript with Vue](https://vuejs.org/guide/typescript/overview). 56 | 57 | A meta-framework for Vue is [Nuxt](https://nuxt.com/). 58 | 59 | ### Svelte 60 | 61 | Svelte is a UI framework, that differs with most other frameworks in that is uses a compiler before shipping JavaScript to the client. Svelte applications are written in HTML, CSS and JS. Read more about Svelte in their [overview](https://svelte.dev/docs/svelte/overview). 62 | 63 | In their [documentation](https://svelte.dev/docs/svelte/getting-started), they recommend to use their meta-framework [SvelteKit](https://svelte.dev/docs/kit/introduction) to create a Svelte application. It also [supports TypeScript](https://svelte.dev/docs/svelte/typescript). 64 | 65 | ### Solid.js 66 | 67 | A UI framework that focuses on performance and being developer friendly. Like React, it uses [JSX](https://docs.solidjs.com/concepts/understanding-jsx). Read more about Solid [here](https://docs.solidjs.com/). 68 | 69 | To create a Solid application, check out the [quick start](https://docs.solidjs.com/quick-start). They also [support TypeScript](https://docs.solidjs.com/configuration/typescript). 70 | 71 | Solid has a meta-framework called [SolidStart](https://start.solidjs.com/). 72 | 73 | ## JavaScript outside of the browser 74 | 75 | Most JavaScript is run in web browsers, but if you want to run it outside of a browser (e.g. as a server or to run a script locally), you'll need a JavaScript **runtime**. These are the main runtimes available: 76 | 77 | - [Node.js](https://nodejs.org) is the most used runtime, mainly for being the only available runtime for a long time. This gives the advantage that there is a lot of documentation available (official and unofficial, e.g. forums) and that many tools are available for Node.js. It comes with a [package manager (npm)](https://www.npmjs.com/) that allows you to install packages from a huge library. Its installation instructions can be found [here](https://nodejs.org/en/learn/getting-started/how-to-install-nodejs). 78 | - [Deno](https://deno.com/) can be seen as a successor to Node.js and tries to improve on it in a few ways, most notably: 79 | - [built-in support](https://docs.deno.com/runtime/fundamentals/typescript/) for TypeScript 80 | - a better [security model](https://docs.deno.com/runtime/fundamentals/typescript/) 81 | - built-in tooling, like a [linter and formatter](https://docs.deno.com/runtime/fundamentals/linting_and_formatting/) 82 | - [compiling](https://docs.deno.com/runtime/reference/cli/compiler/) to standalone executables 83 | 84 | Its installation instructions can be found [here](https://docs.deno.com/runtime/getting_started/installation/) 85 | 86 | - [Bun](https://bun.sh/), the youngest runtime of the three. Its focus is on speed, reduced complexity and enhanced developer productivity (read more [here](https://bun.sh/docs)). Just like Deno, it comes with [built-in TypeScript support](https://bun.sh/docs/runtime/typescript), can [compile to standalone executables](https://bun.sh/docs/bundler/executables) and it aims to be fully [compatible with Node.js](https://bun.sh/docs/runtime/nodejs-apis). Its installation instructions can be found [here](https://bun.sh/docs/installation). 87 | 88 | A more comprehensive comparison can be found [in this guide](https://zerotomastery.io/blog/deno-vs-node-vs-bun-comparison-guide/). 89 | 90 | ### Which runtime to choose? 91 | 92 | To answer this question, you should consider what is important for you and your project. 93 | 94 | Choose Node.js if: 95 | 96 | - you need a stable, mature and a well established runtime with a large community around it; 97 | - you need to use dependencies that should most likely "just work"; 98 | - you cannot convince the people you work with to install something else; 99 | - you don't need any particular feature of any of its competitors. 100 | 101 | Choose Deno if: 102 | 103 | - you want a relatively mature runtime with a lot of features built in; 104 | - you want out-of-the-box TypeScript support; 105 | - you like its security model; 106 | - you want a complete package with a linter and formatter included; 107 | - you don't mind spending some time if something does not work directly. 108 | 109 | Choose Bun if: 110 | 111 | - you are willing to take a risk using a relatively new runtime; 112 | - you want out-of-the-box TypeScript support; 113 | - you want to use one of Bun's particular features; 114 | - you need maximum performance (though you should benchmark for your use case first and consider using a different programming language). 115 | 116 | ## Editors and IDEs 117 | 118 | These are some good JavaScript editors: 119 | 120 | - [WebStorm](https://www.jetbrains.com/webstorm/) by JetBrains. It is free (as in monetary cost) for [non-commercial use](https://www.jetbrains.com/legal/docs/toolbox/license_non-commercial/); otherwise you have to buy a licence. Most of its features are also available in other IDEs of JetBrains, like [IntelliJ IDEA ultimate](https://www.jetbrains.com/idea/), [PyCharm professional](https://www.jetbrains.com/pycharm/) and [Rider](https://www.jetbrains.com/rider/). You can compare the products of JetBrains [here](https://www.jetbrains.com/products/compare/?product=webstorm&product=idea). Note that the free version of WebStorm will [collect data](https://blog.jetbrains.com/blog/2024/10/24/webstorm-and-rider-are-now-free-for-non-commercial-use/#anonymous-data-collection) anonymously, _without_ the option to disable it. WebStorm comes with a lot of [functionality included](https://www.jetbrains.com/webstorm/features/), but also gives access to a [Marketplace of plugins](https://plugins.jetbrains.com/). 121 | - [Visual Studio Code](https://code.visualstudio.com), an open source and free (as in monetary cost) editor by Microsoft. By default, it collects [telemetry data](https://code.visualstudio.com/docs/getstarted/telemetry), but that can be [disabled](https://code.visualstudio.com/docs/getstarted/telemetry#_disable-telemetry-reporting). VSCode has a [limited feature set](https://code.visualstudio.com/docs/editor/whyvscode) out of the box, which can be enhanced with [extensions](https://marketplace.visualstudio.com/vscode). 122 | 123 | ## Debugging 124 | 125 | In web development, debugging is typically done in the browser. Read [this article from W3Schools](https://www.w3schools.com/js/js_debugging.asp) for more info. 126 | 127 | There is documentation for each browser on their [dev tools](https://en.wikipedia.org/wiki/Web_development_tools): 128 | 129 | - [Firefox](https://firefox-source-docs.mozilla.org/devtools-user/) 130 | - [Chrome](https://developer.chrome.com/docs/devtools) 131 | - [Edge](https://learn.microsoft.com/en-us/microsoft-edge/devtools-guide-chromium/overview) 132 | - [Safari](https://developer.apple.com/safari/tools/) 133 | 134 | There are also debugging guides for the various JS runtimes: 135 | 136 | - [Node.js](https://nodejs.org/en/learn/getting-started/debugging) 137 | - [Deno](https://docs.deno.com/runtime/fundamentals/debugging/) 138 | - [Bun](https://bun.sh/docs/runtime/debugger) 139 | 140 | When using a (meta-)framework, also have a look at its documentation. 141 | 142 | Sometimes, the JavaScript code in the browser is not an exact copy of the code you see in your development environment, for example because the original source code is minified/uglified or transpiled before it's loaded in the browser. 143 | All major browsers can now deal with this through so-called [source maps](https://web.dev/articles/source-maps), which instruct the browser which symbol/line in a javascript file corresponds to which line in the human-readable source code. 144 | Look for the 'create sourcemaps' option when using minification/uglification/transpiling tools. 145 | 146 | ## Hosting data files 147 | 148 | To display web pages (HTML files) with JavaScript, you can't use any file system URL due to safety restrictions. 149 | You should use a [web server](https://developer.mozilla.org/en-US/docs/Learn/Common_questions/Web_mechanics/What_is_a_web_server) (which may still serve files that are local). 150 | A simple web server can be started from the directory you want to host files with: 151 | 152 | ```bash 153 | python3 -m http.server 8000 154 | ``` 155 | 156 | 157 | 158 | Then open the web browser to http://localhost:8000. 159 | 160 | ## Documentation :id=js-docs 161 | 162 | [JSDoc](https://jsdoc.app/) (similar to [JavaDoc](https://www.baeldung.com/javadoc)), parses your JavaScript files and automatically generates HTML documentation, based on the JSDoc comments you put in the code. 163 | 164 | ## Testing 165 | 166 | The various runtimes have testing functionality included, so you don't have to install extra dependencies: 167 | 168 | - [Node.js](https://nodejs.org/en/learn/test-runner/introduction) 169 | - [Deno](https://docs.deno.com/runtime/fundamentals/testing/) 170 | - [Bun](https://bun.sh/guides/test/run-tests) 171 | 172 | If these don't suffice, a nice overview of popular testing frameworks can be found [here](https://raygun.com/blog/javascript-unit-testing-frameworks/). 173 | 174 | ### Testing with browsers 175 | 176 | To interact with web browsers use [Selenium](https://www.selenium.dev/). 177 | 178 | ## Coding style 179 | 180 | ### Formatters 181 | 182 | A formatter is a tool to make your source code look consistent and easy to look at. In web development, the most used formatter is [Prettier](https://prettier.io/), which can [integrate with many editors](https://prettier.io/docs/en/editors). You could [set up a GitHub action](https://akhilaariyachandra.com/blog/prettier-in-github-actions) that rejects pull requests that are not formatted properly. 183 | 184 | When using Deno, you can also use its [built-in formatter](https://docs.deno.com/runtime/fundamentals/linting_and_formatting/#formatting). 185 | 186 | An alternative to Prettier is [Biome](https://biomejs.dev/), which also includes a linter. 187 | 188 | In any case, remember to use tabs for indentation for the [purpose of accessibility](https://old.reddit.com/r/javascript/comments/c8drjo/nobody_talks_about_the_real_reason_to_use_tabs/). 189 | 190 | ### Linters 191 | 192 | A linter is a tool to check your code quality, in order to prevent bugs. The most used linter is [ESLint](https://eslint.org/). It has [many integrations](https://eslint.org/docs/latest/use/integrations) 193 | 194 | When using Deno, you can also use its [built-in linter](https://docs.deno.com/runtime/fundamentals/linting_and_formatting/#linting). 195 | 196 | An alternative to ESLint is [Biome](https://biomejs.dev/), which also includes a formatter. 197 | 198 | Also have a look at the [Airbnb JavaScript Style Guide](https://github.com/airbnb/javascript) or the W3Schools page on [JavaScript best practices](https://www.w3schools.com/js/js_best_practices.asp). 199 | 200 | ### Code quality analysis tools and services 201 | 202 | For more in-depth analyses, you can use a code quality and analysis tool. 203 | 204 | - [SonarCloud](https://sonarcloud.io) is an open platform to manage code quality which can also show code coverage and count test results over time. It easily [integrates with GitHub](https://github.com/marketplace/sonarcloud). 205 | - [Codacy](https://www.codacy.com) can analyze [many different languages](https://docs.codacy.com/getting-started/supported-languages-and-tools/) using open source tools. It also offers [GitHub integration](https://docs.codacy.com/repositories-configure/integrations/github-integration/). 206 | - [Code climate](https://codeclimate.com/quality) can analyze JavaScript (and Ruby, PHP). Can analyze Java (best supported), C, C++, Python, JavaScript and TypeScript. 207 | 208 | ## Showing code examples 209 | 210 | You can use [jsfiddle](https://jsfiddle.net/), which shows you a live preview of your web page while you fiddle with the underlying HTML, JavaScript and CSS code. 211 | 212 | ## TypeScript 213 | 214 | https://www.typescriptlang.org/ 215 | 216 | TypeScript is a typed superset of JavaScript which compiles to plain JavaScript. TypeScript adds static typing to JavaScript, which makes it easier to scale up in people and lines of code. 217 | 218 | At the Netherlands eScience Center we prefer TypeScript to JavaScript as it will lead to more sustainable software. 219 | 220 | This section highlights the differences with JavaScript. For topics without significant differences, like IDEs, code style etc., see the respective JavaScript section. 221 | 222 | ### Getting Started 223 | 224 | To learn about TypeScript, the following resources are available: 225 | 226 | - Official [TypeScript documentation](https://www.typescriptlang.org/docs/) and [tutorial](https://www.typescriptlang.org/docs/handbook/intro.html) 227 | - [Single video tutorial](https://www.youtube.com/watch?v=d56mG7DezGs) and [playlist tutorial](https://www.youtube.com/playlist?list=PL4cUxeGkcC9gUgr39Q_yD6v-bSyMwKPUI) 228 | - Tutorials on debugging TypeScript in [Chrome](https://blog.logrocket.com/how-to-debug-typescript-chrome/) and [Firefox](https://hacks.mozilla.org/2019/09/debugging-typescript-in-firefox-devtools/). If you are using a framework, consult the documentation of that framework for additional ways of debugging 229 | - [The Definitive TypeScript 5.0 Guide](https://www.sitepen.com/blog/update-the-definitive-typescript-guide) 230 | - The [W3Schools TypeScript tutorial](https://www.w3schools.com/typescript/index.php) 231 | 232 | ### Quickstart 233 | 234 | To install TypeScript compiler run, check out the [official documentation](https://www.typescriptlang.org/download/). Note that Deno and Bun support TypeScript [out of the box](/language_guides/javascript?id=javascript-outside-of-the-browser). 235 | 236 | ### Dealing with Types 237 | 238 | In TypeScript, variables are typed and these types are checked. 239 | This implies that when using libraries, the types of these libraries need to be installed. 240 | More and more libraries ship with type declarations in them so they can be used directly. These libraries will have a "typings" key in their `package.json`. 241 | When a library does not ship with type declarations then the libraries `@types/` package must be installed using npm: 242 | 243 | ```shell 244 | npm install --save-dev @types/ 245 | ``` 246 | 247 | For example say we want to use the `react` package which we installed using `npm`: 248 | 249 | ```shell 250 | npm install react --save 251 | ``` 252 | 253 | To be able to use its functionality in TypeScript we need to install the typings. 254 | 255 | Install it with: 256 | 257 | ```shell 258 | npm install --save-dev @types/react 259 | ``` 260 | 261 | The `--save-dev` flag saves this installation to the package.json file as a development dependency. 262 | Do not use `--save` for types because a production build will have been transpiled to JavaScript and has no use for TypeScript types. 263 | 264 | ### Debugging 265 | 266 | In web development, debugging is typically done in the browser. 267 | TypeScript cannot be run directly in the web browser, so it must be transpiled to JavaScript. To map a breakpoint in the browser to a line in the original TypeScript file [source maps](https://www.html5rocks.com/en/tutorials/developertools/sourcemaps/) are required. Most frameworks have a project build system which generate source maps. For more info, see the [Javascript section on debugging](/language_guides/javascript?id=debugging) 268 | 269 | ### Documentation 270 | 271 | Just like [JSDoc](/language_guides/javascript?id=js-docs) for JavaScript, [TypeDoc](https://typedoc.org/) can automatically generate HTML documentation for your code. 272 | -------------------------------------------------------------------------------- /language_guides/ccpp.md: -------------------------------------------------------------------------------- 1 | # C and C++ 2 | 3 | _Page maintainer: Johan Hidding_ [@jhidding](https://github.com/jhidding) 4 | 5 | C++ is one of the hardest languages to learn. Entering a project where C++ coding is needed should not be taken lightly. This guide focusses on tools and documentation for use of C++ in an open-source environment. 6 | 7 | ### Standards 8 | 9 | The latest ratified standard of C++ is C++17. The first standardised version of C++ is from 1998. The next version of C++ is scheduled for 2020. With these updates (especially the 2011 one) the preferred style of C++ changed drastically. As a result, a program written in 1998 looks very different from one from 2018, but it still compiles. There are many videos on Youtube describing some of these changes and how they can be used to make your code look better (i.e. more maintainable). This goes with a warning: Don't try to be too smart; other people still have to understand your code. 10 | 11 | ## Practical use 12 | 13 | ### Compilers 14 | 15 | There are two main-stream open-source C++ compilers. 16 | 17 | - [GCC](https://gcc.gnu.org/) 18 | - [LLVM - CLANG](http://llvm.org/) 19 | 20 | Overall, these compilers are more or less similar in terms of features, language support, compile times and (perhaps most importantly) performance of the generated binaries. 21 | The generated binary performance does differ for specific algorithms. 22 | See for instance [this Phoronix benchmark for a comparison of GCC 9 and Clang 7/8](https://www.phoronix.com/scan.php?page=article&item=gcc9-stage3-skylake). 23 | 24 | MacOS (XCode) has a custom branch of `clang`, which misses some features like OpenMP support, and its own libcxx, which misses some standard library things like the very useful `std::filesystem` module. 25 | It is nevertheless recommended to use it as much as possible to maintain binary compatibility with the rest of macOS. 26 | 27 | If you need every last erg of performance, some cluster environments have the Intel compiler installed. 28 | 29 | These compilers come with a lot of options. Some basic literacy in GCC and CLANG: 30 | 31 | - `-O` changes optimisation levels 32 | - `-std=c++xx` sets the C++ standard used 33 | - `-I*path*` add path to search for include files 34 | - `-o*file*` output file 35 | - `-c` only compile, do not link 36 | - `-Wall` be more verbose with warnings 37 | 38 | And linker flags: 39 | 40 | - `-l*library*` links to a library 41 | - `-L*path*` add path to search for libraries 42 | - `-shared` make a shared library 43 | - `-Wl,-z,defs` ensures all symbols are accounted for when linking to a shared object 44 | 45 | ### Interpreter 46 | 47 | There **is** a C++ interpreter called [Cling](https://rawgit.com/vgvassilev/cling/master/www/index.html). 48 | This also comes with a [Jupyter notebook kernel](http://jupyter.org/try). 49 | 50 | ### Build systems 51 | 52 | There are several build systems that handle C/C++. 53 | Currently, [the CMake system is most popular](https://www.jetbrains.com/research/devecosystem-2018/cpp/). 54 | It is not actually a build system itself; it generates build files based on (in theory) platform-independent and compiler-independent configuration files. 55 | It can generate Makefiles, but also [Ninja](https://ninja-build.org/) files, which gives much faster build times, NMake files for Windows and more. 56 | Some popular IDEs keep automatic count for CMake, or are even completely built around it ([CLion](http://www.jetbrains.com/clion/)). 57 | The major drawback of CMake is the confusing documentation, but this is generally made up for in terms of community support. 58 | When Googling for ways to write your CMake files, make sure you look for "modern CMake", which is a style that has been gaining traction in the last few years and makes everything better (e.g. dependency management, but also just the CMake files themselves). 59 | 60 | Traditionally, the auto-tools suite (AutoConf and AutoMake) was _the_ way to build things on Unix; you'll probably know the three command salute: 61 | 62 | > ./configure --prefix=~/.local 63 | ... 64 | > make -j4 65 | ... 66 | > make install 67 | 68 | With either one of these two (CMake or Autotools), any moderately experienced user should be able to compile your code (if it compiles). 69 | 70 | There are many other systems. 71 | Microsoft Visual Studio has its own project model / build system and a library like Qt also forces its own build system on you. 72 | We do not recommend these if you don't also supply an option for building with CMake or Autotools. 73 | Another modern alternative that has been gaining attention mainly in the GNU/Gnome/Linux world is [Meson](http://mesonbuild.com/), which is also based on [Ninja](https://ninja-build.org/). 74 | 75 | ### Package management 76 | 77 | There is no standard package manager like `pip`, `npm` or `gem` for C++. 78 | This means that you will have to choose depending on your particular circumstances what tool to use for installing libraries and, possibly, packaging the tools you yourself built. 79 | Some important factors include: 80 | 81 | - Whether or not you have root/admin access to your system 82 | - What kind of environment/ecosystem you are working in. For instance: 83 | - There are many tools targeted specifically at HPC/cluster environments. 84 | - Specific communities (e.g. NLP research or bioinformatics) may have gravitated towards specific tools, so you'll probably want to use those for maximum impact. 85 | - Whether software is packaged at all; many C/C++ tools only come in source form, hopefully with [build setup configuration](#build-systems). 86 | 87 | #### Yes root access 88 | 89 | If you have root/admin access to your system, the first go-to for libraries may be your OS package manager. 90 | If the target package is not in there, try to see if there is an equivalent library that is, and see what kind of software uses it. 91 | 92 | #### No root access 93 | 94 | A good, cross-platform option nowadays is to use [`miniconda`](https://conda.io/miniconda.html), which works on Linux, macOS and Windows. 95 | The `conda-forge` channel especially has a lot of C++ libraries. 96 | Specify that you want to use this channel with command line option `-c conda-forge`. 97 | The `bioconda` channel in turn builds upon the `conda-forge` libraries, hosting a lot of bioinformatics tools. 98 | 99 | #### Managing non-packaged software 100 | 101 | If you do have to install a programm, which depends on a specific version of a library which depends on a specific version of another library, you enter what is called _dependency hell_. 102 | Some agility in compiling and installing libraries is essential. 103 | 104 | You can install libraries in `/usr/local` or in `${HOME}/.local` if you aren't root, but there you have no package management. 105 | 106 | Many HPC administrations provide [environment modules](https://modules.readthedocs.io/en/latest/) (`module avail`), which allow you to easily populate your `$PATH` and other environment variables to find the respective package. You can also write your own module files to solve your _dependency hell_. 107 | 108 | A lot of libraries come with a package description for `pkg-config`. 109 | These descriptions are installed in `/usr/lib/pkgconfig`. 110 | You can point `pkg-config` to your additional libraries by setting the `PKG_CONFIG_PATH` environment variable. 111 | This also helps for instance when trying to automatically locate dependencies from CMake, which has `pkg-config` support as a fallback for when libraries don't support CMake's `find_package`. 112 | 113 | If you want to keep things organized on systems where you use multiple versions of the same software for different projects, a simple solution is to use something like `xstow`. 114 | [XStow](http://xstow.sourceforge.net/) is a poor-mans package manager. 115 | You install each library in its own directory (`~/.local/pkg/` for instance), then running `xstow` will create symlinks to the files in the `~/.local` directory (one above the XStow package directory). 116 | Using XStow in this way alows you to keep a single additional search path when compiling your next library. 117 | 118 | #### Packaging software 119 | 120 | In case you find the manual compilation too cumbersome, or want to conveniently distribute software (your own or perhaps one of your project's dependencies that the author did not package themselves), you'll have to build your own package. 121 | The above solutions are good defaults for this, but there are some additional options that are widely used. 122 | 123 | - For distribution to root/admin users: system package managers (Linux: `apt`, `yum`, `pacman`, macOS: Homebrew, Macports) 124 | - For distribution to any users: [Conda](https://conda.io/miniconda.html) and [Conan](https://conan.io/) are cross-platform (Linux, macOS, Windows) 125 | - For distribution to HPC/cluster users: see options below 126 | 127 | When choosing which system to build your package for, it is imporant to consider your target audience. 128 | If any of these tools are already widely used in your audience, pick that one. 129 | If not, it is really up to your personal preferences, as all tools have their pros and cons. 130 | Some general guidelines could be: 131 | 132 | - prefer multi-platform over single platform 133 | - prefer widely used over obscure (even if it's technically magnificent, if nobody uses it, it's useless for distributing your software) 134 | - prefer multi-language over single language (especially for C++, because it is so often used to build libraries that power higher level languages) 135 | 136 | But, as the state of the package management ecosystem shows, in practice, there will be many exceptions to these guidelines. 137 | 138 | #### HPC/cluster environments 139 | 140 | One way around this if the system does use `module` is to use [Easybuild](https://easybuild.readthedocs.io/en/latest/), which makes installing modules in your home directory quite easy. 141 | Many recipes (called Easyblocks) for building packages or whole toolchains are [available online](https://easybuild.readthedocs.io/en/latest/version-specific/Supported_software.html). 142 | These are written in Python. 143 | 144 | A similar package that is used a lot in the bioinformatics community is [guix](https://hpc.guix.info/). 145 | With guix, you can create virtual environments, much like those in Python `virtualenv` or Conda. 146 | You can also create relocatable binaries to use your binaries on systems that do not have guix installed. 147 | This makes it easy to test your packages on your laptop before deploying to a cluster system. 148 | 149 | A package that gains more traction at the moment for HPC environments is [spack](https://spack.readthedocs.io/en/latest/). 150 | Spack allows you to pick from many compilers. When installing packages, it compiles every package from scratch. This allows you to be tailor compilation flags and such to take fullest advantage of your cluster's hardware, which can be essential in HPC situations 151 | 152 | #### Near future: Modules 153 | 154 | Note that C++20 will bring Modules, which can be used as an alternative to including (precompiled) header files. 155 | This will allow for easier packaging and will probably cause the package management landscape to change considerably. 156 | For this reason, it may be wise at this time to keep your options open and keep an eye on developments within the different package management solutions. 157 | 158 | ### Editors 159 | 160 | This is largely a matter of taste, but not always. 161 | 162 | In theory, given that there are many good command line tools available for working with C(++) code, any code editor will do to write C(++). 163 | Some people also prefer to avoid relying on IDEs too much; by helping your memory they can also help you to write less maintainable code. 164 | People of this persuasion would usually recommend any of the following editors: 165 | 166 | - Vim, recommended plugins: 167 | - [NERDTree](https://github.com/scrooloose/nerdtree) file explorer. 168 | - [editorconfig](https://github.com/editorconfig/editorconfig-vim) 169 | - [stl.vim](https://www.vim.org/scripts/script.php?script_id=4293) adds STL to syntax highlighting 170 | - [Syntastic](https://github.com/scrooloose/syntastic) 171 | - Integrated debugging using [Clewn](http://clewn.sourceforge.net/) 172 | - Emacs: 173 | - Has GDB mode for debugging. 174 | - More modern editors: Atom / Sublime Text / VS Code 175 | - Rich plugin ecosystem 176 | - Easier on the eyes... I mean modern OS/GUI integration 177 | 178 | In practice, sometimes you run into large/complex existing projects and navigating these can be really hard, especially when you just start working on the project. 179 | In these cases, an IDE can really help. 180 | Intelligent code suggestions, easy jumping between code segments in different files, integrated debugging, testing, VCS, etc. can make the learning curve a lot less steep. 181 | Good/popular IDEs are 182 | 183 | - CLion 184 | - Visual Studio (Windows only, but many people swear by it) 185 | - Eclipse 186 | 187 | ### Code and program quality analysis 188 | 189 | C++ (and C) compilers come with built in linters and tools to check that your program runs correctly, make sure you use those. In order to find issues, it is probably a good idea to use both compilers (and maybe the valgrind memcheck tool too), because they tend to detect different problems. 190 | 191 | #### Automatic Formatting with clang-format 192 | 193 | While most IDEs and some editors offer automatic formatting of files, [clang-format](http://clang.llvm.org/docs/ClangFormat.html) is a standalone tool, which offers sensible defaults and a huge range of customisation options. Integrating it into the CI workflow guarantees that checked in code adheres to formatting guidelines. 194 | 195 | #### Static code analysis with GCC 196 | 197 | To use the GCC linter, use the following set of compiler flags when compiling C++ code: 198 | 199 | ``` 200 | -O2 -Wall -Wextra -Wcast-align -Wcast-qual -Wctor-dtor-privacy -Wdisabled-optimization -Wformat=2 201 | -Winit-self -Wlogical-op -Wmissing-declarations -Wmissing-include-dirs -Wnoexcept -Wold-style-cast 202 | -Woverloaded-virtual -Wredundant-decls -Wshadow -Wsign-conversion -Wsign-promo -Wstrict-null-sentinel 203 | -Wstrict-overflow=5 -Wswitch-default -Wundef -Wno-unused 204 | ``` 205 | 206 | and these flags when compiling C code: 207 | 208 | ``` 209 | -O2 -Wall -Wextra -Wformat-nonliteral -Wcast-align -Wpointer-arith -Wbad-function-cast 210 | -Wmissing-prototypes -Wstrict-prototypes -Wmissing-declarations -Winline -Wundef 211 | -Wnested-externs -Wcast-qual -Wshadow -Wwrite-strings -Wno-unused-parameter 212 | -Wfloat-equal 213 | ``` 214 | 215 | Use at least optimization level 2 (`-O2`) to have GCC perform code analysis up to a level where you get all warnings. Use the `-Werror` flag to turn warnings into errors, i.e. your code won't compile if you have warnings. See this [post](https://stackoverflow.com/questions/5088460/flags-to-enable-thorough-and-verbose-g-warnings) for an explanation of why this is a reasonable selection of warning flags. 216 | 217 | #### Static code analysis with Clang (LLVM) 218 | 219 | Clang has the very convenient flag 220 | 221 | ``` 222 | -Weverything 223 | ``` 224 | 225 | A good strategy is probably to start out using this flag and then disable any warnings that you do not find useful. 226 | 227 | #### Static code analysis with cppcheck 228 | 229 | An additional good tool that detects many issues is cppcheck. Most editors/IDEs have plugins to use it automatically. 230 | 231 | #### Dynamic program analysis using `-fsanitize` 232 | 233 | Both GCC and Clang allow you to compile your code with the `-fsanitize=` flag, which will instrument your program to detect various errors quickly. The most useful option is probably 234 | 235 | ``` 236 | -fsanitize=address -O2 -fno-omit-frame-pointer -g 237 | ``` 238 | 239 | which is a fast memory error detector. There are also other options available like `-fsanitize=thread` and `-fsanitize=undefined`. See the GCC man page or the [Clang online manual](https://clang.llvm.org/docs/index.html) for more information. 240 | 241 | #### Dynamic program analysis using the valgrind suite of tools 242 | 243 | The [valgrind suite of tools](http://valgrind.org/info/tools.html) has tools similar to what is provided by the `-fsanitize` compiler flag as well as various profiling tools. Using the valgrind tool memcheck to detect memory errors is typically slower than using compiler provided option, so this might be something you will want to do less often. You will probably want to compile your code with debug symbols enabled (`-g`) in order to get useful output with memcheck. When using the profilers, keep in mind that a [statistical profiler](https://en.wikipedia.org/wiki/Profiling_%28computer_programming%29#Statistical_profilers) may give you more realistic results. 244 | 245 | ### Automated code refactoring 246 | 247 | Sometimes you have to update large parts of your code base a little bit, like when you move from one standard to another or you changed a function definition. Although this can be accomplished with a `sed` command using regular expressions, this approach is dangerous, if you use macros, your code is not formatted properly etc.... [Clang-tidy](https://clang.llvm.org/extra/clang-tidy/) can do these things and many more by using the abstract syntax tree of the compiler instead of the source code files to refactor your code and thus is much more robust but also powerful. 248 | 249 | ### Debugging 250 | 251 | Most of your time programming C(++) will probably be spent on debugging. 252 | At some point, surrounding every line of your code with `printf("here %d", i++);` will no longer avail you and you will need a more powerful tool. 253 | With a debugger, you can inspect the program while it is running. 254 | You can pause it, either at random points when you feel like it or, more usually, at so-called breakpoints that you specified in advance, for instance at a certain line in your code, or when a certain function is called. 255 | When paused, you can inspect the current values of variables, manually step forward in the code line by line (or by function, or to the next breakpoint) and even change values and continue running. 256 | Learning to use these powerful tools is a very good time investment. 257 | There are some really good CppCon videos about debugging on YouTube. 258 | 259 | - GDB - the GNU Debugger, many graphical front-ends are based on GDB. 260 | - LLDB - the LLVM debugger. This is the go-to GDB alternative for the LLVM toolchain, especially on macOS where GDB is hard to setup. 261 | - DDD - primitive GUI frontend for GDB. 262 | - The IDEs mentioned above either have custom built-in debuggers or provide an interface to GDB or LLDB. 263 | 264 | ## Libraries 265 | 266 | Historically, many C and C++ projects have seemed rather hestitant about using external dependencies (perhaps due to the poor dependency management situation mentioned above). 267 | However, many good (scientific) computing libraries are available today that you should consider using if applicable. 268 | Here follows a list of libraries that we recommend and/or have experience with. 269 | These can typically be installed from a wide range of [package managers](#package-management). 270 | 271 | ### Usual suspects 272 | 273 | These scientific libraries are well known, widely used and have a lot of good online documentation. 274 | 275 | - [GNU Scientific library (GSL)](https://www.gnu.org/software/gsl/doc/html/index.html) 276 | - [FFTW](http://www.fftw.org): Fastest Fourier Transform in the West 277 | - [OpenMPI](https://www.open-mpi.org). Use with caution, since it will strongly define the structure of your code, which may or may not be desirable. 278 | 279 | ### Boost 280 | 281 | This is what the Google style guide has to say about Boost: 282 | 283 | > - **Definition:** The Boost library collection is a popular collection of peer-reviewed, free, open-source C++ libraries. 284 | > - **Pros:** Boost code is generally very high-quality, is widely portable, and fills many important gaps in the C++ standard library, such as type traits and better binders. 285 | > - **Cons:** Some Boost libraries encourage coding practices which can hamper readability, such as metaprogramming and other advanced template techniques, and an excessively "functional" style of programming. 286 | 287 | As a general rule, don't use Boost when there is equivalent STL functionality. 288 | 289 | ### xtensor 290 | 291 | [xtensor](http://github.com/xtensor-stack/xtensor) is a modern (C++14) N-dimensional tensor (array, matrix, etc) library for numerical work in the style of Python's NumPy. 292 | It aims for maximum performance (and in most cases it succeeds) and has an active development community. 293 | This library features, among other things: 294 | 295 | - Lazy-evaluation: only calculate when necessary. 296 | - Extensible template expressions: automatically optimize many subsequent operations into one "kernel". 297 | - NumPy style syntax, including broadcasting. 298 | - C++ STL style interfaces for easy integration with STL functionality. 299 | - [Very low-effort integration with today's main data science languages Python](https://blog.esciencecenter.nl/irregular-data-in-pandas-using-c-88ce311cb9ef?gi=23ebfce3ae77), R and Julia. 300 | This all makes xtensor a very interesting choice compared to similar older libraries like Eigen and Armadillo. 301 | 302 | ### General purpose, I/O 303 | 304 | - Configuration file reading and writing: 305 | - [yaml-cpp](https://github.com/jbeder/yaml-cpp): A YAML parser and emitter in C++ 306 | - [JSON for Modern C++](https://nlohmann.github.io/json/) 307 | - Command line argument parsing: 308 | - [argagg](https://github.com/vietjtnguyen/argagg) 309 | - [Clara](https://github.com/catchorg/Clara) 310 | - [fmt](https://github.com/fmtlib/fmt): pythonic string formatting 311 | - [hdf5](https://github.com/HDFGroup/hdf5): The popular HDF5 binary format C++ interface. 312 | 313 | ### Parallel processing 314 | 315 | - [oneAPI Threading Building Blocks](https://oneapi-src.github.io/oneTBB/) (oneTBB): template library for task parallelism 316 | - [ZeroMQ](http://zeromq.org): lower level flexible communication library with a unified interface for message passing between threads and processes, but also between separate machines via TCP. 317 | 318 | ## Style 319 | 320 | ### Style guides 321 | 322 | Good style is not just about layout and linting on trailing whitespace. It will mean the difference between a blazing fast code and a broken one. 323 | 324 | - [C++ Core Guidelines](http://isocpp.github.io/CppCoreGuidelines/CppCoreGuidelines) 325 | - [Guidelines Support Library](https://github.com/Microsoft/GSL) 326 | - [Google Style Guide](https://google.github.io/styleguide/cppguide.html) 327 | - [Google Style Guide - github](https://github.com/google/styleguide) Contains the CppLint linter. 328 | 329 | ### Project layout 330 | 331 | A C++ project will usually have directories `/src` for source codes, `/doc` for Doxygen output, `/test` for testing code. Some people like to put header files in `/include`. In C++ though, many header files will contain functioning code (templates and inline functions). This makes the separation between code and interface a bit murky. 332 | In this case, it can make more sense to put headers and implementation in the same tree, but different communities will have different opinions on this. 333 | A third option that is sometimes used is to make separate "template implementation" header files. 334 | 335 | ## Sustainability 336 | 337 | ### Testing 338 | 339 | Use [Google Test](https://github.com/google/googletest). 340 | It is light-weight, good and is used a lot. 341 | [Catch2](https://github.com/catchorg/Catch2) is also pretty good, well maintained and has native support in the CLion 342 | IDE. 343 | 344 | ### Documentation 345 | 346 | Use [Doxygen](http://www.doxygen.nl/). It is the de-facto standard way of inlining documentation into comment sections of your code. The output is very ugly. Mini-tutorial: run `doxygen -g` (preferably inside a `doc` folder) in a new project to set things up, from then on, run `doxygen` to (re-)generate the documentation. 347 | 348 | A newer but less mature option is [cldoc](http://jessevdk.github.io/cldoc/). 349 | 350 | ## Resources 351 | 352 | ### Online 353 | 354 | - [CppCon videos](https://www.youtube.com/user/CppCon): Many really good talks recorded at the various CppCon meetings. 355 | - [CppReference.com](http://en.cppreference.com/w/) 356 | - [C++ Annotations](http://www.icce.rug.nl/documents/cplusplus/) 357 | - [CPlusPlus.com](http://www.cplusplus.com/) 358 | - [Modern C++, according to Microsoft](https://msdn.microsoft.com/en-us/library/hh279654.aspx) 359 | 360 | ### Books 361 | 362 | - Bjarne Soustrup - The C++ Language 363 | - Scott Meyers - Effective Modern C++ 364 | -------------------------------------------------------------------------------- /language_guides/python.md: -------------------------------------------------------------------------------- 1 | # Python 2 | 3 | _Page maintainer: Bouwe Andela_ [@bouweandela](https://github.com/bouweandela) 4 | 5 | Python is the "dynamic language of choice" of the Netherlands eScience Center. 6 | We use it for data analysis and data science projects, and for many other types of projects: workflow management, visualization, natural language processing, web-based tools and much more. 7 | It is a good default choice for many kinds of projects due to its generic nature, its large and broad ecosystem of third-party modules and its compact syntax which allows for rapid prototyping. 8 | It is not the language of maximum performance, although in many cases performance critical components can be easily replaced by modules written in faster, compiled languages like C(++) or Cython. 9 | 10 | The philosophy of Python is summarized in the [Zen of Python](https://www.python.org/dev/peps/pep-0020/). 11 | In Python, this text can be retrieved with the `import this` command. 12 | 13 | ## Project setup 14 | 15 | When starting a new Python project, consider using our [Python template](https://github.com/NLeSC/python-template). This template provides a basic project structure, so you can spend less time setting up and configuring your new Python packages, and comply with the software guide right from the start. 16 | 17 | ## Use Python 3, avoid 2 18 | 19 | Python 2 and Python 3 have co-existed for a long time, but [starting from 2020, development of Python 2 is officially abandoned](https://www.python.org/doc/sunset-python-2/), meaning Python 2 will no longer be improved, even in case of security issues. 20 | If you are creating a new package, use Python 3. 21 | It is possible to write Python that is both Python 2 and Python 3 compatible (e.g. using [Six](https://pypi.org/project/six/)), but only do this when you are 100% sure that your package won't be used otherwise. 22 | If you need Python 2 because of old, incompatible Python 2 libraries, strongly consider upgrading those libraries to Python 3 or replacing them altogether. 23 | Building and/or using Python 2 is probably discouraged even more than, say, using Fortran 77, since at least Fortran 77 compilers are still being maintained. 24 | 25 | - [Six](https://pypi.org/project/six/): Python 2 and 3 Compatibility Library 26 | - [2to3](https://docs.python.org/2/library/2to3.html): Automated Python 2 to 3 code translation 27 | - [python-modernize](https://github.com/mitsuhiko/python-modernize): wrapper around 2to3 28 | 29 | ## Learning Python 30 | 31 | - A popular way to learn Python is by doing it the hard way at http://learnpythonthehardway.org/ 32 | - Using [`pylint`](https://www.pylint.org) and [`yapf`](https://github.com/google/yapf) while learning Python is an easy way to get familiar with best practices and commonly used coding styles 33 | 34 | ## Dependencies and package management 35 | 36 | To install Python packages use `pip` or `conda` (or both, see also [what is the difference between pip and conda?](http://stackoverflow.com/questions/20994716/what-is-the-difference-between-pip-and-conda)). 37 | 38 | If you are planning on distributing your code at a later stage, be aware that your choice of package management may affect your packaging process. See [Building and packaging](#building-and-packaging-code) for more info. 39 | 40 | ### Use virtual environments 41 | 42 | We strongly recommend creating isolated "virtual environments" for each Python project. 43 | These can be created with `venv` or with `conda`. 44 | Advantages over installing packages system-wide or in a single user folder: 45 | 46 | - Installs Python modules when you are not root. 47 | - Contains all Python dependencies so the environment keeps working after an upgrade. 48 | - Keeps environments clean for each project, so you don't get more than you need (and can easily reproduce that minimal working situation). 49 | - Lets you select the Python version per environment, so you can test code compatibility between Python versions 50 | 51 | ### Pip + a virtual environment 52 | 53 | If you don't want to use `conda`, create isolated Python environments with the standard library [`venv`](https://docs.python.org/3/library/venv.html) module. 54 | If you are still using Python 2, [`virtualenv`](https://virtualenv.pypa.io/en/latest/) and [`virtualenvwrapper`](https://virtualenvwrapper.readthedocs.org) can be used instead. 55 | 56 | With `venv` and `virtualenv`, `pip` is used to install all dependencies. An increasing number of packages are using [`wheel`](http://pythonwheels.com), so `pip` downloads and installs them as binaries. This means they have no build dependencies and are much faster to install. 57 | 58 | If the installation of a package fails because of its non-Python extensions or system library dependencies and you are not root, you could switch to `conda` (see below). 59 | 60 | ### Conda 61 | 62 | [Conda](http://conda.pydata.org/docs/) can be used instead of venv and pip, since it is both an environment manager and a package manager. It easily installs binary dependencies, like Python itself or system libraries. 63 | Installation of packages that are not using `wheel`, but have a lot of non-Python code, is much faster with Conda than with `pip` because Conda does not compile the package, it only downloads compiled packages. 64 | The disadvantage of Conda is that the package needs to have a Conda build recipe. 65 | Many Conda build recipes already exist, but they are less common than the `setuptools` configuration that generally all Python packages have. 66 | 67 | There are two main "official" distributions of Conda: [Anaconda](https://docs.anaconda.com/anaconda/install/) and [Miniconda](https://docs.conda.io/projects/miniconda/en/latest/) (and variants of the latter like miniforge, explained below). 68 | Anaconda is large and contains a lot of common packages, like numpy and matplotlib, whereas Miniconda is very lightweight and only contains Python. If you need more, the `conda` command acts as a package manager for Python packages. 69 | If installation with the `conda` command is too slow for your purposes, it is recommended that you use [`mamba`](https://github.com/mamba-org/mamba) instead. 70 | 71 | For environments where you do not have admin rights (e.g. DAS-6) either Anaconda or Miniconda is highly recommended since the installation is very straightforward. 72 | The installation of packages through Conda is very robust. 73 | 74 | A possible downside of Anaconda is the fact that this is offered by a commercial supplier, but we don't foresee any vendor lock-in issues, because all packages are open source and can still be obtained elsewhere. 75 | Do note that since 2020, [Anaconda has started to ask money from large institutes](https://www.anaconda.com/blog/anaconda-commercial-edition-faq) for downloading packages from their [main channel (called the `default` channel)](https://docs.conda.io/projects/conda/en/latest/user-guide/concepts/channels.html#what-is-a-conda-channel) through `conda`. 76 | This does not apply to universities and most research institutes, but could apply to some government institutes that also perform research and definitely applies to large for-profit companies. 77 | Be aware of this when choosing the distribution channel for your package. 78 | An alternative, community-driven Conda distribution that avoids this problem altogether because it only installs packages from `conda-forge` by default is [miniforge](https://github.com/conda-forge/miniforge). 79 | Miniforge includes both the faster `mamba` as well as the traditional `conda`. 80 | 81 | ## Building and packaging code 82 | 83 | ### Making an installable package 84 | 85 | To create an installable Python package you will have to create a `pyproject.toml` file. 86 | This will contain three kinds of information: metadata about your project, information on how to build and install your package, and configuration settings for any tools your project may use. Our [Python template](https://github.com/NLeSC/python-template) already does this for you. 87 | 88 | #### Project metadata 89 | 90 | Your project metadata will be under the `[project]` header, and includes such information as the name, version number, description and dependencies. 91 | The [Python Packaging User Guide](https://packaging.python.org/en/latest/specifications/pyproject-toml/#declaring-project-metadata-the-project-table) has more information on what else can or should be added here. 92 | For your dependencies, you should keep version constraints to a minimum; use, in order of descending preference: no constraints, lower bounds, lower + upper bounds, exact versions. 93 | Use of `requirements.txt` is discouraged, unless necessary for something specific, see the [discussion here](https://github.com/NLeSC/guide/issues/156). 94 | 95 | It is best to keep track of direct dependencies for your project from the start and list these in your `pyproject.toml` 96 | If instead you are writing a new `pyproject.toml` for an existing project, a recommended way to find all direct dependencies is by running your code in a clean environment (probably by running your test suite) and installing one by one the dependencies that are missing, as reported by the ensuing errors. 97 | It is possible to find the full list of currently installed packages with `pip freeze` or `conda list`, but note that this is not ideal for listing dependencies in `pyproject.toml`, because it also lists all dependencies of the dependencies that you use. 98 | 99 | #### Build system 100 | 101 | Besides specifying your project's own metadata, you also have to specify a build-system under the `[build-system]` header. 102 | We currently recommend using [`hatchling`](https://pypi.org/project/hatchling/) or [`setuptools`](https://setuptools.pypa.io/en/latest/build_meta.html). 103 | Note that Python's build system landscape is still in flux, so be sure to look upthe some current practices in the [packaging guide's section on build backends](https://packaging.python.org/en/latest/tutorials/packaging-projects/#choosing-a-build-backend) and [authoritative blogs like this one](https://blog.ganssle.io/articles/2021/10/setup-py-deprecated.html). 104 | One important thing to note is that use of `setup.py` and `setup.cfg` has been officially deprecated and we should migrate away from that. 105 | 106 | #### Tool configuration 107 | 108 | Finally, `pyproject.toml` can be used to specify the configuration for any other tools like `pytest`, `ruff` and `mypy` your project may use. 109 | Each of these gets their own section in your `pyproject.toml` instead of using their own file, saving you from having dozens of such files in your project. 110 | 111 | #### Installation 112 | 113 | When the `pyproject.toml` is written, your package can be installed with 114 | 115 | ``` 116 | pip install -e . 117 | ``` 118 | 119 | The `-e` flag will install your package in editable mode, i.e. it will create a symlink to your package in the installation location instead of copying the package. This is convenient when developing, because any changes you make to the source code will immediately be available for use in the installed version. 120 | 121 | Set up continuous integration to test your installation setup. 122 | You can use `pyroma` as a linter for your installation configuration. 123 | 124 | ### Packaging and distributing your package 125 | 126 | For packaging your code, you can either use `pip` or `conda`. Neither of them is [better than the other](https://jakevdp.github.io/blog/2016/08/25/conda-myths-and-misconceptions/) -- they are different; use the one which is more suitable for your project. `pip` may be more suitable for distributing pure python packages, and it provides some support for binary dependencies using [`wheels`](http://pythonwheels.com). `conda` may be more suitable when you have external dependencies which cannot be packaged in a wheel. 127 | 128 | #### Build via the [Python Package Index (PyPI)](https://pypi.org) so that the package can be installed with pip 129 | 130 | - [General instructions](https://packaging.python.org/en/latest/tutorials/packaging-projects/) 131 | - We recommend to configure GitHub Actions to upload the package to PyPI automatically for each release. 132 | - For new repositories, it is recommended to use [trusted publishing](https://docs.pypi.org/trusted-publishers/) because it is more secure than using secret tokens from GitHub. 133 | - For a workflow using secret tokens instead, see this [example workflow in DIANNA](https://github.com/dianna-ai/dianna/blob/main/.github/workflows/release.yml). 134 | - You can follow [these instructions](https://packaging.python.org/en/latest/guides/publishing-package-distribution-releases-using-github-actions-ci-cd-workflows/) to set up GitHub Actions workflows with trusted publishing. 135 | - The [`verbose`](https://github.com/marketplace/actions/pypi-publish#for-debugging) option for pypi workflows is useful to see why a workflow failed. 136 | - To avoid unnecessary workflow runs, you can follow the example in the [sirup package](https://github.com/ivory-tower-private-power/sirup/blob/main/.github/workflows/release.yml): manually trigger pushes to pypi and investigate potential bugs during this process with a manual upload. 137 | - Manual uploads with twine 138 | - Because PyPI and Test PyPI require Two-Factor Authentication per January 2024, you need to mimick GitHub's trusted publishing to publish manually with `twine`. 139 | - You can follow the section on "The manual way" as described [here](https://docs.pypi.org/trusted-publishers/using-a-publisher/). 140 | - Additional guidelines: 141 | - Packages should be uploaded to PyPI using [your own account](https://pypi.org/account/register) 142 | - For packages developed in a team or organization, it is recommended that you create a team or organizational account on PyPI and add that as a collaborator with the owner rule. This will allow your team or organization to maintain the package even if individual contributors at some point move on to do other things. At the Netherlands eScience Center, we are a fairly small organization, so we use a single backup account (`nlesc`). 143 | - When distributing code through PyPI, non-python files (such as `requirements.txt`) will not be packaged automatically, you need to [add them to](https://stackoverflow.com/questions/1612733/including-non-python-files-with-setup-py) a `MANIFEST.in` file. 144 | - To test whether your distribution will work correctly before uploading to PyPI, you can run `python -m build` in the root of your repository. Then try installing your package with `pip install dist/tar.gz.` 145 | - `python -m build` will also build [Python wheels](http://pythonwheels.com/), the current standard for [distributing](https://packaging.python.org/distributing/#wheels) Python packages. This will work out of the box for pure Python code, without C extensions. If C extensions are used, each OS needs to have its own wheel. The [manylinux](https://github.com/pypa/manylinux) Docker images can be used for building wheels compatible with multiple Linux distributions. Wheel building can be automated using GitHub Actions or another CI solution, where you can build on all three major platforms using a build matrix. 146 | 147 | #### [Build using conda](https://conda-forge.org/docs/maintainer/adding_pkgs.html) 148 | 149 | - **Make use of [conda-forge](https://conda-forge.org/) whenever possible**, since it provides many automated build services that save you tons of work, compared to using your own conda repository. It also has a very active community for when you need help. 150 | - Use BioConda or custom channels (hosted on GitHub) as alternatives if need be. 151 | 152 | ## Editors and IDEs 153 | 154 | Every major text editor supports Python, either natively or through plugins. 155 | At the Netherlands eScience Center, some popular editors or IDEs are: 156 | 157 | - [vscode](https://code.visualstudio.com/) holds the middle ground between a lightweight text editor and a full-fledged language-dedicated IDE. 158 | - [vim](https://realpython.com/blog/python/vim-and-python-a-match-made-in-heaven/) or `emacs` (don't forget to install plugins to get the most out of these two), two versatile classic powertools that can also be used through remote SSH connection when needed. 159 | - JetBrains [PyCharm](https://www.jetbrains.com/pycharm/) is the Python-specific IDE of choice. [PyCharm Community Edition](https://www.jetbrains.com/pycharm) is free and open source; the source code is available in the [python folder of the IntelliJ repository](https://github.com/JetBrains/intellij-community/tree/master/python). 160 | 161 | ## Coding style conventions 162 | 163 | The style guide for Python code is [PEP8](http://www.python.org/dev/peps/pep-0008/) and for docstrings it is [PEP257](https://www.python.org/dev/peps/pep-0257/). We highly recommend following these conventions, as they are widely agreed upon to improve readability. To make following them significantly easier, we recommend using a linter. 164 | 165 | Many linters exists for Python. 166 | The most popular one is currently [Ruff](https://github.com/astral-sh/ruff). 167 | Although it is new (see the website for the complete function parity comparison with alternatives), it works well and has an active community. 168 | An alternative is [`prospector`](https://github.com/landscapeio/prospector), a tool for running a suite of linters, including, among others [pycodestyle](https://github.com/PyCQA/pycodestyle), [pydocstyle](https://github.com/PyCQA/pydocstyle), [pyflakes](https://pypi.python.org/pypi/pyflakes), [pylint](https://www.pylint.org/), [mccabe](https://github.com/PyCQA/mccabe) and [pyroma](https://github.com/regebro/pyroma). 169 | Some of these tools have seen decreasing community support recently, but it is still a good alternative, having been a defining community default for years. 170 | 171 | Most of the above tools can be integrated in text editors and IDEs for convenience. 172 | 173 | Autoformatting tools like [`yapf`](https://github.com/google/yapf) and [`black`](https://black.readthedocs.io/en/stable/index.html) can automatically format code for optimal readability. `yapf` is configurable to suit your (team's) preferences, whereas `black` enforces the style chosen by the `black` authors. The [`isort`](http://timothycrosley.github.io/isort/) package automatically formats and groups all imports in a standard, readable way. 174 | 175 | Ruff can do autoformatting as well and can function as a drop-in replacement of `black` and `isort`. 176 | 177 | ## Type hints 178 | 179 | Since [PEP 484](https://peps.python.org/pep-0484/), which was first implemented in Python 3.5 (released in 2015), Python has gained the ability to add type information to variables. 180 | These are not types, as in typed languages; they are _hints_. 181 | Naively, one could say they are a new type of documentation. 182 | However, in practice they are far more than this, because they do have their own special syntax rules and are thus parsable. 183 | In fact, some tools have started to make use of this in runtime modules as well, making them more than hints for tools like Pydantic, FastAPI and Typer (all described below). 184 | See [this guide](https://realpython.com/python-type-checking/) to learn more about type hints. 185 | 186 | Some tools to know about that make use of type hints: 187 | 188 | - [Type checkers](https://www.infoworld.com/article/2260170/4-python-type-checkers-to-keep-your-code-clean.html) are static code 189 | analysis tools that check your code based on the type hints you provide. It is highly recommended that you use a type checker. 190 | Choose [mypy](https://mypy-lang.org/) if you are unsure which one to choose. 191 | - Tools to build documentation from source code have extensions that can show type hints in the generated documentation to make your code easier to understand. Popular examples are [sphinx autodoc](https://www.sphinx-doc.org/en/master/usage/extensions/autodoc.html#confval-autodoc_typehints), [sphinx autapi](https://sphinx-autoapi.readthedocs.io/en/latest/how_to.html#how-to-include-type-annotations-as-types-in-rendered-docstrings), and [mkdocstrings](https://mkdocstrings.github.io/). 192 | - [Pydantic](https://docs.pydantic.dev/latest/) is a widely used data validation library that allows you to automatically validate instances of dataclasses at runtime. This means that for this tool the type hints are no longer just hints or a form of documentation, but have actual effects. Essentially, a fully Pydantic-enriched application (in "strict mode") is like having Mypy at runtime (there is also a "tolerant" mode that lets some common types slip through without errors). It effectively turns Python into a statically typed language. 193 | - Most editors nowadays make use of type hints for autocompletion. 194 | If the editor knows the type of your variable, for instance, it can autocomplete attributes or methods of that class. 195 | 196 | We recommend using type hints, where possible and _practical_. 197 | Type hints are still being actively developed; not everything one would like to be able to express in a compact way can yet be achieved. 198 | This is why, for instance, [NumPy](https://numpy.org/) arrays and machine learning library (e.g. [Pytorch](https://pytorch.org/), [Tensorflow](https://www.tensorflow.org/)) "tensor" types still (in 2024) have awkward type hinting. 199 | Crucial information that one would typically want to encode for array type input arguments are shapes, but this is not yet possible. 200 | Other important libraries, like [Matplotlib](https://matplotlib.org/), have very complex functions that take in many possible types of arguments, leading to overly complex variable types. 201 | Such huge types clutter your code tremendously, so they are not typically encouraged. 202 | 203 | ## Testing 204 | 205 | Use [pytest](https://docs.pytest.org/) as the basis for your testing setup. 206 | This is preferred over the `unittest` standard library, because it has a much more concise syntax and supports many useful features. 207 | 208 | It [has many plugins](https://docs.pytest.org/en/stable/plugins.html). 209 | For linting, we have found `pytest-pycodestyle`, `pytest-pydocstyle`, `pytest-mypy` and `pytest-flake8` to be useful. 210 | Other plugins we had good experience with are `pytest-cov`, `pytest-html`, `pytest-xdist` and `pytest-nbmake`. 211 | 212 | Creating mocks can also be done within the pytest framework by using the `mocker` fixture provided by the `pytest-mock` plugin or by using `MagicMock` and `patch` from `unittest`. 213 | For a general explanation about mocking, see the [standard library docs on mocking](https://docs.python.org/3/library/unittest.mock.html). 214 | 215 | To run your test suite, it can be convenient to use `tox`. 216 | Testing with `tox` allows for keeping the testing environment separate from your development environment. 217 | The development environment will typically accumulate (old) packages during development that interfere with testing; this problem is avoided by testing with `tox`. 218 | 219 | ### Code coverage 220 | 221 | When you have tests it is also a good to see which source code is exercised by the test suite. 222 | [Code coverage](https://book.the-turing-way.org/reproducible-research/testing/testing-guidance#aim-to-have-a-good-code-coverage) can be measured with the [coverage](https://coverage.readthedocs.io) Python package. 223 | The coverage package can also generate html reports which show which line was covered. 224 | Most test runners have have the coverage package integrated. 225 | 226 | The code coverage reports can be published online using a code quality service or code coverage services. 227 | Preferred is to use one of the code quality service which also handles code coverage listed [below](#Code_quality_analysis_tools_and_services). 228 | If this is not possible or does not fit then use a generic code coverage service such as [Codecov](https://about.codecov.io/) or [Coveralls](https://coveralls.io/). 229 | 230 | ## Code quality analysis tools and services 231 | 232 | Code quality service is explained in the [The Turing Way](https://book.the-turing-way.org/reproducible-research/code-quality/code-quality-style.html#online-services-providing-software-quality-checks). 233 | There are multiple code quality services available for Python, all of which have their pros and cons. 234 | See [The Turing Way](https://book.the-turing-way.org/reproducible-research/code-quality/code-quality-resources.html) for links to lists of possible services. 235 | We currently setup [Sonarcloud](https://sonarcloud.io/) by default in our [Python template](https://github.com/NLeSC/python-template). 236 | To reproduce the Sonarcloud pipeline locally, you can use [SonarLint](https://www.sonarlint.org/) in your IDE. 237 | If you use another editor, perhaps it is more convenient to pick another service like Codacy or Codecov. 238 | 239 | ## Debugging and profiling 240 | 241 | ### Debugging 242 | 243 | - Python has its own debugger called [pdb](https://docs.python.org/3/library/pdb.html). It is a part of the Python distribution. 244 | - [pudb](https://github.com/inducer/pudb) is a console-based Python debugger which can easily be installed using pip. 245 | - If you are looking for IDEs with debugging capabilities, see the [Editors and IDEs section](#editors-and-ides). 246 | - If you are using Windows, [Python Tools for Visual Studio](https://github.com/Microsoft/PTVS) adds Python support for Visual Studio. 247 | - If you would like to integrate [pdb](https://docs.python.org/3/library/pdb.html) with `vim`, you can use [Pyclewn](https://sourceforge.net/projects/pyclewn). 248 | 249 | - List of other available software can be found on the [Python wiki page on debugging tools](https://wiki.python.org/moin/PythonDebuggingTools). 250 | 251 | - If you are looking for some tutorials to get started: 252 | - https://pymotw.com/2/pdb 253 | - https://github.com/spiside/pdb-tutorial 254 | - https://www.jetbrains.com/help/pycharm/2016.3/debugging.html 255 | - https://waterprogramming.wordpress.com/2015/09/10/debugging-in-python-using-pycharm/ 256 | - http://www.pydev.org/manual_101_run.html 257 | 258 | ### Profiling 259 | 260 | There are a number of available profiling tools that are suitable for different situations. 261 | 262 | - [cProfile](https://docs.python.org/2/library/profile.html) measures number of function calls and how much CPU time they take. The output can be further analyzed using the `pstats` module. 263 | - For more fine-grained, line-by-line CPU time profiling, two modules can be used: 264 | - [line_profiler](https://github.com/rkern/line_profiler) provides a function decorator that measures the time spent on each line inside the function. 265 | - [pprofile](https://github.com/vpelletier/pprofile) is less intrusive; it simply times entire Python scripts line-by-line. It can give output in callgrind format, which allows you to study the statistics and call tree in `kcachegrind` (often used for analyzing c(++) profiles from `valgrind`). 266 | 267 | More realistic profiling information can usually be obtained by using statistical or sampling profilers. The profilers listed below all create nice flame graphs. 268 | 269 | - [vprof](https://github.com/nvdv/vprof) 270 | - [Pyflame](https://github.com/uber/pyflame) 271 | - [nylas-perftools](https://github.com/nylas/nylas-perftools) 272 | 273 | ## Logging 274 | 275 | - [logging](https://docs.python.org/3/library/logging.html) module is the most commonly used tool to track events in Python code. 276 | - Tutorials: 277 | - [Official Python Logging Tutorial](https://docs.python.org/3/howto/logging.html#logging-basic-tutorial) 278 | - http://docs.python-guide.org/en/latest/writing/logging 279 | - [Python logging best practices](https://www.datadoghq.com/blog/python-logging-best-practices/) 280 | 281 | ## Documentation 282 | 283 | It is recommended that you [write documentation](https://book.the-turing-way.org/reproducible-research/code-documentation) for your projects and publish it on an interactive webpage. 284 | A popular and recommended solution for hosting documentation is [Read the Docs](https://readthedocs.org). 285 | It can automatically build documentation for projects hosted on [GitHub, GitLab, and Bitbucket](https://docs.readthedocs.io/en/stable/reference/git-integration.html). 286 | 287 | ### Building documentation 288 | 289 | There are several tools for building webpages with documentation. 290 | At the eScience Center, we mostly use [Sphinx](http://www.sphinx-doc.org/en/master/usage/quickstart.html) (more established) and [MkDocs](https://www.mkdocs.org/getting-started/) (newer). 291 | 292 | User guides and other text documents are typically written in [Markdown](https://www.markdownguide.org/getting-started/) or [reStructuredText](https://www.sphinx-doc.org/en/master/usage/restructuredtext/basics.html). Sphinx supports both formats, while MkDocs only supports Markdown. Markdown has the advantage that it's easier to read 293 | for humans so it may be easier to work with and contribute to. reStructuredText is easier to read for computers so may be more suitable for complex projects. 294 | 295 | Python uses [Docstrings](https://pandas.pydata.org/docs/development/contributing_docstring.html#about-docstrings-and-standards) for code documentation. You can read a detailed description of docstring usage in [PEP 257](https://www.python.org/dev/peps/pep-0257/). Both Sphinx and MkDocs can generate documentation webpages from docstrings. 296 | There are two popular Sphinx extensions for generating documentation: [autoapi](https://sphinx-autoapi.readthedocs.io) (newer and more lightweight) and [autodoc](https://www.sphinx-doc.org/en/master/usage/extensions/autodoc.html) (more established). 297 | For MkDocs the [mkdocstrings](https://mkdocstrings.github.io/) package is available. 298 | We recommend using the [NumPy documentation style](https://numpydoc.readthedocs.io/en/latest/format.html#docstring-standard), as that is widely used in the scientific Python ecosystem. 299 | 300 | You can also integrate entire Jupyter notebooks into your documentation with [nbsphinx](https://nbsphinx.readthedocs.io) or 301 | [mkdocs-jupyter](https://github.com/danielfrg/mkdocs-jupyter). 302 | This way, your demo notebooks, for instance, can double as documentation. 303 | Of course, the notebooks will not be interactive in the compiled webpage, but they will include all code and output cells and you can easily link to an interactive version from the compiled documentation. 304 | 305 | It is recommended that you [routinely test any code examples in your documentation](https://docs.pytest.org/en/stable/how-to/doctest.html). 306 | 307 | ## Recommended additional packages and libraries 308 | 309 | ### General scientific 310 | 311 | - [NumPy](http://www.numpy.org/) 312 | - [SciPy](https://www.scipy.org/) 313 | - [Pandas](http://pandas.pydata.org/) data analysis toolkit 314 | - [scikit-learn](http://scikit-learn.org/): machine learning in Python 315 | - [Cython](http://cython.org/) speed up Python code by using C types and calling C functions 316 | - [dask](http://dask.pydata.org) larger than memory arrays and parallel execution 317 | 318 | ### IPython and Jupyter notebooks (aka IPython notebooks) 319 | 320 | [IPython](https://ipython.org/) is an interactive Python interpreter -- very much the same as the standard Python interactive interpreter, but with some [extra features](http://ipython.readthedocs.io/en/stable/interactive/index.html) (tab completion, shell commands, in-line help, etc). 321 | 322 | [Jupyter](http://jupyter.org/) notebooks (formerly know as IPython notebooks) are browser based interactive Python enviroments. It incorporates the same features as the IPython console, plus some extras like in-line plotting. [Look at some examples](https://nbviewer.jupyter.org/github/ipython/ipython/blob/4.0.x/examples/IPython%20Kernel/Index.ipynb) to find out more. Within a notebook you can alternate code with Markdown comments (and even LaTeX), which is great for reproducible research. 323 | [Notebook extensions](https://github.com/ipython-contrib/jupyter_contrib_nbextensions) adds extra functionalities to notebooks. 324 | [JupyterLab](https://github.com/jupyterlab/jupyterlab) is a web-based environment with a lot of improvements and integrated tools. 325 | 326 | Jupyter notebooks contain data that makes it hard to nicely keep track of code changes using version control. If you are using git, 327 | you can [add filters that automatically remove output cells and unneeded metadata from your notebooks](http://timstaley.co.uk/posts/making-git-and-jupyter-notebooks-play-nice/). 328 | If you do choose to keep output cells in the notebooks (which can be useful to showcase your code's capabilities statically from GitHub) use [ReviewNB](https://www.reviewnb.com/) to automatically create nice visual diffs in your GitHub pull request threads. 329 | It is good practice to restart the kernel and run the notebook from start to finish in one go before saving and committing, so you are sure that everything works as expected. 330 | 331 | ### Visualization 332 | 333 | - [Matplotlib](http://matplotlib.org) has been the standard in scientific visualization. It supports quick-and-dirty plotting through the `pyplot` submodule. Its object oriented interface can be somewhat arcane, but is highly customizable and runs natively on many platforms, making it compatible with all major OSes and environments. It supports most sources of data, including native Python objects, Numpy and Pandas. 334 | - [Seaborn](http://stanford.edu/~mwaskom/software/seaborn/index.html) is a Python visualisation library based on Matplotlib and aimed towards statistical analysis. It supports numpy, pandas, scipy and statmodels. 335 | - Web-based: 336 | - [Bokeh](https://github.com/bokeh/bokeh) is Interactive Web Plotting for Python. 337 | - [Plotly](https://plot.ly/) is another platform for interactive plotting through a web browser, including in Jupyter notebooks. 338 | - [altair](https://github.com/ellisonbg/altair) is a _grammar of graphics_ style declarative statistical visualization library. It does not render visualizations itself, but rather outputs Vega-Lite JSON data. This can lead to a simplified workflow. 339 | - [ggplot](https://github.com/yhat/ggpy) is a plotting library imported from R. 340 | 341 | ### Parallelisation 342 | 343 | CPython (the official and mainstream Python implementation) is not built for parallel processing due to the [global interpreter lock](https://wiki.python.org/moin/GlobalInterpreterLock). Note that the GIL only applies to actual Python code, so compiled modules like e.g. `numpy` do not suffer from it. 344 | 345 | Having said that, there are many ways to run Python code in parallel: 346 | 347 | - The [multiprocessing](https://docs.python.org/3/library/multiprocessing.html) module is the standard way to do parallel executions in one or multiple machines, it circumvents the GIL by creating multiple Python processess. 348 | - A much simpler alternative in Python 3 is the [`concurrent.futures`](https://docs.python.org/3/library/concurrent.futures.html) module. 349 | - [IPython / Jupyter notebooks have built-in parallel and distributed computing capabilities](https://ipython.org/ipython-doc/3/parallel/) 350 | - Many modules have parallel capabilities or can be compiled to have them. 351 | - At the eScience Center, we have developed the [Noodles package](https://research-software-directory.org/software/noodles) for creating computational workflows and automatically parallelizing it by dispatching independent subtasks to parallel and/or distributed systems. 352 | 353 | ### Web Frameworks 354 | 355 | There are convenient Python web frameworks available: 356 | 357 | - [flask](http://flask.pocoo.org/) 358 | - [CherryPy](https://cherrypy.dev/) 359 | - [Django](https://www.djangoproject.com/) 360 | - [bottle](http://bottlepy.org/) (similar to flask, but a bit more light-weight for a JSON-REST service) 361 | - [FastAPI](https://fastapi.tiangolo.com): again, similar to flask in functionality, but uses modern Python features like async and type hints with runtime behavioral effects. 362 | 363 | We have recommended `flask` in the past, but FastAPI has become more popular recently. 364 | 365 | ### NLP/text mining 366 | 367 | - [nltk](http://www.nltk.org/) Natural Language Toolkit 368 | - [Pattern](https://github.com/clips/pattern): web/text mining module 369 | - [gensim](https://radimrehurek.com/gensim/): Topic modeling 370 | 371 | ### Creating programs with command line arguments 372 | 373 | - For run-time configuration via command-line options, the built-in [`argparse`](https://docs.python.org/library/argparse.html) module usually suffices. 374 | - A more complete solution is [`ConfigArgParse`](https://github.com/bw2/ConfigArgParse). This (almost) drop-in replacement for `argparse` allows you to not only specify configuration options via command-line options, but also via (ini or yaml) configuration files and via environment variables. 375 | - Other popular libraries are [`click`](https://click.palletsprojects.com) and [`fire`](https://google.github.io/python-fire/). 376 | - [Typer](https://typer.tiangolo.com): make a command-line application by using type hints with runtime effects. Very low on boilerplate for simple cases, but also allows for more complex cases. Uses `click` internally. 377 | --------------------------------------------------------------------------------