5 |
--------------------------------------------------------------------------------
/content/images/intro-centralDogma-proteoforms.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/jessegmeyerlab/proteomics-tutorial/HEAD/content/images/intro-centralDogma-proteoforms.png
--------------------------------------------------------------------------------
/content/images/Fig_12_Biological interpretation.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/jessegmeyerlab/proteomics-tutorial/HEAD/content/images/Fig_12_Biological interpretation.png
--------------------------------------------------------------------------------
/content/images/Fig_12_Biological_interpretation.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/jessegmeyerlab/proteomics-tutorial/HEAD/content/images/Fig_12_Biological_interpretation.png
--------------------------------------------------------------------------------
/pull_request_template.md:
--------------------------------------------------------------------------------
1 | Please provide a description of your changes:
2 |
3 | Please check the following about your suggested changed:
4 | - [ ] The new text has one sentence per line
5 | - [ ] preprints that are cited are listed as an issue for discussion and peer review
6 |
--------------------------------------------------------------------------------
/ci/README.md:
--------------------------------------------------------------------------------
1 | # Continuous integration tools
2 |
3 | This directory contains tools and files for continuous integration (CI).
4 | Specifically, [`deploy.sh`](deploy.sh) runs on successful `main` branch builds that are not pull requests.
5 | The contents of `../webpage` are committed to the `gh-pages` branch.
6 | The contents of `../output` are committed to the `output` branch.
7 |
8 | For more information on the CI implementation, see the CI setup documentation in `SETUP.md`.
9 |
--------------------------------------------------------------------------------
/ci/install-spellcheck.sh:
--------------------------------------------------------------------------------
1 | #!/usr/bin/env bash
2 |
3 | ## install-spellcheck.sh: run during a CI build to install Pandoc spellcheck dependencies.
4 |
5 | # Set options for extra caution & debugging
6 | set -o errexit \
7 | -o pipefail
8 |
9 | sudo apt-get update --yes
10 | sudo apt-get install --yes aspell aspell-en
11 | wget --directory-prefix=build/pandoc/filters \
12 | https://github.com/pandoc/lua-filters/raw/13c3fa7e97206413609a48a82575cb43137e037f/spellcheck/spellcheck.lua
13 |
--------------------------------------------------------------------------------
/content/manual-references.json:
--------------------------------------------------------------------------------
1 | [
2 | {
3 | "id": "url:https://github.com/manubot/rootstock",
4 | "type": "webpage",
5 | "URL": "https://github.com/manubot/rootstock",
6 | "title": "manubot/rootstock GitHub repository",
7 | "container-title": "GitHub",
8 | "issued": {
9 | "date-parts": [
10 | [
11 | 2019
12 | ]
13 | ]
14 | },
15 | "author": [
16 | {
17 | "given": "Daniel",
18 | "family": "Himmelstein"
19 | }
20 | ]
21 | }
22 | ]
23 |
--------------------------------------------------------------------------------
/content/images/orcid.svg:
--------------------------------------------------------------------------------
1 |
2 |
5 |
--------------------------------------------------------------------------------
/content/90.Author-contributions.md:
--------------------------------------------------------------------------------
1 | ## Author contributions
2 | JGM was assigned last author as the project initiator and leader.
3 | RLM was assigned second to last author based on their leadership role in curating all sections.
4 | All other authors were ordered by estimating their contributions using a quantitative score.
5 | The score for each author was a sum of the number of sentences added plus 33 lines for each figure added.
6 | Scores were adjusted for confounding factors, such as split contributions between multiple contributors.
7 | Authors with similar scores were assigned equal contributions.
8 |
--------------------------------------------------------------------------------
/.gitignore:
--------------------------------------------------------------------------------
1 | # Generated manuscript output files
2 | output/*
3 | !output/README.md
4 |
5 | webpage/v
6 |
7 | # When PDF building fails, a temporary symlink named images in the root
8 | # directory is not removed.
9 | /images
10 |
11 | # Manubot cache directory
12 | ci/cache
13 |
14 | # Pandoc filters downloaded during continuous integration setup
15 | build/pandoc/filters/spellcheck.lua
16 |
17 | # Python
18 | __pycache__/
19 | *.pyc
20 |
21 | # Jupyter Notebook
22 | .ipynb_checkpoints
23 |
24 | # Misc temporary files
25 | *.bak
26 |
27 | # System specific files
28 |
29 | ## Linux
30 | *~
31 | .Trash-*
32 |
33 | ## macOS
34 | .DS_Store
35 | ._*
36 | .Trashes
37 |
38 | ## Windows
39 | Thumbs.db
40 | [Dd]esktop.ini
41 |
42 | ## Text Editors
43 | .vscode
44 |
--------------------------------------------------------------------------------
/content/images/mastodon.svg:
--------------------------------------------------------------------------------
1 |
2 |
5 |
--------------------------------------------------------------------------------
/ci/install.sh:
--------------------------------------------------------------------------------
1 | #!/usr/bin/env bash
2 |
3 | ## install.sh: run during an AppVeyor build to install the conda environment
4 | ## and the optional Pandoc spellcheck dependencies.
5 |
6 | # Set options for extra caution & debugging
7 | set -o errexit \
8 | -o pipefail
9 |
10 | wget https://github.com/conda-forge/miniforge/releases/latest/download/Mambaforge-$(uname)-$(uname -m).sh \
11 | --output-document miniforge.sh
12 | bash miniforge.sh -b -p $HOME/miniconda
13 | source $HOME/miniconda/etc/profile.d/conda.sh
14 | hash -r
15 | conda config \
16 | --set always_yes yes \
17 | --set changeps1 no
18 | mamba env create --quiet --file build/environment.yml
19 | mamba list --name manubot
20 | conda activate manubot
21 |
22 | # Install Spellcheck filter for Pandoc
23 | if [ "${SPELLCHECK:-}" = "true" ]; then
24 | bash ci/install-spellcheck.sh
25 | fi
--------------------------------------------------------------------------------
/content/89.acknowledgements.md:
--------------------------------------------------------------------------------
1 | ## Acknowledgements {.page_break_before}
2 |
3 | The authors thank Phil Wilmarth for helpful input.
4 | Identification of certain commercial equipment, instruments, software, or materials does not imply recommendation or endorsement by the National Institute of Standards and Technology, nor does it imply that the products identified are necessarily the best available for the purpose.
5 | The authors thank Dasom Hwang for help with graphic design.
6 | The authors thank Anthony Gitter and Daniem Himmelstein for assistance using manubot.
7 | The authors thank Jordan Burton and Pierre-Alexander Mücke for minor edits to the text.
8 | This manuscript was written collaboratively using manubot [@DOI:10.1371/journal.pcbi.1007128].
9 | The live and evolving version where anyone can contribute can be found here https://github.com/jessegmeyerlab/proteomics-tutorial/tree/main.
10 |
--------------------------------------------------------------------------------
/content/images/twitter.svg:
--------------------------------------------------------------------------------
1 |
2 |
5 |
--------------------------------------------------------------------------------
/content/01.abstract.md:
--------------------------------------------------------------------------------
1 | ## Abstract {.page_break_before}
2 |
3 | Proteomics is the large scale study of protein structure and function from biological systems through protein identification and quantification.
4 | "Shotgun proteomics" or "bottom-up proteomics" is the prevailing strategy, in which proteins are hydrolyzed into peptides that are analyzed by mass spectrometry.
5 | Proteomics studies can be applied to diverse studies ranging from simple protein identification to studies of proteoforms, protein-protein interactions, protein structural alterations, absolute and relative protein quantification, post-translational modifications, and protein stability.
6 | To enable this range of different experiments, there are diverse strategies for proteome analysis.
7 | The nuances of how proteomic workflows differ may be challenging to understand for new practitioners.
8 | Here, we provide a comprehensive overview of different proteomics methods.
9 | We cover from biochemistry basics and protein extraction to biological interpretation and orthogonal validation.
10 | We expect this manuscript will serve as a handbook for researchers who are new to the field of bottom-up proteomics.
11 |
--------------------------------------------------------------------------------
/.github/ISSUE_TEMPLATE/paper-discussion.md:
--------------------------------------------------------------------------------
1 | ---
2 | name: Paper discussion
3 | about: Enable discussion of a paper to include in the tutorial
4 | title: ''
5 | labels: ''
6 | assignees: ''
7 |
8 | ---
9 |
10 | ## Paper Title Here
11 |
12 |
13 |
14 | Link to the paper:
15 |
16 | Citation in manubot format:
17 |
18 |
19 |
20 | 1. What section does this paper fit into? (select at least one or add your own)
21 |
22 | - [ ] Introduction
23 | - [ ] protein extraction
24 | - [ ] proteolysis
25 | - [ ] Peptide and protein labeling
26 | - [ ] enrichment of proteins or modifications
27 | - [ ] peptide purification
28 | - [ ] types of mass spectrometers used for proteomics
29 | - [ ] Peptide ionization
30 | - [ ] Data Acquisition (targeted and untargeted DDA and DIA)
31 | - [ ] Basics of data analysis
32 | - [ ] Biological Interpretation
33 | - [ ] Experiment design and considerations not discussed elsewhere
34 |
35 | 2. Please provide a short summary of the work (1-2 sentences):
36 |
37 | 3. Any additional details
38 |
--------------------------------------------------------------------------------
/output/README.md:
--------------------------------------------------------------------------------
1 | # Generated citation / reference files
2 |
3 | The `output` branch contains files automatically generated by the manuscript build process.
4 | It consists of the contents of the `output` directory of the `main` branch.
5 | These files are not tracked in `main`, but instead written to the `output` branch by continuous integration builds.
6 |
7 | ## Files
8 |
9 | This directory contains the following files:
10 |
11 | + [`citations.tsv`](citations.tsv) is a table of citations extracted from the manuscript and the corresponding standard citations and citation IDs.
12 | + [`manuscript.md`](manuscript.md) is a markdown document of all manuscript sections, with citation strings replaced by citation IDs.
13 | + [`references.json`](references.json) is CSL-JSON file of bibliographic item metadata ([see specification](https://github.com/citation-style-language/schema/blob/master/csl-data.json)) for all references.
14 | + [`variables.json`](variables.json) contains variables that were passed to the jinja2 templater. These variables contain those automatically generated by the manubot as well as those provided by the user via the `--template-variables-path` option.
15 |
16 | Pandoc consumes `manuscript.md` and `references.json` to create the formatted manuscript, which is exported to `manuscript.html`, `manuscript.pdf`, and optionally `manuscript.docx`.
17 |
--------------------------------------------------------------------------------
/content/images/github.svg:
--------------------------------------------------------------------------------
1 |
2 |
5 |
--------------------------------------------------------------------------------
/contributing.md:
--------------------------------------------------------------------------------
1 | # Contribution guidelines for the Multiomics review (from [deep-review](https://github.com/greenelab/deep-review/edit/master/CONTRIBUTING.md))
2 |
3 | Please see [`USAGE.md`](USAGE.md) for information on how to use the Manubot for writing the manuscript.
4 | Below you'll find information on the contribution workflow for this proteomics tutorial.
5 |
6 | ## Issues
7 |
8 | We use issues for discussion of papers, section outlines, and other structural components of the paper.
9 |
10 | ## Pull requests
11 |
12 | Contributions to the article operate on a pull request model.
13 | We expect participants to actively review pull requests.
14 | We'd love to have you ask questions, clarify points, and jump in and edit the text.
15 |
16 | ## Authorship
17 |
18 | What qualifies as authorship?
19 | We use the [ICMJE Guidelines](http://www.icmje.org/recommendations/browse/roles-and-responsibilities/defining-the-role-of-authors-and-contributors.html).
20 | We expect authors to contribute to the overall design by participating in issues, to contribute to the text by contributing sections and/or revisions to sections through pull requests.
21 | It is important to note that, for authorship, these should be substantial intellectual contributions.
22 |
23 | ## Peer review
24 |
25 | All pull requests will undergo peer review.
26 | Participants in this project should review proposed changes (pull requests), which can be done using [GitHub's review interface](https://help.github.com/articles/about-pull-request-reviews/ "GitHub: about pull request reviews").
27 | They should suggest modifications or, potentially, directly edit the pull request to make suggested changes.
28 | As a reviewer, it's helpful to note the type of review you performed:
29 | did you read cited literature, look over the text in detail, or are you just supporting the concept?
30 |
31 | Before a repository maintainer merges a pull request, there must be at least one affirmative review.
32 | If there is any unaddressed criticism or disapproval, a repository maintainer will determine how to proceed and may wait for additional feedback.
33 |
--------------------------------------------------------------------------------
/webpage/README.md:
--------------------------------------------------------------------------------
1 | # Output directory containing the formatted manuscript
2 |
3 | The [`gh-pages`](https://github.com/$REPO_SLUG/tree/gh-pages) branch hosts the contents of this directory at .
4 | The permalink for this webpage version is .
5 | To redirect to the permalink for the latest manuscript version at anytime, use the link .
6 |
7 | ## Files
8 |
9 | This directory contains the following files, which are mostly ignored on the `main` branch:
10 |
11 | + [`index.html`](index.html) is an HTML manuscript.
12 | + [`manuscript.pdf`](manuscript.pdf) is a PDF manuscript.
13 |
14 | The `v` directory contains directories for each manuscript version.
15 | In general, a version is identified by the commit hash of the source content that created it.
16 |
17 | ### Timestamps
18 |
19 | The `*.ots` files in version directories are OpenTimestamps which can be used to verify manuscript existence at or before a given time.
20 | [OpenTimestamps](https://opentimestamps.org/) uses the Bitcoin blockchain to attest to file hash existence.
21 | The `deploy.sh` script run during continuous deployment creates the `.ots` files through its `manubot webpage` call.
22 | There is a delay before timestamps get confirmed by a Bitcoin block.
23 | Therefore, `.ots` files are initially incomplete and should be upgraded at a later time, so that they no longer rely on the availability of a calendar server to verify.
24 | The `manubot webpage` call during continuous deployment identifies files matched by `webpage/v/**/*.ots` and attempts to upgrade them.
25 | You can also manually upgrade timestamps, by running the following in the `gh-pages` branch:
26 |
27 | ```shell
28 | ots upgrade v/*/*.ots
29 | rm v/*/*.ots.bak
30 | git add v/*/*.ots
31 | ```
32 |
33 | Verifying timestamps with the `ots verify` command requires running a local bitcoin node with JSON-RPC configured, at this time.
34 |
35 | ## Source
36 |
37 | The manuscripts in this directory were built from
38 | [`$COMMIT`](https://github.com/$REPO_SLUG/commit/$COMMIT).
39 |
--------------------------------------------------------------------------------
/.github/workflows/ai-revision.yaml:
--------------------------------------------------------------------------------
1 | name: ai-revision
2 | on:
3 | workflow_dispatch:
4 | inputs:
5 | branch:
6 | description: 'Branch to revise'
7 | required: true
8 | type: string
9 | default: 'main'
10 | file_names:
11 | description: 'File names to revise'
12 | required: false
13 | type: string
14 | default: ''
15 | model:
16 | description: 'Language model'
17 | required: true
18 | type: string
19 | default: 'text-davinci-003'
20 | branch_name:
21 | description: 'Output branch'
22 | required: true
23 | type: string
24 | default: 'ai-revision-davinci'
25 |
26 | jobs:
27 | ai-revise:
28 | name: AI Revise
29 | runs-on: ubuntu-latest
30 | permissions:
31 | contents: write
32 | pull-requests: write
33 | defaults:
34 | run:
35 | shell: bash --login {0}
36 | steps:
37 | - name: Checkout Repo
38 | uses: actions/checkout@v3
39 | with:
40 | ref: ${{ inputs.branch }}
41 | - name: Install environment
42 | uses: actions/setup-python@v4
43 | with:
44 | python-version: '3.11'
45 | - name: Install Manubot AI revision dependencies
46 | run: |
47 | # install using the same URL used for manubot in build/environment.yml
48 | manubot_line=$(grep "github.com/manubot/manubot" build/environment.yml)
49 | manubot_url=$(echo "$manubot_line" | awk -F"- " '{print $2}')
50 |
51 | pip install ${manubot_url}#egg=manubot[ai-rev]
52 | - name: Revise manuscript
53 | env:
54 | OPENAI_API_KEY: ${{ secrets.OPENAI_API_KEY }}
55 | AI_EDITOR_LANGUAGE_MODEL: ${{ inputs.model }}
56 | AI_EDITOR_FILENAMES_TO_REVISE: ${{ inputs.file_names }}
57 | # More variables can be specified to control the behavior of the model:
58 | # https://github.com/manubot/manubot-ai-editor/blob/main/libs/manubot_ai_editor/env_vars.py
59 | run: manubot ai-revision --content-directory content/
60 | - name: Create Pull Request
61 | uses: peter-evans/create-pull-request@v4
62 | with:
63 | commit-message: 'revise using AI model\n\nUsing the OpenAI model ${{ inputs.model }}'
64 | title: 'AI-based revision using ${{ inputs.model }}'
65 | author: OpenAI model ${{ inputs.model }}
66 | add-paths: |
67 | content/*.md
68 | branch: ${{ inputs.branch_name }}
69 | draft: true
70 |
--------------------------------------------------------------------------------
/ci/deploy.sh:
--------------------------------------------------------------------------------
1 | #!/usr/bin/env bash
2 |
3 | ## deploy.sh: run during a CI build to deploy manuscript outputs to the output and gh-pages branches on GitHub.
4 |
5 | # Set options for extra caution & debugging
6 | set -o errexit \
7 | -o nounset \
8 | -o pipefail
9 |
10 | # set environment variables for GitHub Actions
11 | REPO_SLUG=${GITHUB_REPOSITORY}
12 | COMMIT=${GITHUB_SHA}
13 | BRANCH=${DEFAULT_BRANCH:-main}
14 |
15 | # Add commit hash to the README
16 | OWNER_NAME="$(dirname "$REPO_SLUG")"
17 | REPO_NAME="$(basename "$REPO_SLUG")"
18 | export REPO_SLUG COMMIT OWNER_NAME REPO_NAME
19 | envsubst < webpage/README.md > webpage/README-complete.md
20 | mv webpage/README-complete.md webpage/README.md
21 |
22 | # Configure git
23 | git config --global push.default simple
24 | git config --global user.email "$(git log --max-count=1 --format='%ae')"
25 | git config --global user.name "$(git log --max-count=1 --format='%an')"
26 | git checkout "$BRANCH"
27 |
28 | # Configure deployment credentials
29 | MANUBOT_DEPLOY_VIA_SSH=true
30 | git remote set-url origin "git@github.com:$REPO_SLUG.git"
31 | if [ -v MANUBOT_SSH_PRIVATE_KEY ] && [ "$MANUBOT_SSH_PRIVATE_KEY" != "" ]; then
32 | echo >&2 "[INFO] Detected MANUBOT_SSH_PRIVATE_KEY. Will deploy via SSH."
33 | elif [ -v MANUBOT_ACCESS_TOKEN ] && [ "$MANUBOT_ACCESS_TOKEN" != "" ]; then
34 | echo >&2 "[INFO] Detected MANUBOT_ACCESS_TOKEN. Will deploy via HTTPS."
35 | MANUBOT_DEPLOY_VIA_SSH=false
36 | git remote set-url origin "https://$MANUBOT_ACCESS_TOKEN@github.com/$REPO_SLUG.git"
37 | else
38 | echo >&2 "[INFO] Missing MANUBOT_SSH_PRIVATE_KEY and MANUBOT_ACCESS_TOKEN. Will deploy via SSH."
39 | fi
40 |
41 | if [ $MANUBOT_DEPLOY_VIA_SSH = "true" ]; then
42 | # Decrypt and add SSH key
43 | eval "$(ssh-agent -s)"
44 | (
45 | set +o xtrace # disable xtrace in subshell for private key operations
46 | if [ -v MANUBOT_SSH_PRIVATE_KEY ]; then
47 | base64 --decode <<< "$MANUBOT_SSH_PRIVATE_KEY" | ssh-add -
48 | else
49 | echo >&2 "Deployment will fail since neither of the following environment variables are set: MANUBOT_ACCESS_TOKEN or MANUBOT_SSH_PRIVATE_KEY."
50 | fi
51 | )
52 | fi
53 |
54 | # Fetch and create gh-pages and output branches
55 | git remote set-branches --add origin gh-pages output
56 | git fetch origin gh-pages:gh-pages output:output || \
57 | echo >&2 "[INFO] could not fetch gh-pages or output from origin."
58 |
59 | # Configure versioned webpage and timestamp
60 | manubot webpage \
61 | --timestamp \
62 | --no-ots-cache \
63 | --checkout=gh-pages \
64 | --version="$COMMIT"
65 |
66 | # Commit message
67 | MESSAGE="\
68 | $(git log --max-count=1 --format='%s')
69 | [ci skip]
70 |
71 | This build is based on
72 | https://github.com/$REPO_SLUG/commit/$COMMIT.
73 |
74 | This commit was created by the following CI build and job:
75 | $CI_BUILD_WEB_URL
76 | $CI_JOB_WEB_URL
77 | "
78 |
79 | # Deploy the manubot outputs to output
80 | ghp-import \
81 | --push \
82 | --branch=output \
83 | --message="$MESSAGE" \
84 | output
85 |
86 | # Deploy the webpage directory to gh-pages
87 | ghp-import \
88 | --no-jekyll \
89 | --follow-links \
90 | --push \
91 | --branch=gh-pages \
92 | --message="$MESSAGE" \
93 | webpage
94 |
95 | if [ $MANUBOT_DEPLOY_VIA_SSH = "true" ]; then
96 | # Workaround https://github.com/travis-ci/travis-ci/issues/8082
97 | ssh-agent -k
98 | fi
99 |
--------------------------------------------------------------------------------
/content/00.front-matter.md:
--------------------------------------------------------------------------------
1 | {##
2 | This file contains a Jinja2 front-matter template that adds version and authorship information.
3 | Changing the Jinja2 templates in this file may cause incompatibility with Manubot updates.
4 | Pandoc automatically inserts title from metadata.yaml, so it is not included in this template.
5 | ##}
6 |
7 | {## Uncomment & edit the following line to reference to a preprinted or published version of the manuscript.
8 | _A DOI-citable version of this manuscript is available at _.
9 | ##}
10 |
11 | {## Template to insert build date and source ##}
12 |
13 | This manuscript
14 | {% if manubot.ci_source is defined and manubot.ci_source.provider == "appveyor" -%}
15 | ([permalink]({{manubot.ci_source.artifact_url}}))
16 | {% elif manubot.html_url_versioned is defined -%}
17 | ([permalink]({{manubot.html_url_versioned}}))
18 | {% endif -%}
19 | was automatically generated
20 | {% if manubot.ci_source is defined -%}
21 | from [{{manubot.ci_source.repo_slug}}@{{manubot.ci_source.commit | truncate(length=7, end='', leeway=0)}}](https://github.com/{{manubot.ci_source.repo_slug}}/tree/{{manubot.ci_source.commit}})
22 | {% endif -%}
23 | on {{manubot.generated_date_long}}.
24 |
25 |
26 | {% if manubot.date_long != manubot.generated_date_long -%}
27 | Published: {{manubot.date_long}}
28 | {% endif %}
29 |
30 | ## Authors
31 |
32 | {## Template for listing authors ##}
33 | {% for author in manubot.authors %}
34 | + **{{author.name}}**
35 | {% if author.corresponding is defined and author.corresponding == true -%}^[✉](#correspondence)^{%- endif -%}
36 |
37 | {%- set has_ids = false %}
38 | {%- if author.orcid is defined and author.orcid is not none %}
39 | {%- set has_ids = true %}
40 | {.inline_icon width=16 height=16}
41 | [{{author.orcid}}](https://orcid.org/{{author.orcid}})
42 | {%- endif %}
43 | {%- if author.github is defined and author.github is not none %}
44 | {%- set has_ids = true %}
45 | · {.inline_icon width=16 height=16}
46 | [{{author.github}}](https://github.com/{{author.github}})
47 | {%- endif %}
48 | {%- if author.twitter is defined and author.twitter is not none %}
49 | {%- set has_ids = true %}
50 | · {.inline_icon width=16 height=16}
51 | [{{author.twitter}}](https://twitter.com/{{author.twitter}})
52 | {%- endif %}
53 | {%- if author.mastodon is defined and author.mastodon is not none and author["mastodon-server"] is defined and author["mastodon-server"] is not none %}
54 | {%- set has_ids = true %}
55 | · {.inline_icon width=16 height=16}
56 | [\@{{author.mastodon}}@{{author["mastodon-server"]}}](https://{{author["mastodon-server"]}}/@{{author.mastodon}})
57 | {%- endif %}
58 | {%- if has_ids %}
59 |
60 | {%- endif %}
61 |
62 | {%- if author.affiliations is defined and author.affiliations|length %}
63 | {{author.affiliations | join('; ')}}
64 | {%- endif %}
65 | {%- if author.funders is defined and author.funders|length %}
66 | · Funded by {{author.funders | join('; ')}}
67 | {%- endif %}
68 |
69 | {% endfor %}
70 |
71 | ::: {#correspondence}
72 | ✉ — Correspondence possible via {% if manubot.ci_source is defined -%}[GitHub Issues](https://github.com/{{manubot.ci_source.repo_slug}}/issues){% else %}GitHub Issues{% endif %}
73 | {% if manubot.authors|map(attribute='corresponding')|select|max -%}
74 | or email to
75 | {% for author in manubot.authors|selectattr("corresponding") -%}
76 | {{ author.name }} \<{{ author.email }}\>{{ ", " if not loop.last else "." }}
77 | {% endfor %}
78 | {% endif %}
79 | :::
80 |
--------------------------------------------------------------------------------
/CODE_OF_CONDUCT.md:
--------------------------------------------------------------------------------
1 | # Contributor Covenant Code of Conduct
2 |
3 | ## Our Values
4 |
5 | In this project, we seek to provide an environment for researchers from all backgrounds and disciplines to contribute to the consolidation and synthesis of scientific information related to proteomics.
6 | Our project will be most successful when we create a space where a diverse group of people can collaboratively approach emerging information with curiosity and creativity.
7 | As a result, one of our primary goals is to make the online community associated with the repository welcoming to all members of the scientific community.
8 | Creativity thrives when people feel supported, accepted, and encouraged.
9 | As a result, inclusivity is one of the central tenents of this project.
10 |
11 | ## Our Pledge
12 |
13 | In the interest of fostering an open and welcoming environment, we as contributors and maintainers pledge to make participation in our project and our community a harassment-free experience for everyone, regardless of age, body size, disability, ethnicity, sex characteristics, gender identity and expression, level of experience, education, socio-economic status, nationality, personal appearance, race, religion, or sexual identity and orientation.
14 |
15 | ## Our Standards
16 |
17 | Examples of behavior that contributes to creating a positive environment include:
18 |
19 | * Using welcoming and inclusive language
20 | * Being respectful of differing viewpoints and experiences
21 | * Gracefully accepting constructive criticism
22 | * Focusing on what is best for the community
23 | * Showing empathy towards other community members
24 |
25 | Examples of unacceptable behavior by participants include:
26 |
27 | * The use of sexualized language or imagery and unwelcome sexual attention or advances
28 | * Trolling, insulting/derogatory comments, and personal or political attacks
29 | * Public or private harassment
30 | * Publishing others' private information, such as a physical or electronic address, without explicit permission
31 | * Other conduct which could reasonably be considered inappropriate in a professional setting
32 |
33 | ## Our Responsibilities
34 |
35 | Project maintainers are responsible for clarifying the standards of acceptable behavior and are expected to take appropriate and fair corrective action in response to any instances of unacceptable behavior.
36 |
37 | Project maintainers have the right and responsibility to remove, edit, or reject comments, commits, code, wiki edits, issues, and other contributions that are not aligned to this Code of Conduct, or to ban temporarily or permanently any contributor for other behaviors that they deem inappropriate, threatening, offensive, or harmful.
38 |
39 | ## Scope
40 |
41 | This Code of Conduct applies both within project spaces and in public spaces when an individual is representing the project or its community.
42 | Examples of representing a project or community include acting as an appointed representative at an online or offline event, such as for media coverage or for conferences or other presentations.
43 |
44 | ## Enforcement
45 |
46 | Instances of abusive, harassing, or otherwise unacceptable behavior may be reported by contacting the project team at jessegmeyerlab@gmail.com.
47 | All complaints will be reviewed and investigated and will result in a response that is deemed necessary and appropriate to the circumstances.
48 | The project team is obligated to maintain confidentiality with regard to the reporter of an incident.
49 | Further details of specific enforcement policies may be posted separately.
50 |
51 | Project maintainers who do not follow or enforce the Code of Conduct in good faith may face temporary or permanent repercussions as determined by other members of the project's leadership.
52 | Only contributors in good standing with respect to the code of conduct will be offered the opportunity to review the approve the final manuscript, which is required for authorship through the ICMJE criteria.
53 |
54 | ## Attribution
55 |
56 | This Code of Conduct is adapted from [the COVID review](https://github.com/greenelab/covid19-review/edit/master/CODE_OF_CONDUCT.md), which was adapted from the [Contributor Covenant](https://www.contributor-covenant.org), [version 1.4](https://www.contributor-covenant.org/version/1/4/code-of-conduct.html).
57 |
58 | For answers to common questions about this code of conduct, see https://www.contributor-covenant.org/faq
59 |
--------------------------------------------------------------------------------
/.github/workflows/manubot.yaml:
--------------------------------------------------------------------------------
1 | name: Manubot
2 | on:
3 | push:
4 | branches:
5 | - main
6 | - master
7 | pull_request:
8 | branches:
9 | - main
10 | - master
11 | # NOTE: scheduled workflows are supported as of 2022-09-27 (example commented below)
12 | # https://github.com/community/community/discussions/12269#discussioncomment-3747667
13 | # scheduled:
14 | # - cron: '40 10 1 * *' # https://crontab.guru/#40_10_1_*_*
15 | workflow_dispatch:
16 | inputs:
17 | BUILD_PDF:
18 | type: boolean
19 | description: generate PDF output
20 | default: true
21 | BUILD_DOCX:
22 | type: boolean
23 | description: generate DOCX output
24 | default: false
25 | BUILD_LATEX:
26 | type: boolean
27 | description: generate LaTeX output
28 | default: false
29 | SPELLCHECK:
30 | type: boolean
31 | description: Check spelling
32 | default: true
33 | MANUBOT_USE_DOCKER:
34 | type: boolean
35 | description: Use Docker to generate PDF
36 | default: true
37 | jobs:
38 | manubot:
39 | name: Manubot
40 | runs-on: ubuntu-latest
41 | permissions:
42 | contents: write
43 | env:
44 | GITHUB_PULL_REQUEST_SHA: ${{ github.event.pull_request.head.sha }}
45 | # Set SPELLCHECK to true/false for whether to check spelling in this action.
46 | # For workflow dispatch jobs, this SPELLCHECK setting will be overridden by the user input.
47 | SPELLCHECK: true
48 | defaults:
49 | run:
50 | shell: bash --login {0}
51 | steps:
52 | - name: Checkout Repository
53 | uses: actions/checkout@v3
54 | with:
55 | # fetch entire commit history to support get_rootstock_commit
56 | fetch-depth: 0
57 | - name: Set Environment Variables (Workflow Dispatch)
58 | if: github.event_name == 'workflow_dispatch'
59 | run: |
60 | echo "BUILD_PDF=${{ github.event.inputs.BUILD_PDF }}" >> $GITHUB_ENV
61 | echo "BUILD_DOCX=${{ github.event.inputs.BUILD_DOCX }}" >> $GITHUB_ENV
62 | echo "BUILD_LATEX=${{ github.event.inputs.BUILD_LATEX }}" >> $GITHUB_ENV
63 | echo "SPELLCHECK=${{ github.event.inputs.SPELLCHECK }}" >> $GITHUB_ENV
64 | echo "MANUBOT_USE_DOCKER=${{ github.event.inputs.MANUBOT_USE_DOCKER }}" >> $GITHUB_ENV
65 | - name: Set Environment Variables
66 | run: |
67 | TRIGGERING_SHA=${GITHUB_PULL_REQUEST_SHA:-$GITHUB_SHA}
68 | echo "TRIGGERING_SHA_7=${TRIGGERING_SHA::7}" >> $GITHUB_ENV
69 | echo "TRIGGERING_SHA: $TRIGGERING_SHA"
70 | DEFAULT_BRANCH=${{ github.event.repository.default_branch }}
71 | echo "DEFAULT_BRANCH=${DEFAULT_BRANCH}" >> $GITHUB_ENV
72 | echo "DEFAULT_BRANCH_REF=refs/heads/$DEFAULT_BRANCH" >> $GITHUB_ENV
73 | echo "DEFAULT_BRANCH=$DEFAULT_BRANCH"
74 | - name: Cache
75 | uses: actions/cache@v3
76 | with:
77 | path: ci/cache
78 | key: ci-cache-${{ github.ref }}
79 | restore-keys: |
80 | ci-cache-${{ env.DEFAULT_BRANCH_REF }}
81 | - name: Install Environment
82 | uses: conda-incubator/setup-miniconda@v2
83 | with:
84 | activate-environment: manubot
85 | environment-file: build/environment.yml
86 | auto-activate-base: false
87 | miniforge-variant: Mambaforge
88 | miniforge-version: 'latest'
89 | use-mamba: true
90 | - name: Install Spellcheck
91 | if: env.SPELLCHECK == 'true'
92 | run: bash ci/install-spellcheck.sh
93 | - name: Build Manuscript
94 | run: bash build/build.sh
95 | - name: Upload Artifacts
96 | uses: actions/upload-artifact@v3
97 | with:
98 | name: manuscript-${{ github.run_id }}-${{ env.TRIGGERING_SHA_7 }}
99 | path: output
100 | - name: Deploy Manuscript
101 | if: github.ref == env.DEFAULT_BRANCH_REF && github.event_name != 'pull_request' && !github.event.repository.fork
102 | env:
103 | MANUBOT_ACCESS_TOKEN: ${{ secrets.GITHUB_TOKEN }}
104 | MANUBOT_SSH_PRIVATE_KEY: ${{ secrets.MANUBOT_SSH_PRIVATE_KEY }}
105 | CI_BUILD_WEB_URL: https://github.com/${{ github.repository }}/commit/${{ github.sha }}/checks
106 | CI_JOB_WEB_URL: https://github.com/${{ github.repository }}/actions/runs/${{ github.run_id }}
107 | run: bash ci/deploy.sh
108 |
--------------------------------------------------------------------------------
/content/17.knowledgebases.md:
--------------------------------------------------------------------------------
1 | ## 15. Data Repositories and Knowledge Bases {.page_break_before}
2 |
3 | ### Proteomics raw data repositories
4 | An essential part of the proteomics publication cycle is raw data sharing.
5 | This is important so that others can reproduce results and utilize data for new investigations.
6 | Computational researchers may use published data to develop new algorithms or combine multiple datasets into a meta study.
7 | There are many websites that serve as data repositories for publication.
8 | These include: PRIDE [@DOI:10.1093/nar/gkj138; @DOI:10.1093/nar/gkab1038], Massive [@PMID:30172843], and Chorus [@URL:https://chorusproject.org/pages/index.html]
9 |
10 | ### PeptideAtlas and SRMAtlas
11 | It would be beneficial to analyze all the data _in toto_ to derive a knowledge base of all detectable proteins in an organism.
12 | A challenge in attempting this is that, given the vast array of software for MS data analysis, the results are not directly comparable nor combinable given the problem of false discovery rates (FDR) that must be added up when dataset results are combined.
13 | For example, if we combine 3 datasets that were each filtered to 1% FDR, the maximum FDR of the combined dataset is now 3% because it is unlikely that the random decoy hits are shared across each dataset.
14 | To address this, in 2005 the PeptideAtlas concept was started to ingest as many publically available datasets as possible per organism, search the data through a single pipeline together and arrive at a total controlled 1% protein level FDR [@PMID:16052627; @PMID:16381952].
15 | The PeptideAtlas website (www.peptideatlas.org) is a multi-organism, publicly accessible compendium of peptides identified in large sets of tandem mass spectrometry proteomics experiments.
16 | Mass spectrometer output files are collected for human, mouse, yeast, and many other organisms of research interest, and searched using the latest search engines and genome derived protein sequences.
17 | All results of sequence and spectral library searching on PeptideAtlas are processed through the Trans Proteomic Pipeline to derive a probability of correct identification for all results in a uniform manner to insure a high quality database, along with false discovery rates at the whole atlas level.
18 |
19 | The most recognizable MS data compendium is the Human PeptideAtlas which is produced yearly since 2005 to derive all the peptide sequence knowledge of the current human proteome (Figure 17A).
20 | As of 2023, the Human PeptideAtlas contains the knowledge of over 93% of the human proteome, with over 189K MS runs and 3.6B spectra searched resulting in 3.4m peptides identified and 17,245 proteins identified from the 19,600 total proteins possible.
21 | The number of proteins has been incrementally increases year over year as new public data becomes available (Figure 17B).
22 |
23 | {#fig:peptide-atlas tag="17" width="100%"}
27 |
28 | For the presentation of selected reaction monitoring (SRM) targeted peptide assays, there are two components of the PeptideAtlas ecosystem where the PeptideAtlas SRM Experiment Library (PASSEL) is presented to enable submission, dissemination, and reuse of SRM experimental results from analysis of biological samples [@PMID:22318887; @PMID:24939129].
29 | The PASSEL system acts as a data repository by allowing researchers with SRM data to deposit their data in parallel with journal publication, and other users can search existing data to obtain the parameters for replication in their own laboratory.
30 | Another unique component for SRM data repositories is the SRMAtlas website, which provides definitive coordinates for all possible proteins within an organism to conduct targeted SRM assays that conclusively identify the respective peptide in biological samples.
31 | As an example, the Human SRMAtlas provides data on 166,174 synthetic proteotypic human peptides, providing multiple, independent assays to quantify any human protein and numerous spliced variants, non-synonymous mutations, and post-translational modifications [@PMID:27453469].
32 | The data are freely accessible as a resource at http://www.srmatlas.org/.
33 |
34 | ### Other knowledgebases
35 | There are many other knowledgebases that will be useful to proteomics researchers.
36 | These include the proteomics standards initiative (PSI) [@DOI:10.1021/acs.jproteome.7b00370], Proteometools [@DOI:10.1038/nmeth.4153], and iProX [@DOI:10.1093/nar/gky869].
37 | Panorama [@DOI:10.1021/pr5006636] is a resource for sharing processed proteomics data including the extracted ion chromatograms, which can improve transparency by enabling easy data inspection on the web.
38 |
--------------------------------------------------------------------------------
/content/03.biochemistry-basics.md:
--------------------------------------------------------------------------------
1 | ## 1. Biochemistry Basics {.page_break_before}
2 |
3 | ### Proteins
4 |
5 | Proteins are large biomolecules or biopolymers made up of a backbone of amino acids which are linked by peptide bonds.
6 | They perform various functions in living organisms ranging from structural roles to functional involvement in cellular signaling and the catalysis of chemical reactions (enzymes).
7 | Proteins are made up of 20 different amino acids (not counting pyrrolysine, hydroxyproline, and selenocysteine, which only occur in specific organisms) and their sequence is encoded in their corresponding genes.
8 | The human genome encodes approximately 19,778 of the predicted canonical proteins coded in the human genome (see www.neXtProt.org) [@PMID:36318223].
9 | Each protein is present at a different abundance depending on the cell type or bodily fluid.
10 | Previous studies have shown that the concentration range of proteins can span at least seven orders of magnitude to up to 20,000,000 copies per cell, and that their distribution is tissue-specific [@DOI:10.1038/msb.2011.82;@DOI:10.1016/j.cell.2020.08.036].
11 | Protein abundances can span more than ten orders of magnitude in human blood, while a few proteins make up most of the protein by weight in these fluids, making blood and plasma proteomics one of the most challenging matrices for mass spectrometry to analyze.
12 | Due to genetic variation, alternative splicing, and co- and post-translational modifications (PTMs), multiple different proteoforms can be produced from a single gene (**Figure 1**) [@DOI:10.1038/nmeth.2369; @DOI:10.1038/s41587-023-01714-x].
13 |
14 | {#fig:proteoforms tag="1" width="100%"}
19 |
20 | #### PTMs
21 |
22 | After protein biosynthesis, enzymatic and nonenzymatic processes change the protein sequence through proteolysis or covalent chemical modification of amino acid side chains.
23 | Post-translational modifications (PTMs) are important biological regulators contributing to the diversity and function of the cellular proteome.
24 | Proteins can be post-translationally modified through enzymatic and non-enzymatic reactions _in vivo_ and _in vitro_ [@doi:10.1093/database/baab012].
25 | PTMs can be reversible or irreversible, and they change protein function in multiple ways, for example by altering substrate–enzyme interactions, subcellular localization or protein-protein interactions [@PMID:33826699; @PMID:24217768].
26 |
27 | More than 400 biological PTMs have been discovered in both prokaryotic and eukaryotic cells.
28 | There are many more chemical artifact PTMs that occur during sample preparation, such as carbamylation.
29 | Biological modifications are crucial in controlling protein functions and signal transduction pathways [@PMID:23887885].
30 | The most commonly studied and biologically relevant post-translational modifications include phosphorylation (Ser, Thr, Tyr, His), glycosylation (Arg, Asp, Cys, Ser, Thr, Tyr, Trp), disulfide bonds (Cys-Cys), ubiquitination (Lys, Cys, Ser, Thr, N-term), succinylation (Lys), methylation (Arg, Lys, His, Glu, Asn, Cys), oxidation (especially Met, Trp, His, Cys), acetylation (Lys, N-term), and lipidations [@DOI:10.1042/BCJ20220251].
31 |
32 | Protein PTMs can alter its function, activity, structure, spatiotemporal status and interaction with proteins or small molecules.
33 | PTMs alter signal transduction pathways and gene expression control [@PMID:28656226] regulation of apoptosis [@PMID:23088365; @PMID:11368354] by phosphorylation.
34 | Ubiquitination generally regulates protein degradation [@PMID:16738015], SUMOylation regulates chromatin structure, DNA repair, transcription, and cell-cycle progression [@PMID:26601932; @PMID:29079793], and palmitoylation regulates the maintenance of the structural organization of exosome-like extracellular vesicle membranes [@PMID:30251702].
35 | Glycosylation is a ubiquitous modification that regulates various T cell functions, such as cellular migration, T cell receptor signaling, cell survival, and apoptosis [@PMID:22288421; @PMID:18846099].
36 | Deregulation of PTMs is linked to cellular stress and diseases [@doi:10.1038/s41570-020-00223-8].
37 |
38 | Several non-MS methods exist to study PTMs, including in vitro PTM reaction tests with colorimetric assays, radioactive isotope-labeled substrates, western blot with PTM-specific antibodies and superbinders, and peptide and protein arrays [@PMID:11062466; @PMID:12323352; @PMID:35613471].
39 | While effective, these approaches have many limitations, such as inefficiency and difficulty in producing pan-specific antibodies.
40 | MS-based proteomics approaches are currently the predominant tool for identifying and quantifying changes in PTMs.
41 |
42 |
--------------------------------------------------------------------------------
/setup.bash:
--------------------------------------------------------------------------------
1 | #!/bin/bash
2 | # Setup Manubot
3 | # Based on https://github.com/manubot/rootstock/blob/main/SETUP.md
4 | # This is designed to be run from the bash terminal
5 |
6 | # Stop on first error.
7 | set -e
8 |
9 | usage() {
10 | echo "Usage: $0 [--owner text] [--repo text] [--yes]"
11 | echo "Guides the user through the creation of a new Manubot repository for their manuscript."
12 | echo
13 | echo "If no options are supplied a fully interactive process is used."
14 | echo "OWNER and REPO refer to the details of your manuscript repo location:"
15 | echo "i.e. https://github.com/OWNER/REPO."
16 | echo
17 | echo "Options:"
18 | echo " -o --owner GitHub user or organization name."
19 | echo " -r --repo Name of the repository for your new manuscript."
20 | echo " -y --yes Non-interactive mode. Continue script without asking for confirmation that the repo exists."
21 | echo " -s --ssh Use SSH to authenticate GitHub account. HTTPS is used by default."
22 | echo " Option only effective if --yes is also set, otherwise answer given in user interaction takes precedence."
23 | echo " -h --help Display usage information."
24 | 1>&2; exit 1; }
25 |
26 | # Check if to continue
27 | check(){
28 | while true
29 | do
30 | echo "Once you have created your repo press enter to continue setup,"
31 | read -r -p "or type exit to quit now: " input
32 |
33 | case $input in
34 | "")
35 | echo
36 | echo "Continuing Setup..."
37 | echo
38 | break
39 | ;;
40 | [eE][xX][iI][tT])
41 | exit 1
42 | ;;
43 | *)
44 | echo
45 | echo "Invalid input, try again..."
46 | echo
47 | ;;
48 | esac
49 | done
50 | }
51 |
52 | # Option strings
53 | SHORT=o:r:hys
54 | LONG=owner:,repo:,help,yes,ssh
55 |
56 | YES=0 # continue when prompted
57 | AUTH=0 # used https or ssh auth
58 |
59 | # read the options
60 | OPTS=$(getopt --options $SHORT --long $LONG --name "$0" -- "$@")
61 |
62 | if [ $? != 0 ] ; then echo "Failed to parse options...exiting." >&2 ; exit 1 ; fi
63 |
64 | eval set -- "$OPTS"
65 |
66 | # extract options and their arguments into variables.
67 | while true ; do
68 | case "$1" in
69 | -o | --owner )
70 | shift;
71 | OWNER=$1
72 | shift
73 | ;;
74 | -r | --repo )
75 | REPO="$2"
76 | shift 2
77 | ;;
78 | -y | --yes )
79 | YES=1
80 | shift
81 | ;;
82 | -s | --ssh )
83 | AUTH=1;
84 | shift
85 | ;;
86 | -- )
87 | shift
88 | break
89 | ;;
90 | -h | --help )
91 | shift
92 | usage
93 | exit 1
94 | ;;
95 | *)
96 | echo "Internal error!"
97 | exit 1
98 | ;;
99 | esac
100 | done
101 |
102 | if [ -z "${OWNER}" ] || [ -z "${REPO}" ]; then
103 | echo "This script will take you through the setup process for Manubot."
104 | echo "First, we need to specify where to create the GitHub repo for your manuscript."
105 | echo
106 | echo "The URL will take this format: https://github.com/OWNER/REPO."
107 | echo "OWNER is your username or organization"
108 | echo "REPO is the name of your repository"
109 | echo
110 | read -r -p "Type in the OWNER now:" input
111 | OWNER=$input
112 | read -r -p "Type in the REPO now:" input
113 | REPO=$input
114 | fi
115 |
116 | # If using interactive mode, check remote repo exists.
117 | if [[ "$YES" == '0' ]]; then
118 | while true
119 | do
120 | echo
121 | read -r -p "Have you manually created https://github.com/${OWNER}/${REPO}? [y/n] " input
122 |
123 | case $input in
124 | [yY][eE][sS]|[yY])
125 |
126 | echo
127 | echo "Continuing Setup..."
128 | echo
129 | break
130 | ;;
131 | [nN][oO]|[nN])
132 | echo
133 | echo "Go to https://github.com/new and create https://github.com/${OWNER}/${REPO}"
134 | echo "Note: the new repo must be completely empty or the script will fail."
135 | echo
136 | check
137 | break
138 | ;;
139 | *)
140 | echo
141 | echo "Invalid input, try again..."
142 | echo
143 | ;;
144 | esac
145 | done
146 | else
147 | echo "Setting up https://github.com/${OWNER}/${REPO}"
148 | fi
149 |
150 | # Clone manubot/rootstock
151 | echo
152 | echo "Cloning Rootstock..."
153 | echo
154 | git clone --single-branch https://github.com/manubot/rootstock.git ${REPO}
155 | cd ${REPO}
156 |
157 | echo
158 | echo "Setup tracking of remote..."
159 |
160 | # Configure remotes
161 | git remote add rootstock https://github.com/manubot/rootstock.git
162 |
163 | # Check auth method
164 | if [[ "$YES" == '0' ]]; then
165 | while true
166 | do
167 | echo
168 | read -r -p "Would you like to use SSH to authenticate your GitHub account? [y/n] " input
169 |
170 | case $input in
171 | [yY][eE][sS]|[yY])
172 | AUTH=1
173 | break
174 | ;;
175 | [nN][oO]|[nN])
176 | AUTH=0
177 | break
178 | ;;
179 | *)
180 | echo
181 | echo "Invalid input, try again..."
182 | echo
183 | ;;
184 | esac
185 | done
186 | fi
187 |
188 | case $AUTH in
189 | 0)
190 | echo
191 | echo "Setting origin URL using its https web address"
192 | echo
193 | git remote set-url origin https://github.com/${OWNER}/${REPO}.git
194 | ;;
195 | 1)
196 | echo
197 | echo "Setting origin URL using SSH"
198 | echo
199 | git remote set-url origin git@github.com:$OWNER/$REPO.git
200 | ;;
201 | esac
202 |
203 | git push --set-upstream origin main
204 |
205 | # To use GitHub Actions only:
206 | echo "Setup for GitHub Actions ONLY..."
207 | # remove AppVeyor config
208 | git rm .appveyor.yml
209 | # remove ci/install.sh (only used by AppVeyor)
210 | git rm ci/install.sh
211 |
212 | # Update README
213 | echo "Updating README..."
214 | # Perform substitutions
215 | sed "s/manubot\/rootstock/${OWNER}\/${REPO}/g" README.md > tmp && mv -f tmp README.md
216 | sed "s/manubot\.github\.io\/rootstock/${OWNER}\.github\.io\/${REPO}/g" README.md > tmp && mv -f tmp README.md
217 |
218 | echo "Committing rebrand..."
219 | git add --update
220 | git commit --message "Brand repo to $OWNER/$REPO"
221 | git push origin main
222 | echo
223 | echo "Setup complete"
224 | echo
225 | echo "The new repo has been created at $(pwd)"
226 | echo
227 | echo "A good first step is to modify content/metadata.yaml with the relevant information for your manuscript."
228 | echo
229 |
--------------------------------------------------------------------------------
/content/images/MS_f101.svg:
--------------------------------------------------------------------------------
1 |
--------------------------------------------------------------------------------
/LICENSE-CC0.md:
--------------------------------------------------------------------------------
1 | # CC0 1.0 Universal
2 |
3 | ```
4 | CREATIVE COMMONS CORPORATION IS NOT A LAW FIRM AND DOES NOT PROVIDE LEGAL SERVICES. DISTRIBUTION OF THIS DOCUMENT DOES NOT CREATE AN ATTORNEY-CLIENT RELATIONSHIP. CREATIVE COMMONS PROVIDES THIS INFORMATION ON AN "AS-IS" BASIS. CREATIVE COMMONS MAKES NO WARRANTIES REGARDING THE USE OF THIS DOCUMENT OR THE INFORMATION OR WORKS PROVIDED HEREUNDER, AND DISCLAIMS LIABILITY FOR DAMAGES RESULTING FROM THE USE OF THIS DOCUMENT OR THE INFORMATION OR WORKS PROVIDED HEREUNDER.
5 | ```
6 |
7 | ### Statement of Purpose
8 |
9 | The laws of most jurisdictions throughout the world automatically confer exclusive Copyright and Related Rights (defined below) upon the creator and subsequent owner(s) (each and all, an "owner") of an original work of authorship and/or a database (each, a "Work").
10 |
11 | Certain owners wish to permanently relinquish those rights to a Work for the purpose of contributing to a commons of creative, cultural and scientific works ("Commons") that the public can reliably and without fear of later claims of infringement build upon, modify, incorporate in other works, reuse and redistribute as freely as possible in any form whatsoever and for any purposes, including without limitation commercial purposes. These owners may contribute to the Commons to promote the ideal of a free culture and the further production of creative, cultural and scientific works, or to gain reputation or greater distribution for their Work in part through the use and efforts of others.
12 |
13 | For these and/or other purposes and motivations, and without any expectation of additional consideration or compensation, the person associating CC0 with a Work (the "Affirmer"), to the extent that he or she is an owner of Copyright and Related Rights in the Work, voluntarily elects to apply CC0 to the Work and publicly distribute the Work under its terms, with knowledge of his or her Copyright and Related Rights in the Work and the meaning and intended legal effect of CC0 on those rights.
14 |
15 | 1. __Copyright and Related Rights.__ A Work made available under CC0 may be protected by copyright and related or neighboring rights ("Copyright and Related Rights"). Copyright and Related Rights include, but are not limited to, the following:
16 |
17 | i. the right to reproduce, adapt, distribute, perform, display, communicate, and translate a Work;
18 |
19 | ii. moral rights retained by the original author(s) and/or performer(s);
20 |
21 | iii. publicity and privacy rights pertaining to a person's image or likeness depicted in a Work;
22 |
23 | iv. rights protecting against unfair competition in regards to a Work, subject to the limitations in paragraph 4(a), below;
24 |
25 | v. rights protecting the extraction, dissemination, use and reuse of data in a Work;
26 |
27 | vi. database rights (such as those arising under Directive 96/9/EC of the European Parliament and of the Council of 11 March 1996 on the legal protection of databases, and under any national implementation thereof, including any amended or successor version of such directive); and
28 |
29 | vii. other similar, equivalent or corresponding rights throughout the world based on applicable law or treaty, and any national implementations thereof.
30 |
31 | 2. __Waiver.__ To the greatest extent permitted by, but not in contravention of, applicable law, Affirmer hereby overtly, fully, permanently, irrevocably and unconditionally waives, abandons, and surrenders all of Affirmer's Copyright and Related Rights and associated claims and causes of action, whether now known or unknown (including existing as well as future claims and causes of action), in the Work (i) in all territories worldwide, (ii) for the maximum duration provided by applicable law or treaty (including future time extensions), (iii) in any current or future medium and for any number of copies, and (iv) for any purpose whatsoever, including without limitation commercial, advertising or promotional purposes (the "Waiver"). Affirmer makes the Waiver for the benefit of each member of the public at large and to the detriment of Affirmer's heirs and successors, fully intending that such Waiver shall not be subject to revocation, rescission, cancellation, termination, or any other legal or equitable action to disrupt the quiet enjoyment of the Work by the public as contemplated by Affirmer's express Statement of Purpose.
32 |
33 | 3. __Public License Fallback.__ Should any part of the Waiver for any reason be judged legally invalid or ineffective under applicable law, then the Waiver shall be preserved to the maximum extent permitted taking into account Affirmer's express Statement of Purpose. In addition, to the extent the Waiver is so judged Affirmer hereby grants to each affected person a royalty-free, non transferable, non sublicensable, non exclusive, irrevocable and unconditional license to exercise Affirmer's Copyright and Related Rights in the Work (i) in all territories worldwide, (ii) for the maximum duration provided by applicable law or treaty (including future time extensions), (iii) in any current or future medium and for any number of copies, and (iv) for any purpose whatsoever, including without limitation commercial, advertising or promotional purposes (the "License"). The License shall be deemed effective as of the date CC0 was applied by Affirmer to the Work. Should any part of the License for any reason be judged legally invalid or ineffective under applicable law, such partial invalidity or ineffectiveness shall not invalidate the remainder of the License, and in such case Affirmer hereby affirms that he or she will not (i) exercise any of his or her remaining Copyright and Related Rights in the Work or (ii) assert any associated claims and causes of action with respect to the Work, in either case contrary to Affirmer's express Statement of Purpose.
34 |
35 | 4. __Limitations and Disclaimers.__
36 |
37 | a. No trademark or patent rights held by Affirmer are waived, abandoned, surrendered, licensed or otherwise affected by this document.
38 |
39 | b. Affirmer offers the Work as-is and makes no representations or warranties of any kind concerning the Work, express, implied, statutory or otherwise, including without limitation warranties of title, merchantability, fitness for a particular purpose, non infringement, or the absence of latent or other defects, accuracy, or the present or absence of errors, whether or not discoverable, all to the greatest extent permissible under applicable law.
40 |
41 | c. Affirmer disclaims responsibility for clearing rights of other persons that may apply to the Work or any use thereof, including without limitation any person's Copyright and Related Rights in the Work. Further, Affirmer disclaims responsibility for obtaining any necessary consents, permissions or other rights required for any use of the Work.
42 |
43 | d. Affirmer understands and acknowledges that Creative Commons is not a party to this document and has no duty or obligation with respect to this CC0 or use of the Work.
44 |
--------------------------------------------------------------------------------
/content/11.peptide-ionization.md:
--------------------------------------------------------------------------------
1 | ## 9. Peptide Ionization {.page_break_before}
2 |
3 | As early as the late 1950s, derivitization reagents were used to make peptides volatile enough for electron impact ionization analysis [@DOI:10.1021/ja01518a069].
4 | Eventually this led to GC-MS analysis of derivatized peptides for sequencing [@DOI:10.1021/ja00802a048].
5 | In the early 1980s, fast atom bombardment (FAB) enabled peptide ionization and sequencing by MS/MS [@DOI:10.1021/ac00234a035], but difficulty interfacing FAB with LC limited its utility [@DOI:10.1021/ja3094313].
6 | New soft ionization techniques called matrix-assisted laser desorption (MALDI) and electrospray ionization (ESI) were applied to peptides around 1990, which revolutionized the field of proteomics by making high throughput ionization of peptides easy.
7 | These two techniques were so impactful that the 2002 Nobel Prize in Chemistry was co-awarded to John Fenn (ESI) and Koichi Tanaka (MALDI) "for their development of soft desorption ionization methods for mass spectrometric analyses of biological macromolecules" [@URL:https://www.nobelprize.org/prizes/chemistry/2002/summary].
8 |
9 | ### MALDI
10 | The term "Matrix-assisted laser desorption" was coined by Hillenkamp and Karas in 1985, although this orignal paper only applied the technique to dipeptides [@URL:https://pubs.acs.org/doi/abs/10.1021/ac00291a042].
11 | It was Koichi Tanaka who first applied this idea to proteins above 10,000 Daltons in size and published a paper in the Proceedings of the 2nd Japan-China Joint Symposium on Mass spectrometry in 1987 (Tanaka, K., Ido, Y., Akita, S., Yoshida, Y. and Yoshida, T. (1987) Detection of high mass molecules by laser desorption time-of-flight mass spectrometry. Proceedings of the 2nd Japan-China Joint Symposium on Mass spectrometry, 185-187), and then in a follow-up paper published in 1988 [@DOI:10.1002/rcm.1290020802].
12 | A few months later, Karas and Hillenkamp also demonstrated MALDI applied to proteins above 10kDa with MALDI [@DOI:10.1021/ac00171a028].
13 | This resulted in some controversy about who should have won the Nobel prize [@URL:https://web.archive.org/web/20070517202246/http://cmbi.bjmu.edu.cn/news/0212/55.htm] as it was felt by the community that Hillenkamp and Karas had provided the technology several years before but it was Koichi Tanaka that was the first to apply the MALDI technology to proteins a year before Hillenkamp and Karas.
14 |
15 | MALDI first requires the peptide sample to be co-crystallized with a matrix molecule, which is usually a volatile, low molecular-weight, organic aromatic compound (**Figure 6**).
16 | Some examples of such compounds are cyno-hydroxycinnamic acid, dihyrobenzic acid, sinapinic acid, alpha-hydroxycinnamic acid, ferulic acid etc [@PMID:23681820].
17 | Subsequently, the analyte is placed in a vacuum chamber in which it is irradiated with a laser, usually at 337nm [@DOI:10.1021/cr010375i].
18 | This laser energy is absorbed by the matrix, which then transfers that energy along with its free protons to the co-crystalized peptides without significantly breaking them.
19 | The matrix and co-crystallized sample generate plumes, and the volatile matrix imparts its protons to the peptides as it gets ionized first.
20 | The weak acidic conditions used as well as the acidic nature of the matrix allows easy exchange of protons for the peptides to get ionized and fly under the electrical field in the mass spectrometer.
21 | These ionized peptides generally form the metastable ions, most of them will fragment quickly [@DOI:10.1021/ac00099a029].
22 | However, it can take several milliseconds and the mass spectrometry analysis can be performed before this time.
23 | Peptides ionized by MALDI almost always take up a single charge and thus observed and detected as [M+H]+ species.
24 |
25 | #### MALDI Mechanism
26 | {#fig:MALDI-mechanism tag="6" width="100%"}
29 |
30 | According to PubMed, the number of publications related to MALDI peaked in 2013 and has been steadily declining.
31 | Concurrently, the usage of MALDI for bottom-up proteomics has subsided in favor of the better depth and throughput possible from using ESI.
32 | MALDI is still widely used for mass spectrometry imaging of proteins and metabolites [@PMID:29155564].
33 |
34 | ### Electrospray Ionization
35 | ESI was first applied to peptides by John Fenn and coworkers in 1989 [@DOI:10.1126/science.2675315].
36 | Concepts related to ESI were published at least as early as 1882, when Lord Rayleigh described the number of charges that could assemble on the surface of a droplet [@DOI:10.1080/14786448208628425].
37 | ESI is usually coupled with reverse-phase liquid-chromatography of peptides directly interfaced to a mass spectrometer.
38 | A high voltage (~ 2 kV) is applied between the spray needle and the mass spectrometer (**Figure 7**).
39 | As solvent exits the needle, it forms droplets that take on charge at the surface, and through a debated mechanism, those charges are imparted to peptide ions.
40 | The liquid phase is generally kept acidic to help impart protons easily to the analytes.
41 |
42 | Tryptic peptides ionized by ESI usually carry one charge one the side chain of their C-terminal residue (Arg or Lys) and one charge at their n-terminal amine.
43 | Peptides can have more than one charge if they have a longer peptide backbone, have histidine residues, or have missed cleavages leaving extra Arg and Lys.
44 | In most cases, peptides ionized by ESI are observed at more than one charge state.
45 | Evidence suggests that the distribution of peptide charge states can be manipulated through chemical additives [@PMID:22610994].
46 |
47 | #### Electrospray Mechanism
48 | {#fig:ESI-mechanism tag="7" width="100%"}
50 |
51 | The main goal of ESI is the production of gas-phase ions from electrolyte ions in solution.
52 | During the process of ionization, the solution emerging from the electrospray needle or capillary is distorted into a Taylor cone and charged droplets are formed.
53 | The charged droplets subsequently decrease in size due to solvent evaporation.
54 | As the droplets shrink, the charge density and Coulombic repulsion increase.
55 | This process destabilizes the droplets until the repulsion between the charges is higher than the surface tension and they fission (Coulomb explosion) [@PMID:19551695] [@PMID:23134552].
56 | Typical bottom-up proteomics experiments make use of acidic analyte solutions which leads to the formation of positively charged analyte molecules due to an excess presence of protons.
57 |
58 |
59 |
60 |
61 |
--------------------------------------------------------------------------------
/content/02.introduction.md:
--------------------------------------------------------------------------------
1 | ## Introduction {.page_break_before}
2 |
3 | Proteomics is the large-scale study of protein structure and function.
4 | The term "proteomics" is thought to have been coined by Marc R. Wilkins.
5 | Proteins are translated from messenger RNA (mRNA) transcripts that are transcribed from the complementary DNA-based genome.
6 | Although the genome encodes potential cellular functions and states, the study of proteins in all their forms is necessary to truly understand biology.
7 |
8 | Currently, proteomics can be performed with various methods.
9 | Mass spectrometry has emerged within the past few decades as the premier tool for comprehensive proteome analysis.
10 | The ability of mass spectrometry (MS) to detect charged chemicals enables the identification of peptide sequences and modifications for diverse biological investigations.
11 | Alternative (commercial) methods based on affinity interactions of antibodies or DNA aptamers have been developed, namely [Olink](https://olink.com/) and [SomaScan](https://somalogic.com/somascan-platform/https://somalogic.com/somascan-platform/).
12 | There are also nascent methods that are either recently commercialized or still under development and not yet applicable to whole proteomes, such as motif scanning using antibodies, variants of N-terminal degradation, and nanopores [@DOI:10.1038/s41565-023-01462-8; @DOI:10.1021/jacs.2c13465; @DOI:10.1038/nnano.2016.267; @DOI:10.1021/jacs.1c11758].
13 | Another approach uses parallel immobilization of peptides with total internal reflection microscopy and sequential Edman degradation [@DOI:10.1038/nbt.4278].
14 | However, by far the most common method for proteomics is based on mass spectrometry coupled to liquid chromatography (LC).
15 |
16 | Modern proteomics had its roots in the early 1980s with the analysis of peptides by mass spectrometry and low efficiency ion sources.
17 | One pioneer in the field was Don Hunt, who described sequencing of peptides using tandem mass spectrometry after chemical ionization with isobutane in 1981 [@DOI:10.1002/bms.1200080909].
18 | Another pioneer was Klaus Biemann, who for example worked with Brad Gibson to report peptide identification from fast atom bombardment [@DOI:10.1073/pnas.81.7.1956].
19 | Progress started ramping up around the year 1990 with the introduction of soft ionization methods that enabled, for the first time, efficient transfer of large biomolecules into the gas phase without destroying them [@DOI:10.1126/science.2675315; @DOI:10.1002/rcm.1290020802].
20 | Shortly afterward, the first computer algorithm for matching peptides to a database was introduced [@PMID:24226387].
21 | Another major milestone that allowed identification of over 1,000 proteins were improvements to chromatography upstream of MS anlaysis [@DOI:10.1021/ac010617e].
22 | As the volume of data exploded, methods for statistical analysis transitioned from the wild west of ad hoc empirical analysis to modern informatics based on statistical models [@DOI:10.1021/ac0341261] and false discovery rate [@DOI:10.1038/nmeth1019].
23 |
24 |
25 | Two strategies of mass spectrometry-based proteomics differ fundamentally by whether proteins are analyzed as a whole chain or cleaved into peptides before analysis: "top-down" versus "bottom-up".
26 | Bottom-up proteomics (also referred to as shotgun proteomics) is defined by the intentional hydrolysis of proteins into peptide pieces using enzymes called proteases [@DOI:10.1038/nature19949].
27 | Therefore, bottom-up proteomics does not actually measure proteins, but instead infers protein presence and abundance from identified peptides [@DOI:10.1021/ac0341261].
28 | Sometimes, proteins are inferred from only one peptide sequence representing a small fraction of the total protein sequence predicted from the genome.
29 | In contrast, top-down proteomics attempts to measure intact proteins [@DOI:10.1021/ja973655h; @DOI:10.1038/nmeth.2369; @DOI:10.1021/ac0415657; @DOI:10.1039/C9MO00154A].
30 | The potential benefit of top-down proteomics is the ability to measure the many varied proteoforms [@DOI:10.1038/nmeth.2369; @DOI:10.1126/science.aat1884; @DOI:10.1038/nchembio.2576].
31 | However, due to myriad analytical challenges, the depth of protein coverage that is achievable by top-down proteomics is considerably less than that of bottom-up proteomics [@DOI:10.1021/acs.analchem.7b04747].
32 |
33 | In this tutorial we focus on the bottom-up proteomics workflow.
34 | The most common version of this workflow is generally comprised of the following steps.
35 | First, proteins in a biological sample must be extracted.
36 | Usually this is achieved by mechanically lysing cells or tissue while denaturing and solubilizing the proteins and disrupting DNA to minimize interference in analysis procedures.
37 | Next, proteins are hydrolyzed into peptides, most often using the protease trypsin, which generates peptides with basic C-terminal amino acids (arginine and lysine) to aid in fragment ion series production during tandem mass spectrometry (MS/MS).
38 | Peptides can also be generated by chemical reactions that induce residue specific hydrolysis, such as cyanogen bromide that cleaves after methionine.
39 | Peptides from proteome hydrolysis must be purified; this is often accomplished with reversed-phase liquid chromatography (RPLC) cartridges or tips to remove interfering molecules in the sample such as salts and buffers.
40 | The peptides are then almost always separated by reversed-phase LC before they are ionized and introduced into a mass spectrometer, although recent reports also describe LC-free proteomics by direct infusion [@DOI:10.1038/s41592-020-00999-z; @DOI:10.1021/acs.analchem.2c02249; @DOI:10.1101/2023.06.26.546628].
41 | The mass spectrometer then collects precursor and fragment ion data from those peptides.
42 | Peptides must be identified from the tandem mass spectra, protein groups are inferred from a proteome database, and then quantitative values are assigned.
43 | Changes in protein abundances across conditions are determined with statistical tests, and results must be interpreted in the context of the relevant biology.
44 | Data interpretation is the rate limiting step; data collected in less than one week can take months or years to understand.
45 |
46 | There are many variations to this workflow.
47 | The diversity of experimental goals that are achievable with proteomics technology drives an expansive array of workflows.
48 | Every choice is important as every choice will affect the results, from instrument procurement to choice of data processing software and everything in between.
49 | In this tutorial, we detail all the required steps to serve as a comprehensive overview for new proteomics practitioners.
50 |
51 | There are 17 sections in total:
52 |
53 | 1. Biochemistry basics
54 | 2. Types of experiments
55 | 3. Protein extraction
56 | 4. Proteolysis
57 | 5. Peptide Quantification Methods
58 | 6. Enrichments
59 | 7. Peptide purification
60 | 8. Liquid Chromatography
61 | 9. Peptide Ionization
62 | 10. Mass Spectrometry
63 | 11. Peptide Fragmentation (MS/MS)
64 | 12. Data Acquisition
65 | 13. Raw Data Analysis
66 | 14. Protein Databases
67 | 15. Proteomics Knowledge Bases
68 | 16. Biological Interpretation
69 | 17. Orthogonal Validation Experiments
70 |
71 |
--------------------------------------------------------------------------------
/content/metadata.yaml:
--------------------------------------------------------------------------------
1 | ---
2 | title: "Comprehensive Overview of Bottom-Up Proteomics using Mass Spectrometry"
3 | date: null # Defaults to date generated, but can specify like '2022-10-31'.
4 | keywords:
5 | - markdown
6 | - publishing
7 | - manubot
8 | lang: en-US
9 | authors:
10 |
11 | - github: jymbcrc
12 | name: Yuming Jiang
13 | initials: YJ
14 | orcid: 0000-0001-7444-3849
15 | twitter: yumingjiang94
16 | email: Yuming.Jiang@cshs.org
17 | affiliations:
18 | - Department of Computational Biomedicine, Cedars Sinai Medical Center
19 |
20 | - github: ArokiaRex
21 | name: Devasahayam Arokia Balaya Rex
22 | initials: DABR
23 | orcid: 0000-0002-9556-3150
24 | twitter: rexprem
25 | email: rexprem@yenepoya.edu.in
26 | affiliations:
27 | - Center for Systems Biology and Molecular Medicine, Yenepoya Research Centre, Yenepoya (Deemed to be University), Mangalore 575018, India
28 |
29 | - github: dschust-r
30 | name: Dina Schuster
31 | initials: DS
32 | orcid: 0000-0001-6611-8237
33 | twitter: dina_sch
34 | email: dschuster@ethz.ch
35 | affiliations:
36 | - Department of Biology, Institute of Molecular Systems Biology, ETH Zurich, Zurich 8093, Switzerland
37 | - Department of Biology, Institute of Molecular Biology and Biophysics, ETH Zurich, Zurich 8093, Switzerland
38 | - Laboratory of Biomolecular Research, Division of Biology and Chemistry, Paul Scherrer Institute, Villigen 5232, Switzerland
39 |
40 | - github: neely
41 | name: Benjamin A. Neely
42 | initials: BAN
43 | orcid: 0000-0001-6120-7695
44 | twitter: neely615
45 | email: benjamin.neely@nist.gov
46 | affiliations:
47 | - Chemical Sciences Division, National Institute of Standards and Technology, NIST Charleston
48 | funders:
49 | - NIST
50 |
51 | - github: ger225
52 | name: Germán L. Rosano
53 | initials: GLR
54 | orcid: 0000-0002-8313-6813
55 | twitter: GermanRosano
56 | email: rosano@ibr-conicet.gov.ar
57 | affiliations:
58 | - Mass Spectrometry Unit, Institute of Molecular and Cellular Biology of Rosario, Rosario, Argentina
59 | funders:
60 | - Grant PICT 2019-02971 (Agencia I+D+i)
61 |
62 | - github: norbertvolkmar
63 | name: Norbert Volkmar
64 | initials: NV
65 | orcid: 0000-0003-0766-5606
66 | twitter: NorbertVolkmar
67 | email: norbertvolkmar22@gmail.com
68 | affiliations:
69 | - Department of Biology, Institute of Molecular Systems Biology, ETH Zurich, Zurich 8093, Switzerland
70 |
71 | - github: amandamomenzadeh
72 | name: Amanda Momenzadeh
73 | initials: AM
74 | orcid: 0000-0002-8614-0690
75 | email: amanda.momenzadeh@cshs.org
76 | affiliations:
77 | - Department of Computational Biomedicine, Cedars Sinai Medical Center, Los Angeles, California, USA
78 |
79 | - github: petersclarke
80 | name: Trenton M. Peters-Clarke
81 | initials: TMPC
82 | orcid: 0000-0002-9153-2525
83 | twitter: trentmpc
84 | email: trenton.petersclarke@ucsf.edu
85 | affiliations:
86 | - Department of Pharmaceutical Chemistry, University of California-San Francisco
87 |
88 | - github: lichenlady94
89 | name: Susan B. Egbert
90 | initials: SBE
91 | orcid: 0000-0001-5458-1099
92 | twitter: lichenlady94
93 | email: sbegbert80@gmail.com
94 | affiliations:
95 | - Department of Chemistry, University of Manitoba, Winnipeg, Cananda
96 |
97 | - name: Simion Kreimer
98 | initials: SK
99 | orcid: 0000-0001-6627-3771
100 | twitter: KreimerSimion
101 | email: Simion.Kreimer@cshs.org
102 | affiliations:
103 | - Smidt Heart Institute, Cedars Sinai Medical Center
104 | - Advanced Clinical Biosystems Research Institute, Cedars Sinai Medical Center
105 |
106 | - github: edoud1
107 | name: Emma H. Doud
108 | initials: EHD
109 | orcid: 0000-0003-0049-0073
110 | twitter: fireinlab
111 | email: edoud@iu.edu
112 | affiliations:
113 | - Center for Proteome Analysis, Indiana University School of Medicine, Indianapolis, Indiana, USA
114 |
115 | - github: ococrook
116 | name: Oliver M. Crook
117 | initials: OMC
118 | orcid: 0000-0001-5669-8506
119 | twitter: OllyMCrook
120 | email: Oliver.Crook@stats.ox.ac.uk
121 | affiliations:
122 | - Oxford Protein Informatics Group, Department of Statistics, University of Oxford, Oxford OX1 3LB, United Kingdom
123 |
124 | - github: aky
125 | name: Amit Kumar Yadav
126 | initials: AKY
127 | orcid: 0000-0002-9445-8156
128 | twitter: theoneamit
129 | email: amit.yadav@thsti.res.in
130 | affiliations:
131 | - Translational Health Science and Technology Institute
132 | funders:
133 | - Grant BT/PR16456/BID/7/624/2016 (Department of Biotechnology, India)
134 | - Grant Translational Research Program (TRP) at THSTI funded by DBT
135 |
136 | - github: vanuopadathmurali
137 | name: Muralidharan Vanuopadath
138 | initials: MV
139 | orcid: 0000-0002-9364-917X
140 | twitter: V_MuraleeDhar
141 | email: muralidharanv@am.amrita.edu
142 | affiliations:
143 | - School of Biotechnology, Amrita Vishwa Vidyapeetham, Kollam-690 525, Kerala, India
144 | funders:
145 | - Department of Health Research, Indian Council of Medical Research, Government of India (File No.R.12014/31/2022-HR)
146 |
147 | - github: HegemanLab
148 | name: Adrian D. Hegeman
149 | initials: ADH
150 | orcid: 0000-0003-1008-6066
151 | email: hegem007@umn.edu
152 | affiliations:
153 | - Departments of Horticultural Science and Plant and Microbial Biology, University of Minnesota – Twin Cities, United States
154 | funders:
155 | - Nation Science Foundation MCB-2225057 and IOS-2025297
156 |
157 | - github: martinmayta
158 | name: Martín L. Mayta
159 | initials: MLM
160 | orcid: 0000-0002-7986-4551
161 | twitter: MartinMayta2
162 | email: martin.mayta@uap.edu.ar
163 | affiliations:
164 | - School of Medicine and Health Sciences, Center for Health Sciences Research, Universidad Adventista del Plata, Libertador San Martín 3103, Argentina
165 | - Molecular Biology Department, School of Pharmacy and Biochemistry, Universidad Nacional de Rosario, Rosario 2000, Argentina
166 |
167 | - github: agduboff
168 | name: Anna G. Duboff
169 | initials: AGD
170 | orcid: 0009-0002-7316-3831
171 | email: agduboff@uw.edu
172 | affiliations:
173 | - Department of Chemistry, University of Washington
174 | funders:
175 | - Summer Research Acceleration Fellowship, Department of Chemistry, University of Washington
176 |
177 | - github: rileynm
178 | name: Nicholas M. Riley
179 | initials: NMR
180 | orcid: 0000-0002-1536-2966
181 | twitter: riley_nm1
182 | email: nmriley@uw.edu
183 | affiliations:
184 | - Department of Chemistry, University of Washington
185 | funders:
186 | - National Institutes of Health Grant R00 GM147304
187 |
188 | - name: Robert L. Moritz
189 | initials: RLM
190 | orcid: 0000-0002-3216-9447
191 | twitter: r_l_moritz
192 | email: rmoritz@systemsbiology.org
193 | affiliations:
194 | - Institute for Systems biology, Seattle, WA, USA, 98109
195 | funders:
196 | - National Institutes of Health Grants R01GM087221, R24GM127667, U19AG023122, S10OD026936
197 | - National Science Foundation Award 1920268
198 |
199 | - github: jessegmeyerlab
200 | name: Jesse G. Meyer
201 | initials: JGM
202 | orcid: 0000-0003-2753-3926
203 | twitter: j_my_sci
204 | email: jessegmeyer@gmail.com
205 | affiliations:
206 | - Department of Computational Biomedicine, Cedars Sinai Medical Center
207 | funders:
208 | - National Institutes of Health Grant R21 AG074234
209 | - National Institutes of Health Grant R35 GM142502
210 | coi:
211 | string: "None"
212 | lastapproved: !!str 2022-01-16
213 |
214 | # use this to randomize authors
215 | manubot-randomize-author-order: false
216 |
--------------------------------------------------------------------------------
/README.md:
--------------------------------------------------------------------------------
1 | # A Practical Beginner's Guide to Proteomics
2 |
3 |
4 |
5 | [](https://jessegmeyerlab.github.io/proteomics-tutorial/)
6 | [](https://jessegmeyerlab.github.io/proteomics-tutorial/manuscript.pdf)
7 | [](https://github.com/jessegmeyerlab/proteomics-tutorial/actions)
8 |
9 | ^^^ click the links above to see the current manuscript in HTML or PDF format!! ^^^
10 |
11 | ## Project information
12 |
13 | **Purpose.** We are collaboratively writing a broad, basic tutorial on proteomics invited by the ACS journal [Measurement Science Au](https://pubs.acs.org/journal/amachv). Anyone is welcome to contribute! An example of the scope and style is [this tutorial from Jillian Dempsy's group on cyclic voltammetry](https://pubs.acs.org/doi/full/10.1021/acs.jchemed.7b00361). We are using [Manubot](https://github.com/manubot/manubot) to write this manuscript and track contributions. See the remaining readme below this section or at the link above for more information about manubot. **We are aiming to submit by the end of 2022.**
14 |
15 | **Authorship.** I (Jesse Meyer) am leading this project as senior author, and we will decide author order for everyone else based on the contributions recorded via github. The minimum suggested contribution for authorship is five coherent and well referenced paragraphs. Authorship can also be awarded for helping with reviewing new contributions (pull requests), editing, or by making figures and tables.
16 |
17 | The author order in the pdf and html versions is currently set to random to indicate that the order is up in the air until the first draft of the paper is complete.
18 |
19 | **Rules.**
20 | See also [the code of conduct](https://github.com/jessegmeyerlab/proteomics-tutorial/blob/main/CODE_OF_CONDUCT.md) for participants.
21 |
22 | ## **How to contribute:**
23 | Edit sections in the /content directory and create a pull request describing your changes. You can practice contributing [here](https://github.com/manubot/try-manubot).
24 | See the [contributing section](https://github.com/jessegmeyerlab/proteomics-tutorial/blob/main/contributing.md) for guidance on how to contribute and what is expected of authors.
25 |
26 | [** PLEASE CHECK OUT THIS HELP VIDEO FIRST**](https://manubot.org/docs/getting-started.html) and also [this step-by-step guide](https://github.com/greenelab/covid19-review/blob/master/CONTRIBUTING.md) for the COVID-19 review by @rando2.
27 |
28 | See also this [additional information](https://github.com/greenelab/covid19-review/blob/master/INSTRUCTIONS.md) in the COVID review project.
29 |
30 | **Use Issues** to discuss organization, potential figures, and discussion of papers.
31 |
32 | **Focus Areas** of this tutorial include an entire overview of topics that are relevant to mass spectrometry based proteomics. The current list of sections includes:
33 |
34 | 1. Introduction
35 | 2. protein extraction
36 | 3. proteolysis
37 | 4. Peptide and protein labeling
38 | 5. enrichment of proteins or modifications
39 | 6. peptide purification
40 | 7. types of mass spectrometers used for proteomics
41 | 8. Peptide ionization
42 | 9. Data Acquisition (targeted and untargeted DDA and DIA)
43 | 10. Basics of data analysis
44 | 11. Biological Interpretation
45 | 12. Experiment design and considerations not discussed elsewhere
46 |
47 |
48 | ## Manubot
49 |
50 |
51 |
52 | Manubot is a system for writing scholarly manuscripts via GitHub.
53 | Manubot automates citations and references, versions manuscripts using git, and enables collaborative writing via GitHub.
54 | An [overview manuscript](https://greenelab.github.io/meta-review/ "Open collaborative writing with Manubot") presents the benefits of collaborative writing with Manubot and its unique features.
55 | The [rootstock repository](https://git.io/fhQH1) is a general purpose template for creating new Manubot instances, as detailed in [`SETUP.md`](SETUP.md).
56 | See [`USAGE.md`](USAGE.md) for documentation how to write a manuscript.
57 |
58 | Please open [an issue](https://git.io/fhQHM) for questions related to Manubot usage, bug reports, or general inquiries.
59 |
60 | ### Repository directories & files
61 |
62 | The directories are as follows:
63 |
64 | + [`content`](content) contains the manuscript source, which includes markdown files as well as inputs for citations and references.
65 | See [`USAGE.md`](USAGE.md) for more information.
66 | + [`output`](output) contains the outputs (generated files) from Manubot including the resulting manuscripts.
67 | You should not edit these files manually, because they will get overwritten.
68 | + [`webpage`](webpage) is a directory meant to be rendered as a static webpage for viewing the HTML manuscript.
69 | + [`build`](build) contains commands and tools for building the manuscript.
70 | + [`ci`](ci) contains files necessary for deployment via continuous integration.
71 |
72 | ### Local execution
73 |
74 | The easiest way to run Manubot is to use [continuous integration](#continuous-integration) to rebuild the manuscript when the content changes.
75 | If you want to build a Manubot manuscript locally, install the [conda](https://conda.io) environment as described in [`build`](build).
76 | Then, you can build the manuscript on POSIX systems by running the following commands from this root directory.
77 |
78 | ```sh
79 | # Activate the manubot conda environment (assumes conda version >= 4.4)
80 | conda activate manubot
81 |
82 | # Build the manuscript, saving outputs to the output directory
83 | bash build/build.sh
84 |
85 | # At this point, the HTML & PDF outputs will have been created. The remaining
86 | # commands are for serving the webpage to view the HTML manuscript locally.
87 | # This is required to view local images in the HTML output.
88 |
89 | # Configure the webpage directory
90 | manubot webpage
91 |
92 | # You can now open the manuscript webpage/index.html in a web browser.
93 | # Alternatively, open a local webserver at http://localhost:8000/ with the
94 | # following commands.
95 | cd webpage
96 | python -m http.server
97 | ```
98 |
99 | Sometimes it's helpful to monitor the content directory and automatically rebuild the manuscript when a change is detected.
100 | The following command, while running, will trigger both the `build.sh` script and `manubot webpage` command upon content changes:
101 |
102 | ```sh
103 | bash build/autobuild.sh
104 | ```
105 |
106 | ### Continuous Integration
107 |
108 | Whenever a pull request is opened, CI (continuous integration) will test whether the changes break the build process to generate a formatted manuscript.
109 | The build process aims to detect common errors, such as invalid citations.
110 | If your pull request build fails, see the CI logs for the cause of failure and revise your pull request accordingly.
111 |
112 | When a commit to the `main` branch occurs (for example, when a pull request is merged), CI builds the manuscript and writes the results to the [`gh-pages`](https://github.com/jessegmeyerlab/proteomics-tutorial/tree/gh-pages) and [`output`](https://github.com/jessegmeyerlab/proteomics-tutorial/tree/output) branches.
113 | The `gh-pages` branch uses [GitHub Pages](https://pages.github.com/) to host the following URLs:
114 |
115 | + **HTML manuscript** at https://jessegmeyerlab.github.io/proteomics-tutorial/
116 | + **PDF manuscript** at https://jessegmeyerlab.github.io/proteomics-tutorial/manuscript.pdf
117 |
118 | For continuous integration configuration details, see [`.github/workflows/manubot.yaml`](.github/workflows/manubot.yaml).
119 |
120 | ## License
121 |
122 |
126 |
127 | [](http://creativecommons.org/licenses/by/4.0/)
128 | [](https://creativecommons.org/publicdomain/zero/1.0/)
129 |
130 | Except when noted otherwise, the entirety of this repository is licensed under a CC BY 4.0 License ([`LICENSE.md`](LICENSE.md)), which allows reuse with attribution.
131 | Please attribute by linking to https://github.com/jessegmeyerlab/proteomics-tutorial.
132 |
133 | Since CC BY is not ideal for code and data, certain repository components are also released under the CC0 1.0 public domain dedication ([`LICENSE-CC0.md`](LICENSE-CC0.md)).
134 | All files matched by the following glob patterns are dual licensed under CC BY 4.0 and CC0 1.0:
135 |
136 | + `*.sh`
137 | + `*.py`
138 | + `*.yml` / `*.yaml`
139 | + `*.json`
140 | + `*.bib`
141 | + `*.tsv`
142 | + `.gitignore`
143 |
144 | All other files are only available under CC BY 4.0, including:
145 |
146 | + `*.md`
147 | + `*.html`
148 | + `*.pdf`
149 | + `*.docx`
150 |
151 | Please open [an issue](https://github.com/jessegmeyerlab/proteomics-tutorial/issues) for any question related to licensing.
152 |
--------------------------------------------------------------------------------
/content/09.peptide-purification.md:
--------------------------------------------------------------------------------
1 | ## 7. Peptide Purification and Fractionation {.page_break_before}
2 |
3 | ### Peptide purification methods
4 | Before peptide analysis, interferences from sample preparation must be removed.
5 | There are several approaches to purify peptides.
6 |
7 | #### Solid phase extraction (SPE)
8 | Solid phase extraction (SPE) is a common MS-based proteomics technique employed in the sample preparation.
9 | In this method, compound isolation is based on chemical and physical properties, which determines the distribution of compounds between a mobile phase (liquid) and a stationary phase (solid).
10 | After the molecules bind, washing of the bound compounds is performed and then molecules are made to elute from the stationary phase after replacing the mobile phase with the elution buffer.
11 | The material used for SPE is usually discarded after every sample and no gradient is applied for elution (single-step procedure of elution) [@PMID:14697044].
12 | Thus, using SPE only a specific analyte group gets separated, which depends on the stationary phase.
13 | Hence, SPE is primarily used for sample clean-up and for reducing complexity of the sample.
14 | For MS-based proteomic analysis, it is largely used to get rid of salts and other contaminants that might lead to ion suppression.
15 | The major drawback of this technique is that with SPE only a small fraction of the sample is examined because not all compounds are captured, but only those with binding capabilities same as that of the sorbent.
16 | The material for SPE is available in various types, including (micro-) columns, cartridges, plates, micropipette tips, and functionalized magnetic beads (MBs) [@PMID:20606758; @PMID:20099258].
17 | Reversed-phase is the most widely used material for SPE in proteomic studies for the proteins and peptide fractionation and rarely, ion-exchange material.
18 | For the separation of glycosylated proteins and peptides, the preferred material is normal phase such as HILIC [@PMID:22665312; @PMID:20536156].
19 | SPE materials which are less commonly used are silica- or polystyrene-based ones [@PMID:17625912; @PMID:15317408].
20 | The other types of SPE methods are IEX, metal chelation, and affinity-based [@PMID:25692071].
21 |
22 | The basic idea behind the choice of binding and wash versus elution solutions for SPE is that that the binding and wash solutions should favor the interaction between the analytes of interest and the solid phase, whereas the elution solution should favor the interaction of the analyte with the liquid phase (**Figure 5**).
23 | For example, with reversed phase SPE, the solid phase is C18 or some other hydrophobic chemistry.
24 | Binding of peptides to this solid phase is based on the hydrophobicity of peptides, mostly due to the presence of hydrophobic amino acid side chains; leucine is the most common amino acid in human proteins.
25 | To encourage peptides to ‘like’ the stationary phase more than the liquid phase, the peptides are loaded in aqueous solution.
26 | This will enable washing of the hydrophilic contaminants like salts, small polar buffer molecules, and polar denaturants like urea.
27 | After washing the bound peptides, they can be eluted by switching the liquid phase to something hydrophobic, which allows the peptides to partition more into the liquid phase and elute from the solid phase.
28 |
29 | {#fig:SPE tag="5" width="100%"}
39 |
40 | #### Specific Types of peptide purification
41 | There are many additional peptide purification methods that are commonly used in proteomics currently.
42 | These methods include the following:
43 |
44 | 1. StageTips, in-stagetip (iST) [@DOI:10.1021/ac026117i; @DOI:10.1038/nmeth.2834]
45 | 2. SP2 or SP3 [@PMID:32129943]
46 | 3. Suspension trapping (S-trap) [@DOI:10.1002/pmic.201300553]
47 |
48 | ### Peptide fractionation methods
49 | The number of peptides produced from proteolysis of the whole proteome is immense.
50 | Thus, after peptides are cleaned from interferences, they are often fractionated into subsets to enable increased proteome coverage.
51 | The characterization of the whole proteome is expected from higher order organisms, and with rising interest in post-translational modifications, an elaborate coverage of protein sequence is required.
52 | There are different methods for peptide fractionation as follows:
53 |
54 | #### Ion-exchange chromatography (IEC)
55 | This method involves the separation based on contrasting electric charge [@PMID:27868236].
56 | In this approach, the mechanism of analyte retention is based on the principle of electrostatic attraction between the sample and the stationary phase functional groups (FGs), having opposite charges.
57 | IEC is classified into two types: cation-exchange and anion-exchange chromatography.
58 | In cation-exchange chromatography, at an acidic pH, the negatively charged functional groups such as sulfates are attracted to positively charged peptides, whereas, in anion-exchange chromatography, positively charged FGs such as quaternary ammoniums are attracted to peptides with negative charge at an alkaline pH.
59 | These techniques are further classified into: strong (cation [SCX] and anion [SAX] exchange), and weak exchangers (cation [WCX] and anion [WAX] exchange), based on the type of FG attached [@PMID:35777803].
60 | These functional groups are most commonly supported in resins made up of silica and synthetic polymers, however, some inorganic materials are sometimes used [@PMID:27868236].
61 | In the IEC method, peptide elution is performed using a mobile phase with higher ionic strength, to ensure peptide partition into the liquid phase. SCX along with a salt gradient/plug is a routinely used proteomics technique.
62 | In the SCX method, peptides are resolved according to their net charge, in which the peptide with the lowest positive charge is eluted first.
63 | Increasing the salt concentration decreases the peptide retention time due to competition with the electrostatic interactions between the peptides and the solid phase.
64 | However, SCX resolution is limited compared to reversed phase chromatography and will thus limit the suitability of this technique for complex mixtures [@PMID:15672457].
65 |
66 | #### Reversed-phase chromatography (RPLC)
67 | Reversed-phase chromatography is the most commonly used chromatographic technique which separates molecules in solution having neutral pH based on their hydrophobicity.
68 | The separation occurs on the basis of the partition coefficient of analytes between the mobile phase and the hydrophobic stationary phase.
69 | Highly polar peptides elute before the ones having less polarity because of the strong interaction with the hydrophobic functional groups forming a layer similar to a liquid around the silica resin [@PMID:20973639].
70 | RPLC has been widely used in separation of peptides because of its compatibility with gradient elution and aqueous samples and its retention mechanism, which modulates separation owing to changes in the properties like pH, additives and organic modifier [@PMID:20031138].
71 | Numerous factors influence the capacity of chromatographic peaks, such as temperature, column length, stationary phase, particle size, mobile-phase ion-pairing reagent, mobile-phase modifier and gradient slope[@PMID:16224963].
72 | Usually online RPLC is done at acidic pH to ensure peptide ionization, but it can be paired with offline high pH RPLC and multiple fraction concatenation to produce orthogonal separation due to altered ionization of amino acids changing peptide hydrophobicity [@PMID:22462785].
73 |
74 | #### Hydrophilic interaction liquid chromatography (HILIC)
75 | Inverse-gradient chromatography was the forerunner to HILIC [@PMID:3569294]
76 | HILIC is similar in its principle to normal-phase chromatography where the stationary phase is polar and the intitial solvent conditions are nonpolar.
77 | Gradient elution in HILIC is accomplished by increasing the polarity of the mobile phase, by decreasing the concentration of organic solvent, i.e., in the “opposite” direction compared to RPLC separations.
78 | With charged HILIC stationary phases there is also a possibility of increasing the salt or buffer concentration during a gradient to disrupt electrostatic interactions with the solute [@PMID:18264818; @PMID:15459207].
79 | Thus, the peptides with less polarity elute before the more polar peptides.
80 | It is used for the separation of hydrophilic peptides and polar analytes [@PMID:21879300].
81 | This separation is achieved by a stationary phase that is hydrophilic in nature, for example: cyano-, diol-, amino- bonded phases [@PMID:18428181], and an organic and hydrophobic mobile phase [@PMID:18264818].
82 | HILIC can also be used for enrichment and targeted proteomic analysis of PTMs, such as glycosylation, N-acetylation and phosphorylation, which increase the polarity of peptides and therefore also their retention on HILIC [@PMID:20973639].
83 |
84 | #### Electrostatic repulsion-hydrophilic interaction chromatography (ERLIC)
85 | ERLIC is a method based on use of a weak anion exchange column operated at low pH with high organic solvent enabling isocratic elution [@DOI:10.1021/ac070997p].
86 | Acidic peptides are retained by electrostatic interaction, basic and neutral peptides are retained through hydrophilic interaction made favorable by high organic solvent.
87 | This improves retention of acidic peptides and reduces retention of basic peptides compared to normal HILIC [@DOI:10.1021/pr100037h].
88 |
89 | #### Isoelectric focusing (IEF)
90 | IEF is a type of high-resolution (HR) technique of electrophoresis used for the separation as well as concentration of peptides that are amphoteric in nature on the basis of their isoelectric point (pI) using a solution without buffer consisting of either carrier ampholytes or a gel with immobilized pH gradient (IPG).
91 | After IEF separation, the separated amphoteric peptides in the liquid phase are recovered for further analysis by RPLC-MS/MS [@PMID:16849286].
92 | Along with being a technique with improved resolution and capacity, for separation of peptides, IEF provides with additional information on physicochemical properties of the peptides, for example: peptide iso electric point (pI) which acts as a tool for validation and filtration for identifying MS/MS peptide sequence during the step of database search [@PMID:18851748].
93 | The IEF system is not only used for increasing the coverage of proteome but also in quantitative label-free [@PMID:17708596] and stable isobaric labeling experiments [@PMID:18851748].
94 | IEF and gel-based separations have fallen out of favor in the last decade due to improvements in liquid chromatography.
95 |
--------------------------------------------------------------------------------
/content/images/MS_f103.svg:
--------------------------------------------------------------------------------
1 |
--------------------------------------------------------------------------------
/content/10.liquid-chromatography.md:
--------------------------------------------------------------------------------
1 | ## 8. Liquid Chromatography (LC) {.page_break_before}
2 |
3 | Chromatography is the physical sorting of a mixture of molecular species that are dissolved in a mobile phase through the strength of binding, or affinity, to the chromatographic column’s stationary phase [@DOI:10.1016/j.aca.2009.03.041].
4 | The mobile phase is pressure driven through the column and molecular species, or analytes, that have a strong affinity to the stationary phase are retained, or slowed, while those with a weak affinity pass through quickly.
5 | Thusly the analytes are separated by order of elution from the column.
6 | Chromatography can exploit most physical properties of the analytes, including ionic charge (anion/cation exchange chromatography), hydrogen binding (hydrophilic interaction), and size (size exclusion chromatography, capillary electrophoresis).
7 | In some chromatographic separations the mobile phase composition is adjusted by mixing two or more buffers at different ratios to influence the strength of affinity of individual analytes to the stationary phase and exquisitely regulate retention.
8 |
9 | Mass spectrometers suffer from ion suppression, a phenomenon where the over-abundance of one or a few species within the ion population entering the mass spectrometer masks the presence of less abundant species [@DOI:10.1373/49.7.1041].
10 | Complex biological samples, such as tissue, cell lysate, or physiological fluids contain a wide dynamic range of molecule concentrations that span many orders of magnitude.
11 | The physical separation of analytes from biological samples by LC reduces the complexity of the ion population presented to the mass spectrometer at a given time, thus allowing the instrument to carry out the necessary fragmentation scans to identify and quantify the detectable species.
12 | Therefore, one major benefit of LC is that it allows detection of low abundant analytes in other elution windows.
13 |
14 | The field of proteomics predominantly separates peptides using reversed phase liquid chromatography [@DOI:10.1016/j.chroma.2012.06.098; @DOI:10.1016/j.aca.2010.02.001; @DOI:10.1016/j.chroma.2004.07.044].
15 | Reversed stationary phase is most commonly composed of microscopic (1-3 μm) silica beads coated with covalently bound long (e.g. C18) hydrophobic alkyl chains.
16 | The hydrophobic side chains of certain residues and the peptide backbone bind to this stationary phase through non-polar interactions.
17 | These interactions are strong in an aqueous solvent but are disrupted when the organic composition of the solvent is increased.
18 | Thus, in a reversed phase separation the proportion of non-polar, or organic, solvent in the mobile phase is gradually increased to release analytes from the stationary phase based on the strength of hydrophobic binding: weakly bound hydrophilic analytes elute with a low organic level in the mobile phase and strongly bound hydrophobic analytes only elute when the organic composition reaches a higher percentage.
19 | By far the most popular combination of solvents for peptide analysis is water and acetonitrile with dilute acid modifier (such as 0.1% formic acid or 0.5% acetic acid).
20 | The programmed rate at which the proportion of organic solvent is increased in the mobile phase is called the “gradient”, which you will often find described in the methods sections for reversed phase separations.
21 |
22 | ### LC Considerations Related to Electrospray Ionization (ESI)
23 |
24 | LC is paired to MS through ESI, and LC parameters greatly influence ESI.
25 | The analytes are eluted in a liquid mobile phase and must be released into the gas phase as charged ions for detection by mass spectrometry.
26 | This is achieved by spraying the eluent from the chromatographic separation through a narrow nozzle under a high voltage potential (1,000-4,000 volts) between the nozzle, or emitter, and the mass spectrometer inlet.
27 | The eluent is sprayed as a mist of small charged droplets that explode into smaller droplets as the solvent evaporates and the repelling columbic force of the charged analytes increases [@DOI:10.1080/14786448208628425].
28 | The droplets become progressively smaller until individual analyte molecules are ejected.
29 | The ejected analytes are ionized by the retained charge and can thus be manipulated by the electric fields in the mass spectrometer to measure their mass and perform the necessary fragmentations to elucidate structure.
30 |
31 | The chromatographic flowrate (the volume of mobile phase driven through the chromatographic column per unit time e.g. uL/min) dictates the efficiency of electrospray ionization (proportion of analytes eluting from the column that are ionized and into the gas phase) and is thus a key consideration for sensitivity of analysis [@DOI:10.1021/pr050424y].
32 | Reduced flowrates generate smaller droplets which degrade into ejected charged analytes rapidly, thus resulting in more detectable analytes and higher ionization efficiency.
33 | Electrospray ionization efficiency is also aided by an inert sheath gas, high temperature, and reduced pressure between the nozzle and ion lensing elements, thus decent sensitivity can still be achieved at high flowrates.
34 | For more detailed discussion of ionization, see the “Ionization” section.
35 |
36 | ### Quality Attributes of Chromatographic Separation
37 | The quality of chromatographic separation defines the number of analytes that are identified and quantified by LC-MS analysis.
38 | The theory around chromatographic separation was developed when LCs were paired with spectrophotometer detectors that only measure the combined signal intensity from all co-eluting analytes.
39 | The ability of MS to simultaneously detect the masses of individual components re-defines the significance of certain LC attributes.
40 | For those looking for mathematical descriptions of chromatographic quality, refer to the “Van Deemter equation”, which we do not cover here to maintain simplicity [@URL:https://www.sciencedirect.com/science/article/abs/pii/0009250956800031].
41 | The following attributes are the most important to consider in LC-MS.
42 |
43 | #### Chromatographic Resolution
44 | Chromatographic resolution is defined as the ability to fully resolve adjacent chromatographic peaks containing analytes with nearly equal affinities to the solid phase.
45 | In mass spectrometry analytes are distinguished by mass even if they are not resolved by LC.
46 | Thus in LC-MS, the more relevant, but closely related concept is the peak width at the half maximum (FWHM) of the peak.
47 | A low FWHM indicates a sharp elution peak. In a sharp peak the entirety of the analyte population is electrosprayed into the mass spectrometer in a short time thus increasing the signal.
48 | Low FWHM of high abundance species also confines their ionization suppression to narrow time windows, which means a lower number of co-eluting analytes are hidden.
49 | Conversely, high FWHM means that the analyte signal is spread out over time, thus reducing sensitivity.
50 | Furthermore, at a high FWHM, high abundance species mask analytes through ion suppression over a larger portion of the separation.
51 |
52 | #### Peak Capacity
53 | Peak capacity is defined as the maximal number of peaks that ideally can be completely resolved in a pre-established time window.
54 | A long separation in which FWHM remains low would have a large peak capacity and thus allow identification of many species.
55 | Unfortunately increasing the length of a reversed phase gradient also increases the FWHM due to an increase in diffusion, which results in a diminishing return for longer analytical methods.
56 | A longer separation provides more time and opportunities for the mass spectrometer to sample each analyte to acquire fragmentation spectra required for identification and the selection of gradient length should consider both the desired throughput and the speed of the MS data acquisition strategy.
57 |
58 | #### Reproducibility and Robustness
59 | Reproducibility is defined as the ability to repeatedly obtain the same measurement for the same analytes each time that the analysis is repeated.
60 | In liquid chromatography this means that each analyte should elute at nearly the same retention time (the time elapsed since the start of the analysis until the analyte’s elution from the chromatographic column) with the same peak width.
61 | Robustness is the ability of the system to maintain reproducible performance despite nonoptimal conditions.
62 | The most typical obstacles to robustness are mechanical wear of the system components and the analytical column, fouling of the system by contaminants introduced in the samples, and clogging due to accumulation of contaminants.
63 | High flow methods tend to be more robust due to reduced impact of pump and plumbing configurations and changes in dwell volumes, and the wider bore of the components used is more resilient to clogging.
64 | However, higher flowrate comes at the cost of reduced sensitivity due to reduced ionization efficiency at higher flow rates and increases in the overall peak volume at constant sample loading, thus nanoflow (100-300 nL/min flowrate) chromatography remains a widely utilized strategy in proteomics.
65 | For applications where sample is not limited, slightly higher amounts of applied samples can take advantage of robustness of higher flow rates in the microflow range using newer optimized electrospray sources [@PMID:37294184].
66 |
67 | #### Throughput and Instrument Utilization
68 | Throughput is the number of samples that are analyzed in a given timeframe, for example samples per day.
69 | High throughput is required to analyze thousands of samples that truly represent biological diversity in a timely manner.
70 | Increasing throughput means less data are collected for individual samples.
71 | Furthermore, many steps in the LC process are required for sample analysis in which no useful data is collected including sample injection, and system cleaning and equilibration, which reduce the ratio of data collected to instrument operation time, or instrument utilization.
72 | The ability to perform these steps while a different sample is analyzed, or parallelization, increases instrument utilization and the amount of data collected by several minutes which is a significant increase when several samples are analyzed per hour.
73 |
74 | ### Trapping and Pre-Columns
75 |
76 | Trapping and pre-columns are short chromatographic columns that are used to increase robustness of an LC-MS system.
77 | A pre-column is connected directly to the front of the analytical column and is intended to be disposable and to absorb contaminants and protect the analytical column.
78 | The trapping column is connected indirectly to the analytical column through a valve. The valve can be switched to redirect the flow through the trapping column away from the analytical column.
79 | This allows analytes to be loaded on the trapping column while analytes that are hydrophilic and poorly retained are washed away and do not contaminate the analytical column or the mass spectrometer.
80 | This process is referred to as desalting, and once it is complete, the valve configuration is changed to connect the trapping column to the analytical column, and analytes captured on the trapping column can be eluted off the trap and through the analytical column for analysis by MS.
81 | Certain trapping columns can be operated in both directions, which allows aggregates to be flushed away when the trapping column is cleaned in the reverse direction.
82 | Additionally trapping columns are shorter and have less backpressure so they can be loaded with sample quickly at a fast flowrate.
83 | Whereas loading the sample directly on the analytical column requires a slower flowrate.
84 | Two trapping columns can be used in tandem to provide parallelization, while one trapping column is cleaned and loaded with samples the second trapping column is in line with the analytical column analyzing the sample that was loaded on it in the previous run [@DOI:10.1021/acs.analchem.2c02609; @DOI:10.1021/acs.analchem.3c00213].
85 |
86 | ### Multi Dimensional LC
87 |
88 | Depth of profiling has previously been increased by combining two or more orthogonal LC separations.
89 | Orthogonal in this context means that each separation sorts the analytes into different populations [@DOI:10.1016/j.chroma.2005.09.080].
90 | For example, a separation based on positive charge (strong cation exchange, SCX) separates analytes based on positive charge, and when paired with reversed phase chromatography results a higher peak capacity and more analytes identified.
91 | The first highly popular method was multidimensional protein identification technology (MudPIT), which used online separation by SCX followed by C18 reversed phase [@DOI:10.1038/85686].
92 | However, the resolution of peptide separation by SCX is low, leading to the presence of peptides in many fractions.
93 | The currently accepted most popular method for two-dimensional separation combines iterative reversed phase at different high and then low pH to sort analytes by changes in hydrophobicity due to changes in amino acid side chain ionization.
94 | Although the separations are not entirely orthogonal, multiple fraction concatenation across the high pH elution can produce entirely orthogonal peptide sets [@DOI:10.1002/pmic.201000722].
95 | In recent years the focus of proteomics has shifted from deep profiling of fewer samples to rapid profiling of large cohorts.
96 | Thus, lengthy multidimensional methods have been replaced with single shot experiments only using one dimension of high resolution reversed phase separation [@DOI:10.1021/acs.jproteome.2c00023].
97 | However peak capacity is regained by using ion mobility spectrometry (separation of ionized peptides in the gas phase).
98 |
--------------------------------------------------------------------------------
/content/14.Data-Acquisition.md:
--------------------------------------------------------------------------------
1 | ## 12. Data Acquisition {.page_break_before}
2 |
3 | Hybrid mass spectrometers used for modern proteome analysis offer the flexibility to collect data in many different ways.
4 | Data acquisition strategies differ in the sequence of precursor scans and fragment ion scans, and in how analytes are chosen for MS/MS.
5 | Constant innovation to develop better data collection methods improves our view of the proteome, but many method options may confuse newcomers.
6 | This section provides an overview of the general classes of data collection methods.
7 |
8 | Data acquisition strategies for proteomics fall into one of two groups.
9 |
10 | 1. Data dependent acquisition (DDA), in which the exact scan sequence in each analysis depends on the data that the mass spectrometer observes.
11 | 2. Data independent acquisition (DIA), in which the exact scan sequence in each analysis DOES NOT depend on the data; the collected scans are the same whether you inject yeast peptides, human peptides, or a solvent blank.
12 |
13 | DDA and DIA can both be further subdivided in to targeted and untargeted methods.
14 |
15 | ### DDA
16 |
17 | In most cases, the peptide masses that will be observed are not known before doing the experiment.
18 | Data collection methods must account for this.
19 | DDA was invented in the early 1990s, which enabled collecting MS/MS spectra for observed peptides as they eluting from the LC column [@DOI:10.1006/meth.1994.1031; @DOI:10.1021/ac00104a020; @PMID:24203425].
20 |
21 | #### Untargeted DDA
22 | A common method currently used in modern proteomics is untargeted DDA.
23 | The MS collects precursor (MS1) scans iteratively until precursor mass envelopes meeting certain criteria are detected.
24 | Criteria for selection are usually specific charge states and a minimum signal intensity.
25 | When those ions meet these criteria, the MS selects those masses for fragmentation.
26 |
27 | Because ions are selected as they are observed, repeated DDA of the same sample will produce a different set of identifications.
28 | This stochasticity is the main drawback of DDA.
29 |
30 | Because DDA is required for quantification of proteins using isobaric tags like TMT, this stochasticity of DDA limits the ability to compare quantities across batches.
31 | For example, if you have 30 samples, you can use two sets of the 16-plex kit to label 15 samples in each set with one channel labeled by a pooled sample to enable comparison across the groups.
32 | When you collect DDA data from each of those sets, each set will have MS/MS data from an overlapping but different set of peptides.
33 | If one set has MS/MS from a peptide but the other set does not, then that peptide cannot be quantified in the whole sample group.
34 | This limits the number of quantified proteins in large TMT experiments with multiple batches.
35 |
36 |
37 | #### Targeted DDA
38 | Targeted DDA is not common in modern proteomics.
39 | In targeted DDA, in addition to general criteria like a minimum intensity and a certain charge state, the mass spectrometer looks for specific masses.
40 | These masses might be previously observed signals that were previously missed by MS/MS [@DOI:10.1021/pr800828p; @DOI:10.1074/mcp.M700029-MCP200].
41 | In these studies, the sample is first analyzed by LC-MS to detect precursor ion features with some software, and then subsequent analyses target those masses for fragmentation with inclusion lists until they are all fragmented.
42 | This was shown to increase proteome coverage.
43 |
44 | ### DIA
45 | The simplest method to operate a mass spectrometer is to have predefined scans that are collected for each sample analysis.
46 | This is data-independent acquisition (DIA); the scan sequence does not depend on the data that the instrument observes.
47 | Thus, the scan sequence is repetitive, looping through predetermined scans, most often which are *m/z* quadrupole selection ranges followed by fragmentation in a second quadrupole and fragment ion detection in a final MS stage.
48 | In DIA, the same scan sequence is performed if we inject air, a blank, peptides, ammonia, or anything.
49 | Like DDA, DIA can also be either targeted or untargeted [@DOI:10.1080/14789450.2017.1322904].
50 | The two targeted DIA methods are SRM/MRM or PRM.
51 | Untargeted DIA (uDIA) is often referred to simply as "DIA" or "SWATH" (Sequential Window Acquisition of All Theoretical Mass Spectra) (**Figure 15**).
52 |
53 | {#fig:DIA-types tag="15" width="100%"}
67 |
68 | #### Targeted DIA
69 | The first type of targeted DIA is called SRM or MRM [@DOI:10.1016/j.ymeth.2013.05.004].
70 | The popularity of this method in the literature peaked in 2014, with just under 1,500 documents on PubMed that year resulting from a search for "MRM".
71 | In this strategy, the QQQ MS is set so that the first quadrupole selects the precursor mass of the peptide(s) of interest, the second quadrupole fragments the peptide, and the third quadrupole monitors the product of specific fragments from that peptide.
72 | This strategy is very sensitive and has the benefit of very low noise.
73 | The fragments monitored in Q3 are chosen such that it is unlikely these fragments could arise from another peptide.
74 | Usually at least a few transitions are monitored for each peptide in order to get multiple measures for that peptide.
75 |
76 | An early example of MRM applied to quantify c-reactive protein was in 2004 [@DOI:10.1002/pmic.200300670].
77 | Around the same time, SRM was combined with antibody enrichment of peptides from target proteins [@DOI:10.1021/pr034086h].
78 | This approach was popular for analysis of plasma proteins [@DOI:10.1074/mcp.M500331-MCP200].
79 | These early examples led to many more studies that used QQQ MS instruments to get accurate quantitation of many proteins in one injection [@DOI:10.1016/j.aca.2017.01.059; @DOI:10.1586/erm.12.32].
80 | Scheduling MRM measurement when chromatography is stable additionally enabled better utilization of instrument duty cycle and therefore monitoring of more peptides per injection [@DOI:10.1074/mcp.M700132-MCP200].
81 | Efforts even developed libraries of transitions that allow quantification of any protein in model organisms [@DOI:10.1038/nmeth1108-913].
82 |
83 | Another similar targeted DIA method is called parallel reaction monitoring (PRM) [@DOI:10.1074/mcp.O112.020131].
84 | Instead of using a QQQ instrument, PRM uses a hybrid MS with a quadrupole and a high-resolution mass analyzer, such as an Q-TOF or Q-Exactive.
85 | The idea is that instead of monitoring specific fragments in Q3, the high mass accuracy can be used to filter peptide fragments for high selectivity and accurate quantification.
86 | Studies have found that PRM and MRM/SRM have comparable dynamic range and linearity [@DOI:10.1016/j.jprot.2014.10.017].
87 |
88 |
89 | #### Untargeted DIA
90 |
91 | There were many implementations of uDIA over the years, starting in 2003 by Purvine et al from the Goodlett lab [@DOI:10.1002/pmic.200300362].
92 | In this first work they demonstrated uDIA using a Q-TOF with in source fragmentation and showed that extracted ion chromatograms of precursor and fragment ions matched in shape suggesting that this could be used to identify and quantify peptides.
93 | The following year, Venable et al from the Yates lab introduced uDIA with an ion trap [@DOI:10.1038/nmeth705].
94 | Subsequent methods include MSE [@DOI:10.1002/rcm.2550], PAcIFIC [@DOI:10.1021/ac900888s], all ions fragmentation (AIF) [@DOI:10.1074/mcp.M110.001537].
95 | Computational methods were also developed to automate interpretation of this data, such as DeMux [@DOI:10.1074/mcp.M110.001537], XDIA [@DOI:10.1093/bioinformatics/btq031], and ETISEQ [@DOI:10.1186/1471-2105-10-244].
96 |
97 | The paper that is often cited for uDIA that led to widespread adoption was by Gillet et al. from the Aebersold group in 2012 [@DOI:10.1074/mcp.O111.016717].
98 | In this paper they branded the idea as SWATH.
99 | Widespread adoption may have been facilitated by the co-marketing of this idea by ABSciex as a proteomics solution on their new 5600 Q-TOF (called "tripleTOF" despite containing only one TOF, likely a portmanteau of "triple quadrupole" and "Q-TOF").
100 | Importantly, in the Gillet et al. paper the authors described a computational method to extract information from SWATH where peptides of interest were queried against the data.
101 | They also demonstrated the application of SWATH to measure proteomic changes that happen in diauxic shift, and showed that SWATH can reveal modified peptides, in this case a methionine oxidation.
102 |
103 | There are also many papers describing uDIA with orbitraps.
104 | One early example described combining random isolation windows together and then demultiplexing the chimeric spectra [@DOI:10.1038/nmeth.2528].
105 | In another landmark paper, over 6,000 proteins were identified from mouse tissue by at least 2 peptides [@DOI:10.1074/mcp.RA117.000314].
106 | In 2018, the new model orbitrap at that time (HF-X) enabled identification of nearly 6,000 human proteins in only 30 minutes.
107 | Currently orbitraps have all but replaced the Sciex Q-TOFs for DIA data collection.
108 |
109 | A new direction in uDIA is the addition of ion separation by ion mobility.
110 | This has appeared in two forms.
111 | On the timsTOF, diaPASEF makes use of the trapped ion mobility to increase speed and sensitivity of analysis [@DOI:10.1038/s41592-020-00998-0].
112 | On the orbitrap, the combination of FAIMS and DIA has enabled the identification of over 10,000 proteins from one sample, which is a major milestone [@DOI:10.1021/acs.jproteome.2c00023].
113 |
114 | #### Acquisition methods for PTMs
115 | ##### Phosphopeptides
116 | Resonant CID [@DOI:10.1074/mcp.m111.009910] and beam-type HCD [@DOI:10.1021/pr100637q] are the most popular methods for unmodified and modified peptides due to their speed, accessibility, and efficiency.
117 | Due to the weak phosphoester bond relative to the peptide backbone, resonant CID usually produces spectra that are dominated by only the neutral loss of the phosphate.
118 | For this reason, the optimal dissociation methods for phosphopeptide identification and phosphosite localization include HCD or ExD-based methods, discussed later in more depth [@DOI:10.1021/acs.analchem.7b00212; @DOI:10.1021/pr301130k].
119 | ExD methods generate phosphopeptide MS/MS spectra with many c- and z•-type fragment ions for peptide sequencing and localization of labile phosphate modifications, typically disrupted with CID [@DOI:10.1073/pnas.0402700101].
120 | Gas-phase phosphate rearrangement induced by collisional activation represents a glaring challenge for the field and several have explored site localization in the face of rearrangement [@DOI:10.1021/ac801768s; @DOI:10.1002/pmic.201200384; @DOI:10.1074/mcp.m900619-mcp200].
121 |
122 | Advanced data acquisition schemes trigger predetermined MS/MS events when a specific fragment ion or neutral loss is detected in a spectrum.
123 | Certain decision-tree strategies have arisen to increase data acquisition efficiency, including pseudo-MS3 scans which are triggered on detection of phosphate losses [@DOI:10.1021/ac0497104] and the use of site-specific x-type ions [@DOI:10.1021/pr200154t].
124 | For example, when linear ion traps were the main proteomics workhorses, resonant CID analysis of phosphopeptides would result in predominantly neutral loss of the phosphate with limited sequence ion information.
125 | To gain sequence ions in these experiments, instruments could be set to isolate a loss of 98 Daltons for MS3 activation [@DOI:10.1073/pnas.0404720101; @DOI:10.1002/pmic.200800283].
126 | The newer collisional dissociation technique HCD, or beam-type collisional activation, significantly improves the detection of peptide fragments with the phosphorylation intact on fragment ions, and thus, this neutral loss scanning technique is no longer common.
127 |
128 | Recently developed approaches to phosphopeptide identification include DIA-based phosphoproteomics with Spectronaut [@DOI:10.1126/scisignal.aaa3139; @DOI:10.1007/978-1-0716-1641-3_6], “plug-and-play” high-resolution MS [@DOI:10.1038/nmeth.3811], SureQuant for phosphotyrosine [@DOI:10.1158/0008-5472.can-20-3804], PIQED for direct identification and quantification of phosphorylation from DIA without a prior spectral library [@DOI:10.1038/nmeth.4334], and FAIMS front-end separations which yield 15-20% more phosphosite identifications than non-FAIMS experiments [@DOI:10.1021/acs.analchem.8b02233].
129 | For quantification of phosphoproteins, Hogrebe et al. investigated several of the most common strategies and concluded that TMT-based MS2 strategies may be the current best approach [@DOI:10.1038/s41467-018-03309-6].
130 |
131 | ##### Glycopeptides
132 | A similar product-dependent MS/MS triggering strategy was introduced for N-linked glycopeptides [@DOI:10.1021/pr300257c].
133 | Collisional dissociation of glycosylated peptides produces oxonium ions, for example at *m/z* 204.09 (HexNAc) or *m/z* 366.14 (HexHexNAc).
134 | If oxonium ions from the fragmented glycan are detected among the most abundant fragment ions of the HCD spectra, then an ETD scan is triggered.
135 | This ETD scan provides information about the peptide sequence, while the original HCD scan provides glycan structure information.
136 |
--------------------------------------------------------------------------------
/SETUP.md:
--------------------------------------------------------------------------------
1 | # Table of contents
2 |
3 | - [Creating a new manuscript](#creating-a-new-manuscript)
4 | * [Using setup script](#using-setup-script)
5 | * [Manual configuration](#manual-configuration)
6 | * [Create repository](#create-repository)
7 | * [Continuous integration](#continuous-integration)
8 | + [GitHub Actions](#github-actions)
9 | + [SSH Deploy Key](#ssh-deploy-key)
10 | - [Add the public key to GitHub](#add-the-public-key-to-github)
11 | - [Add the private key to GitHub](#add-the-private-key-to-github)
12 | + [Previewing pull request builds with AppVeyor](#previewing-pull-request-builds-with-appveyor)
13 | * [README updates](#readme-updates)
14 | * [Finalize](#finalize)
15 | - [Merging upstream rootstock changes](#merging-upstream-rootstock-changes)
16 | * [Default branch](#default-branch)
17 |
18 | _generated with [markdown-toc](https://ecotrust-canada.github.io/markdown-toc/)_
19 |
20 | # Creating a new manuscript
21 |
22 | These instructions detail how to create a new manuscript based off of the [`manubot/rootstock`](https://github.com/manubot/rootstock/) repository.
23 | The process can be a bit challenging, because it requires a few steps that are difficult to automate.
24 | However, you will only have to perform these steps once for each manuscript.
25 |
26 | These steps should be performed in a command-line shell (terminal), starting in the directory where you want the manuscript folder be created.
27 | Setup is supported on Linux, macOS, and Windows.
28 | Windows setup requires [Git Bash](https://gitforwindows.org/) or [Windows Subsystem for Linux](https://docs.microsoft.com/en-us/windows/wsl/faq).
29 |
30 | ## Using setup script
31 | Creating a new manuscript using GitHub actions, the recommended default CI service (see below), can be achieved easily using the [setup script](https://github.com/manubot/rootstock/blob/main/setup.bash).
32 | This simply runs the steps detailed below in the manual configuration.
33 |
34 | Use the command below to copy `setup.bash` and run it.
35 | You can check the code that will be executed [here](https://github.com/manubot/rootstock/blob/main/setup.bash).
36 |
37 | ````sh
38 | bash <( curl --location https://github.com/manubot/rootstock/raw/main/setup.bash )
39 | ````
40 | The script will then take you through the process of cloning the rootstock repo, make the changes required to use GitHub actions, edit the README to point to your repo and commit the changes.
41 | Your new manuscript repo is then ready for you to start adding your own content.
42 |
43 | This script does not create the remote repo for you, so you will be prompted to manually create an empty GitHub repository at .
44 | Do not initialize the repository, other than optionally adding a description.
45 |
46 | ### CLI
47 | There is also a command line interface for users who want to create manuscripts at scale and in an automated way.
48 | See the help for details.
49 |
50 | ````sh
51 | bash setup.bash --help
52 | ````
53 |
54 | ## Manual configuration
55 |
56 | If you do not wish to use the above setup script to configure your new manuscript repository, you can instead execute the steps manually.
57 | First, you must configure two environment variables (`OWNER` and `REPO`).
58 | These variables specify the GitHub repository for the manuscript (i.e. `https://github.com/OWNER/REPO`).
59 | Make sure that the case of `OWNER` matches how your username is displayed on GitHub.
60 | In general, assume that all commands in this setup are case-sensitive.
61 | **Edit the following commands with your manuscript's information:**
62 |
63 | ```sh
64 | # GitHub username or organization name (change from manubot)
65 | OWNER=manubot
66 | # Repository name (change from rootstock)
67 | REPO=rootstock
68 | ```
69 |
70 | ## Create repository
71 |
72 | **Execute the remaining commands verbatim.**
73 | They do not need to be edited (if the setup works as intended).
74 |
75 | Next you must clone `manubot/rootstock` and reconfigure the remote repositories:
76 |
77 | ```sh
78 | # Clone manubot/rootstock
79 | git clone --single-branch https://github.com/manubot/rootstock.git $REPO
80 | cd $REPO
81 |
82 | # Configure remotes
83 | git remote add rootstock https://github.com/manubot/rootstock.git
84 |
85 | # Option A: Set origin URL using its web address
86 | git remote set-url origin https://github.com/$OWNER/$REPO.git
87 | # Option B: If GitHub SSH key access is enabled for OWNER, run the following command instead
88 | git remote set-url origin git@github.com:$OWNER/$REPO.git
89 | ```
90 |
91 | Then create an empty repository on GitHub.
92 | You can do this at or via the [GitHub command line interface](https://github.com/cli/cli) (if installed) with `gh repo create`.
93 | Make sure to use the same "Owner" and "Repository name" specified above.
94 | Do not initialize the repository, other than optionally adding a Description.
95 | Next, push your cloned manuscript:
96 |
97 | ```sh
98 | git push --set-upstream origin main
99 | ```
100 |
101 | ## Continuous integration
102 |
103 | Manubot integrates with cloud services to perform continuous integration (CI).
104 | For Manubot that means automatically building and deploying your manuscript.
105 | Manubot supports the following CI services:
106 |
107 | | Service | Default | Artifacts | Deployment | Config | Private Repos |
108 | |---------|---------|-----------|---------|--------|---------------|
109 | | [GitHub Actions](https://github.com/features/actions) | ✔️ | ✔️ | ✔️ | [`manubot.yaml`](.github/workflows/manubot.yaml) | 2,000 minutes per month |
110 | | [AppVeyor](https://www.appveyor.com/) | ❌ | ✔️ with PR comments | ❌ | [`.appveyor.yml`](.appveyor.yml) | 14 day trial |
111 |
112 | Notes on table fields:
113 |
114 | - **Default**: Whether the following uncollapsed setup instructions enable the service by default.
115 | - **Artifacts**: Manuscript outputs that are saved alongside the CI build logs.
116 | This is especially helpful for previewing changes that are under development in a pull request.
117 | Both GitHub Actions and AppVeyor upload the rendered manuscript as an artifact for pull request builds.
118 | However, only AppVeyor comments on pull requests with a download link to the artifacts ([example](https://github.com/manubot/rootstock/pull/262#issuecomment-519944731)).
119 | - **Deployment**: Whether the CI service can write outputs back to the GitHub repository (to the `output` and `gh-pages` branches).
120 | Deployment provides GitHub Pages with the latest manuscript version to serve to the manuscript's URL.
121 | GitHub Actions will deploy by default without any additional setup.
122 | - **Config**: File configuring what operations CI will perform.
123 | Removing this file is one method to disable the CI service.
124 | - **Private Repos**: Quota for private repos.
125 | Only GitHub Actions supports cost-free builds of private repositories beyond a trial period.
126 | All services are cost-free for public repos.
127 |
128 | Manubot was originally designed to use Travis CI,
129 | but later switched to primarily use GitHub Actions.
130 | Support for Travis was [removed](https://github.com/manubot/rootstock/issues/446) in 2021.
131 |
132 | ### GitHub Actions
133 |
134 | GitHub Actions is the recommended default CI service because it requires no additional setup.
135 | To use GitHub Actions only, remove configuration files for other CI services:
136 |
137 | ```shell
138 | # remove AppVeyor config
139 | git rm .appveyor.yml
140 | # remove ci/install.sh (only used by AppVeyor)
141 | git rm ci/install.sh
142 | ```
143 |
144 | GitHub Actions is _usually_ able to deploy without any additional setup using the [`GITHUB_TOKEN`](https://help.github.com/en/actions/configuring-and-managing-workflows/authenticating-with-the-github_token) for authentication.
145 | GitHub Pages deployment using `GITHUB_TOKEN` recently started working on GitHub without an official announcement.
146 | If it does not work for you after completing this setup, try reselecting "gh-pages branch" as the Source for GitHub Pages in the repository Settings.
147 | GitHub Pages should now trigger on the next commit.
148 | If not, [let us know](https://github.com/manubot/rootstock/issues/new).
149 |
150 | For an alternative deployment method on GitHub,
151 | you can use an SSH Deploy Key instead.
152 | However, the setup is more complex.
153 | The following sections, collapsed by default, detail how to generate an SSH Deploy Key.
154 |
155 |
156 | Expand for SSH Deploy Key setup
157 |
158 | ### SSH Deploy Key
159 |
160 | Previously, GitHub Actions required an SSH Deploy Key,
161 | but now GitHub can deploy using the `GITHUB_TOKEN` secret.
162 | Therefore, users following the default configuration can skip these steps.
163 | Otherwise, generate a deploy key so CI can write to the repository.
164 |
165 | ```sh
166 | # Generate deploy.key.pub (public) and deploy.key (private)
167 | ssh-keygen \
168 | -t rsa -b 4096 -N "" \
169 | -C "deploy@manubot.org" \
170 | -f ci/deploy.key
171 |
172 | # Encode deploy.key to remove newlines, writing encoded text to deploy.key.txt.
173 | # This was required for entry into the Travis settings.
174 | openssl base64 -A -in ci/deploy.key > ci/deploy.key.txt
175 | ```
176 |
177 | #### Add the public key to GitHub
178 |
179 | ```sh
180 | # Print the URL for adding the public key to GitHub
181 | echo "https://github.com/$OWNER/$REPO/settings/keys/new"
182 |
183 | # Print the public key for copy-pasting to GitHub
184 | cat ci/deploy.key.pub
185 | ```
186 |
187 | Go to the GitHub settings URL echoed above in a browser, and click "Add deploy key".
188 | For "Title", add a description like "Manubot Deploy Key".
189 | Copy-paste the contents of the `ci/deploy.key.pub` text file (printed above by `cat`) into the "Key" text box.
190 | Check the "Allow write access" box below.
191 | Finally, click "Add key".
192 |
193 | #### Add the private key to GitHub
194 |
195 | If you would like GitHub Actions to use SSH for deployment, rather than via HTTPS using `GITHUB_TOKEN`, perform the steps in this section.
196 |
197 | ```sh
198 | # Print the URL for adding the private key to GitHub
199 | echo "https://github.com/$OWNER/$REPO/settings/secrets"
200 |
201 | # Print the encoded private key for copy-pasting to GitHub
202 | cat ci/deploy.key.txt && echo
203 | ```
204 |
205 | Next, go to the GitHub repository settings page (URL echoed above).
206 | Click "Add a new secret".
207 | For "Name", enter `MANUBOT_SSH_PRIVATE_KEY`.
208 | Next, copy-paste the content of `ci/deploy.key.txt` into "Value"
209 | (printed above by `cat`, including any trailing `=` characters if present).
210 |
211 |
212 |
213 |
214 | Expand for AppVeyor setup
215 |
216 | ### Previewing pull request builds with AppVeyor
217 |
218 | You can optionally enable AppVeyor continuous integration to view pull request builds.
219 | AppVeyor supports storing manuscripts generated during pull request builds as artifacts.
220 | These can be previewed to facilitate pull request review and ensure formatting and reference changes render as expected.
221 | When a pull request build runs successfully, **@AppVeyorBot** will comment on the pull request with a download link to the manuscript PDF.
222 |
223 | To enable AppVeyor, follow steps 1 and 2 of the [AppVeyor welcome](https://www.appveyor.com/docs/) to sign in to AppVeyor and add your manuscript repository as an AppVeyor project.
224 | The repository already contains an `.appveyor.yml` build configuration file, so no other setup is required.
225 | AppVeyor only runs when it detects changes that are likely to affect the manuscript.
226 |
227 |
228 | ## README updates
229 |
230 | The continuous integration configuration should now be complete.
231 | Now update `README.md` files to reference your new repository:
232 |
233 | ```shell
234 | # Perform substitutions
235 | sed "s/manubot\/rootstock/$OWNER\/$REPO/g" README.md > tmp && mv -f tmp README.md
236 | sed "s/manubot\.github\.io\/rootstock/$OWNER\.github\.io\/$REPO/g" README.md > tmp && mv -f tmp README.md
237 | ```
238 |
239 | ## Finalize
240 |
241 | The `content/02.delete-me.md` file details the Markdown syntax and formatting options available with Manubot.
242 | Remove it to reduce the content to a blank manuscript:
243 |
244 | ```shell
245 | # Remove deletable content file
246 | git rm content/02.delete-me.md
247 | ```
248 |
249 | Run `git status` or `git diff --color-words` to double check the changes thus far.
250 | If the changes look okay, commit and push:
251 |
252 | ```shell
253 | git add --update
254 | git commit --message "Brand repo to $OWNER/$REPO"
255 | git push origin main
256 | ```
257 |
258 | You should be good to go now.
259 | A good first step is to modify [`content/metadata.yaml`](content/metadata.yaml) with the relevant information for your manuscript.
260 |
261 | # Merging upstream rootstock changes
262 |
263 | This section will describe how to incorporate changes to rootstock that occurred since initializing your manuscript.
264 | You will want to do this if there are new enhancements or bugfixes that you want to incorporate.
265 | This process can be difficult, especially if conflicts have arisen, and is recommended only for advanced git users.
266 |
267 | It is recommended to do rootstock upgrades via a pull request to help you view the proposed changes and to ensure the build uses the updated environment.
268 | First, checkout a new branch to use as the pull request head branch:
269 |
270 | ```shell
271 | # checkout a new branch, named using the current date, i.e. rootstock-2018-11-16
272 | git checkout -b rootstock-$(date '+%Y-%m-%d')
273 | ```
274 |
275 | Second, pull the new commits from rootstock, but do not automerge:
276 |
277 | ```shell
278 | # if rootstock remote is not set, add it
279 | git config remote.rootstock.url || git remote add rootstock https://github.com/manubot/rootstock.git
280 |
281 | # pull the new commits from rootstock
282 | git pull --no-ff --no-rebase --no-commit rootstock main
283 | ```
284 |
285 | If all goes well, there won't be any conflicts.
286 | However, if there are conflicts, follow the suggested commands to resolve them.
287 |
288 | You can add the changes incrementally using `git add --patch`.
289 | This is helpful to see each upstream change.
290 | You may notice changes that affect how items in `content` are processed.
291 | If so, you should edit and stage `content` files as needed.
292 | When there are no longer any unstaged changes, then do `git commit`.
293 |
294 | If updating your default branch (i.e. `main` or `master`) via a pull request, proceed to push the commit to GitHub and open a pull request.
295 | Once the pull request is ready to merge, use GitHub's "Create a merge commit" option rather than "Squash and merge" or "Rebase and merge" to preserve the rootstock commit hashes.
296 |
297 | The environment for local builds does not automatically update when [`build/environment.yml`](build/environment.yml) changes.
298 | To update your local conda `manubot` environment with new changes, run:
299 |
300 | ```shell
301 | # update a local conda environment
302 | conda env update --file build/environment.yml
303 | ```
304 |
305 | ## Default branch
306 |
307 | On 2020-10-01, GitHub [changed](https://github.blog/changelog/2020-10-01-the-default-branch-for-newly-created-repositories-is-now-main/) the default branch name for new repositories from `master` to `main`.
308 | More information on GitHub's migration is available at [github/renaming](https://github.com/github/renaming).
309 |
310 | On 2020-12-10, Manubot [updated](https://github.com/manubot/rootstock/pull/399) the Rootstock default branch to `main`.
311 | For existing manuscripts, the default branch will remain `master`,
312 | unless manually switched to `main`.
313 | Rootstock has been configured to run continuous integration on both `main` and `master`,
314 | so existing manuscripts can, but are not required, to switch their default branch to `main`.
315 |
316 | Upgrading to the latest Rootstock will change several READMEs links to `main`.
317 | For manuscripts that do not plan to switch their default branch,
318 | do not include these changes in the upgrade merge commit.
319 |
--------------------------------------------------------------------------------
/content/06.proteolysis.md:
--------------------------------------------------------------------------------
1 | ## 4. Proteolysis {.page_break_before}
2 | Proteolysis is the defining step that differentiates bottom-up or shotgun proteomics from top-down proteomics.
3 | Hydrolysis of proteins is extremely important because it defines the population of potentially identifiable peptides.
4 | Generally, peptides between a length of 7-35 amino acids are considered useful for mass spectrometry analysis.
5 | Peptides that are too long are difficult to identify by tandem mass spectrometry or may be lost during sample preparation due to irreversible binding with solid-phase extraction sorbents.
6 | Peptides that are too short are also not useful because they may match to many proteins during protein inference.
7 | There are many choices of enzymes and chemicals that hydrolyze proteins into peptides.
8 | This section summarizes potential choices and their strengths and weaknesses.
9 |
10 | Before we get into details of various choices for proteolysis, we must discuss terminology.
11 | While it's true that "digestion" is commonly used in proteomics, it's important to note that "hydrolysis" is a more specific word choice to describe the chemical process because it refers to breaking peptide bonds within proteins using water.
12 | Although hydrolysis may be associated with the complete chemical hydrolysis of proteins into amino acids, for example using high temperature and acid, hydrolysis reactions catalyzed by enzymes such as pepsin and trypsin are specific for certain amino acid residues.
13 | In fact, all methods of protein cleavage to shorter peptides require a water molecule for their mechanism of action.
14 | In contrast, the definition of “digestion” relates to food breakdown into subunits usable by the body or any chemical process that breaks down substances.
15 | Therefore, while "digestion" is indeed a widely used term for the conversion of the proteome to peptides, "hydrolysis" more accurately describes the specific biochemical process that occurs.
16 | We believe that this terminology choice enhances clarity and precision in scientific communication within the field of proteomics.
17 |
18 | Trypsin is the most common choice of protease for proteome hydrolysis [@DOI:10.1002/mas.21376].
19 | Trypsin is favorable because of its specificity, availability, efficiency and low cost.
20 | Trypsin is a sufficient choice for most proteomics experiments.
21 | Trypsin cleaves at the C-terminus of basic amino acids, Arg and Lys, if not immediately followed by proline (although there is debate whether a small number of R/K-P sites are actually cleaved).
22 | Many of the peptides generated from trypsin are appropriate in length and hydrophobicity for chromatographic separation, MS-based peptide fragmentation and identification by database search.
23 | The main drawback of trypsin is that majority (56%) of the tryptic peptides are ≤ 6 amino acids, and hence using trypsin alone limits the observable proteome [@PMID:20113005; @PMID:25823410; @PMID:30687733].
24 | This limits the number of identifiable protein isoforms and post-translational modifications.
25 |
26 | Although trypsin is the most common protease used for proteomics, in theory it can only cover a fraction of the proteome predicted from the genome [@DOI:10.1155/2014/960902].
27 | This is due to production of peptides that are too short to be unique, for example due to R and K immediately next to each other.
28 | Peptides below a certain length are likely to occur many times in the whole proteome, meaning that even if we identify them we cannot know their protein of origin.
29 | In protein regions devoid of R/K, trypsin may also result in very long peptides that are then lost due to irreversible binding to the solid phase extraction device, or that become difficult to identify due to complicated fragmentation patterns.
30 | Thus, parts of the true proteome sequences that are present are lost after trypsin digestion due to both production of very long and very short peptides.
31 |
32 | Many alternative proteases are available with different specificities that complement trypsin to reveal different protein sequences [@PMID:12643544; @PMID:20113005], which can help distinguish protein isoforms [@PMID:27123950] (**Figure 2, Table 3**).
33 | The enzyme choice mostly depends on the application.
34 | In general, for a mere protein identification, trypsin is often chosen due to the aforementioned reasons.
35 | However, alternative enzymes can facilitate _de novo_ assembly when the genomic data information is limited in the public database repositories [@pmid:31615963; @pmid:30622160; @pmid:29990557; @doi:10.1016/j.actatropica.2022.106324; @DOI:10.1021/pr400173d].
36 | Use of multiple proteases for proteome digestion also can improve the sensitivity and accuracy of protein quantification [@PMID:30336047].
37 | Moreover, by providing an increased peptide diversity, the use of multiple proteases can expand sequence coverage and increase the probability of finding peptides which are unique to single proteins [@DOI:10.1021/acs.jproteome.9b00330; @DOI:10.1074/mcp.M113.034710; @DOI:10.1155/2014/960902].
38 | A multi-protease approach can also improve the identification of N-Termini and signal peptides for small proteins [@DOI:10.1021/acs.jproteome.1c00115].
39 | Overall, integrating multiple-protease data can increase the number of proteins identified [@DOI:10.3390/ijms20225630; @DOI:10.1074/mcp.M113.035170], increase the identified post-translational modifications [@DOI:10.1021/acs.jproteome.9b00330; @DOI:10.1016/j.celrep.2015.05.029; @DOI:10.1074/mcp.M113.034710] and decrease the ambiguity of the inferred protein groups [@DOI:10.1021/acs.jproteome.9b00330].
40 |
41 | There are, however, many challenges associated with using alternative proteases.
42 | Since peptides are not cleaved after a positively charged residue (like the R/K targeted by trypsin), they may only obtain one precursor charge and be ineffectively fragmented.
43 | The lack of a c-terminal positive charge will lead to less consistent y-ion series.
44 | Other peptides may obtain too many charges and produce highly charged fragments that are not scored well by search engines.
45 | Another common issue with alternative proteases is the potential for producing "shredded" peptides where multiple peptides differ only by a few residues at either end, thus decreasing the quantity of each species and limiting sensitivity.
46 | This problem is worse with proteases that target uncharged residues, because ionic interactions are much stronger than dispersion forces used for binding aliphatic residues.
47 |
48 | {#fig:Multiple-protease-proteolysis tag="2" width="100%"}
53 |
54 | Table 3: Common proteases used for proteomics.
55 |
56 | | Protease | source | class | specificity | optimal pH | notes |
57 | |:-----------:|:--------:|:--------------------------:|:-----------:|:-----------:|:-----------:|
58 | | Trypsin | mammal pancreas | serine protease | c-term of R/K, not before P | 7-9 | most common protease |
59 | | LysC | _Lysobacter enzymogenesis_ | serine protease | c-term of K | 7-9 | high stability |
60 | | Alpha-lytic protease | _Lysobacter enzymogenesis_ | serine protease | c-term of small side chains | 7-9 | high stability |
61 | | GluC | _Staphyloccous aureus_ | serine protease | c-term of D/E | 4-8 | specificity for Glu depends on buffer |
62 | | Asp-N | _Pseudomonas fragi_ | metalloprotease | n-term of D | 4-9 | avoid chelators |
63 | | chymotrypsin | mammal pancreas | serine protease | c-term of larger hydroponics | 7-9 | |
64 | | Arg-C | _Clostridium histolyticum_ | cysteine protease | c-term or R | 7.2-7.8 | avoid oxidation |
65 | | Ulilysin | _Methanosarcina acetivorans_ | metalloprotease | N-term of R/K | 6-9 | stable to 55℃ |
66 | | Lys-N | _Grifola frondosa_ | metalloprotease | N-term or K | 7-9 | stable to 70℃ |
67 | | Pepsin A | mammal pancreas | aspartic acid protease | broad including W, F, Y, L | 1-4 | common for HDX |
68 | | Proteinase K | _Tritirachium album_ | serine protease | broadest | 4-12 | common for limited proteolysis |
69 |
70 | Lysyl endopeptidase (Lys-C) obtained from _Lysobacter enzymogenesis_ is a serine protease involved in cleaving carboxyl terminus of Lys [@PMID:6359954; @PMID:25823410].
71 | Like trypsin, the optimum pH range required for its activity is from 7 to 9.
72 | A major advantage of Lys-C is its resistance to denaturing agents, including 8 M urea - a chaotrope commonly used to denature proteins _prior_ to digestion [@PMID:27123950].
73 | Trypsin is less efficient at cleaving Lys than Arg, which could limit the quality of quantitation from tryptic peptides.
74 | Hence, to achieve complete protein digestion with minimal missed cleavages, Lys-C is often used simultaneously with trypsin digestion [@PMID:23017020].
75 |
76 | Alpha-lytic protease (aLP) is another serine protease secreted by the soil bacterial _Lysobacter enzymogenesis_ [@PMID:3053694].
77 | Wild-type aLP (WaLP) and an active site mutant of aLP, M190A (MaLP), have been used to expand proteome coverage [@DOI:10.1074/mcp.m113.034710].
78 | Based on observed peptide sequences from yeast proteome digestion, WaLP showed a specificity for small aliphatic amino acids like alanine, valine, and glycine, but also threonine and serine.
79 | MaLP showed specificity for slightly larger amino acids like methionine, phenylalanine, and surprisingly, a preference for leucine over isoleucine.
80 | The specificity of WaLP for threonine enabled the first method for mapping endogenous human SUMO sites [@PMID:29079793].
81 |
82 | Glutamyl peptidase I, commonly known as Glu-C or V8 protease, is a serine protease obtained from _Staphyloccous aureus_ [@PMID:4627743].
83 | Glu-C cleaves at the C-terminus of glutamate, but also after aspartate [@PMID:4627743; @PMID:26748652].
84 |
85 | Peptidyl-Asp metallopeptidase, commonly known as Asp-N, is a metalloprotease obtained from _Pseudomonas fragi_ [@PMID:2669754].
86 | Asp-N catalyzes the hydrolysis of peptide bonds at the N-terminal of aspartate residues.
87 | The optimum activity of this enzyme occurs at a pH range between 4 and 9.
88 | As with any metalloprotease, chelators like EDTA should be avoided for digestion buffers when using Asp-N.
89 | Studies also suggest that Asp-N cleaves at the amino terminus of glutamate when a detergent is present in the proteolysis buffer [@PMID:2669754].
90 | Asp-N often leaves many missed cleavages [@PMID:27123950].
91 |
92 | Chymotrypsin or chymotrypsinogen A is a serine protease obtained from porcine or bovine pancreas with an optimum pH range from 7-9 [@PMID:3555886].
93 | It cleaves at the C-terminus of hydrophobic amino acids Phe, Trp, Tyr and barely Met and Leu residues.
94 | Since the transmembrane region of membrane proteins commonly lacks tryptic cleavage sites, this enzyme works well with membrane proteins having more hydrophobic residues [@PMID:24870543; @PMID:24696503; @PMID:27123950].
95 | The chymotryptic peptides generated after proteolysis will cover the proteome space orthogonal to that of tryptic peptides both in a quantitative and qualitative manner [@PMID:24290761; @PMID:22669647; @PMID:24696503]
96 |
97 | Clostripain, commonly known as Arg-C, is a cysteine protease obtained from _Clostridium histolyticum_ [@PMID:4332560].
98 | It hydrolyses mostly the C-terminal Arg residues and sometimes Lys residues, but with less efficiency.
99 | The peptides generated are generally longer than that of tryptic peptides.
100 | Arg-C is often used with other proteases for improving qualitative proteome data and also for investigating PTMs [@PMID:25823410].
101 |
102 | LysargiNase, also known as Ulilysin, is a recently discovered protease belonging to the metalloprotease family.
103 | It is a thermophilic protease derived from _Methanosarcina acetivorans_ that specifically cleaves at the N-terminus of Lys and Arg residues [@PMID:25419962].
104 | Hence, it enabled discovery of C-terminal peptides that were not observed using trypsin.
105 | In addition, it can also cleave modified amino acids such as methylated or dimethylated Arg and Lys [@PMID:25419962].
106 |
107 | Peptidyl-Lys metalloendopeptidase, or Lys-N, is an metalloprotease obtained from _Grifola frondosa_ [@PMID:19195997].
108 | It cleaves N-terminally of Lys and has an optimal activity at pH 9.0.
109 | Unlike trypsin, Lys-N is more resistant to denaturing agents and can be heated up to 70°C [@PMID:25823410].
110 | Peptides generated from Lys-N digestion produce more c-type ions using ETD fragmentation [@PMID:18425140].
111 | Hence this can be used for analysing PTMs, identification of C-terminal peptides and also for _de novo_ sequencing strategies [@PMID:18425140; @PMID:20953479].
112 |
113 | Pepsin A, commonly known as pepsin, is an aspartic protease obtained from bovine or porcine pancreas [@PMID:12089768].
114 | Pepsin was one of several proteins crystalized by John Northrop, who shared the 1946 Nobel prize in chemistry for this work [@PMID:19872561;@PMID:19872562; @PMID:17758437;@URL:https://www.nobelprize.org/prizes/chemistry/1946/speedread].
115 | Pepsin works at an optimum pH range from 1 to 4 and specifically cleaves Trp, Phe, Tyr and Leu [@PMID:25823410].
116 | Since it possess high enzyme activity and broad specificity at lower pH, it is preferred over other proteases for MS-based disulphide mapping [@PMID:12476442; @PMID:24980484].
117 | Pepsin is also used extensively for structural mass spectrometry studies with hydrogen-deuterium exchange (HDX) because the rate of back exchange of the amide deuteron is minimized at low pH [@DOI:10.1021/ac902477u; @DOI:10.1002/mas.21565].
118 |
119 | Proteinase K was first isolated from the mold _Tritirachium album_ Limber [@PMID:4373242].
120 | The epithet 'K' is derived from its ability to efficiently hydrolyze keratin [@PMID:4373242].
121 | It is a member of the subtilisin family of proteases and is relatively unspecific with a preference for proteolysis at hydrophobic and aromatic amino acid residues [@DOI:https://doi.org/10.1016/B978-0-12-382219-2.00714-6].
122 | The optimal enzyme activity is between pH 7.5 and 12.
123 | Proteinase K is used at low concentrations for limited proteolysis (LiP) and the detection of protein structural changes in the eponymous technique LiP-MS [@PMID:29072706].
124 |
125 | ### Peptide quantitation assays
126 | After peptide production from proteomes, it may be desirable to quantify the peptide yield.
127 | Quantitation of peptide assays is not as easy as protein lysate assays.
128 | BCA protein assays perform poorly with peptide solutions and report erroneous values.
129 | A simplistic measurement is to use a nanodrop device, but absorbance measurements from a drop of solution does not report accurate values either.
130 | Especially given that low amounts of peptides are often produced for proteomics, more sensitive methods based on fluorescence are prefered.
131 | One reliable approach is to Fluorescamine based assay for peptide solutions for higher accuracy [@PMID:5085985; @PMID:11673879].
132 | This assay is based on the reaction between a labeling reagent and the N-terminal primary amine in the peptide(s); therefore, samples must be free of amine-containing buffers (e.g., Tris-based buffer and/or amino acids).
133 | This procedure has performance similar to the Pierce Quantitative Fluorometric Peptide Assay (Cat 23290).
134 | A second option is also easy to use tryptophan fluorescence to quantify peptide yields [@DOI:10.1021/ac504689z], which is useful because it does not consume the sample because it uses intrinsic fluorescence.
135 |
136 |
137 |
--------------------------------------------------------------------------------
/content/13.Peptide-Fragmentation.md:
--------------------------------------------------------------------------------
1 | ## 11. Tandem Mass Spectrometry and Peptide Fragmentation {.page_break_before}
2 |
3 | ### Tandem Mass Spectrometry
4 |
5 | Tandem MS, where precursor ions are selected and fragmented to generate an MS/MS spectrum containing peptide-derived product ions, is a fundamental process in modern proteomics [@DOI:10.1021/acs.jproteome.2c00838; @DOI:10.1021/cr3003533].
6 | This is largely because intact peptide mass alone cannot unambiguously provide a peptide’s sequence [@DOI:10.1021/ac201624t]; however, MS/MS spectra provide more information due to predictable fragmentation behavior of peptide ions to generate sequence-informative fragments [@DOI:10.1073/pnas.83.17.6233; @DOI:10.1021/acs.jproteome.2c00838].
7 | Some more advanced proteomic acquisition methods use MS1-only feature detection in combination with retention time to maximize information used for downstream quantitation [@DOI:10.1021/acs.analchem.9b05095].
8 | In most of these, identifications are fundamentally based on MS/MS spectra, either acquired as part of a specific LC-MS/MS analysis that contains the MS/MS spectra themselves or on a spectral library of MS/MS spectra acquired previously [@DOI:10.1021/ac052197p; @DOI:10.1038/s41592-018-0003-5].
9 | True MS1-only methods that use only accurate mass and retention time for identification have been discussed, but these have yet to be widely adopted [@DOI:10.1021/acs.analchem.9b05095].
10 |
11 | The value of MS/MS spectra for peptide identification comes from predictable fragmentation behavior of peptide ions to generate sequence-informative fragments [@DOI:10.1073/pnas.83.17.6233; @DOI:10.1021/acs.jproteome.2c00838].
12 | Multiple dissociation methods exist to generate product ions in MS/MS spectra through various mechanisms (**Figure 14**).
13 | In non-modified peptides, the most labile bonds are typically peptide bonds (i.e., amide bonds) between amino acids.
14 | Depending on where peptides dissociate along the peptide backbone, the fragments are assigned different ion types (**Figure 14A**).
15 | Fragment ion nomenclature was first developed by Roepstorff and Fohlman in 1984 [@DOI:10.1002/bms.1200111109] and then refined by Biemann in 1990 [@ISBN:978-0121820947].
16 | The main ion types are the fragments that contain the original peptide N-terminus (i.e., a-, b-, and c-type ions), or the original peptide C-terminus (i.e., x-, y-, and z-type ions).
17 | The number associated with each fragment ion indicates how many amino acids from each terminus are included.
18 |
19 | {#fig:SPE tag="14" width="100%"}
28 |
29 | One of the earliest and most ubiquitous peptide fragmentation methods is collision-induced dissociation (CID, also called collisionally-activated dissociation, CAD) [@PMID:16401509] (**Figure 14B**).
30 | Here, collisions with inert gas molecules are used to increase the internal energy of peptide ions to reach bond dissociation energies that fragment them into products.
31 | Various inert gases can be used; helium, nitrogen, and argon are the most common.
32 | Preferences for which gas is used is often a function of how much energy per collision is desired.
33 | Two main versions of CID are used in proteomics, with the most common being beam-type CID (beamCID, sometimes called higher-energy collisional dissociation, HCD) [@DOI:10.1063/1.471812; @DOI:10.1038/nmeth1060].
34 | BeamCID typically uses nitrogen or argon as a collision gas, and peptide ions are accelerated into a collision cell filled with several mTorr of bath gas.
35 | The kinetic energy used to accelerate precursor ions (often generated using direct current voltage differentials between the source of the ions and the collision cell) determines the energy imparted through collisions with the bath gas, which in turn governs their fragmentation behavior.
36 |
37 | Since in non-modified peptides the most labile bonds are typically peptide bonds (i.e., amide bonds) between amino acids, the increase in internal energy from beamCID generates b- and y-type ions that represent this peptide bond cleavage, as shown in Biemann fragment ion nomenclature (**Figure 14A**).
38 | b-type ions provide sequence information for fragments that have an intact N-terminus, while y-type ions denote fragment ions with an intact C-terminus.
39 | Collisions in beamCID cause near instantaneous generation of primary fragment ions.
40 | Because the increase in internal energy happens rapidly before energy can be redistributed, beamCID can generate fragments that are not necessarily derived from cleavage of the most labile bonds (e.g., PTM-modified peptides, discussed below), but spectra are often dominated by b/y-type ions from amide bond cleavage (**Figure 14B**).
41 | BeamCID can also generate secondary fragments, such as immonium ions from side chain losses [@DOI:10.1021/ac8006076] or a-type fragment ions that come from water loss from b-type ions due to multiple collision events (note: a-type ions can form as primary fragmentation products in other dissociation methods).
42 | The simplicity of beamCID, which simply requires an rf-only collision cell, has made it widely implemented on most instrument platforms used in modern proteomics.
43 |
44 | A second form of CID is called resonant CID (resCID), where the internal energy of peptide ions is slowly increased through multiple low-energy collisions.
45 | Here, helium gas is most often used, as it imparts less energy per collision, and activation typically happens in ion trap devices where supplemental frequencies can be used to excite ions.
46 | In other words, ions are trapped using axial rf-frequencies, and an additional rf-frequency is applied to the electrodes of the ion trap [@DOI:10.1002/mas.21549].
47 | This supplemental rf is selected to have a frequency resonant with the fundamental frequency of the ions to be fragmented, as determined by the Mathieu equations, which excites the ions of interest so that they have increased kinetic energy as they move in the ion trap [@DOI:10.1115/1.4039144; @DOI:10.1006/rwsp.2000.0143].
48 | The increased kinetic energy creates more collisions with the background helium gas to slowly build up the internal energy of the precursor ions until the dissociation energy of the most labile bond is reached, causing fragmentation.
49 | Once ions dissociate, the fragments have different *m/z* values than the precursor ions, meaning they fall out of resonance with the supplemental rf and are no longer activated.
50 | Thus, resCID typically fragments only the most labile bonds in precursor ions and does not have secondary fragmentation behavior.
51 | As above, for non-modified peptide ions, this typically generates sequence-informative b- and y-type product ions.
52 | For modified peptides where the bonds connecting the modification to an amino acid are more labile than peptide bonds (e.g., phosphopeptides and glycopeptides), resCID MS/MS spectra can be dominated by product ions only of the PTM-loss rather than sequence-informative fragment ions, although many factors govern this behavior [@DOI:10.1021/pr0705136; @DOI:10.1021/ac0497104].
53 | Because of this, and because this method requires an ion trap device with the ability to apply supplemental rfs, resCID is less prevalent than beamCID.
54 | For both beamCID and resCID, the mobile proton model has been widely accepted to explain fragmentation behavior [@PMID:11180630], and this largely predictable behavior has greatly helped in manual and algorithm-assisted spectral interpretation.
55 |
56 | Despite the utility and broad adoption of CID, alternative dissociation methods have been explored for a variety of uses, including applications where CID is inadequate for the experimental question [@DOI:10.1021/ac802330b; @DOI:10.1038/nprot.2008.159; @DOI:10.1007/s00726-014-1726-y].
57 | The most popular of these alternative dissociation methods are electron-based dissociation (ExD) approaches, which include electron capture dissociation (ECD) and electron transfer dissociation (ETD).
58 | In both of these, peptide cations capture thermal electrons (ECD [@DOI:10.1021/ja973478k]) or abstract an electron from a reagent anion (ETD [@DOI:10.1073/pnas.0402700101]) to generate radical-driven dissociation of the N-Ca bond that predominantly generates sequence-informative c- and z-type product ions (**Figure 14C**).
59 | The mechanisms of ExD methods have been widely explored [@DOI:10.1021/ja8019005; @DOI:10.1016/j.jasms.2004.11.001], and the preferential cleavage of N-Ca bonds along the peptide backbone have been particularly useful for PTM-modified species because the modifications remain largely intact even during peptide backbone bond fragmentation.
60 | ExD methods have shown promise for analysis of numerous PTMs, including phosphorylation, glycosylation, ADP-ribosylation, and more [@DOI:10.1021/acs.analchem.7b04810; @DOI:10.1002/mas.21560].
61 | Electron-based dissociation is also more suitable than collision-based dissociation for MS analyses of intact proteins [@DOI:https://doi.org/10.1021/ja011335z; @DOI:https://doi.org/10.1021/acs.analchem.5b00162] and larger oligonucleotides [@DOI:10.1002/mas.21442; @DOI:10.1021/acs.analchem.9b05388; @DOI:10.1021/jacs.1c10757; @DOI:10.1002/ange.201206232; @DOI:10.1021/acs.analchem.2c03030; @DOI:10.1021/acs.bioconjchem.3c00254]
62 |
63 | Two fundamental challenges exist with ExD methods.
64 | First, ExD implementation requires instruments that can manipulate cations and anions (or free electrons) within the same scan sequence and can trap both simultaneously for electron capture/transfer events to occur.
65 | This has been successfully accomplished on a number of instruments, including FT-ICR systems, ion traps, ToFs with quadrupole ion traps, and hybrid Orbitrap instruments, but it is not a ubiquitous feature of all platforms.
66 | That said, several exciting advances in recent years have made ExD methods more accessible on numerous instrument configurations [@DOI:10.1021/acs.analchem.7b04810; @DOI:10.1002/mas.21560; @DOI:10.1021/acs.analchem.8b01901; @DOI:10.1021/acs.jproteome.7b00622; @DOI:10.1021/jasms.0c00425].
67 | A second challenge is the dependence of ExD dissociation efficiency on precursor ion charge density [@DOI:10.1074/mcp.M700073-MCP200].
68 | ExD methods generally produce robust fragmentation for charge dense precursor ions (i.e., those with relatively low *m/z* values and higher z).
69 | Alternatively, precursors with low charge density (i.e., higher *m/z* values) have relatively condensed secondary gas-phase structure that leads to non-covalent interactions.
70 | Even in the cases when ExD methods drive peptide backbone cleavage, product ions (i.e., c- and z-type fragments) are held together by the non-covalent interactions so that few (or no) sequence-informative product ions are produced.
71 | This process is called non-dissociative electron-capture/transfer (ECnoD/ETnoD) [@DOI:10.1021/ac050666h].
72 | Several strategies to mitigate ECnoD/ETnoD have been successfully explored, including supplemental activation of product ions with resCID (ETcaD [@DOI:10.1021/ac061457f]) or beamCID (EThcD [@DOI:10.1016/j.jasms.2009.05.009; @DOI:10.1021/ac3025366]), supplemental activation with infrared photons (AI-ECD [@DOI:10.1016/j.jasms.2008.12.015; @DOI:10.1021/ac000494i] and AI-ETD [@DOI:10.1021/acs.analchem.5b00881; @DOI:10.1002/anie.200903557; @DOI:10.1021/acs.analchem.0c02087; @DOI:10.1021/jasms.1c00284]) or ultraviolet photons (ETuvPD [@DOI:10.1021/ac5036082]), and use of higher energy electrons [@URL:https://www.sciencedirect.com/science/article/abs/pii/S0009261402001495; @DOI:10.1021/ja8087407; @DOI:10.1021/jasms.0c00425].
73 | Despite their successes, these methods still require instrumentation capable of ExD in addition to extra hardware needed for a given strategy (e.g., a CO2 laser in AI-ETD [@DOI:10.1021/acs.analchem.7b00213]).
74 | As with ExD in general, recent advances in supplemental activation strategies for ExD are making these tools more accessible [@DOI:10.1021/acs.analchem.7b04810; @DOI:10.1002/mas.21560].
75 |
76 | Photoactivation is another family of alternative dissociation strategies that has been steadily gaining popularity [@DOI:10.1021/acs.analchem.9b04859; @DOI:10.1039/c3cs60444f].
77 | Infrared multi-photon dissociation (IRMPD) is canonically the photodissociation method used in early proteomic applications [@DOI:10.1039/c3cs60444f], but ultraviolet photodissociation (UVPD) has been the more widely used approach in the recent decade [@DOI:10.1021/acs.chemrev.9b00440].
78 | IRMPD functions similarly to resCID; it is a slow heating approach that causes vibrational excitation due to absorption of low energy photons, generally 10.6 μm photons from a CO2 laser [@DOI:10.1021/acs.chemrev.9b00395; @DOI:10.1016/j.jasms.2004.07.016].
79 | Predominant fragments are b- and y-type fragments, although secondary fragmentation occurs because fragment ions remain in the photon path after the initial dissociation event (**Figure 14D**).
80 | Despite limited use in the past decade, recent work shows that IRMPD, or more generally activation with IR photons, may still have value in the proteomics toolkit [@DOI:10.1021/acs.analchem.1c05398; @DOI:10.1021/acs.analchem.0c02087].
81 | UVPD has been explored with a number of wavelengths, including 157 nm, 193 nm, 213 nm, 266 nm, and 355 nm [@DOI:10.1002/anie.200460788; @DOI:10.1021/pr100515x; @DOI:10.1074/mcp.TIR119.001638; @DOI:10.1016/j.jasms.2008.10.019; @DOI:10.1002/rcm.4184; @DOI:10.1021/ac071241t].
82 | Higher-energy UVPD approaches, like 193 and 213 nm photons, are typically used for underivatized peptide and protein ions [@DOI:10.1021/acs.chemrev.9b00440], while others, like 266 and 355 nm, can be used for directed fragmentation at specific residues with natural chromophores (e.g., tyrosine) or exogenously added chromophore tags [@DOI:10.1021/ja076535a; @DOI:10.1002/anie.200900613].
83 | UVPD with 193 and 213 nm generate multiple fragment types, including sequence-informative a-, b-, c-, x-, y-, and z-ions in addition to other fragmentation pathways, which occur through vibrational and electronic excitation [@DOI:10.1007/s13361-017-1721-0].
84 | UVPD has been explored for bottom-up proteomic applications, but its more impactful utility, arguably, has been realized for intact protein characterization [@DOI:10.1016/j.cbpa.2022.102180].
85 | The laser needed for UVPD (i.e., the photon wavelength desired) determines much about its implementation.
86 | 193 nm photons are typically generated using an Excimer laser with ArF gas [@DOI:10.1021/ja4029654], while 213 nm photons can be generated with a solid-state laser that is easier to integrate into an instrument platform and maintain [@DOI:10.1021/jasms.0c00106; @DOI:10.1074/mcp.TIR119.001638].
87 | That said, 213 nm photons tend to provide more directed, preferential cleavage pathways compared to 193 nm photons that cleave more broadly in non-directed fashion [@DOI:10.1021/jasms.2c00288].
88 | Outside of ExD and photoactivation approaches, other alternative dissociation methods have been explored for various proteomic applications, although they are not as widely adopted at ExD and UVPD methods [@DOI:10.1021/acs.analchem.9b04859].
89 |
--------------------------------------------------------------------------------
/LICENSE.md:
--------------------------------------------------------------------------------
1 | # Creative Commons Attribution 4.0 International
2 |
3 | Creative Commons Corporation (“Creative Commons”) is not a law firm and does not provide legal services or legal advice. Distribution of Creative Commons public licenses does not create a lawyer-client or other relationship. Creative Commons makes its licenses and related information available on an “as-is” basis. Creative Commons gives no warranties regarding its licenses, any material licensed under their terms and conditions, or any related information. Creative Commons disclaims all liability for damages resulting from their use to the fullest extent possible.
4 |
5 | ### Using Creative Commons Public Licenses
6 |
7 | Creative Commons public licenses provide a standard set of terms and conditions that creators and other rights holders may use to share original works of authorship and other material subject to copyright and certain other rights specified in the public license below. The following considerations are for informational purposes only, are not exhaustive, and do not form part of our licenses.
8 |
9 | * __Considerations for licensors:__ Our public licenses are intended for use by those authorized to give the public permission to use material in ways otherwise restricted by copyright and certain other rights. Our licenses are irrevocable. Licensors should read and understand the terms and conditions of the license they choose before applying it. Licensors should also secure all rights necessary before applying our licenses so that the public can reuse the material as expected. Licensors should clearly mark any material not subject to the license. This includes other CC-licensed material, or material used under an exception or limitation to copyright. [More considerations for licensors](http://wiki.creativecommons.org/Considerations_for_licensors_and_licensees#Considerations_for_licensors).
10 |
11 | * __Considerations for the public:__ By using one of our public licenses, a licensor grants the public permission to use the licensed material under specified terms and conditions. If the licensor’s permission is not necessary for any reason–for example, because of any applicable exception or limitation to copyright–then that use is not regulated by the license. Our licenses grant only permissions under copyright and certain other rights that a licensor has authority to grant. Use of the licensed material may still be restricted for other reasons, including because others have copyright or other rights in the material. A licensor may make special requests, such as asking that all changes be marked or described. Although not required by our licenses, you are encouraged to respect those requests where reasonable. [More considerations for the public](http://wiki.creativecommons.org/Considerations_for_licensors_and_licensees#Considerations_for_licensees).
12 |
13 | ## Creative Commons Attribution 4.0 International Public License
14 |
15 | By exercising the Licensed Rights (defined below), You accept and agree to be bound by the terms and conditions of this Creative Commons Attribution 4.0 International Public License ("Public License"). To the extent this Public License may be interpreted as a contract, You are granted the Licensed Rights in consideration of Your acceptance of these terms and conditions, and the Licensor grants You such rights in consideration of benefits the Licensor receives from making the Licensed Material available under these terms and conditions.
16 |
17 | ### Section 1 – Definitions.
18 |
19 | a. __Adapted Material__ means material subject to Copyright and Similar Rights that is derived from or based upon the Licensed Material and in which the Licensed Material is translated, altered, arranged, transformed, or otherwise modified in a manner requiring permission under the Copyright and Similar Rights held by the Licensor. For purposes of this Public License, where the Licensed Material is a musical work, performance, or sound recording, Adapted Material is always produced where the Licensed Material is synched in timed relation with a moving image.
20 |
21 | b. __Adapter's License__ means the license You apply to Your Copyright and Similar Rights in Your contributions to Adapted Material in accordance with the terms and conditions of this Public License.
22 |
23 | c. __Copyright and Similar Rights__ means copyright and/or similar rights closely related to copyright including, without limitation, performance, broadcast, sound recording, and Sui Generis Database Rights, without regard to how the rights are labeled or categorized. For purposes of this Public License, the rights specified in Section 2(b)(1)-(2) are not Copyright and Similar Rights.
24 |
25 | d. __Effective Technological Measures__ means those measures that, in the absence of proper authority, may not be circumvented under laws fulfilling obligations under Article 11 of the WIPO Copyright Treaty adopted on December 20, 1996, and/or similar international agreements.
26 |
27 | e. __Exceptions and Limitations__ means fair use, fair dealing, and/or any other exception or limitation to Copyright and Similar Rights that applies to Your use of the Licensed Material.
28 |
29 | f. __Licensed Material__ means the artistic or literary work, database, or other material to which the Licensor applied this Public License.
30 |
31 | g. __Licensed Rights__ means the rights granted to You subject to the terms and conditions of this Public License, which are limited to all Copyright and Similar Rights that apply to Your use of the Licensed Material and that the Licensor has authority to license.
32 |
33 | h. __Licensor__ means the individual(s) or entity(ies) granting rights under this Public License.
34 |
35 | i. __Share__ means to provide material to the public by any means or process that requires permission under the Licensed Rights, such as reproduction, public display, public performance, distribution, dissemination, communication, or importation, and to make material available to the public including in ways that members of the public may access the material from a place and at a time individually chosen by them.
36 |
37 | j. __Sui Generis Database Rights__ means rights other than copyright resulting from Directive 96/9/EC of the European Parliament and of the Council of 11 March 1996 on the legal protection of databases, as amended and/or succeeded, as well as other essentially equivalent rights anywhere in the world.
38 |
39 | k. __You__ means the individual or entity exercising the Licensed Rights under this Public License. Your has a corresponding meaning.
40 |
41 | ### Section 2 – Scope.
42 |
43 | a. ___License grant.___
44 |
45 | 1. Subject to the terms and conditions of this Public License, the Licensor hereby grants You a worldwide, royalty-free, non-sublicensable, non-exclusive, irrevocable license to exercise the Licensed Rights in the Licensed Material to:
46 |
47 | A. reproduce and Share the Licensed Material, in whole or in part; and
48 |
49 | B. produce, reproduce, and Share Adapted Material.
50 |
51 | 2. __Exceptions and Limitations.__ For the avoidance of doubt, where Exceptions and Limitations apply to Your use, this Public License does not apply, and You do not need to comply with its terms and conditions.
52 |
53 | 3. __Term.__ The term of this Public License is specified in Section 6(a).
54 |
55 | 4. __Media and formats; technical modifications allowed.__ The Licensor authorizes You to exercise the Licensed Rights in all media and formats whether now known or hereafter created, and to make technical modifications necessary to do so. The Licensor waives and/or agrees not to assert any right or authority to forbid You from making technical modifications necessary to exercise the Licensed Rights, including technical modifications necessary to circumvent Effective Technological Measures. For purposes of this Public License, simply making modifications authorized by this Section 2(a)(4) never produces Adapted Material.
56 |
57 | 5. __Downstream recipients.__
58 |
59 | A. __Offer from the Licensor – Licensed Material.__ Every recipient of the Licensed Material automatically receives an offer from the Licensor to exercise the Licensed Rights under the terms and conditions of this Public License.
60 |
61 | B. __No downstream restrictions.__ You may not offer or impose any additional or different terms or conditions on, or apply any Effective Technological Measures to, the Licensed Material if doing so restricts exercise of the Licensed Rights by any recipient of the Licensed Material.
62 |
63 | 6. __No endorsement.__ Nothing in this Public License constitutes or may be construed as permission to assert or imply that You are, or that Your use of the Licensed Material is, connected with, or sponsored, endorsed, or granted official status by, the Licensor or others designated to receive attribution as provided in Section 3(a)(1)(A)(i).
64 |
65 | b. ___Other rights.___
66 |
67 | 1. Moral rights, such as the right of integrity, are not licensed under this Public License, nor are publicity, privacy, and/or other similar personality rights; however, to the extent possible, the Licensor waives and/or agrees not to assert any such rights held by the Licensor to the limited extent necessary to allow You to exercise the Licensed Rights, but not otherwise.
68 |
69 | 2. Patent and trademark rights are not licensed under this Public License.
70 |
71 | 3. To the extent possible, the Licensor waives any right to collect royalties from You for the exercise of the Licensed Rights, whether directly or through a collecting society under any voluntary or waivable statutory or compulsory licensing scheme. In all other cases the Licensor expressly reserves any right to collect such royalties.
72 |
73 | ### Section 3 – License Conditions.
74 |
75 | Your exercise of the Licensed Rights is expressly made subject to the following conditions.
76 |
77 | a. ___Attribution.___
78 |
79 | 1. If You Share the Licensed Material (including in modified form), You must:
80 |
81 | A. retain the following if it is supplied by the Licensor with the Licensed Material:
82 |
83 | i. identification of the creator(s) of the Licensed Material and any others designated to receive attribution, in any reasonable manner requested by the Licensor (including by pseudonym if designated);
84 |
85 | ii. a copyright notice;
86 |
87 | iii. a notice that refers to this Public License;
88 |
89 | iv. a notice that refers to the disclaimer of warranties;
90 |
91 | v. a URI or hyperlink to the Licensed Material to the extent reasonably practicable;
92 |
93 | B. indicate if You modified the Licensed Material and retain an indication of any previous modifications; and
94 |
95 | C. indicate the Licensed Material is licensed under this Public License, and include the text of, or the URI or hyperlink to, this Public License.
96 |
97 | 2. You may satisfy the conditions in Section 3(a)(1) in any reasonable manner based on the medium, means, and context in which You Share the Licensed Material. For example, it may be reasonable to satisfy the conditions by providing a URI or hyperlink to a resource that includes the required information.
98 |
99 | 3. If requested by the Licensor, You must remove any of the information required by Section 3(a)(1)(A) to the extent reasonably practicable.
100 |
101 | 4. If You Share Adapted Material You produce, the Adapter's License You apply must not prevent recipients of the Adapted Material from complying with this Public License.
102 |
103 | ### Section 4 – Sui Generis Database Rights.
104 |
105 | Where the Licensed Rights include Sui Generis Database Rights that apply to Your use of the Licensed Material:
106 |
107 | a. for the avoidance of doubt, Section 2(a)(1) grants You the right to extract, reuse, reproduce, and Share all or a substantial portion of the contents of the database;
108 |
109 | b. if You include all or a substantial portion of the database contents in a database in which You have Sui Generis Database Rights, then the database in which You have Sui Generis Database Rights (but not its individual contents) is Adapted Material; and
110 |
111 | c. You must comply with the conditions in Section 3(a) if You Share all or a substantial portion of the contents of the database.
112 |
113 | For the avoidance of doubt, this Section 4 supplements and does not replace Your obligations under this Public License where the Licensed Rights include other Copyright and Similar Rights.
114 |
115 | ### Section 5 – Disclaimer of Warranties and Limitation of Liability.
116 |
117 | a. __Unless otherwise separately undertaken by the Licensor, to the extent possible, the Licensor offers the Licensed Material as-is and as-available, and makes no representations or warranties of any kind concerning the Licensed Material, whether express, implied, statutory, or other. This includes, without limitation, warranties of title, merchantability, fitness for a particular purpose, non-infringement, absence of latent or other defects, accuracy, or the presence or absence of errors, whether or not known or discoverable. Where disclaimers of warranties are not allowed in full or in part, this disclaimer may not apply to You.__
118 |
119 | b. __To the extent possible, in no event will the Licensor be liable to You on any legal theory (including, without limitation, negligence) or otherwise for any direct, special, indirect, incidental, consequential, punitive, exemplary, or other losses, costs, expenses, or damages arising out of this Public License or use of the Licensed Material, even if the Licensor has been advised of the possibility of such losses, costs, expenses, or damages. Where a limitation of liability is not allowed in full or in part, this limitation may not apply to You.__
120 |
121 | c. The disclaimer of warranties and limitation of liability provided above shall be interpreted in a manner that, to the extent possible, most closely approximates an absolute disclaimer and waiver of all liability.
122 |
123 | ### Section 6 – Term and Termination.
124 |
125 | a. This Public License applies for the term of the Copyright and Similar Rights licensed here. However, if You fail to comply with this Public License, then Your rights under this Public License terminate automatically.
126 |
127 | b. Where Your right to use the Licensed Material has terminated under Section 6(a), it reinstates:
128 |
129 | 1. automatically as of the date the violation is cured, provided it is cured within 30 days of Your discovery of the violation; or
130 |
131 | 2. upon express reinstatement by the Licensor.
132 |
133 | For the avoidance of doubt, this Section 6(b) does not affect any right the Licensor may have to seek remedies for Your violations of this Public License.
134 |
135 | c. For the avoidance of doubt, the Licensor may also offer the Licensed Material under separate terms or conditions or stop distributing the Licensed Material at any time; however, doing so will not terminate this Public License.
136 |
137 | d. Sections 1, 5, 6, 7, and 8 survive termination of this Public License.
138 |
139 | ### Section 7 – Other Terms and Conditions.
140 |
141 | a. The Licensor shall not be bound by any additional or different terms or conditions communicated by You unless expressly agreed.
142 |
143 | b. Any arrangements, understandings, or agreements regarding the Licensed Material not stated herein are separate from and independent of the terms and conditions of this Public License.
144 |
145 | ### Section 8 – Interpretation.
146 |
147 | a. For the avoidance of doubt, this Public License does not, and shall not be interpreted to, reduce, limit, restrict, or impose conditions on any use of the Licensed Material that could lawfully be made without permission under this Public License.
148 |
149 | b. To the extent possible, if any provision of this Public License is deemed unenforceable, it shall be automatically reformed to the minimum extent necessary to make it enforceable. If the provision cannot be reformed, it shall be severed from this Public License without affecting the enforceability of the remaining terms and conditions.
150 |
151 | c. No term or condition of this Public License will be waived and no failure to comply consented to unless expressly agreed to by the Licensor.
152 |
153 | d. Nothing in this Public License constitutes or may be interpreted as a limitation upon, or waiver of, any privileges and immunities that apply to the Licensor or You, including from the legal processes of any jurisdiction or authority.
154 |
155 | ```
156 | Creative Commons is not a party to its public licenses. Notwithstanding, Creative Commons may elect to apply one of its public licenses to material it publishes and in those instances will be considered the “Licensor.” Except for the limited purpose of indicating that material is shared under a Creative Commons public license or as otherwise permitted by the Creative Commons policies published at [creativecommons.org/policies](http://creativecommons.org/policies), Creative Commons does not authorize the use of the trademark “Creative Commons” or any other trademark or logo of Creative Commons without its prior written consent including, without limitation, in connection with any unauthorized modifications to any of its public licenses or any other arrangements, understandings, or agreements concerning use of licensed material. For the avoidance of doubt, this paragraph does not form part of the public licenses.
157 |
158 | Creative Commons may be contacted at creativecommons.org
159 | ```
160 |
--------------------------------------------------------------------------------