├── .gitignore ├── Future Module - Adding to Plots └── FilePlotter2.py ├── LICENSE ├── Module 0 - Plotting a Spectrum ├── FilePlotter.py ├── Lesson 0 Notes.md ├── SPECTRUM - MS.txt ├── scratch0.py └── test.py ├── Module 1 - Calculating Masses ├── __init__.py ├── calc_masses.py ├── calc_masses_chem.py ├── homework1.py └── scratch1.py ├── Module 2 - Larger Databases └── calc_masses_fasta.py └── README.md /.gitignore: -------------------------------------------------------------------------------- 1 | .idea/ 2 | *.png 3 | *_key.py -------------------------------------------------------------------------------- /Future Module - Adding to Plots/FilePlotter2.py: -------------------------------------------------------------------------------- 1 | import numpy as np 2 | import matplotlib.pyplot as plt 3 | 4 | 5 | def predicted_mz(mass, zrange=[15, 30], adductmass=1): 6 | zarray = np.arange(zrange[0], zrange[1] + 1) 7 | mz = (mass + zarray * adductmass) / zarray 8 | mz = np.round(mz, 0) 9 | return mz, zarray 10 | 11 | 12 | path = "C:\\Users\\Teaching\\Desktop\\SPECTRUM - MS.txt" 13 | path = "C:\\Python\\MassSpecCodingClub\\Module 0 - Plotting a Spectrum\\SPECTRUM - MS.txt" 14 | 15 | data = np.loadtxt(path, skiprows=8) 16 | 17 | x = data[:, 0] 18 | y = data[:, 1] 19 | 20 | ymax = np.amax(y) 21 | 22 | y = y / ymax * 100 23 | 24 | mzrange = [5000, 7000] 25 | 26 | mz_vals, zvals = predicted_mz(147500, zrange=[1, 1000]) 27 | 28 | plt.plot(x, y) 29 | plt.xlim(mzrange[0], mzrange[1]) 30 | # plt.plot(mz_vals, np.ones_like(mz_vals)*100, marker="o", linestyle=" ") 31 | for i, mz in enumerate(mz_vals): 32 | if mz < mzrange[1] and mz > mzrange[0]: 33 | plt.vlines(mz, 0, 100, linestyles="--", color="black") 34 | z = zvals[i] 35 | print(i, mz, z) 36 | plt.text(mz - 75, 101, "+" + str(z)) 37 | # print(z) 38 | 39 | plt.xlabel("m/z") 40 | plt.ylabel("Intensity (%)") 41 | 42 | # plt.text(2000, 20, "Monomer") 43 | 44 | plt.savefig("testspectrum.png") 45 | plt.show() 46 | -------------------------------------------------------------------------------- /LICENSE: -------------------------------------------------------------------------------- 1 | BSD 3-Clause License 2 | 3 | Copyright (c) 2023, michaelmarty 4 | 5 | Redistribution and use in source and binary forms, with or without 6 | modification, are permitted provided that the following conditions are met: 7 | 8 | 1. Redistributions of source code must retain the above copyright notice, this 9 | list of conditions and the following disclaimer. 10 | 11 | 2. Redistributions in binary form must reproduce the above copyright notice, 12 | this list of conditions and the following disclaimer in the documentation 13 | and/or other materials provided with the distribution. 14 | 15 | 3. Neither the name of the copyright holder nor the names of its 16 | contributors may be used to endorse or promote products derived from 17 | this software without specific prior written permission. 18 | 19 | THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" 20 | AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE 21 | IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE 22 | DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE 23 | FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL 24 | DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR 25 | SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER 26 | CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, 27 | OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE 28 | OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. 29 | -------------------------------------------------------------------------------- /Module 0 - Plotting a Spectrum/FilePlotter.py: -------------------------------------------------------------------------------- 1 | import numpy as np 2 | import matplotlib.pyplot as plt 3 | 4 | path = "C:\\Users\\Teaching\\Desktop\\SPECTRUM - MS.txt" 5 | 6 | data = np.loadtxt(path, skiprows=8) 7 | 8 | x = data[:,0] 9 | y = data[:,1] 10 | 11 | ymax = np.amax(y) 12 | 13 | y = y / ymax * 100 14 | 15 | plt.plot(x, y) 16 | plt.xlabel("m/z") 17 | plt.ylabel("Intensity (%)") 18 | plt.savefig("testspectrum.png") 19 | plt.show() 20 | -------------------------------------------------------------------------------- /Module 0 - Plotting a Spectrum/Lesson 0 Notes.md: -------------------------------------------------------------------------------- 1 | # Lesson 0 Notes 2 | 3 | ## The 3 key parts of the computer: 4 | 5 | * Hard Drive: Permanent file storage 6 | * Memory: Temporary storage of information 7 | * Processor: Carries out operations on data 8 | 9 | ## Basic Processor Operations 10 | 11 | * Input: Read data from memory or hard drive 12 | * Process: Arithmetic or logic operations 13 | * Output: Display or output data in other forms 14 | * Store: Write data to memory or hard drive 15 | 16 | ## Computer code 17 | 18 | A (typically linear) set of operations for the computer to perform. 19 | 20 | ## Types of objects in Python 21 | 22 | * Variables: Data stored in memory 23 | * float: 3.0 24 | * int: 3 25 | * string: "Hello World" 26 | * bool: True 27 | * list/array: [0, 1, 2] 28 | * dict: {"Name": "Python Champion"} 29 | * Functions: Perform a series of operations based on inputs and return outputs 30 | * add(a, b) 31 | * plot(x, y) 32 | * Classes: Collections of variables and/or functions grouped together 33 | * Library/Package: Importable collection of variables, functions, and classes -------------------------------------------------------------------------------- /Module 0 - Plotting a Spectrum/scratch0.py: -------------------------------------------------------------------------------- 1 | list1 = [] 2 | list2 = [] 3 | 4 | for penguin in range(0, 10): 5 | a = 0 6 | b = 1 7 | if penguin >= 2: 8 | a = 1 9 | 10 | if penguin < 3: 11 | b = 2 12 | 13 | list1.append(a) 14 | list2.append(b) 15 | 16 | print(list1) 17 | print(list2) 18 | -------------------------------------------------------------------------------- /Module 0 - Plotting a Spectrum/test.py: -------------------------------------------------------------------------------- 1 | a = 2 2 | b = 3 3 | c = a + b 4 | 5 | def add(x, y): 6 | z = x + y 7 | return z 8 | 9 | 10 | d = add(a+1, b) 11 | 12 | print("The answer is:", d) -------------------------------------------------------------------------------- /Module 1 - Calculating Masses/__init__.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/michaelmarty/MassSpecCodingClub/15fea90bdb7b27c357801378795e6556871f73b4/Module 1 - Calculating Masses/__init__.py -------------------------------------------------------------------------------- /Module 1 - Calculating Masses/calc_masses.py: -------------------------------------------------------------------------------- 1 | import numpy as np 2 | 3 | aa_masses = {'A': 71.0788, 'C': 103.1388, 'D': 115.0886, 'E': 129.1155, 'F': 147.1766, 4 | 'G': 57.0519, 'H': 137.1411, 'I': 113.1594, 'K': 128.1741, 'L': 113.1594, 5 | 'M': 131.1926, 'N': 114.1038, 'P': 97.1167, 'Q': 128.1307, 'R': 156.1875, 6 | 'S': 87.0782, 'T': 101.1051, 'V': 99.1326, 'W': 186.2132, 'Y': 163.1760} 7 | 8 | aa_masses_monoisotopic = {'A': 71.03711, 'C': 103.00919, 'D': 115.02694, 'E': 129.04259, 'F': 147.06841, 9 | 'G': 57.02146, 'H': 137.05891, 'I': 113.08406, 'K': 128.09496, 'L': 113.08406, 10 | 'M': 131.04049, 'N': 114.04293, 'P': 97.05276, 'Q': 128.05858, 'R': 156.10111, 11 | 'S': 87.03203, 'T': 101.04768, 'V': 99.06841, 'W': 186.07931, 'Y': 163.06333} 12 | 13 | rna_masses = {'A': 329.2, 'U': 306.2, 'C': 305.2, 'G': 345.2, 'T': 306.2} 14 | 15 | dna_masses = {'A': 313.2, 'T': 304.2, 'C': 289.2, 'G': 329.2, 'U': 304.2} 16 | 17 | mass_water = 18.0153 18 | mass_water_monoisotopic = 18.0105 19 | 20 | test_seq = ("GSTFSKLREQLGPVTQEFWDNLEKETEGLRQEMSKDLEEVKAKVQPYLDDFQKKWQEEMELYRQKVEPLRAELQEGARQKLHELQEKLSPLGEEMRD" 21 | "RARAHVDALRTHLAPYSDELRQRLAARLEALKENGGARLAEYHAKATEHLSTLSEKAKPALEDLRQGLLPVLESFKVSFLSALEEYTKKLNTQ") 22 | 23 | 24 | def calc_aa_mass(a, monoisotopic=False): 25 | a = a.upper() 26 | 27 | if a in aa_masses.keys(): 28 | if monoisotopic: 29 | return aa_masses_monoisotopic[a] 30 | else: 31 | return aa_masses[a] 32 | else: 33 | print("Unrecognized Character:", a) 34 | return 0 35 | 36 | 37 | def calc_prot_mass(seq, monoisotopic=False): 38 | if monoisotopic: 39 | water = mass_water_monoisotopic 40 | else: 41 | water = mass_water 42 | return np.sum([calc_aa_mass(s, monoisotopic=monoisotopic) for s in test_seq]) + water 43 | 44 | 45 | tot_mass = calc_prot_mass(test_seq, monoisotopic=True) 46 | 47 | print(tot_mass) 48 | -------------------------------------------------------------------------------- /Module 1 - Calculating Masses/calc_masses_chem.py: -------------------------------------------------------------------------------- 1 | from glypy.io import glycoct 2 | from molmass import Formula 3 | from rdkit.Chem.Descriptors import * 4 | 5 | glycan = """ 6 | RES 7 | 1b:a-dman-HEX-1:5 8 | 2b:b-dglc-HEX-1:5 9 | 3s:n-acetyl 10 | 4b:b-dgal-HEX-1:5 11 | 5b:a-dgro-dgal-NON-2:6|1:a|2:keto|3:d 12 | 6s:n-acetyl 13 | 7b:b-dglc-HEX-1:5 14 | 8s:n-acetyl 15 | 9b:b-dgal-HEX-1:5 16 | 10b:a-dgro-dgal-NON-2:6|1:a|2:keto|3:d 17 | 11s:n-acetyl 18 | LIN 19 | 1:1o(2+1)2d 20 | 2:2d(2+1)3n 21 | 3:2o(4+1)4d 22 | 4:4o(6+2)5d 23 | 5:5d(5+1)6n 24 | 6:1o(6+1)7d 25 | 7:7d(2+1)8n 26 | 8:7o(4+1)9d 27 | 9:9o(3+2)10d 28 | 10:10d(5+1)11n""" 29 | 30 | 31 | def calc_glycan_mass(glycan_string, monoisotopic=False): 32 | gc = glycoct.loads(glycan_string) 33 | return gc.mass(average=(not monoisotopic)) 34 | 35 | 36 | glycan_mass = calc_glycan_mass(glycan, monoisotopic=False) 37 | # print(glycan_mass) 38 | 39 | formula = 'C8H10N4O2' 40 | 41 | 42 | # exit() 43 | 44 | 45 | def calc_formula_mass(formula, monoisotopic=False): 46 | f = Formula(formula) 47 | if monoisotopic: 48 | return f.monoisotopic_mass 49 | else: 50 | return f.mass 51 | 52 | 53 | # print(calc_formula_mass(formula, monoisotopic=False)) 54 | 55 | 56 | 57 | smiles = "[C@](COP(=O)([O-])OCC[N+](C)(C)C)([H])(OC(CCCCCCC/C=C\CCCCCCCC)=O)COC(CCCCCCCCCCCCCCC)=O" 58 | 59 | 60 | def calc_smiles_mass(smiles, monoisotopic=False): 61 | mol = Chem.MolFromSmiles(smiles) 62 | if monoisotopic: 63 | mass = ExactMolWt(mol) 64 | else: 65 | mass = MolWt(mol) 66 | return mass 67 | 68 | 69 | # print(calc_smiles_mass(smiles, monoisotopic=False)) 70 | -------------------------------------------------------------------------------- /Module 1 - Calculating Masses/homework1.py: -------------------------------------------------------------------------------- 1 | # Homework 1 for Module 1 2 | # 3 | # Problem 1 of 2: Calculate the neutral monoisotopic mass of this RNA oligonucliotide 4 | # 5 | 6 | oligo = "aacauucaACgcugucggugAgu" 7 | 8 | # 9 | # I will give you the atomic formulas of the nucleobases 10 | # You will need to add ribose and phosphate to each nucleobase 11 | # and take away 3 waters, one for the nucleobase-ribose bond, one for the phosphate-ribose bond, and 12 | # one for the second phosphate-ribose bond. 13 | # 14 | 15 | A = "C5H5N5" 16 | C = "C4H5N3O" 17 | G = "C5H5N5O" 18 | U = "C4H4N2O2" 19 | ribose = "C5H10O5" 20 | phos = "H3PO4" 21 | water = "H2O" 22 | 23 | # 24 | # I will help you out with a hint here 25 | # You may need to do to get the calc_formula_mass function working 26 | # Note, it is importing this from the file we wrote in Module 1.2. 27 | # If something isn't working, you can write these from scratch 28 | # 29 | 30 | from calc_masses_chem import calc_formula_mass 31 | 32 | mono = True 33 | mass_water = calc_formula_mass(water, monoisotopic=mono) 34 | 35 | form_a = A + ribose + phos # This just stacks the strings of formulas back to back 36 | mass_a = calc_formula_mass(form_a, monoisotopic=mono) - 3 * mass_water 37 | 38 | # Don't forget, this mass is just for a single unterminated nucleotide fragment. 39 | # You will then need to add back a water for the terminal phosphate. 40 | # 41 | 42 | # [ Your code here] 43 | 44 | 45 | 46 | # Print the final answer: 47 | 48 | print("The oligo mass is:") 49 | 50 | # If you are curious about checking your answer, try: http://mass.rega.kuleuven.be/mass/mongo.htm. 51 | # Select monoisotopic, RNA, -OH terminal on 5', and 3' phosphate terminal 52 | 53 | # 54 | # 55 | # Problem 2 of 2: Calculate the difference between the monoisotopic mass and the average mass for this oligo 56 | # 57 | # 58 | 59 | 60 | # [Your code here] 61 | 62 | 63 | # Print the final answer: 64 | 65 | print("The difference between average and monoisotopic:") 66 | -------------------------------------------------------------------------------- /Module 1 - Calculating Masses/scratch1.py: -------------------------------------------------------------------------------- 1 | print(list(range(5))) 2 | 3 | for i in range(5): 4 | if i % 2 == 0: 5 | print("Even:", i) 6 | else: 7 | print("Odd:", i) 8 | 9 | ''' 10 | print(2==2) 11 | print(1==2) 12 | print(1>2) 13 | print(2<=1) 14 | print(2!=1) 15 | 16 | if 1 == 2: 17 | print("TRUE") 18 | elif 2 == 2: 19 | print("ELIF True") 20 | else: 21 | print("FALSE")''' 22 | 23 | 24 | def add(a, b, s=0, m=1, d=1): 25 | return m*(a + b - s)/d 26 | 27 | c = add(1, 2, d=2) 28 | print(c) 29 | -------------------------------------------------------------------------------- /Module 2 - Larger Databases/calc_masses_fasta.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/michaelmarty/MassSpecCodingClub/15fea90bdb7b27c357801378795e6556871f73b4/Module 2 - Larger Databases/calc_masses_fasta.py -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | # Mass Spec Coding Club 2 | The Mass Spec Coding Club (MSCC) is a community dedicated to education of computer coding applied to mass spectrometry applications. Our goal is to make coding accessible to mass spectrometry researchers and provide free resources and open-source examples. 3 | 4 | As the community develops, we will continue to post more content, and we welcome contributions from anyone. 5 | 6 | # Discord Server 7 | 8 | Want to chat with community members and join meetings, join us on the [Mass Spec Coding Club Discord Server](https://discord.gg/24GupxGn3d). It's easy to set up, and you can run it from a browser if you'd like. We will pick a time soon and start hosting meetings/office hours there. 9 | 10 | In the meantime, feel free to post questions to the text channels there, and people can answer. 11 | 12 | # Learning Modules 13 | 14 | ## Module 0: Setting Up Python and Plotting A Spectrum 15 | 16 | This series of lessons will cover how to set up Python from scratch and write a simple script to plot a mass spectrum. Skills and learning outcomes are outlined below each video 17 | 18 | * [Lesson 0.0: Setting up Python from Scratch](https://youtu.be/BLaoo1S3ImU) 19 | * How to set up and run Python 20 | * Setting variables 21 | * Printing to the terminal 22 | * [Lesson 0.1: Loading Data Into Python](https://youtu.be/vpbdUQp8m0U) 23 | * Importing libraries 24 | * Reading from text files into NumPy arrays 25 | * Intro to array slicing 26 | * [Lesson 0.2: Plotting a Spectrum](https://youtu.be/88m4a9CEeBY) 27 | * Plotting a spectrum with MatPlotLib 28 | * Normalizing the y-axis 29 | * [Lesson 0.3: Too Fast, Go Back - Review and Background from Module 0](https://youtu.be/V6alRhace2A) 30 | * Fundamentals of how computers work 31 | * Basics of code concepts 32 | * Discussion of variables, functions, and classes 33 | * How to define functions 34 | 35 | The data files, Python code, and notes used in this module are available in the "Module 0" folder. 36 | 37 | ## Module 1: Calculating Masses 38 | 39 | The goal of module 1 is to show how Python can be used to predict masses of various molecules, starting with proteins. 40 | 41 | * [Lesson 1.0: Calculating Protein Masses](https://youtu.be/FFR1gg2cA6E) 42 | * Using a Dictionary 43 | * Creating a function 44 | * Looping through protein sequence to calculate the protein mass 45 | * [Lesson 1.1: Improving Our Protein Mass Calculator](https://youtu.be/lxTrA_EPeNg) 46 | * String manipulation 47 | * Passing variables through functions 48 | * If/then statements 49 | * Monoisotopic mass calculation for protein 50 | * [Lesson 1.2: Calculating Masses from Glycans, SMILES, and Formulas](https://youtu.be/XSgA7SODmSg) 51 | * Using Glypy to calculate masses from GlycoCT strings 52 | * Using molmass to calculate masses from formulas 53 | * Using RDKit to calculate masses from SMILES strings 54 | * [Lesson 1.3: Too Fast, Go Back - For Loops, If/Then, and Function Options](https://youtu.be/xg1QxAzznkg) 55 | * Writing For loops 56 | * If/Then statements and Boolean tests 57 | * Passing options to functions 58 | * Homework 1 59 | * For those who want to test their skills and calculate some RNA masses, check out homework1.py in Module 1. 60 | 61 | Check back for more videos, and reach out if you like these [mtmarty@arizona.edu](mailto:mtmarty@arizona.edu). 62 | 63 | # Ideas for Future Tutorials 64 | 65 | Here are some ideas that users have suggested. If you have other suggestions, please enter them in the ["What Projects Would You Like to See?"](https://github.com/michaelmarty/MassSpecCodingClub/discussions/3) discussion. If you would like to volunteer to make a module on one of these topics, please add your name here. 66 | 67 | * Plotting multiple spectra with for loops and string parsing (Michael Marty) 68 | * Reading vendor files 69 | * Writing to different output files 70 | * Exploring other Python MS packages 71 | * How to use public databases (Ming?) 72 | * Applications to polymers and oligonucleotides 73 | * Ion mobility 74 | * Using Git and GitHub 75 | * Gasp, R! 76 | * There are a lot of great R resources for MS already, so maybe we could organize and link those here too. 77 | 78 | # Funding 79 | 80 | Funding is provided by the National Science Foundation: CHE-1845230. 81 | --------------------------------------------------------------------------------