├── .gitignore ├── LICENSE ├── README.md ├── _config.yml ├── env ├── .bashrc └── README.md ├── hpc ├── README.md └── af3_alpine_array-example.sh └── sequence-analysis ├── blastdb_list.md └── update_blastdb.sh /.gitignore: -------------------------------------------------------------------------------- 1 | # History files 2 | .Rhistory 3 | .Rapp.history 4 | 5 | # Session Data files 6 | .RData 7 | 8 | # Example code in package build process 9 | *-Ex.R 10 | 11 | # Output files from R CMD build 12 | /*.tar.gz 13 | 14 | # Output files from R CMD check 15 | /*.Rcheck/ 16 | 17 | # RStudio files 18 | .Rproj.user/ 19 | 20 | # produced vignettes 21 | vignettes/*.html 22 | vignettes/*.pdf 23 | 24 | # OAuth2 token, see https://github.com/hadley/httr/releases/tag/v0.3 25 | .httr-oauth 26 | 27 | # knitr and R markdown default cache directories 28 | /*_cache/ 29 | /cache/ 30 | 31 | # Temporary files created by R markdown 32 | *.utf8.md 33 | *.knit.md 34 | 35 | # Shiny token, see https://shiny.rstudio.com/articles/shinyapps.html 36 | rsconnect/ 37 | -------------------------------------------------------------------------------- /LICENSE: -------------------------------------------------------------------------------- 1 | MIT License 2 | 3 | Copyright (c) 2019 Janani Ravi 4 | 5 | Permission is hereby granted, free of charge, to any person obtaining a copy 6 | of this software and associated documentation files (the "Software"), to deal 7 | in the Software without restriction, including without limitation the rights 8 | to use, copy, modify, merge, publish, distribute, sublicense, and/or sell 9 | copies of the Software, and to permit persons to whom the Software is 10 | furnished to do so, subject to the following conditions: 11 | 12 | The above copyright notice and this permission notice shall be included in all 13 | copies or substantial portions of the Software. 14 | 15 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR 16 | IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, 17 | FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE 18 | AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER 19 | LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, 20 | OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE 21 | SOFTWARE. 22 | -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | # Computational Biology & Bioinformatics Resources 2 | _With programming resources on R, Python, Unix, Git, and Stats._ 3 | _Other non-compbio gists will be [here](https://gist.github.com/jananiravi)!_ 4 | > NOTE: When the recommendation is an online course, we recommend the *FREE* version. 5 | 6 | ## Contributors 7 | [Janani Ravi](https://github.com/jananiravi) & [Arjun Krishnan](https://github.com/krishnanlab) 8 | 9 | > NOTE: _You can request gist on a particular topic by adding an [issue](https://github.com/jananiravi/compbio-gists/issues) outlining the details of the problem. Keywords of interest are in the repo description above._ 10 | 11 | ## Table of Contents 12 | * [Cheatsheets](#cheatsheets) 13 | * [Unix](#unix) 14 | * [R](#r) 15 | * [Python](#python) 16 | * [Probability & Statistics](#probability-and-statistics) 17 | * [Biology](#biology) 18 | 19 | ## Cheatsheets 20 | For R/RStudio, Git/GitHub, Markdown, Unix/vi, Slack, …
21 | https://github.com/jananiravi/cheatsheets 22 | 23 | ## Unix 24 | * [Command-line Bootcamp](http://rik.smith-unna.com/command_line_bootcamp/) 25 | * [Command-line Guide](http://commandline.guide/) | Also interactive, just like the bootcamp. 26 | * [Linux Journey](https://linuxjourney.com) 27 | * A Unix workshop: [course materials](https://www.dropbox.com/s/1ltlyhtdbccymep/w1-files.zip?dl=0) 28 | * Day1 - [Video](https://www.youtube.com/watch?v=liC5uM8czyo) & [Slides](https://www.dropbox.com/s/ggv7ijwateim7zt/day1_Unix.pdf?dl=0) 29 | * Day2 - [Video](https://www.youtube.com/watch?v=ArbOG6YpakU) & [Slides](https://www.dropbox.com/s/xorsuvk1cugiyw8/day2_Unix.pdf?dl=0) 30 | * Day3 - [Video](https://www.youtube.com/watch?v=PHmfgIuOMFQ) & [Slides](https://www.dropbox.com/s/88wu7svvfur8upw/day3_Unix.pdf?dl=0) 31 | * Command-line refresher from [Software Carpentry](http://swcarpentry.github.io/shell-novice/) 32 | 33 | ## R 34 | ### General introduction to R 35 | * [Swirl](http://swirlstats.com) ('R Programming' & 'Data Analysis’ lessons) 36 | * [Programming with R](http://swcarpentry.github.io/r-novice-inflammation/) 37 | * [RStudio Education](https://education.rstudio.com/) 38 | * [Finding Your Way To R](https://education.rstudio.com/learn/) | [Beginners](https://education.rstudio.com/learn/beginner/) 39 | * [RStudio Essentials](https://resources.rstudio.com/) 40 | * [R Cheatsheets](https://www.rstudio.com/resources/cheatsheets/) 41 | 42 | #### Data Visualization 43 | A few useful resources to share along with the tidyverse/ggplot 44 | 1. To pick the right kind of visualization, given your data type: 45 | https://www.data-to-viz.com/ 46 | 2. Graph galleries w/ sample codes for R/python-newbies:
47 | [R Graph Gallery](https://www.r-graph-gallery.com/) | [Python Graph Gallery](https://python-graph-gallery.com/) 48 | 3. [ggplot extension gallery](https://exts.ggplot2.tidyverse.org/gallery/) | https://github.com/ggplot2-exts/gallery 49 | 50 | ### R for data science and machine learning 51 | * [Data Science Course in a Box](https://datasciencebox.org/) - Introductory data science course covering data acquisition and wrangling, exploratory data analysis, data visualization, inference, modeling, and effective communication of results (with tidyverse, R Markdown, and version control). The course also introduces interactive visualization and reporting, text analysis, and Bayesian inference. 52 | * [RStudio | The Essentials of Data Science](https://resources.rstudio.com/the-essentials-of-data-science) 53 | * [R for Reproducible Scientific Analysis](http://swcarpentry.github.io/r-novice-gapminder/) 54 | 55 | ### eBooks for R 56 | * R for Data Science | R4DS | Hadley Wickham, Garrett Grolemund | 57 | [eBook](https://r4ds.had.co.nz/) 58 | * Hands-On Programming with R | HOPR | Garrett Grolemund | 59 | [eBook](https://rstudio-education.github.io/hopr/) 60 | * Happy Git and GitHub for the useR | Jenny Bryan | 61 | [eBook](https://happygitwithr.com/) 62 | * [Learning Statistics with R](https://learningstatisticswithr.com/) | Danielle Navarro | 63 | [eBook](https://learningstatisticswithr.com/book/) 64 | * Computational Genomics with R | Altuna Akalin | 65 | [eBook](http://compgenomr.github.io/book/) | _Work in progress_ 66 | * R Programming for Data Science | Roger Peng | 67 | [eBook](https://leanpub.com/rprogramming) 68 | * R Graphics Cookbook | Winston Chang | 69 | [eBook](https://r-graphics.org/) 70 | 71 | ## Python 72 | 73 | ### General introduction to Python 74 | * [Learning Python the Hard Way](https://learnpythonthehardway.org/book/) 75 | * [Google Python Class](https://developers.google.com/edu/python/) 76 | * [Videos to follow along](https://www.youtube.com/playlist?list=PLfZeRfzhgQzTMgwFVezQbnpc1ck0I6CQl) 77 | * Introduction to Interactive Programming in Python 78 | * [Part 1](https://www.coursera.org/learn/interactive-python-1) 79 | * [Part 2](https://www.coursera.org/learn/interactive-python-2) 80 | 81 | ### Python for data science and machine learning 82 | * Courses to learn introductory computer science, programming, computational thinking, and data science (video lectures + notes + assignments): 83 | * [Introduction to Computer Science and Programming in Python](https://ocw.mit.edu/courses/electrical-engineering-and-computer-science/6-0001-introduction-to-computer-science-and-programming-in-python-fall-2016/) 84 | * [Introduction to Computational Thinking and Data Science](https://ocw.mit.edu/courses/electrical-engineering-and-computer-science/6-0002-introduction-to-computational-thinking-and-data-science-fall-2016/) 85 | * [A Whirlwind Tour of Python](https://jakevdp.github.io/WhirlwindTourOfPython/): [PDF](http://www.oreilly.com/programming/free/files/a-whirlwind-tour-of-python.pdf) and [Jupyter Notebooks](https://github.com/jakevdp/WhirlwindTourOfPython) 86 | * [Scipy Lecture Notes](http://www.scipy-lectures.org/) – Awesome document to learn numerics, science, and data with Python 87 | * Data Wrangling: 88 | * [Data Wrangling in Python with Pandas - Kaggle](https://www.kaggle.com/learn/pandas) 89 | * [Video series on data analysis with Pandas](https://www.dataschool.io/easier-data-analysis-with-pandas/) – Excellent set of short videos 90 | * Visualization: 91 | * [Data Visualization with Python - Kaggle](https://www.kaggle.com/learn/data-visualisation) 92 | * [Python Plotting for Exploratory Data Analysis](http://pythonplot.com/) 93 | * Machine Learning: 94 | * [Introduction to ML in Python - Kaggle](https://www.kaggle.com/learn/machine-learning) (Checkout both Levels 1 & 2) 95 | * [Another intro to ML with scikit-learn](https://www.dataschool.io/machine-learning-with-scikit-learn/) – This one contains videos and accompanying JuPyter notebooks + blog posts. 96 | * [A Quick Demo to ML with Scikit Learn Python Package](https://github.com/mmmayo13/scikit-learn-classifiers/blob/master/sklearn-classifiers-tutorial.ipynb) – A nice demo+tour of scikit learn. 97 | * [Deep Learning with Python and TensorFlow - Kaggle](https://www.kaggle.com/learn/deep-learning) 98 | * [Embeddings with Python and TensorFlow - Kaggle](https://www.kaggle.com/learn/embeddings) – Build deep learning models that handle sparse categorical variables 99 | * [Machine Learning Explainability](https://www.kaggle.com/learn/machine-learning-explainability) 100 | * General mutli-topic resources: 101 | * [A Step-by-step Guide to Python for Data Science](http://www.dataschool.io/launch-your-data-science-career-with-python/) 102 | * Always checkout the latest PyCon Conference tutorials and talks, almost all of which are available online. [For e.g., here's a list from PyCon 2017](https://krishnanlab.slack.com/files/arjunkrish/F5MEK7GAK/Python_Videos_of_Interest_to_Lab). 103 | 104 | ### Probability and statistics 105 | * [Think Stats](https://greenteapress.com/wp/think-stats-2e/) (book + code + solutions; for Python programmers). 106 | * [Learning statistics with R](https://learningstatisticswithr.com/book/) (book + code + solutions; for R programmers). 107 | * [Points of Significance](https://www.nature.com/collections/qghhqm/pointsofsignificance) - an awesome collection of short articles on a variety of topics in statistical data analysis. 108 | * [OpenIntro to Probablity and Statistics](https://www.openintro.org/stat/textbook.php?stat_book=os) 109 | 110 | #### Statistical learning 111 | > A great resource (book + videos + slides + exercises + example code + solutions) for simultaneously learning both statistical learning and R. [_Statistical learning_ is just another term for _machine learning_ done from a slightly statistical-modeling point-of-view.] 112 | * An Introduction to Statistical Learning with Applications in R | Gareth James, Daniela Witten, Trevor Hastie and Robert Tibshirani 113 | http://www-bcf.usc.edu/~gareth/ISL/index.html 114 | * You can download the latest version of the book as a PDF on that site: http://www-bcf.usc.edu/~gareth/ISL/ISLR%20Seventh%20Printing.pdf 115 | * I would encourage watching these excellent course lecture videos (by the authors, who’re world-class scientists) that follow the book closely: http://www.dataschool.io/15-hours-of-expert-machine-learning-videos/ 116 | * There are additional slides & videos from another good course taught based on this book: https://www.alsharif.info/iom530 117 | 118 | ## Biology 119 | * [Learn genetics](https://learn.genetics.utah.edu/) 120 | * [IBiology](https://www.ibiology.org/biology-videos/) 121 | * [DNA seen through the eyes of a coder](https://ds9a.nl/amazing-dna/) - If you have a computational/quantitaive background, you'll esp. love this! 122 | -------------------------------------------------------------------------------- /_config.yml: -------------------------------------------------------------------------------- 1 | theme: jekyll-theme-midnight 2 | title: Computational Biology Gists 3 | -------------------------------------------------------------------------------- /env/.bashrc: -------------------------------------------------------------------------------- 1 | # .bashrc 2 | 3 | ########################## 4 | #### .bashrc #### 5 | ########################## 6 | 7 | test -f /etc/profile.dos && . /etc/profile.dos 8 | 9 | 10 | # Some applications read the EDITOR variable to determine your favourite text 11 | # editor. So uncomment the line below and enter the editor of your choice :-) 12 | #export EDITOR=/usr/bin/vim 13 | #export EDITOR=/usr/bin/mcedit 14 | 15 | 16 | # add aliases if there is a .aliases file 17 | test -s ~/.alias && . ~/.alias 18 | 19 | 20 | # Source global definitions 21 | if [ -f /etc/bashrc ]; then 22 | . /etc/bashrc 23 | fi 24 | 25 | # Uncomment the following line if you don't like systemctl's auto-paging feature: 26 | # export SYSTEMD_PAGER= 27 | 28 | PS1='${PWD#"${PWD%/*/*}/"} \$ ' 29 | `export PS1="$(basename $(dirname $PWD))/$(basename $PWD)"` 30 | 31 | eval "$(/opt/homebrew/bin/brew shellenv)" 32 | export gfortran=/usr/local/gfortran/bin 33 | export PATH=$PATH:$gfortran 34 | export PATH=$PATH:/opt/R/arm64/bin:/opt/R/arm64/bin/gfortran 35 | export PATH=$PATH:/opt/homebrew:/usr/local/bin:/opt/homebrew/sbin:/opt/homebrew/bin 36 | 37 | ## Connecting to servers 38 | ## For example... 39 | ## CU 40 | alias jrcu='ssh USERNAME@XYZLAB.ucdenver.pvt' 41 | ## MSU 42 | alias jrmsu='ssh -A USERNAME@compute.cvm.msu.edu -p 55411' #-i ~/.ssh/id_cvmcompute' 43 | alias mhpc='ssh USERNAME@hpcc.msu.edu' 44 | 45 | ##################### 46 | ## General aliases ## 47 | ##################### 48 | alias vi='vim' 49 | alias c='clear' 50 | alias e='exit' 51 | 52 | alias rm='rm -i' 53 | alias mv='mv -i' 54 | alias ls='ls -lthG --color=auto' 55 | alias ll='ls -lth --color=auto' 56 | alias llh='ls -lthG --color=auto | head' 57 | alias grep='grep --color=auto' 58 | 59 | ## git 60 | alias gs='git status ' 61 | alias ga='git add ' 62 | alias gaa='git add -A .' 63 | alias gb='git branch ' 64 | alias gc='git commit' 65 | alias gcm='git commit -m' 66 | alias gd='git diff' 67 | alias go='git checkout ' 68 | alias gk='gitk --all&' 69 | alias gx='gitx --all' 70 | 71 | alias got='git ' 72 | alias get='git ' 73 | 74 | 75 | # for vi; rest in .vimrc 76 | set noclobber 77 | set nowrap 78 | set number 79 | set syntax=on 80 | #export CLICOLOR=1 81 | 82 | # bind TAB:menu-complete 83 | bind TAB:complete 84 | # bind ESC:complete 85 | 86 | ## Local (Mac) aliases 87 | alias gh='cd /Users/USERNAME/GitHub' 88 | alias pastweek='find /Users/USERNAME -mtime -7' 89 | alias pastten='find /Users/USERNAME -mtime -10' 90 | alias pastmonth='find /Users/USERNAME -mtime -30' 91 | ### Other examples 92 | #alias molevolvr='cd /Users/USERNAME/GitHub/molevolvr' 93 | #alias amr='cd /Users/USERNAME/GitHub/amR' 94 | #alias microgenomer='cd /Users/USERNAME/GitHub/microgenomer' 95 | #alias drugrep='cd /Users/USERNAME/GitHub/drugrep' 96 | 97 | ##################### 98 | #### MSU compute #### 99 | ##################### 100 | # Compute.cvm.msu.edu 101 | alias usermsu='ssh -A USERNAME@compute.cvm.msu.edu -p 55411' #-i ~/.ssh/id_rsa_cvmcompute' 102 | 103 | ## MSU CVM compute server paths 104 | PATH=$PATH:/bin:/usr/bin:/home 105 | PATH=$PATH:/data/research/XYZLAB:/data/research/XYZLAB/common-data 106 | PATH=$PATH:/data/run/USERNAME:/data/scratch/USERNAME 107 | export PATH 108 | 109 | BLASTDB=/data/blastdb 110 | export BLASTDB 111 | 112 | BLASTMAT=/opt/software/BLAST/2.2.26/data 113 | export BLASTMAT 114 | 115 | INTERPRO=/opt/software/iprscan/5.47.82.0-Python3/data 116 | export INTERPRO 117 | 118 | ## Server aliases 119 | alias jrlab='cd /data/research/XYZLAB' 120 | alias cdata='cd /data/research/XYZLAB/common_data' 121 | alias shiny='cd /srv/shiny-server' 122 | alias logs='cd /var/log/shiny-server' 123 | 124 | ####################### 125 | #### e.g., MSU HPC #### 126 | ####################### 127 | ## MSU HPC PATHS 128 | PATH=/usr/bin:/bin:/usr/sbin:/sbin:/opt/software/:/mnt/research/XYZLAB/software 129 | #/mnt/research/XYZLAB/software/anaconda3/bin:/mnt/research/XYZLAB/software/anaconda3: 130 | PATH=$PATH:/mnt/research/common-data:mnt/research/common-data/Bio:mnt/research/common-data/Bio/blastdb:mnt/research/common-data/Bio/blastdb/v5 131 | PATH=$PATH:/mnt/home/johnj/software/modulefiles 132 | #PATH=$PATH:/mnt/research/XYZLAB/software/sanger-pathogens-Roary-225d24f/bin:/mnt/research/XYZLAB/software/perlmods 133 | export PATH=/usr/local/bin:$PATH 134 | export PATH=/Library/TeX/texbin:$PATH 135 | export PATH 136 | 137 | BLASTDB=/opt/software/:/mnt/research/XYZLAB/molevol/data:/mnt/research/common-data:mnt/research/common-data/Bio:mnt/research/common-data/Bio/blastdb:mnt/research/common-data/Bio/blastdb/v5 138 | export BLASTDB 139 | 140 | BLASTMAT=/mnt/research/XYZLAB:/mnt/research/XYZLAB/molevol/data:/mnt/research/XYZLAB/evolvr_hpc/data 141 | export BLASTMAT=/mnt/home/USERNAME:/mnt/home/USERNAME/testspace:$BLASTMAT 142 | 143 | export PKG_CONFIG_PATH=/usr/local/lib/pkgconfig:/usr/lib/pkgconfig:/opt/X11/lib/pkgconfig 144 | 145 | ## Google drive download | unused? 146 | function gdrive_download () { 147 | CONFIRM=$(wget --quiet --save-cookies /tmp/cookies.txt --keep-session-cookies --no-check-certificate "https://docs.google.com/uc?export=download&id=$1" -O- | sed -rn 's/.*confirm=([0-9A-Za-z_]+).*/\1\n/p') 148 | wget --load-cookies /tmp/cookies.txt "https://docs.google.com/uc?export=download&confirm=$CONFIRM&id=$1" -O $2 149 | rm -rf /tmp/cookies.txt 150 | } 151 | -------------------------------------------------------------------------------- /env/README.md: -------------------------------------------------------------------------------- 1 | Houses environment setup snippets, incl. 2 | - `.bashrc`, `.vimrc`, `.bash_profile) 3 | - `renv` 4 | - ... 5 | -------------------------------------------------------------------------------- /hpc/README.md: -------------------------------------------------------------------------------- 1 | Houses HPC, slurm/pbs, and related code snippets 2 | -------------------------------------------------------------------------------- /hpc/af3_alpine_array-example.sh: -------------------------------------------------------------------------------- 1 | #!/usr/bin/env bash 2 | 3 | ########################### 4 | # Created by: Sarah Mentzer, JRaviLab (https://jravilab.github.io) | Summer 2025 5 | # In collaboration with: Abhirupa Ghosh, Janani Ravi (JRaviLab), with inputs from Kevin Fotso, CU Alpine 6 | # This script runs AlphaFold3 PPI predictions 7 | # It uses an array job to process 2 PPI IDs at a time until the 383 are done. 8 | # 9 | # The SLURM options request GPUs, set runtime limits, and handle job logging. 10 | # For each job, it: 11 | # - Loads the AlphaFold3 environment and sets paths 12 | # - Picks the next PPI ID and JSON input file from the input 13 | # - Runs the AlphaFold3 scoring command 14 | # - Logs how long it took and checks for success/failure based on 15 | # the presence of a AlphaFold3 summary_confidences.json ouput for the PPI ID 16 | ########################### 17 | 18 | #SBATCH --nodes=1 19 | #SBATCH --ntasks=10 20 | #SBATCH --gres=gpu 21 | #SBATCH --time=05:00:00 22 | #SBATCH --partition=aa100 23 | #SBATCH --qos=normal 24 | #SBATCH --output=logs/scoring_ppi-%j.out 25 | #SBATCH --error=logs/scoring_ppi-%j.err 26 | #SBATCH --array=1-383%2 27 | 28 | ### SET UP ENVIRONMENT ### 29 | # Set working directory 30 | cd /projects/$USER/ppi_network 31 | 32 | # Create a log to capture the success of each job 33 | TIMESTAMP=$(date '+%Y-%m-%d_%H-%M-%S') 34 | MAIN_LOG="$PWD/scripts/logs/run_main-remaining.log" 35 | 36 | echo "Main log file: $MAIN_LOG" 37 | 38 | ### SET UP ENVIRONMENT FOR GIVEN JOB ### 39 | # Capture start of given job 40 | JOB_START=$(date +%s) 41 | 42 | # Set path to parameters 43 | export AF3_MODEL_PARAMETERS_DIR=$PWD/data/alphafold3/parameters 44 | # Set path to AF3 directory 45 | export AF3_DIR=$PWD/results/alphafold3 46 | # Set path to input directory 47 | export AF3_INPUT_DIR=$AF3_DIR/json_inputs 48 | # Set path to input PPI ID list 49 | export AF3_INPUT_LIST=$AF3_DIR/scoring_ppi.tab 50 | # Set path to output 51 | export AF3_OUTPUT_DIR=$AF3_DIR/raw_outputs 52 | 53 | # Get the line from the .tab file corresponding to this task ID (skipping header) 54 | LINE=$(sed -n "$((SLURM_ARRAY_TASK_ID + 1))p" "$AF3_INPUT_LIST") 55 | 56 | # Extract PPI ID (col 1) and JSON path (col 4) 57 | PPI_ID=$(echo "$LINE" | awk -F'\t' '{print $1}') 58 | # Create a variable for the given JSON input 59 | AF3_INPUT_FILE=$(echo "$LINE" | awk -F'\t' '{print $4}') 60 | 61 | echo "[$(date)] START Job ${SLURM_JOB_ID}_${SLURM_ARRAY_TASK_ID} for $PPI_ID (input: $AF3_INPUT_FILE)" >> "$_LOG" 62 | 63 | # Load AlphaFold 3 module 64 | module purge 65 | module load alphafold/3.0.0 66 | 67 | # Set temp directory paths 68 | export TMPDIR=/scratch/alpine/$USER/tmp-alphafold 69 | mkdir -pv $TMPDIR 70 | export TEMP=$TMPDIR 71 | export TMP_DIR=$TMPDIR 72 | export TEMP_DIR=$TMPDIR 73 | export TEMPDIR=$TMPDIR 74 | 75 | ### BEGIN ALPHAFOLD SCORING ### 76 | run_alphafold --json_path=$AF3_INPUT_FILE --output_dir=$AF3_OUTPUT_DIR --model_dir=$AF3_MODEL_PARAMETERS_DIR 77 | 78 | # Capture end time of job 79 | JOB_END=$(date +%s) 80 | # Calculate the elapsed time of the job by subtracting start from end 81 | JOB_ELAPSED=$((JOB_END - JOB_START)) 82 | 83 | # Format elapsed time into minutes and seconds 84 | JOB_ELAPSED_FMT=$(printf "%02dm%02ds" $((JOB_ELAPSED/60)) $((JOB_ELAPSED%60))) 85 | 86 | ### CHECK FOR JOB SUCCESS 87 | # Find the summary confidence file ignoring the case of ppi id 88 | PPI_OUTPUT_DIR=$(find "$AF3_OUTPUT_DIR" -maxdepth 1 -type d -iname "$PPI_ID" | head -n 1) 89 | 90 | # Construct the expected file path and check for it 91 | if [ -n "$PPI_OUTPUT_DIR" ]; then 92 | OUTPUT_FILE=$(find "$PPI_OUTPUT_DIR" -maxdepth 1 -type f -iname "${PPI_ID}_summary_confidences.json" | head -n 1) 93 | if [ -n "$OUTPUT_FILE" ]; then 94 | echo "[$(date '+%Y-%m-%d %H:%M:%S')] SUCCESS for $PPI_ID (Job ${SLURM_JOB_ID}_${SLURM_ARRAY_TASK_ID}) — Elapsed: $JOB_ELAPSED_FMT" >> "$_LOG" 95 | else 96 | echo "[$(date '+%Y-%m-%d %H:%M:%S')] FAILURE for $PPI_ID — File missing inside matching dir: ${PPI_ID}_summary_confidences.json" >> "$_LOG" 97 | fi 98 | else 99 | echo "[$(date '+%Y-%m-%d %H:%M:%S')] FAILURE for $PPI_ID — Output subdirectory not found under $AF3_OUTPUT_DIR" >> "$MAIN_LOG" 100 | fi 101 | -------------------------------------------------------------------------------- /sequence-analysis/blastdb_list.md: -------------------------------------------------------------------------------- 1 | **Date Created**: April 17, 2019
2 | **Updated** by @jananiravi 3 | 4 | **Sources**:
5 | [ftp://ftp.ncbi.nlm.nih.gov/blast/db/](ftp://ftp.ncbi.nlm.nih.gov/blast/db/)
6 | https://www.ncbi.nlm.nih.gov/books/NBK62345/ 7 | 8 | 9 | Result of `update_blastdb.pl --blastdb_version 5 --showall` 10 | ``` 11 | nr_v5 12 | nt_v5 13 | pdb_v5 14 | refseq_rna_v5 15 | swissprot_v5 16 | taxdb 17 | ``` 18 | 19 | Result of `update_blastdb.pl --passive --showall` 20 | 21 | ``` 22 | 16SMicrobial 23 | cdd_delta 24 | env_nr 25 | env_nt 26 | est 27 | est_human 28 | est_human_blob 29 | est_mouse 30 | est_mouse_blob 31 | est_others 32 | gss 33 | gss_annot 34 | htgs 35 | human_genomic 36 | landmark 37 | nr 38 | nt 39 | other_genomic 40 | pataa 41 | patnt 42 | pdbaa 43 | pdbnt 44 | ref_prok_rep_genomes 45 | ref_viroids_rep_genomes 46 | ref_viruses_rep_genomes 47 | refseq_genomic 48 | refseq_protein 49 | refseq_rna 50 | refseqgene 51 | sts 52 | swissprot 53 | taxdb 54 | tsa_nr 55 | tsa_nt 56 | vector 57 | ``` 58 | -------------------------------------------------------------------------------- /sequence-analysis/update_blastdb.sh: -------------------------------------------------------------------------------- 1 | ## Date Created: April 17, 2019 2 | ## Updated by: Janani Ravi 3 | ## Purpose: To update to BLASTDB v5 (& likely all BLASTDBs) 4 | ## The BLASTDB (all, v5) files available as of April 17, 2019: 5 | ## https://github.com/jananiravi/compbio-gists/blob/master/sequence-analysis/blastdb_list.md 6 | 7 | ## This chunk below would work ONLY on MSU's HPC 8 | # Start w/ ssh username@hpcc.msu.edu | for users w/ write access to blastdb or subfolders 9 | cd /mnt/research/common-data/Bio/blastdb 10 | module purge 11 | # Loading Blast+; https://wiki.hpcc.msu.edu/pages/viewpage.action?pageId=11896703 12 | module load icc/2017.4.196-GCC-6.4.0-2.28 impi/2017.3.196 BLAST+/2.8.1-Python-2.7.14 13 | 14 | ## Checking latest updates for version 5 15 | # https://ftp.ncbi.nlm.nih.gov/blast/db/v5/blastdbv5.pdf 16 | printf "\nHere are the BLASTDB v5 files to be downloaded:\n" 17 | update_blastdb.pl --blastdb_version 5 --showall 18 | update_blastdb.pl --blastdb_version 5 --showall > blastdb_v5_update_list.txt 19 | 20 | # Downloading the updates for v5 21 | printf "\nSTARTING the downloads of BLASTDB v5 databases\n" 22 | while read f; do 23 | printf "\nDownloading $f...\n" 24 | # update_blastdb.pl --blastdb_version 5 $f 25 | update_blastdb.pl --blastdb_version 5 $f --decompress 26 | done blastdb_update_list.txt 32 | 33 | # Downloading and decompressing all BLASTDB files 34 | printf "\nDownloading ALL the latest BLASTDBs\n" 35 | while read f; do 36 | printf "\nDownloading $f\n" 37 | update_blastdb.pl $f --passive --decompress 38 | done