├── .gitignore
├── LICENSE
├── README.md
├── _config.yml
├── env
├── .bashrc
└── README.md
├── hpc
├── README.md
└── af3_alpine_array-example.sh
└── sequence-analysis
├── blastdb_list.md
└── update_blastdb.sh
/.gitignore:
--------------------------------------------------------------------------------
1 | # History files
2 | .Rhistory
3 | .Rapp.history
4 |
5 | # Session Data files
6 | .RData
7 |
8 | # Example code in package build process
9 | *-Ex.R
10 |
11 | # Output files from R CMD build
12 | /*.tar.gz
13 |
14 | # Output files from R CMD check
15 | /*.Rcheck/
16 |
17 | # RStudio files
18 | .Rproj.user/
19 |
20 | # produced vignettes
21 | vignettes/*.html
22 | vignettes/*.pdf
23 |
24 | # OAuth2 token, see https://github.com/hadley/httr/releases/tag/v0.3
25 | .httr-oauth
26 |
27 | # knitr and R markdown default cache directories
28 | /*_cache/
29 | /cache/
30 |
31 | # Temporary files created by R markdown
32 | *.utf8.md
33 | *.knit.md
34 |
35 | # Shiny token, see https://shiny.rstudio.com/articles/shinyapps.html
36 | rsconnect/
37 |
--------------------------------------------------------------------------------
/LICENSE:
--------------------------------------------------------------------------------
1 | MIT License
2 |
3 | Copyright (c) 2019 Janani Ravi
4 |
5 | Permission is hereby granted, free of charge, to any person obtaining a copy
6 | of this software and associated documentation files (the "Software"), to deal
7 | in the Software without restriction, including without limitation the rights
8 | to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
9 | copies of the Software, and to permit persons to whom the Software is
10 | furnished to do so, subject to the following conditions:
11 |
12 | The above copyright notice and this permission notice shall be included in all
13 | copies or substantial portions of the Software.
14 |
15 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
16 | IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
17 | FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
18 | AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
19 | LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
20 | OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
21 | SOFTWARE.
22 |
--------------------------------------------------------------------------------
/README.md:
--------------------------------------------------------------------------------
1 | # Computational Biology & Bioinformatics Resources
2 | _With programming resources on R, Python, Unix, Git, and Stats._
3 | _Other non-compbio gists will be [here](https://gist.github.com/jananiravi)!_
4 | > NOTE: When the recommendation is an online course, we recommend the *FREE* version.
5 |
6 | ## Contributors
7 | [Janani Ravi](https://github.com/jananiravi) & [Arjun Krishnan](https://github.com/krishnanlab)
8 |
9 | > NOTE: _You can request gist on a particular topic by adding an [issue](https://github.com/jananiravi/compbio-gists/issues) outlining the details of the problem. Keywords of interest are in the repo description above._
10 |
11 | ## Table of Contents
12 | * [Cheatsheets](#cheatsheets)
13 | * [Unix](#unix)
14 | * [R](#r)
15 | * [Python](#python)
16 | * [Probability & Statistics](#probability-and-statistics)
17 | * [Biology](#biology)
18 |
19 | ## Cheatsheets
20 | For R/RStudio, Git/GitHub, Markdown, Unix/vi, Slack, …
21 | https://github.com/jananiravi/cheatsheets
22 |
23 | ## Unix
24 | * [Command-line Bootcamp](http://rik.smith-unna.com/command_line_bootcamp/)
25 | * [Command-line Guide](http://commandline.guide/) | Also interactive, just like the bootcamp.
26 | * [Linux Journey](https://linuxjourney.com)
27 | * A Unix workshop: [course materials](https://www.dropbox.com/s/1ltlyhtdbccymep/w1-files.zip?dl=0)
28 | * Day1 - [Video](https://www.youtube.com/watch?v=liC5uM8czyo) & [Slides](https://www.dropbox.com/s/ggv7ijwateim7zt/day1_Unix.pdf?dl=0)
29 | * Day2 - [Video](https://www.youtube.com/watch?v=ArbOG6YpakU) & [Slides](https://www.dropbox.com/s/xorsuvk1cugiyw8/day2_Unix.pdf?dl=0)
30 | * Day3 - [Video](https://www.youtube.com/watch?v=PHmfgIuOMFQ) & [Slides](https://www.dropbox.com/s/88wu7svvfur8upw/day3_Unix.pdf?dl=0)
31 | * Command-line refresher from [Software Carpentry](http://swcarpentry.github.io/shell-novice/)
32 |
33 | ## R
34 | ### General introduction to R
35 | * [Swirl](http://swirlstats.com) ('R Programming' & 'Data Analysis’ lessons)
36 | * [Programming with R](http://swcarpentry.github.io/r-novice-inflammation/)
37 | * [RStudio Education](https://education.rstudio.com/)
38 | * [Finding Your Way To R](https://education.rstudio.com/learn/) | [Beginners](https://education.rstudio.com/learn/beginner/)
39 | * [RStudio Essentials](https://resources.rstudio.com/)
40 | * [R Cheatsheets](https://www.rstudio.com/resources/cheatsheets/)
41 |
42 | #### Data Visualization
43 | A few useful resources to share along with the tidyverse/ggplot
44 | 1. To pick the right kind of visualization, given your data type:
45 | https://www.data-to-viz.com/
46 | 2. Graph galleries w/ sample codes for R/python-newbies:
47 | [R Graph Gallery](https://www.r-graph-gallery.com/) | [Python Graph Gallery](https://python-graph-gallery.com/)
48 | 3. [ggplot extension gallery](https://exts.ggplot2.tidyverse.org/gallery/) | https://github.com/ggplot2-exts/gallery
49 |
50 | ### R for data science and machine learning
51 | * [Data Science Course in a Box](https://datasciencebox.org/) - Introductory data science course covering data acquisition and wrangling, exploratory data analysis, data visualization, inference, modeling, and effective communication of results (with tidyverse, R Markdown, and version control). The course also introduces interactive visualization and reporting, text analysis, and Bayesian inference.
52 | * [RStudio | The Essentials of Data Science](https://resources.rstudio.com/the-essentials-of-data-science)
53 | * [R for Reproducible Scientific Analysis](http://swcarpentry.github.io/r-novice-gapminder/)
54 |
55 | ### eBooks for R
56 | * R for Data Science | R4DS | Hadley Wickham, Garrett Grolemund |
57 | [eBook](https://r4ds.had.co.nz/)
58 | * Hands-On Programming with R | HOPR | Garrett Grolemund |
59 | [eBook](https://rstudio-education.github.io/hopr/)
60 | * Happy Git and GitHub for the useR | Jenny Bryan |
61 | [eBook](https://happygitwithr.com/)
62 | * [Learning Statistics with R](https://learningstatisticswithr.com/) | Danielle Navarro |
63 | [eBook](https://learningstatisticswithr.com/book/)
64 | * Computational Genomics with R | Altuna Akalin |
65 | [eBook](http://compgenomr.github.io/book/) | _Work in progress_
66 | * R Programming for Data Science | Roger Peng |
67 | [eBook](https://leanpub.com/rprogramming)
68 | * R Graphics Cookbook | Winston Chang |
69 | [eBook](https://r-graphics.org/)
70 |
71 | ## Python
72 |
73 | ### General introduction to Python
74 | * [Learning Python the Hard Way](https://learnpythonthehardway.org/book/)
75 | * [Google Python Class](https://developers.google.com/edu/python/)
76 | * [Videos to follow along](https://www.youtube.com/playlist?list=PLfZeRfzhgQzTMgwFVezQbnpc1ck0I6CQl)
77 | * Introduction to Interactive Programming in Python
78 | * [Part 1](https://www.coursera.org/learn/interactive-python-1)
79 | * [Part 2](https://www.coursera.org/learn/interactive-python-2)
80 |
81 | ### Python for data science and machine learning
82 | * Courses to learn introductory computer science, programming, computational thinking, and data science (video lectures + notes + assignments):
83 | * [Introduction to Computer Science and Programming in Python](https://ocw.mit.edu/courses/electrical-engineering-and-computer-science/6-0001-introduction-to-computer-science-and-programming-in-python-fall-2016/)
84 | * [Introduction to Computational Thinking and Data Science](https://ocw.mit.edu/courses/electrical-engineering-and-computer-science/6-0002-introduction-to-computational-thinking-and-data-science-fall-2016/)
85 | * [A Whirlwind Tour of Python](https://jakevdp.github.io/WhirlwindTourOfPython/): [PDF](http://www.oreilly.com/programming/free/files/a-whirlwind-tour-of-python.pdf) and [Jupyter Notebooks](https://github.com/jakevdp/WhirlwindTourOfPython)
86 | * [Scipy Lecture Notes](http://www.scipy-lectures.org/) – Awesome document to learn numerics, science, and data with Python
87 | * Data Wrangling:
88 | * [Data Wrangling in Python with Pandas - Kaggle](https://www.kaggle.com/learn/pandas)
89 | * [Video series on data analysis with Pandas](https://www.dataschool.io/easier-data-analysis-with-pandas/) – Excellent set of short videos
90 | * Visualization:
91 | * [Data Visualization with Python - Kaggle](https://www.kaggle.com/learn/data-visualisation)
92 | * [Python Plotting for Exploratory Data Analysis](http://pythonplot.com/)
93 | * Machine Learning:
94 | * [Introduction to ML in Python - Kaggle](https://www.kaggle.com/learn/machine-learning) (Checkout both Levels 1 & 2)
95 | * [Another intro to ML with scikit-learn](https://www.dataschool.io/machine-learning-with-scikit-learn/) – This one contains videos and accompanying JuPyter notebooks + blog posts.
96 | * [A Quick Demo to ML with Scikit Learn Python Package](https://github.com/mmmayo13/scikit-learn-classifiers/blob/master/sklearn-classifiers-tutorial.ipynb) – A nice demo+tour of scikit learn.
97 | * [Deep Learning with Python and TensorFlow - Kaggle](https://www.kaggle.com/learn/deep-learning)
98 | * [Embeddings with Python and TensorFlow - Kaggle](https://www.kaggle.com/learn/embeddings) – Build deep learning models that handle sparse categorical variables
99 | * [Machine Learning Explainability](https://www.kaggle.com/learn/machine-learning-explainability)
100 | * General mutli-topic resources:
101 | * [A Step-by-step Guide to Python for Data Science](http://www.dataschool.io/launch-your-data-science-career-with-python/)
102 | * Always checkout the latest PyCon Conference tutorials and talks, almost all of which are available online. [For e.g., here's a list from PyCon 2017](https://krishnanlab.slack.com/files/arjunkrish/F5MEK7GAK/Python_Videos_of_Interest_to_Lab).
103 |
104 | ### Probability and statistics
105 | * [Think Stats](https://greenteapress.com/wp/think-stats-2e/) (book + code + solutions; for Python programmers).
106 | * [Learning statistics with R](https://learningstatisticswithr.com/book/) (book + code + solutions; for R programmers).
107 | * [Points of Significance](https://www.nature.com/collections/qghhqm/pointsofsignificance) - an awesome collection of short articles on a variety of topics in statistical data analysis.
108 | * [OpenIntro to Probablity and Statistics](https://www.openintro.org/stat/textbook.php?stat_book=os)
109 |
110 | #### Statistical learning
111 | > A great resource (book + videos + slides + exercises + example code + solutions) for simultaneously learning both statistical learning and R. [_Statistical learning_ is just another term for _machine learning_ done from a slightly statistical-modeling point-of-view.]
112 | * An Introduction to Statistical Learning with Applications in R | Gareth James, Daniela Witten, Trevor Hastie and Robert Tibshirani
113 | http://www-bcf.usc.edu/~gareth/ISL/index.html
114 | * You can download the latest version of the book as a PDF on that site: http://www-bcf.usc.edu/~gareth/ISL/ISLR%20Seventh%20Printing.pdf
115 | * I would encourage watching these excellent course lecture videos (by the authors, who’re world-class scientists) that follow the book closely: http://www.dataschool.io/15-hours-of-expert-machine-learning-videos/
116 | * There are additional slides & videos from another good course taught based on this book: https://www.alsharif.info/iom530
117 |
118 | ## Biology
119 | * [Learn genetics](https://learn.genetics.utah.edu/)
120 | * [IBiology](https://www.ibiology.org/biology-videos/)
121 | * [DNA seen through the eyes of a coder](https://ds9a.nl/amazing-dna/) - If you have a computational/quantitaive background, you'll esp. love this!
122 |
--------------------------------------------------------------------------------
/_config.yml:
--------------------------------------------------------------------------------
1 | theme: jekyll-theme-midnight
2 | title: Computational Biology Gists
3 |
--------------------------------------------------------------------------------
/env/.bashrc:
--------------------------------------------------------------------------------
1 | # .bashrc
2 |
3 | ##########################
4 | #### .bashrc ####
5 | ##########################
6 |
7 | test -f /etc/profile.dos && . /etc/profile.dos
8 |
9 |
10 | # Some applications read the EDITOR variable to determine your favourite text
11 | # editor. So uncomment the line below and enter the editor of your choice :-)
12 | #export EDITOR=/usr/bin/vim
13 | #export EDITOR=/usr/bin/mcedit
14 |
15 |
16 | # add aliases if there is a .aliases file
17 | test -s ~/.alias && . ~/.alias
18 |
19 |
20 | # Source global definitions
21 | if [ -f /etc/bashrc ]; then
22 | . /etc/bashrc
23 | fi
24 |
25 | # Uncomment the following line if you don't like systemctl's auto-paging feature:
26 | # export SYSTEMD_PAGER=
27 |
28 | PS1='${PWD#"${PWD%/*/*}/"} \$ '
29 | `export PS1="$(basename $(dirname $PWD))/$(basename $PWD)"`
30 |
31 | eval "$(/opt/homebrew/bin/brew shellenv)"
32 | export gfortran=/usr/local/gfortran/bin
33 | export PATH=$PATH:$gfortran
34 | export PATH=$PATH:/opt/R/arm64/bin:/opt/R/arm64/bin/gfortran
35 | export PATH=$PATH:/opt/homebrew:/usr/local/bin:/opt/homebrew/sbin:/opt/homebrew/bin
36 |
37 | ## Connecting to servers
38 | ## For example...
39 | ## CU
40 | alias jrcu='ssh USERNAME@XYZLAB.ucdenver.pvt'
41 | ## MSU
42 | alias jrmsu='ssh -A USERNAME@compute.cvm.msu.edu -p 55411' #-i ~/.ssh/id_cvmcompute'
43 | alias mhpc='ssh USERNAME@hpcc.msu.edu'
44 |
45 | #####################
46 | ## General aliases ##
47 | #####################
48 | alias vi='vim'
49 | alias c='clear'
50 | alias e='exit'
51 |
52 | alias rm='rm -i'
53 | alias mv='mv -i'
54 | alias ls='ls -lthG --color=auto'
55 | alias ll='ls -lth --color=auto'
56 | alias llh='ls -lthG --color=auto | head'
57 | alias grep='grep --color=auto'
58 |
59 | ## git
60 | alias gs='git status '
61 | alias ga='git add '
62 | alias gaa='git add -A .'
63 | alias gb='git branch '
64 | alias gc='git commit'
65 | alias gcm='git commit -m'
66 | alias gd='git diff'
67 | alias go='git checkout '
68 | alias gk='gitk --all&'
69 | alias gx='gitx --all'
70 |
71 | alias got='git '
72 | alias get='git '
73 |
74 |
75 | # for vi; rest in .vimrc
76 | set noclobber
77 | set nowrap
78 | set number
79 | set syntax=on
80 | #export CLICOLOR=1
81 |
82 | # bind TAB:menu-complete
83 | bind TAB:complete
84 | # bind ESC:complete
85 |
86 | ## Local (Mac) aliases
87 | alias gh='cd /Users/USERNAME/GitHub'
88 | alias pastweek='find /Users/USERNAME -mtime -7'
89 | alias pastten='find /Users/USERNAME -mtime -10'
90 | alias pastmonth='find /Users/USERNAME -mtime -30'
91 | ### Other examples
92 | #alias molevolvr='cd /Users/USERNAME/GitHub/molevolvr'
93 | #alias amr='cd /Users/USERNAME/GitHub/amR'
94 | #alias microgenomer='cd /Users/USERNAME/GitHub/microgenomer'
95 | #alias drugrep='cd /Users/USERNAME/GitHub/drugrep'
96 |
97 | #####################
98 | #### MSU compute ####
99 | #####################
100 | # Compute.cvm.msu.edu
101 | alias usermsu='ssh -A USERNAME@compute.cvm.msu.edu -p 55411' #-i ~/.ssh/id_rsa_cvmcompute'
102 |
103 | ## MSU CVM compute server paths
104 | PATH=$PATH:/bin:/usr/bin:/home
105 | PATH=$PATH:/data/research/XYZLAB:/data/research/XYZLAB/common-data
106 | PATH=$PATH:/data/run/USERNAME:/data/scratch/USERNAME
107 | export PATH
108 |
109 | BLASTDB=/data/blastdb
110 | export BLASTDB
111 |
112 | BLASTMAT=/opt/software/BLAST/2.2.26/data
113 | export BLASTMAT
114 |
115 | INTERPRO=/opt/software/iprscan/5.47.82.0-Python3/data
116 | export INTERPRO
117 |
118 | ## Server aliases
119 | alias jrlab='cd /data/research/XYZLAB'
120 | alias cdata='cd /data/research/XYZLAB/common_data'
121 | alias shiny='cd /srv/shiny-server'
122 | alias logs='cd /var/log/shiny-server'
123 |
124 | #######################
125 | #### e.g., MSU HPC ####
126 | #######################
127 | ## MSU HPC PATHS
128 | PATH=/usr/bin:/bin:/usr/sbin:/sbin:/opt/software/:/mnt/research/XYZLAB/software
129 | #/mnt/research/XYZLAB/software/anaconda3/bin:/mnt/research/XYZLAB/software/anaconda3:
130 | PATH=$PATH:/mnt/research/common-data:mnt/research/common-data/Bio:mnt/research/common-data/Bio/blastdb:mnt/research/common-data/Bio/blastdb/v5
131 | PATH=$PATH:/mnt/home/johnj/software/modulefiles
132 | #PATH=$PATH:/mnt/research/XYZLAB/software/sanger-pathogens-Roary-225d24f/bin:/mnt/research/XYZLAB/software/perlmods
133 | export PATH=/usr/local/bin:$PATH
134 | export PATH=/Library/TeX/texbin:$PATH
135 | export PATH
136 |
137 | BLASTDB=/opt/software/:/mnt/research/XYZLAB/molevol/data:/mnt/research/common-data:mnt/research/common-data/Bio:mnt/research/common-data/Bio/blastdb:mnt/research/common-data/Bio/blastdb/v5
138 | export BLASTDB
139 |
140 | BLASTMAT=/mnt/research/XYZLAB:/mnt/research/XYZLAB/molevol/data:/mnt/research/XYZLAB/evolvr_hpc/data
141 | export BLASTMAT=/mnt/home/USERNAME:/mnt/home/USERNAME/testspace:$BLASTMAT
142 |
143 | export PKG_CONFIG_PATH=/usr/local/lib/pkgconfig:/usr/lib/pkgconfig:/opt/X11/lib/pkgconfig
144 |
145 | ## Google drive download | unused?
146 | function gdrive_download () {
147 | CONFIRM=$(wget --quiet --save-cookies /tmp/cookies.txt --keep-session-cookies --no-check-certificate "https://docs.google.com/uc?export=download&id=$1" -O- | sed -rn 's/.*confirm=([0-9A-Za-z_]+).*/\1\n/p')
148 | wget --load-cookies /tmp/cookies.txt "https://docs.google.com/uc?export=download&confirm=$CONFIRM&id=$1" -O $2
149 | rm -rf /tmp/cookies.txt
150 | }
151 |
--------------------------------------------------------------------------------
/env/README.md:
--------------------------------------------------------------------------------
1 | Houses environment setup snippets, incl.
2 | - `.bashrc`, `.vimrc`, `.bash_profile)
3 | - `renv`
4 | - ...
5 |
--------------------------------------------------------------------------------
/hpc/README.md:
--------------------------------------------------------------------------------
1 | Houses HPC, slurm/pbs, and related code snippets
2 |
--------------------------------------------------------------------------------
/hpc/af3_alpine_array-example.sh:
--------------------------------------------------------------------------------
1 | #!/usr/bin/env bash
2 |
3 | ###########################
4 | # Created by: Sarah Mentzer, JRaviLab (https://jravilab.github.io) | Summer 2025
5 | # In collaboration with: Abhirupa Ghosh, Janani Ravi (JRaviLab), with inputs from Kevin Fotso, CU Alpine
6 | # This script runs AlphaFold3 PPI predictions
7 | # It uses an array job to process 2 PPI IDs at a time until the 383 are done.
8 | #
9 | # The SLURM options request GPUs, set runtime limits, and handle job logging.
10 | # For each job, it:
11 | # - Loads the AlphaFold3 environment and sets paths
12 | # - Picks the next PPI ID and JSON input file from the input
13 | # - Runs the AlphaFold3 scoring command
14 | # - Logs how long it took and checks for success/failure based on
15 | # the presence of a AlphaFold3 summary_confidences.json ouput for the PPI ID
16 | ###########################
17 |
18 | #SBATCH --nodes=1
19 | #SBATCH --ntasks=10
20 | #SBATCH --gres=gpu
21 | #SBATCH --time=05:00:00
22 | #SBATCH --partition=aa100
23 | #SBATCH --qos=normal
24 | #SBATCH --output=logs/scoring_ppi-%j.out
25 | #SBATCH --error=logs/scoring_ppi-%j.err
26 | #SBATCH --array=1-383%2
27 |
28 | ### SET UP ENVIRONMENT ###
29 | # Set working directory
30 | cd /projects/$USER/ppi_network
31 |
32 | # Create a log to capture the success of each job
33 | TIMESTAMP=$(date '+%Y-%m-%d_%H-%M-%S')
34 | MAIN_LOG="$PWD/scripts/logs/run_main-remaining.log"
35 |
36 | echo "Main log file: $MAIN_LOG"
37 |
38 | ### SET UP ENVIRONMENT FOR GIVEN JOB ###
39 | # Capture start of given job
40 | JOB_START=$(date +%s)
41 |
42 | # Set path to parameters
43 | export AF3_MODEL_PARAMETERS_DIR=$PWD/data/alphafold3/parameters
44 | # Set path to AF3 directory
45 | export AF3_DIR=$PWD/results/alphafold3
46 | # Set path to input directory
47 | export AF3_INPUT_DIR=$AF3_DIR/json_inputs
48 | # Set path to input PPI ID list
49 | export AF3_INPUT_LIST=$AF3_DIR/scoring_ppi.tab
50 | # Set path to output
51 | export AF3_OUTPUT_DIR=$AF3_DIR/raw_outputs
52 |
53 | # Get the line from the .tab file corresponding to this task ID (skipping header)
54 | LINE=$(sed -n "$((SLURM_ARRAY_TASK_ID + 1))p" "$AF3_INPUT_LIST")
55 |
56 | # Extract PPI ID (col 1) and JSON path (col 4)
57 | PPI_ID=$(echo "$LINE" | awk -F'\t' '{print $1}')
58 | # Create a variable for the given JSON input
59 | AF3_INPUT_FILE=$(echo "$LINE" | awk -F'\t' '{print $4}')
60 |
61 | echo "[$(date)] START Job ${SLURM_JOB_ID}_${SLURM_ARRAY_TASK_ID} for $PPI_ID (input: $AF3_INPUT_FILE)" >> "$_LOG"
62 |
63 | # Load AlphaFold 3 module
64 | module purge
65 | module load alphafold/3.0.0
66 |
67 | # Set temp directory paths
68 | export TMPDIR=/scratch/alpine/$USER/tmp-alphafold
69 | mkdir -pv $TMPDIR
70 | export TEMP=$TMPDIR
71 | export TMP_DIR=$TMPDIR
72 | export TEMP_DIR=$TMPDIR
73 | export TEMPDIR=$TMPDIR
74 |
75 | ### BEGIN ALPHAFOLD SCORING ###
76 | run_alphafold --json_path=$AF3_INPUT_FILE --output_dir=$AF3_OUTPUT_DIR --model_dir=$AF3_MODEL_PARAMETERS_DIR
77 |
78 | # Capture end time of job
79 | JOB_END=$(date +%s)
80 | # Calculate the elapsed time of the job by subtracting start from end
81 | JOB_ELAPSED=$((JOB_END - JOB_START))
82 |
83 | # Format elapsed time into minutes and seconds
84 | JOB_ELAPSED_FMT=$(printf "%02dm%02ds" $((JOB_ELAPSED/60)) $((JOB_ELAPSED%60)))
85 |
86 | ### CHECK FOR JOB SUCCESS
87 | # Find the summary confidence file ignoring the case of ppi id
88 | PPI_OUTPUT_DIR=$(find "$AF3_OUTPUT_DIR" -maxdepth 1 -type d -iname "$PPI_ID" | head -n 1)
89 |
90 | # Construct the expected file path and check for it
91 | if [ -n "$PPI_OUTPUT_DIR" ]; then
92 | OUTPUT_FILE=$(find "$PPI_OUTPUT_DIR" -maxdepth 1 -type f -iname "${PPI_ID}_summary_confidences.json" | head -n 1)
93 | if [ -n "$OUTPUT_FILE" ]; then
94 | echo "[$(date '+%Y-%m-%d %H:%M:%S')] SUCCESS for $PPI_ID (Job ${SLURM_JOB_ID}_${SLURM_ARRAY_TASK_ID}) — Elapsed: $JOB_ELAPSED_FMT" >> "$_LOG"
95 | else
96 | echo "[$(date '+%Y-%m-%d %H:%M:%S')] FAILURE for $PPI_ID — File missing inside matching dir: ${PPI_ID}_summary_confidences.json" >> "$_LOG"
97 | fi
98 | else
99 | echo "[$(date '+%Y-%m-%d %H:%M:%S')] FAILURE for $PPI_ID — Output subdirectory not found under $AF3_OUTPUT_DIR" >> "$MAIN_LOG"
100 | fi
101 |
--------------------------------------------------------------------------------
/sequence-analysis/blastdb_list.md:
--------------------------------------------------------------------------------
1 | **Date Created**: April 17, 2019
2 | **Updated** by @jananiravi
3 |
4 | **Sources**:
5 | [ftp://ftp.ncbi.nlm.nih.gov/blast/db/](ftp://ftp.ncbi.nlm.nih.gov/blast/db/)
6 | https://www.ncbi.nlm.nih.gov/books/NBK62345/
7 |
8 |
9 | Result of `update_blastdb.pl --blastdb_version 5 --showall`
10 | ```
11 | nr_v5
12 | nt_v5
13 | pdb_v5
14 | refseq_rna_v5
15 | swissprot_v5
16 | taxdb
17 | ```
18 |
19 | Result of `update_blastdb.pl --passive --showall`
20 |
21 | ```
22 | 16SMicrobial
23 | cdd_delta
24 | env_nr
25 | env_nt
26 | est
27 | est_human
28 | est_human_blob
29 | est_mouse
30 | est_mouse_blob
31 | est_others
32 | gss
33 | gss_annot
34 | htgs
35 | human_genomic
36 | landmark
37 | nr
38 | nt
39 | other_genomic
40 | pataa
41 | patnt
42 | pdbaa
43 | pdbnt
44 | ref_prok_rep_genomes
45 | ref_viroids_rep_genomes
46 | ref_viruses_rep_genomes
47 | refseq_genomic
48 | refseq_protein
49 | refseq_rna
50 | refseqgene
51 | sts
52 | swissprot
53 | taxdb
54 | tsa_nr
55 | tsa_nt
56 | vector
57 | ```
58 |
--------------------------------------------------------------------------------
/sequence-analysis/update_blastdb.sh:
--------------------------------------------------------------------------------
1 | ## Date Created: April 17, 2019
2 | ## Updated by: Janani Ravi
3 | ## Purpose: To update to BLASTDB v5 (& likely all BLASTDBs)
4 | ## The BLASTDB (all, v5) files available as of April 17, 2019:
5 | ## https://github.com/jananiravi/compbio-gists/blob/master/sequence-analysis/blastdb_list.md
6 |
7 | ## This chunk below would work ONLY on MSU's HPC
8 | # Start w/ ssh username@hpcc.msu.edu | for users w/ write access to blastdb or subfolders
9 | cd /mnt/research/common-data/Bio/blastdb
10 | module purge
11 | # Loading Blast+; https://wiki.hpcc.msu.edu/pages/viewpage.action?pageId=11896703
12 | module load icc/2017.4.196-GCC-6.4.0-2.28 impi/2017.3.196 BLAST+/2.8.1-Python-2.7.14
13 |
14 | ## Checking latest updates for version 5
15 | # https://ftp.ncbi.nlm.nih.gov/blast/db/v5/blastdbv5.pdf
16 | printf "\nHere are the BLASTDB v5 files to be downloaded:\n"
17 | update_blastdb.pl --blastdb_version 5 --showall
18 | update_blastdb.pl --blastdb_version 5 --showall > blastdb_v5_update_list.txt
19 |
20 | # Downloading the updates for v5
21 | printf "\nSTARTING the downloads of BLASTDB v5 databases\n"
22 | while read f; do
23 | printf "\nDownloading $f...\n"
24 | # update_blastdb.pl --blastdb_version 5 $f
25 | update_blastdb.pl --blastdb_version 5 $f --decompress
26 | done blastdb_update_list.txt
32 |
33 | # Downloading and decompressing all BLASTDB files
34 | printf "\nDownloading ALL the latest BLASTDBs\n"
35 | while read f; do
36 | printf "\nDownloading $f\n"
37 | update_blastdb.pl $f --passive --decompress
38 | done