├── 2013 ├── 0903-bacterial_transcriptome_analysis │ └── README.md ├── 0903-knitr_reproducible_research │ ├── README.html │ ├── README.md │ ├── README.pdf │ ├── README.rmd │ ├── figure │ │ └── plot_example.png │ └── images │ │ └── 0.jpg ├── 0910-de_novo_transcriptome_assembly │ └── README.md ├── 0917-genome_browsers │ ├── BYOB_genome_browsers.pdf │ └── README.md ├── 0924-RNAseq-SNPs │ ├── FST_example │ │ ├── FST_from_transcript_pileups_count_fixsites_rmtri.pl │ │ ├── drive.shared.ex.pu │ │ ├── nondrive.shared.ex.pu │ │ ├── out.raw.txt │ │ └── out.summary.txt │ ├── README.md │ ├── R_example │ │ ├── Acounts.txt │ │ ├── Adata.txt.gz │ │ ├── Rcode_for_fixdiffsim_Aug27.R │ │ ├── Xcounts.txt │ │ ├── Xdata.txt.gz │ │ └── transcrlens.txt │ └── figure │ │ ├── WGS_FST.pdf │ │ ├── fsteq.png │ │ ├── simulation3.jpg │ │ ├── table2.png │ │ └── table3.png ├── 1001-molecular_evolution_using_PAML │ ├── PAML_BYOB.pdf │ └── README.md ├── 1008-lincRNA_ID_from_RNA-Seq │ ├── BYOB.fasta │ ├── BYOB_10-8-13.pptx │ ├── README.md │ ├── RNAcode_BYOB.sh │ ├── TCONS_00000047_4species.aln │ ├── TCONS_00001351_4species.aln │ ├── TCONS_00001358_4species.aln │ ├── TCONS_00001369_4species.aln │ ├── cpat.py │ ├── fly_Hexame.tab │ └── fly_train.RData ├── 1015-PBS │ ├── 01-setting_up │ │ ├── dotbash_aliases │ │ ├── dotbash_profile │ │ ├── dotbashrc │ │ ├── dotscreenrc │ │ ├── ssh_keygen.log │ │ └── ssh_keygen.time │ ├── 02-first_submission │ │ ├── 01-first_submission.log │ │ ├── 01-first_submission.time │ │ ├── 02-first_submission.log │ │ ├── 02-first_submission.time │ │ ├── 03-first_submission.log │ │ ├── 03-first_submission.time │ │ ├── first_submission-v2.sh │ │ ├── first_submission-v3.sh │ │ ├── first_submission.sh │ │ ├── first_submission_v2.e46373 │ │ ├── first_submission_v2.o46373 │ │ ├── submissionv2.e46372 │ │ └── submissionv2.o46372 │ ├── 03-real_examples │ │ ├── 01-wdarocha_bwa2.sh │ │ ├── 02-tophat.sh │ │ ├── 03-rRNA_filter.sh │ │ ├── 04-CountTable_qsub_WDR.sh │ │ ├── 05-align_concat.sh │ │ ├── 06-tnseq_align.sh │ │ ├── 07-split_align.sh │ │ ├── 08-read_fasta.pl │ │ ├── 08-split_align-fasta.pl │ │ ├── 09-read_blast.pl │ │ └── 09-split_align-blast.pl │ ├── 04-gotcha │ │ ├── 46366.cbcbtorque.umiacs.umd.edu.ER │ │ └── 46366.cbcbtorque.umiacs.umd.edu.OU │ ├── 05-advanced │ │ ├── 01 │ │ ├── queue.sh │ │ ├── render.py │ │ └── submit.sh │ └── README.md ├── 1022-quality_assurance_wgs_rnaseq │ └── README.md ├── 1029-RNA-Seq_normalization_assumptions │ └── rnaseq_normalization_assumptions.pptx ├── 1105-batch_effects_in_RNA-Seq │ └── BYOB_batch_20131105_v1.pptx ├── 1203_piplining_with_python │ ├── README.md │ ├── example1_sys.py │ ├── pipeline.png │ ├── presentation.html │ ├── remark-0.5.9.min.js │ ├── ruffus_example.py │ ├── ruffus_example2.py │ ├── ruffus_example_utils.py │ └── styles.css └── README.md ├── 2014 ├── 0128-into_to_git │ └── README.md ├── 0218-rna-seq_normalization_issues │ └── BYOB-14b18-Sailfish-final.pdf ├── 0311-git_with_confidence │ ├── LICENSE │ ├── README.md │ ├── img │ │ ├── git_after_ff_merge.png │ │ ├── git_after_rebase.png │ │ ├── git_after_recursive_merge.png │ │ ├── git_before_ff_merge.png │ │ ├── git_before_rebase.png │ │ ├── git_before_recursive_merge.png │ │ ├── git_branch.png │ │ ├── git_commit.png │ │ ├── git_commit_parents.png │ │ ├── git_tree.png │ │ └── git_workflow.png │ └── references │ │ └── git.from.bottom.up.pdf ├── 0408-sequence-clustering │ ├── 2014-04-08_BYOB_Sequence_Clustering.pdf │ └── README.md ├── 0422-functional_variants_snpeff_snpsift │ ├── BYOB_SnpEff_SnpSift.pptx │ └── README.md ├── 0429-tools-for-cnv-detection │ ├── BYOB_April29.pptx │ ├── DND.svdet.cnv.conf │ ├── README.md │ ├── bash_scripts │ │ ├── 140424_CNVnator_all1.sh │ │ ├── 140424_CNVnator_extract1.sh │ │ ├── 140424_CNVnator_extract2.sh │ │ ├── 140424_breakdancer.sh │ │ ├── 140425_Hydra_novoalign1.sh │ │ ├── 140425_Hydra_novoalign2.sh │ │ ├── 140425_Hydra_samtools1_try2.sh │ │ ├── 140425_Hydra_samtools2_try2.sh │ │ ├── 140425_SVdetect_preprocess1.sh │ │ ├── 140425_SVdetect_preprocess2.sh │ │ ├── 140425_picard_ins_stats1.sh │ │ ├── 140425_picard_ins_stats2.sh │ │ ├── 140425_svdetect_DND_script.sh │ │ ├── 140427_CNVnator_histstatpartcall1.sh │ │ ├── 140427_CNVnator_histstatpartcall2.sh │ │ ├── 140428_svdetect_componly.sh │ │ ├── drive.sv.conf │ │ └── nondrive.sv.conf │ ├── install │ │ ├── 2014_April24 │ │ ├── 2014_April25 │ │ ├── breakdancer_README │ │ └── svdetect_README │ └── results │ │ ├── Breakdancer_SVout.xlsx │ │ ├── SV_CNV_hist.jpg │ │ ├── SV_CNV_hist.pdf │ │ ├── SV_detect_CNVout.xlsx │ │ ├── SV_detect_SVout.xlsx │ │ └── SV_detect_details.xlsx ├── 0506-r-graphics │ ├── Graphics.png │ ├── README.md │ ├── cool.jpg │ ├── example.r │ ├── hiveF1.jpg │ ├── r_graphics_byob.Rnw │ ├── r_graphics_byob.pdf │ └── tsi_fig.pdf ├── 0908-unix_tools │ ├── README.md │ ├── test.sam.gz │ └── test_insert.sam.gz ├── 0916-ncgas-and-sda │ └── BYOB_NCGAS_SDA.pptx ├── 0923-conserved-protein-domains │ └── BYOB_presentation_Thomas_Peterson_09-23-14.pptx ├── 0930-extreme-motif-detection │ ├── README.md │ ├── images │ │ ├── Trypanosoma_parasiteblood_cells_ger.jpg │ │ ├── cluster_motif_profiles.png │ │ ├── clustering_example.png │ │ ├── embj7594407-fig-0002-m.jpg │ │ ├── example_motif.png │ │ └── motivation_1.png │ └── input │ │ └── cluster1.fasta ├── 1007-phylotranscriptomics-using-orthograph │ ├── BYOB_2014-10-07-files.tar │ └── BYOB_2014-10-07.pptx ├── 1021-dont-fear-the-reapr │ └── BYOB_10_21_14_Reapr.pdf └── README.md ├── 2015 ├── 0210-RNA-Seq_expression_threshold │ ├── BYOB_2-10-15.pptx │ └── README.md ├── 0224-Provean-mutation-effects │ ├── PROVEAN_BYOB.pptx │ └── README.md ├── 0303-shiny-interactive-data │ ├── README.md │ ├── demo1 │ │ ├── README.md │ │ ├── server.R │ │ └── ui.R │ ├── demo2 │ │ ├── README.md │ │ ├── server.R │ │ └── ui.R │ └── images │ │ ├── 1280px-Iris_versicolor_3.jpg │ │ ├── Fission_yeast.jpg │ │ ├── demo1.png │ │ ├── demo2.png │ │ └── shiny-hello-world.png ├── 0310-Hombrew-at-BYOB │ ├── README.md │ ├── homebrew_install.png │ └── unix_timeline.png ├── 0324-RepeatMasker-RepeatModeler │ └── README.md ├── 0331-Amplicon-sequencing-microbial-communities │ ├── Allard_BYOB.pptx │ └── README.md ├── 0414-Annovar-annotate-genetic-variants │ ├── ANNOVAR_Tutorial_BYOB_04-13-15_Peterson.pdf │ └── README.md ├── 0421-DELLY-annotate-CNVs │ ├── 150216_bwamem.sh │ ├── 150217_delly.sh │ ├── 150217_dellydup.sh │ ├── 150217_dellyinv.sh │ ├── CNV_parse_dellyVCF_PASSVAR.pl │ ├── DELLY_BYOB.pptx │ ├── DELLY_stdout.txt │ ├── Delly_results_sm.xlsx │ └── README.md ├── 0910-reproducible-manuscripts-with-rmarkdown │ ├── README.md │ ├── examples │ │ ├── 01-simple-markdown-document.md │ │ ├── 01-simple-markdown-document.pdf │ │ ├── 02-markdown-and-latex.md │ │ ├── 02-markdown-and-latex.pdf │ │ ├── 03-including-a-bibliography.md │ │ ├── 03-including-a-bibliography.pdf │ │ ├── 04-rmarkdown-manuscript-example.Rmd │ │ ├── 04-rmarkdown-manuscript-example.bib │ │ ├── 04-rmarkdown-manuscript-example.pdf │ │ ├── images │ │ │ ├── iris_svm.png │ │ │ ├── placeholder.png │ │ │ └── ponyo.png │ │ ├── nucleic-acids-research.csl │ │ └── references.bib │ └── images │ │ └── markdown_pdf_example.png ├── 0924-trees-with-ete2 │ ├── Basics of tree manipulation and visualization in Python using ETE2.ipynb │ ├── README.md │ └── named.fasttree ├── 1008-snakemake │ ├── README.md │ ├── Snakefile │ ├── prepare-environment.sh │ ├── rnaseq-dag.png │ ├── serve-slides.sh │ ├── slides.html │ ├── slides2.html │ └── source.markdown └── 1022-allele-specific-expression │ ├── 1022-AlleleSpecificExpression.pptx │ └── README.md ├── 2016 ├── 0301-kallisto-ASE │ ├── 0301-kallisto-ASE.pptx │ └── README.md ├── 0315-labnote │ ├── README.md │ └── example_notebook.png ├── 0412-chromonomer │ └── BYOB_Chromonomer_April-2016.pptx └── 0420-hpgltools │ └── index.html ├── .gitignore ├── .gitmodules └── README.md /.gitignore: -------------------------------------------------------------------------------- 1 | *.DS_Store 2 | [Tt]humbs.db 3 | -------------------------------------------------------------------------------- /.gitmodules: -------------------------------------------------------------------------------- 1 | [submodule "2013/0903-introduction"] 2 | path = 2013/0903-introduction 3 | url = https://github.com/khughitt/slidify-byob-intro 4 | -------------------------------------------------------------------------------- /2013/0903-bacterial_transcriptome_analysis/README.md: -------------------------------------------------------------------------------- 1 | Bacterial Transcriptomics 2 | ========================= 3 | 4 | An overview of handling of data received from sequencing facilities, from quality control and trimming to alignment of reads and parsing of alignments into data files for analysis. 5 | 6 | The presentation as it was given at the first meeting is available on the [Prezi website](http://prezi.com/mtiw5z9i0gfu/copy-of-analysis-of-bacterial-transcriptome-data/). 7 | 8 | I may update this presentation in the future with more or newer information. This will also be [available](http://prezi.com/mtiw5z9i0gfu/copy-of-analysis-of-bacterial-transcriptome-data/). 9 | 10 | 11 | 12 | -------------------------------------------------------------------------------- /2013/0903-knitr_reproducible_research/README.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/umd-byob/presentations/23ad5271b4fcd24c6c3ebc94eb42e1b9eb19bd44/2013/0903-knitr_reproducible_research/README.pdf -------------------------------------------------------------------------------- /2013/0903-knitr_reproducible_research/figure/plot_example.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/umd-byob/presentations/23ad5271b4fcd24c6c3ebc94eb42e1b9eb19bd44/2013/0903-knitr_reproducible_research/figure/plot_example.png -------------------------------------------------------------------------------- /2013/0903-knitr_reproducible_research/images/0.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/umd-byob/presentations/23ad5271b4fcd24c6c3ebc94eb42e1b9eb19bd44/2013/0903-knitr_reproducible_research/images/0.jpg -------------------------------------------------------------------------------- /2013/0917-genome_browsers/BYOB_genome_browsers.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/umd-byob/presentations/23ad5271b4fcd24c6c3ebc94eb42e1b9eb19bd44/2013/0917-genome_browsers/BYOB_genome_browsers.pdf -------------------------------------------------------------------------------- /2013/0917-genome_browsers/README.md: -------------------------------------------------------------------------------- 1 | A (not so) Complete Guide to Genome Browsers 2 | ============================================ 3 | 4 | A brief overview of various software and web-based genome browsers. Discussion 5 | includes GBrowse, JBrowse, IGV, and Savant. 6 | 7 | -------------------------------------------------------------------------------- /2013/0924-RNAseq-SNPs/FST_example/out.raw.txt: -------------------------------------------------------------------------------- 1 | namef posf seq1 seq2 ptseq pit piw fst type 2 | comp115911_c0_seq1 66 CCCCCCCCCCcCT CCCCCCCCCCCCCCCCcTT CCCCCCCCCCcCTCCCCCCCCCCCCCCCCcTT 0.15645371577575 0.202200416294975 0.176877202493901 0.18361581920904 -0.0380977119726387 BOTH no 0 0 0 0 0 3 | comp115911_c0_seq1 71 AAAAAAAAAAaAAAa GGGGGGGGGGGGGGGGgGGG AAAAAAAAAAaAAAaGGGGGGGGGGGGGGGGgGGG 0 0 0.508438669585481 0 1 FIX no 1 1 0 0 1 4 | comp115911_c0_seq1 116 AAaAAAAAAAAA TaAAAAAAAAaT AAaAAAAAAAAATaAAAAAAAAaT 0 0.308166409861325 0.160759956156376 0.154083204930663 0.0415324275108557 POLY2 no 1 1 0 0 1 5 | comp115911_c0_seq1 168 CcCCccCCcC CCcCcCcCCTT CcCCccCCcCCCcCcCcCCTT 0 0.332819722650231 0.182472989195678 0.174334140435835 0.0446030330062441 POLY2 no 1 1 0 1 1 6 | comp115911_c0_seq1 207 AAaAAaaaAaaaaaaaaaa gGAGGGGGGGAGaGgggggggGag AAaAAaaaAaaaaaaaaaagGAGGGGGGGAGaGgggggggGag 0 0.294767870302137 0.513693849632876 0.1645216020291 0.679728300919548 POLY2 no 1 1 0 2 1 7 | comp115911_c0_seq1 219 AaaAaaaaaaaaaaAaAT AAAAAAAATAAAaAaaaaaaaAaaaaaAa AaaAaaaaaaaaaaAaATAAAAAAAATAAAaAaaaaaaaAaaaaaAa 0.112994350282486 0.0701344243132671 0.083955876522672 0.0865488640461594 -0.030885122410546 BOTH no 1 1 0 3 1 8 | comp115911_c0_seq1 233 AaaaaaaaaaaAaAAaaA AAaAaaaataaAaaaaaAaAaaT AaaaaaaaaaaAaAAaaAAAaAaaaataaAaaaaaAaAaaT 0 0.168821598445769 0.0959212953474072 0.0947047991281146 0.0126822330212158 POLY2 yes 1 1 0 3 2 9 | comp115911_c0_seq1 234 GggggggggggGgGGggGT GGgGggtggggGgggggGgGggGG GggggggggggGgGGggGTGGgGggtggggGgggggGgGggGG 0.107047279214987 0.0847457627118644 0.0915715123258606 0.0945999211667324 -0.0330715171558495 BOTH yes 1 1 0 4 2 10 | comp115911_c0_seq1 237 CccccccccccCcCCccCC GgGgggggggGgggggGgGacaGGT CccccccccccCcCCccCCGgGgggggggGgggggGgGacaGGT 0 0 NA NV 11 | comp115911_c0_seq1 249 ggggGgGGggGGGGG aggggGgggggGgGggaGGGGGg ggggGgGGggGGGGGaggggGgggggGgGggaGGGGGg 0 0.168821598445769 0.103278864888772 0.102181493796124 0.0106253210066768 POLY2 no 1 1 0 4 3 12 | comp115911_c0_seq1 259 AaAaaAAAAAaaaaAaaaa aaaAaAaaaAAAAAaAAaAAATTa AaAaaAAAAAaaaaAaaaaaaaAaAaaaAAAAAaAAaAAATTa 0 0.162122328666175 0.0915715123258606 0.0904868811160049 0.01184463576397 POLY2 no 1 1 0 5 3 13 | comp115911_c0_seq1 275 cCCCCCccccCccccccccCCCCCCCc TTTTTtTTtTTTTTtTtttttttttttttTTT cCCCCCccccCccccccccCCCCCCCcTTTTTtTTtTTTTTtTtttttttttttttTTT 0 0 0.509211282408931 0 1 FIX no 2 2 0 6 3 14 | comp115911_c0_seq1 307 aaaCccccccCCCCCCCcCCC AAAAAaAaaaaaaaaaaaaaAAAAAA aaaCccccccCCCCCCCcCCCAAAAAaAaaaaaaaaaaaaaAAAAAA 0.261501210653753 0 0.486944083831498 0.116840966462315 0.760052600818235 POLY1 no 2 2 0 6 3 15 | comp115911_c0_seq1 314 tttTTTTTTTtTTT aatttttttttttTTTTTT tttTTTTTTTtTTTaatttttttttttTTTTTT 0 0.202200416294975 0.118411000763942 0.116418421503167 0.016827653240994 POLY2 no 2 2 1 6 3 16 | comp115911_c0_seq1 320 cCCCCCCCcCCC aacccccCCCCCCcc cCCCCCCCcCCCaacccccCCCCCCcc 0 0.25181598062954 0.143647202470732 0.139897767016411 0.0261016949152542 POLY2 no 2 2 1 7 3 17 | comp115911_c0_seq1 412 aaaaaaaaaaa gggggggggggggg aaaaaaaaaaagggggggggggggg 0 0 0.517647058823529 0 1 FIX no 3 3 1 8 3 18 | -------------------------------------------------------------------------------- /2013/0924-RNAseq-SNPs/FST_example/out.summary.txt: -------------------------------------------------------------------------------- 1 | /Users/jerrywilkinson/presentations/2013/0924-RNAseq-SNPs/example/drive.shared.ex.pu.txt /Users/jerrywilkinson/presentations/2013/0924-RNAseq-SNPs/example/nondrive.shared.ex.pu.txt 10 60 60 /Users/jerrywilkinson/presentations/2013/0924-RNAseq-SNPs/example/out.raw.txt 2 | name cnt totcnt mean median stdev fixdiffs fixdiffs2 poly1 poly2 both 3 | comp115911_c0_seq1 12 267 0.383666491683583 0.0430677302585499 0.455112740313141 3 3 1 8 3 4 | -------------------------------------------------------------------------------- /2013/0924-RNAseq-SNPs/R_example/Adata.txt.gz: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/umd-byob/presentations/23ad5271b4fcd24c6c3ebc94eb42e1b9eb19bd44/2013/0924-RNAseq-SNPs/R_example/Adata.txt.gz -------------------------------------------------------------------------------- /2013/0924-RNAseq-SNPs/R_example/Rcode_for_fixdiffsim_Aug27.R: -------------------------------------------------------------------------------- 1 | # performs 10000 draws from the binomial distribution for each transcript, given 2 | # the lengths of the transcripts (transcrlens input) 3 | 4 | # create an empty matrix with a row for each transcript and 10000 columns for the 5 | # trials. One for the Autosomes and one for the X 6 | 7 | setwd("yourworkingdir") 8 | 9 | infile <- read.table("transcrlens.txt", header = FALSE) 10 | transcrlens <- infile$V1 11 | 12 | outmatA <- matrix(data=0,nrow=1163,ncol=10000) 13 | outmat <- matrix(data=0,nrow=1163,ncol=10000) 14 | 15 | j=0 16 | # 0.000707563 is the per basepair rate of fixed differences on X linked genes 17 | for (i in transcrlens){ 18 | outmat[j,] <- rbinom(10000,i,0.000707563) 19 | j <- j+1 20 | } 21 | # 0.000707563 is the per basepair rate of fixed differences on Autosomal genes 22 | 23 | for (i in transcrlens){ 24 | outmatA[j,] <- rbinom(10000,i,1.75518E-06) 25 | j <- j+1 26 | } 27 | 28 | # create arrays that will average across the trials for each gene 29 | 30 | Xavgs <- seq(length=10000,from=0,by=0) 31 | 32 | for (i in 1:1218) { 33 | Xavgs[i] <- mean(outmat[i,]) 34 | } 35 | 36 | Aavgs <- seq(length=1218,from=0,by=0) 37 | 38 | for (i in 1:1218) { 39 | Aavgs[i] <- mean(outmatA[i,]) 40 | } 41 | 42 | # create matrixes that will hold count data - we are deriving the number of 43 | # times that 0, 1, 2, 3, etc fixed differences were found in a single gene 44 | # The ncol = 17 was set because looking at the resutls of my X simulation, there 45 | # was one trial where 17 fixed differences fell into a single gene. In the A 46 | # simulation the highest was 3. 47 | 48 | countsmat <- matrix(data=0,nrow=10000,ncol=17) 49 | countsmat[1,] 50 | countsmatA <- matrix(data=0,nrow=10000,ncol=3) 51 | 52 | #example... use the hist function to find the distribution of counts for each 53 | # of the 10000 simulations... then put those into the output matrix 54 | 55 | histtemp <- hist(outmat[,3412],seq(0,max(outmat)+1)) 56 | histtemp$counts 57 | 58 | 59 | for (i in 1:10000) { 60 | histtemp <- hist(outmat[,i],seq(-0.5,max(outmat)+1,1),plot = FALSE) 61 | countsmat[i,] <- histtemp$counts 62 | } 63 | 64 | for (i in 1:10000) { 65 | histtemp <- hist(outmatA[,i],seq(-0.5,max(outmatA)+1,1),plot = FALSE) 66 | countsmatA[i,] <- histtemp$counts 67 | } 68 | 69 | write.table(outmat,file="Xdata.txt") 70 | write.table(outmatA,file="Adata.txt") # all of the raw data 71 | 72 | write.table(countsmat,file="Xcounts.txt",sep="\t") 73 | write.table(countsmatA,file="Acounts.txt",sep="\t") # the count data -------------------------------------------------------------------------------- /2013/0924-RNAseq-SNPs/R_example/Xdata.txt.gz: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/umd-byob/presentations/23ad5271b4fcd24c6c3ebc94eb42e1b9eb19bd44/2013/0924-RNAseq-SNPs/R_example/Xdata.txt.gz -------------------------------------------------------------------------------- /2013/0924-RNAseq-SNPs/figure/WGS_FST.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/umd-byob/presentations/23ad5271b4fcd24c6c3ebc94eb42e1b9eb19bd44/2013/0924-RNAseq-SNPs/figure/WGS_FST.pdf -------------------------------------------------------------------------------- /2013/0924-RNAseq-SNPs/figure/fsteq.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/umd-byob/presentations/23ad5271b4fcd24c6c3ebc94eb42e1b9eb19bd44/2013/0924-RNAseq-SNPs/figure/fsteq.png -------------------------------------------------------------------------------- /2013/0924-RNAseq-SNPs/figure/simulation3.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/umd-byob/presentations/23ad5271b4fcd24c6c3ebc94eb42e1b9eb19bd44/2013/0924-RNAseq-SNPs/figure/simulation3.jpg -------------------------------------------------------------------------------- /2013/0924-RNAseq-SNPs/figure/table2.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/umd-byob/presentations/23ad5271b4fcd24c6c3ebc94eb42e1b9eb19bd44/2013/0924-RNAseq-SNPs/figure/table2.png -------------------------------------------------------------------------------- /2013/0924-RNAseq-SNPs/figure/table3.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/umd-byob/presentations/23ad5271b4fcd24c6c3ebc94eb42e1b9eb19bd44/2013/0924-RNAseq-SNPs/figure/table3.png -------------------------------------------------------------------------------- /2013/1001-molecular_evolution_using_PAML/PAML_BYOB.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/umd-byob/presentations/23ad5271b4fcd24c6c3ebc94eb42e1b9eb19bd44/2013/1001-molecular_evolution_using_PAML/PAML_BYOB.pdf -------------------------------------------------------------------------------- /2013/1001-molecular_evolution_using_PAML/README.md: -------------------------------------------------------------------------------- 1 | Molecular Evolutionary Analyses using PAML 2 | ============================================= 3 | An overview of the types of analyses that can be conducted using the molecular evolutionary program, PAML. This ppt includes molecular evolution review and the types of statistical models that PAML uses. -------------------------------------------------------------------------------- /2013/1008-lincRNA_ID_from_RNA-Seq/BYOB.fasta: -------------------------------------------------------------------------------- 1 | >TCONS_00001351 gene=Dpse\GA27082 2 | ATGGGACAAGCGACGGCATTTGTGGTTCTCGCACTCTGTGCTTTGGCTCTGACCGAGGCAGCGGTGTACA 3 | TTGGCGGCGGTTGCTATGACTGTAACCCCCCCGGCGGCCAGGGACCCGGCATCTATACAGGCGGCGGACG 4 | AGGAGGTGGCGGTGCACCCAACAACTACTATAATCAAGGAGGTGGCCGTGGGGGTGGAGGCGGCAGTGGT 5 | CGTCCAGTTTACTCCGGAAACTTTGGTCCAAATGGATATAGCGGCGGTGGCGGCGGTCGAGGAGGTGGTG 6 | GGGGCTATAGGAATGGAGGAGGCGGCGGCGGTTATGACGATGGTGGTCTGACGCAGATTATACGATATGA 7 | TTAA 8 | >TCONS_00001358 gene=Dpse\GA12297 9 | ATGTGTGGCATTTTCGCGTACCTCAACTACCTGACGCCCAAGTCGCGCCAGCAGGTGCTGGAGCTCCTGC 10 | TGCAGGGCTTGAAACGTCTGGAGTATCGCGGCTACGACTCCACGGGCGTGGCCATTGATGGGCCCAGCGC 11 | GGGCGACGACATCCTTCTGGTGAAACGCACCGGCAAGGTCAAGGTGCTGGAGGATGCCATAGCGGAGGTG 12 | TGTCGCGGCGAGGGCTACAGCCAGCCCGTGGACATCCATATTGGCATCGCGCACACGCGCTGGGCCACGC 13 | ACGGCGTTCCCTCGGAACTGAATTCGCATCCCCAGCGATCGGATGTGGAGAACAGCTTTGTGGTGGTCCA 14 | CAATGGCATCATCACCAACTACAAGGACGTGAAGACGCTGCTGGAGAAGCGCGGCTATGTCTTCGAGTCG 15 | GAAACGGACACGGAGGTGATTGCCAAGCTCGTGCATCACCTCTGGCAAACGCACCCCGGCTACACCTTTG 16 | GCGAGCTGGTGGAGCAGGCCATTCAGCAGCTGGAGGGCGCCTTTGCCATTGCCTTCAAGTCGAAGCACTT 17 | TCCCGGCGAGTGCGTCGCCTCGCGTCGCGGCTCTCCGCTTCTGGTGGGCATCAAGGCCAAGACAAAGCTG 18 | GCCACGGACCATGTGCCCATCCTGTACGCCAAGGCCCATCGTCCGAATGGCCTGCCGTTCCCAGTGCTGC 19 | ACCCTGGCGGCGATGGCACTGCCGAATTCCAGCCCCTGGAGCACAAAGAGGTGGAGTACTTCTTTGCCTC 20 | GGACGCATCGGCGGTGATTGAGCACACGAATCGGGTCATCTACCTGGAGGACGATGACGTGGCCGCCGTC 21 | AAGCGGGACGGTACCCTGAGCATCCACCGGCTGAACAAATCGTCGGACGATCCGCATATACGGGAGATCA 22 | TCACTCTGAAGATGGAGATCCAGCAGATCATGAAGGGCAACTACGACTACTTTATGCTCAAGGAGATCTT 23 | CGAGCAGCCCGAGTCGGTGGTCAACACGATGCGCGGACGAGTACGCTTCGACACGCAGACCATTGTGCTG 24 | GGCGGAATCAAGGAGTACATCCCAGAGATCAAGCGGTGTCGCCGCCTGATGCTGATTGCCTGCGGCACCT 25 | CCTATCACAGTGCCGTGGCCACCCGACAGCTGCTGGAGGAGCTCACCGAACTGCCCGTCATGGTGGAACT 26 | GGCCTCCGACTTCCTCGACCGCAACACGCCCATCTTCCGCGACGACGTGTGCTTCTTCATCTCGCAGTCT 27 | GGCGAGACGGCCGACACACTGATGTCGCTGCGGTACTGCAAGCAGCGGGGAGCCCTGATTGTCGGGATCA 28 | CCAACACCGTGGGCAGCAGCATCTGCCGGGAGTCAAACTGCGGCGTGCACATCAATGCAGGGCCGGAGAT 29 | TGGGGTGGCCTCCACCAAGGCCTACACCTCGCAGTTCATTTCGCTGGTGATGTTCGCTTTGGTGATGTCC 30 | GAGGACCGACTGTCGCTGCTACAGCGGCGGCAGGAGATCATCGCGGGACTGTCGCAGCTGGACGAGCACA 31 | TCCGATCGGTGCTGAAGCTGAACTCGCAGGTCCAGGAGCTGGCCAAGGAGCTGTACAAGCACAAGTCGCT 32 | GCTGATCATGGGCAGGGGCTTCAATTTCGCCACCTGCCTGGAGGGTGCACTCAAGGTCAAGGAGCTGACG 33 | TACATGCACAGCGAGGGTATACTCGCGGGGGAGCTGAAGCACGGACCCCTTGCTCTGGTCGACGATGAGA 34 | TGCCAGTGCTGATGATCGTCACGCGGGATCCCGTCTACACAAAGTGTATGAATGCCCTGCAACAGGTGAC 35 | CTCGCGGAAGGGGCGACCCATCCTGATCTGCGAGGAGAACGACACCGAGACCATGTCCTTCTCCACGCGT 36 | TCGTTGCAGATTCCCCGCACGGTGGACTGCCTGCAGGGCATCCTTACCGTCATACCGCTGCAGCTCCTCT 37 | CCTACCACATTGCCGTGCTGCGCGGCTGCGATGTGGACTGTCCCCGGAACCTGGCCAAGTCGGTGACCGT 38 | GGAGTAG 39 | >TCONS_00000047 gene=Dpse\RNaseP:RNA:GA29345 40 | CTAGCATCTGGGGCACACACAACGAGTATCTGATTACTCTTACTTTGCCCCGGGAAGGTCTGAGATTTGG 41 | CCAAGCCTCCTTGTGTTGTTACAACATGCAGTGCATCGAGAGGTGCCTGTGTGTCGGAGGCTTAATTTCC 42 | TAGCAGGAAACCTGTGTGATTGCAGGACGAAAGTACCAGAAATACTGCGGCTTGGCGTTGCCATCGGCGC 43 | TCGCGTCTGACTGCCGCCTTGTTGCATTGAAAACTTTCGTGGCCGGCTGTTTTAGTGCAATGTGCTTGCT 44 | GCAGAACCTCCGGGCAACGCAGAACTCAATTCAGACTAATCT 45 | >TCONS_00001369 gene=Dpse\GA11672 46 | ATGGTGGTGCGCAACTTACCCTACGAGAAGGAGGATAAGGTTCGATTGCCTGATGTCTGGGACGATGGCA 47 | GGGGCATACCAGGAGAGTCTTCAAAGGATACAAAGACAGAAGCCATAGAACCAACGGGGGAGGGAGAGCA 48 | GGAGGAGAACCTCGAAGCTCAGTTGGAAGTGAAGACTGAAGTGGATGATGTGGAGGAGACTTCTGCTAAA 49 | ACCGAAACCGAAGAAGAGCCTCTGGCAGCTGATGTGAAATATGCCTGGAATTCGCCTTGTATGATGATTA 50 | TTTTCGAAGACGGCGATGTCAATAGTGCAATCCATCATTTGGTTGGTTCGATGCAGAATCCGTTTGCCCT 51 | GGATGCCGTGGCCACGGTGTTGGTGCAGGAGAGCATAGCGGAGGATCTCGCCAGCCGAATAGTCGACCTT 52 | CTGCATCCATTGGATCCCCGGGTCGCCAATCACCCGAACTATCTTCGCACTCTCAGCCATTTGGCCCAGC 53 | TGAAGGCAAAGATCGTGGCTGGGAATCCGGACAATGTGCCGGCCAATGCCACGCCCATGATCGTACGCGA 54 | TGTACCCCACCACTATCTGGGCGATGGTCCCACGGGCGTCGTAACCATGCACATCTTCCGCACTCCCCTC 55 | GAGGCCACCCTGGTCTATCAGAAGGAAGCCCTGCCCATTGCCTCGGTCAGCATTTGGAACGAGAAGATCG 56 | CCAGCATGTACGACATGGTGGCCGTACTCAATGCCGACACATTCAAAATCAACTGTTTCGACGTGGACCT 57 | GGAACCCATCAAACGGACCTTCGAGTGTCGTCGCTATAGTGCCCATATGAGTCGTGGTTATCACTACGAG 58 | ACCTTGCTGTTGAATGAGCAGCGAAAGGTTGTGATCTTTCCCGTGGGAGCTATCTATGCAAATTGA 59 | -------------------------------------------------------------------------------- /2013/1008-lincRNA_ID_from_RNA-Seq/BYOB_10-8-13.pptx: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/umd-byob/presentations/23ad5271b4fcd24c6c3ebc94eb42e1b9eb19bd44/2013/1008-lincRNA_ID_from_RNA-Seq/BYOB_10-8-13.pptx -------------------------------------------------------------------------------- /2013/1008-lincRNA_ID_from_RNA-Seq/README.md: -------------------------------------------------------------------------------- 1 | Bring Your Own Bioinformatics 10-08-13 2 | Unannotated transcripts and long intergenic noncoding RNA identification using RNAcode and CPAT 3 | by Kevin Nyberg 4 | 5 | See BYOB_10-8-13.pptx for info behind RNAcode and CPAT, including my test runs on annotated Drosophila pseudoobscura transcripts. 6 | 7 | <<>> 8 | Get RNAcode here: http://wash.github.io/rnacode/ 9 | 10 | To run a test, make sure you have the following files: 11 | RNAcode_BYOB.sh 12 | TCONS_00000047_4species.aln 13 | TCONS_00001351_4species.aln 14 | TCONS_00001358_4species.aln 15 | TCONS_00001369_4species.aln 16 | 17 | Run 18 | > bash RNAcode_BYOB.sh 19 | Output file is RNAcode_BYOB_out.txt 20 | If p-value < 0.05, then assumed to be protein-coding. 21 | If not info is given (and no error), assume to be noncoding. 22 | 23 | 24 | <<>> 25 | Get CPAT here: http://rna-cpat.sourceforge.net/ 26 | 27 | To run a test, make sure you have the following files: 28 | cpat.py 29 | BYOB.fasta 30 | fly_Hexame.tab 31 | fly_train.RData 32 | 33 | Run 34 | > python cpat.py --gene BYOB.fasta --outfile BYOB_CPAT_out --hex fly_Hexame.tab --logitModel fly_train.Rdata 35 | Protein coding probabilities are in the right-most column. Cutoff for D. melanogaster training set is 0.39. Any probability above 0.39 can be assumed to be protein coding. Any probability below 0.39 can be assumed to be noncoding. -------------------------------------------------------------------------------- /2013/1008-lincRNA_ID_from_RNA-Seq/RNAcode_BYOB.sh: -------------------------------------------------------------------------------- 1 | echo TCONS_00001351 >> RNAcode_BYOB_out.txt; RNAcode -s -g -p 0.05 -n 500 TCONS_00001351_4species.aln >> RNAcode_BYOB_out.txt 2 | echo TCONS_00001358 >> RNAcode_BYOB_out.txt; RNAcode -s -g -p 0.05 -n 500 TCONS_00001358_4species.aln >> RNAcode_BYOB_out.txt 3 | echo TCONS_00000047 >> RNAcode_BYOB_out.txt; RNAcode -s -g -p 0.05 -n 500 TCONS_00000047_4species.aln >> RNAcode_BYOB_out.txt 4 | echo TCONS_00001369 >> RNAcode_BYOB_out.txt; RNAcode -s -g -p 0.05 -n 500 TCONS_00001369_4species.aln >> RNAcode_BYOB_out.txt 5 | -------------------------------------------------------------------------------- /2013/1008-lincRNA_ID_from_RNA-Seq/TCONS_00000047_4species.aln: -------------------------------------------------------------------------------- 1 | CLUSTAL W(1.81) multiple sequence alignment 2 | 3 | 4 | Dpse_MV225/1-322 CTAGCATCTGGGGCACACACAACGAGTATCTGATTACTCTTACTTTGCCCCGGGAAGGTC 5 | Dper_MSH1993/1-322 CTAGCATCTGGGGCACACACAACGAGTATCTGATTACTCTTACTTTGCCCCGGGAAGGTC 6 | Dmir_MAO/1-322 CTAGCATCTGGGGCACACACAACGAGTATCTGATTACTCTTACTTTGCCCCGGGAAGGTC 7 | Dlow_Lowei/1-322 CTAGCATCTGGGGCACACACAACGAGTATCTGATTACTCTTACTTTGCCCCGGGAAGGTC 8 | ************************************************************ 9 | 10 | 11 | Dpse_MV225/1-322 TGAGATTTGGCCAAGCCTCCTTGTGTTGTTACAACATGCAGTGCATCGAGAGGTGCCTGT 12 | Dper_MSH1993/1-322 TGAGATTTGGCCAAGCCTCCTTGTGTTGTTACAACATGCAGTGCATCGAGAGGTGCCTGT 13 | Dmir_MAO/1-322 TGAGATTTGGCCAAGCCTCCTTGTGTTGTTACAACATGCAGTGCATCGAGAGGTGCCTGT 14 | Dlow_Lowei/1-322 TGAGATTTGGCCAAGCCTCCTTGTGTTGTTACAACATGCAGTGCATCGAGAGGTGCCTGT 15 | ************************************************************ 16 | 17 | 18 | Dpse_MV225/1-322 GTGTCGGAGGCTTAATTTCCTAGCAGGAAACCTGTGTGATTGCAGGACGAAAGTACCAGA 19 | Dper_MSH1993/1-322 GTGTCGGAGGCTTAATTTCCTAGCAGGAAACCTGTGTGATTGCAGGACGAAAGTACCAGA 20 | Dmir_MAO/1-322 GTGTCGGAGGCTTAATTTCCTAGCAGGAAACCTGTGTGATTGCAGGACGAAAGTACCAGA 21 | Dlow_Lowei/1-322 GTGTCGGAGGCTTAATTTCCTAGCAGGAAACCTGTGTGATTGCAGGACGAAAGTACCAGA 22 | ************************************************************ 23 | 24 | 25 | Dpse_MV225/1-322 AATACTGCGGCTTGGCGTTGCCATCGGCGCTCGCGTCTGACTGCCGCCTTGTTGCATTGA 26 | Dper_MSH1993/1-322 AATACTGCGGCTTGGCGTTGCCATCGGCGCTCGCGTCTGACTGCCGCCTTGTTGCATTGA 27 | Dmir_MAO/1-322 AATACTGCGGCTTGGCGTTGCCATCGGCGCTCGCGTCTGACTGCCGCCTTGTTGCATTGA 28 | Dlow_Lowei/1-322 AATCCTGCGGCTTGGCGTTGCCNTCGACGCTCGCGTCTGACCGCCGCCTTGTCGCATTGA 29 | *** ****************** *** ************** ********** ******* 30 | 31 | 32 | Dpse_MV225/1-322 AAACTTTCGTGGCCGGCTGTTTTAGTGCAATGTGCTTGCTGCAGAACCTCCGGGCAACGC 33 | Dper_MSH1993/1-322 AAACTTTCGTGGCCGGCTGTTTTAGTGCAATGTGCTTGCTGCAGAGCCTCCGGGCAACGC 34 | Dmir_MAO/1-322 AAACTTTCGTGGCCGGCTGTTTTAGTGCAATGTGCTTGCTGCAGAGCCTCCGGGCAACGC 35 | Dlow_Lowei/1-322 AAACTTTCGTGGCCGGCTTTTTTAGTGCAATGTGCTTGCTGCAGAGCCTCCGGGCAACGC 36 | ****************** ************************** ************** 37 | 38 | 39 | Dpse_MV225/1-322 AGAACTCAATTCAGACTAATCT 40 | Dper_MSH1993/1-322 AGAACTCAATTCAGACTAATCT 41 | Dmir_MAO/1-322 AGAACTCAATTCAGACTAATCT 42 | Dlow_Lowei/1-322 AGAACTCAATTCAGACTAATCT 43 | ********************** 44 | 45 | 46 | -------------------------------------------------------------------------------- /2013/1008-lincRNA_ID_from_RNA-Seq/TCONS_00001351_4species.aln: -------------------------------------------------------------------------------- 1 | CLUSTAL W(1.81) multiple sequence alignment 2 | 3 | 4 | Dpse_MV225/1-354 ATGGGACAAGCGACGGCATTTGTGGTTCTCGCACTCTGTGCTTTGGCTCTGACCGAGGCA 5 | Dper_MSH1993/1-354 ATGGGACAAGCGACGGCATTTGTGGTTCTCGCACTCTGTGCTTTAGCTGTGACCGAGGCA 6 | Dmir_MAO/1-351 ATGCGACAAGCGACGGCATTTGTGGTTCTCGCACTCTGTGCTTTAGCTCTGACCGAGGCG 7 | Dlow_Lowei/1-354 ATGRGACARCCGACGACATTTGTGGTTCTCGCACTCTGTGCTTTAGCTCTGACCGAGGCA 8 | *** **** ***** **************************** *** ********** 9 | 10 | 11 | Dpse_MV225/1-354 GCGGTGTACATTGGCGGCGGTTGCTATGACTGTAACCCCCCCGGCGGCCAGGGACCCGGC 12 | Dper_MSH1993/1-354 GCGGTGTACATTGGCGGCGGTTGCTATGACTGTAACCCGCCCGGCGGCCAGGGACCCGGC 13 | Dmir_MAO/1-351 GCGGTGTACATTGGCGGCGGTTGCTATGACTGTAANNNNNCCGGCGGCCAGGGGCCCGGC 14 | Dlow_Lowei/1-354 GCTGTGTACATCGGCGGCGGTTGCTATGACTGTAACCCGCCCGGCGGCCAGGGACCCGGC 15 | ** ******** *********************** ************* ****** 16 | 17 | 18 | Dpse_MV225/1-354 ATCTATACAGGCGGCGGACGAGGAGGTGGCGGTGCACCCAACAACTACTATAATCAAGGA 19 | Dper_MSH1993/1-354 ATCTATACAGGCGGCGGACGAGGAGGTGGCGGTGCACCCAACAACTACTATAATCAAGGA 20 | Dmir_MAO/1-351 ATCTATACAGGCGGCGGTCGAGGAGGTGGCGGTTCACCCAACAACTACTATAATCAAGGA 21 | Dlow_Lowei/1-354 ATCTATACAGGCGGTGGACGAGGAGGWGGNGGTGCACCCAACAACTACTATAATCAAGGA 22 | ************** ** ******** ** *** ************************** 23 | 24 | 25 | Dpse_MV225/1-354 GGTGGCCGTGGGGGTGGAGGCGGCAGTGGTCGTCCAGTTTACTCCGGAAACTTTGGTCCA 26 | Dper_MSH1993/1-354 GGTGGCCGTGGGGGTGGAGGCGGCAGTGGTCGTCCAGTTTACTCCGGAAACTTTGGCCCA 27 | Dmir_MAO/1-351 GGTGGCCGTGGAGGTGGAAGCGGCAGTGGTCGTCCAGTTTACTCCGGAAACTTTGGTCCA 28 | Dlow_Lowei/1-354 GGTGGCCGAGGCTCTGCAGGCGGCAGTGGTCGTCCAGTTTACTCCGGAAACTTTGGTCCA 29 | ******** ** ** * ************************************* *** 30 | 31 | 32 | Dpse_MV225/1-354 AATGGATATAGCGGCGGTGGCGGCGGTCGAGGAGGTGGTGGGGGCTATAGGAATGGAGGA 33 | Dper_MSH1993/1-354 AATGGATATAGCGGCGGTGGCGGCGGTCGAGGAGGTGGTGGGGGCTATAGAAATGGAGGA 34 | Dmir_MAO/1-351 AATGGATATAGCGGC---GGCGGCGGTCGAGGAGGTGGTGGGGGCTATAGCAGTGGAGGA 35 | Dlow_Lowei/1-354 AATGGATANNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN 36 | ******** 37 | 38 | 39 | Dpse_MV225/1-354 GGCGGCGGCGGTTATGACGATGGTGGTCTGACGCAGATTATACGATATGATTAA 40 | Dper_MSH1993/1-354 GGCGGCGGCGGTTATGACGATGGTGGTCTGACGCAGATTATACGATATGATTAA 41 | Dmir_MAO/1-351 GGCGGCGGCGGTTATGACGATGGTGGTCTGACGCAGATTATACGATATGATTAA 42 | Dlow_Lowei/1-354 NNNNNNNNNNNNNATGACGATGGTGGNCTGACGCAGATTATACGATATGATTAA 43 | ************* *************************** 44 | 45 | 46 | -------------------------------------------------------------------------------- /2013/1008-lincRNA_ID_from_RNA-Seq/fly_train.RData: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/umd-byob/presentations/23ad5271b4fcd24c6c3ebc94eb42e1b9eb19bd44/2013/1008-lincRNA_ID_from_RNA-Seq/fly_train.RData -------------------------------------------------------------------------------- /2013/1015-PBS/01-setting_up/dotbash_profile: -------------------------------------------------------------------------------- 1 | ## Any commands which print text to the user can be executed 2 | ## here in a ~/.profile or ~/.bash_profile 3 | ## rsync, sftp, scp, etc should not see this file. 4 | 5 | 6 | ## Check to make sure there is no execute loop 7 | ## If SHELL_LEVELS is undefined, set it to 0 8 | ## Otherwise leave it alone. 9 | export SHELL_LEVELS=${SHELL_LEVELS-0} 10 | ## When doing math tests, [[ ]] is the way to go, otherwise it will attempt to execute 11 | ## a command named (in this case) $SHELL_LEVELS -- which evaluates to something like 12 | ## 0 or 1 or 2... 13 | if [[ "$SHELL_LEVELS" > 10 ]]; then 14 | echo "You appear to have an execute loop." 15 | echo "This may happen if your .bashrc calls your .bash_profile" 16 | echo "and the .bash_profile calls .bashrc; similarly, having" 17 | echo "newgrp(1) in the .bashrc could cause this." 18 | echo "exiting now." 19 | exit 20 | else 21 | export SHELL_LEVELS=$(( $SHELL_LEVELS + 1 )) 22 | fi 23 | 24 | 25 | ## Print the current number of processes running. 26 | PROCESSES=$(ps aux | wc -l) 27 | echo "There are ${PROCESSES} processes running." 28 | 29 | ## Check for full filesystems 30 | FULL=$(df | grep 9[0-9]%) 31 | if [ -n "$FULL" ]; then 32 | echo "The following filesystems are >= 90% full." 33 | for fs in $(df | grep 9[0-9]% | awk '{print $5}'); do 34 | echo $fs 35 | done 36 | fi 37 | 38 | ## Print the uptime and processor status 39 | uptime 40 | 41 | ## Check that we are in the hpgl group. 42 | CURRENT_GROUP=$(groups | awk '{print $1}') 43 | if [ "$CURRENT_GROUP" != "hpgl" ]; then 44 | ## Only run newgrop if we are on a toplevel shell 45 | ## In an attempt to avoid shell loops 46 | if [[ "$SHELL_LEVELS" < 2 ]]; then 47 | newgrp hpgl 48 | fi 49 | fi 50 | 51 | ## Add any commands the user wants in his/her home directory 52 | if [ -e "${HOME}/.profile_local" ]; then 53 | . ${HOME}/.profile_local 54 | fi 55 | 56 | if [ -r "/cbcb/lab/nelsayed/scripts/dotbashrc" ]; then 57 | . /cbcb/lab/nelsayed/scripts/dotbashrc 58 | fi 59 | ## Add any environment variables etc he/she wants 60 | ## If s/he accidently calls the .profile, the above 61 | ## test will stop it. 62 | if [ -e "${HOME}/.bashrc" ]; then 63 | . ${HOME}/.bashrc 64 | fi 65 | 66 | 67 | set -P ## pwd returns the actual current location without symlinks 68 | set auto_resume=1 69 | set bell-style visible 70 | set history=1000 71 | set notify 72 | set savehist=10000 73 | set show-all-if-ambiguous on 74 | set visible-stats on 75 | 76 | shopt -s huponexit 77 | shopt -s histappend #makes bash append to history rather than overwrite 78 | shopt -s cmdhist 79 | shopt -s histreedit 80 | shopt -s cdspell 81 | shopt -s cdable_vars 82 | # tab-completion of hostnames after @ 83 | shopt -s hostcomplete 84 | -------------------------------------------------------------------------------- /2013/1015-PBS/01-setting_up/dotbashrc: -------------------------------------------------------------------------------- 1 | ## Keep in mind that this file should never print anything to the terminal. 2 | ## It is executed on every new shell, including those started by things like 3 | ## rsync, scp, sftp etc. As a result, any additional shells it starts or 4 | ## statements it prints could prove drastically bad. 5 | ## Some lab specific paths and variables that may prove useful 6 | export LAB="/cbcb/lab/nelsayed" 7 | export LAB_BASE="/cbcb/lab/nelsayed" 8 | export LAB_BASHRC=1 9 | export LOGS="${LAB}/log" 10 | export PBS_WORKSTATION=${PBS_WORKSTATION-" -V -S /cbcb/lab/nelsayed/local/bin/bash -q workstation -l walltime=12:00:00 "} 11 | export PBS_THROUGHPUT=${PBS_THROUGHPUT-" -V -S /cbcb/lab/nelsayed/local/bin/bash -q throughput -l walltime=12:00:00 "} 12 | export PBS_LARGE=${PBS_LARGS-" -V -S /cbcb/lab/nelsayed/local/bin/bash -q large -l walltime=72:00:00"} 13 | export PBS_LOG=${PBS_LOG-" -j eo -e ibissub00.umiacs.umd.edu:${HOME}/outputs/pbs.out -m n"} 14 | export PBS_ARGS=${PBS_ARGS-"$PBS_THROUGHPUT $PBS_LOG"} 15 | export PREFIX="${LAB}/local" 16 | export PROG="${LAB}/programs" 17 | export PROG_BIN="${PROG}/bin" 18 | export PROG_LIB="${PROG}/lib" 19 | export RAW="${LAB}/raw_data" 20 | export REF="${LAB}/ref_data" 21 | export SCRATCH=${SCRATCH-"/cbcb/personal-scratch/${USER}"} 22 | export WD=${PWD} 23 | export TRINITY_HOME="${PROG}/trinityrnaseq_r2013-02-25" 24 | 25 | umask 0002 26 | 27 | case $TERM in 28 | gnome|screen|rxvt|xterm*) 29 | if (($UID == 0)) 30 | then 31 | PS1="\[\033]0;SUPER USER@\h: \w\007\]<\t>SUPER USER@\h:\W# " 32 | # PS1="\[\033]0;SUPER USER@\h:\w\007\]\[\033[36;40m<\t>\[\033[32;40mSUPER USER@\h:#\[\033[m " 33 | else 34 | # PS1="\[\033]0;\u@\h: \w\007\]<\t>\u@\h:\W>" 35 | PS1="\[\033]0;\u@\h: \w\007\]\u@\h:\w>" 36 | # PS1="\[\033]0;\u@\h:\w\007\]\[\033[36;40m<\t>\[\033[32;40m\u@\h:>\[\033[m " 37 | fi 38 | ;; 39 | *) 40 | if (($UID == 0)) 41 | then 42 | PS1="<\t>SUPER USER@\h:# " 43 | else 44 | PS1="<\t>\u@\h:> " 45 | fi 46 | ;; 47 | esac 48 | export PS1 49 | export PS2='and>' 50 | 51 | export BOOST_INCLUDEDIR=$PREFIX/include/boost/ 52 | export BOOST_ROOT=$PREFIX/include 53 | #export BROWSER="firefox %s" 54 | export CMAKE_INSTALL_PREFIX=$PREFIX 55 | export CVSEDITOR="vi" 56 | export CVSIGNORE="*.o *~ #* *.emacs*" 57 | export CVS_RSH="ssh" 58 | export EDITOR=vim 59 | export FTP_PASSIVE=1 60 | export GREP_COLOR=32 61 | export GREP_OPTIONS=--color=auto 62 | export HACKPAGER=more 63 | #export HISTCONTROL=ignoreboth 64 | export HISTSIZE=10000 65 | export HISTFILESIZE=2000000 66 | export LSCOLORS=ExDxHxAxCxegedabagacad 67 | export MANPATH=${MANPATH}:/usr/share/man:/usr/local/share/man:${PREFIX}/share/man 68 | export MYCFLAGS="-O3 -I$PREFIX/include -L$PREFIX/lib" 69 | export PAGER=more 70 | export PERLIO=stdio 71 | #export PERL_LOCAL_LIB_ROOT="/cbcb/lab/nelsayed/programs/perl_lib/perl5"; 72 | #export PERL_MB_OPT="--install_base=/cbcb/lab/nelsayed/programs/perl_lib/perl5"; 73 | #export PERL_MM_OPT="INSTALL_BASE=/cbcb/lab/nelsayed/programs/perl_lib/perl5"; 74 | #export PERL5LIB="/cbcb/lab/nelsayed/programs/perl_lib/perl5/x86_64-linux-thread-multi:/cbcb/lab/nelsayed/programs/perl_lib/perl5/lib/perl5"; 75 | #export PROMPT_COMMAND="history -a; history -n; $PROMPT_COMMAND" 76 | export R_LIBS=$PREFIX/R 77 | 78 | ## Potentially long statements below 79 | # LIBRARY_PATH is for the compiler -- a list of directories to search for needed libraries at compile time 80 | # Setting this makes the required list of -L directories shorter 81 | export LIBRARY_PATH="$PREFIX/lib:$PREFIX/lib64:/usr/lib64" 82 | # LD_LIBRARY_PATH is for the dynamic linker -- a list of directories for programs to search at runtime 83 | export LD_LIBRARY_PATH="$PREFIX/lib:$PREFIX/lib64:/usr/local/stow/gcc-4.7.2/lib:/usr/local/stow/gcc-4.7.2/lib64:/lib64:/lib:/usr/lib64:/usr/lib:/usr/local/lib:/usr/local/lib64:/cbcb/lab/nelsayed/programs/lib:/cbcb/lab/nelsayed/programs/lib:$PREFIX/qt/lib" 84 | ## CPATH is a list of directories which will be added as per -I for the C compiler. 85 | export CPATH=${CPATH-"."} 86 | export CPATH="$PREFIX/include:/usr/include:/usr/local/include:$CPATH" 87 | ## C_INCLUDE is as above but C specific 88 | #C_INCLUDE_PATH 89 | ## CPLUS_INCLUDE is as above but C++ specific 90 | #CPLUS_INCLUDE_PATH 91 | ## Generate a path statement using the many entries in $LAB/scripts/dotbash_paths 92 | ## Order of usage is first to last in that file. 93 | PATH_STMT="export PATH=$PREFIX/bin" 94 | for mypath in $(/bin/cat $LAB/scripts/dotbash_paths); do 95 | PATH_STMT="$PATH_STMT:$mypath" 96 | done 97 | eval $(echo "$PATH_STMT") 98 | 99 | export JAR_DIR="$PREFIX/jar" 100 | CLASSPATH_STMT="export CLASSPATH=$JAR_DIR" 101 | for jarfile in $(/bin/ls $JAR_DIR/*.jar); do 102 | CLASSPATH_STMT="$CLASSPATH_STMT:$jarfile" 103 | ALIAS_STMT="alias $(basename $jarfile .jar)=\"java -jar $jarfile\"" 104 | eval $(echo $ALIAS_STMT) 105 | done 106 | eval $(echo $CLASSPATH_STMT) 107 | 108 | ## The following wraps the shopt calls in case zsh is used 109 | if [ $(basename $SHELL) == "bash" ]; then 110 | shopt -s dotglob 111 | shopt -s extglob 112 | shopt -s histappend 113 | shopt -s histreedit 114 | shopt -s hostcomplete 115 | shopt -s lithist 116 | shopt -s cdable_vars 117 | shopt -s cdspell 118 | fi 119 | ## Stop the xbell 120 | #if [ -n "${DISPLAY}" ]; then 121 | # xset b off 2>/dev/null 1>&2 122 | #fi 123 | 124 | if [ -e "${LAB_BASE}/scripts/dotbash_aliases" ]; then 125 | source ${LAB_BASE}/scripts/dotbash_aliases 126 | fi 127 | 128 | # >>>>>>>>>>>>>>>>>>>>>>> Enabling Biopieces <<<<<<<<<<<<<<<<<<<<<<< 129 | 130 | export BP_DIR="/cbcb/lab/nelsayed/local/biopieces" 131 | export BP_DATA="/cbcb/lab/nelsayed/local/biopieces/data" 132 | export BP_TMP="/cbcb/lab/nelsayed/local/biopieces/tmp" 133 | export BP_LOG="/cbcb/lab/nelsayed/local/biopieces/log" 134 | 135 | source "$BP_DIR/bp_conf/bashrc" 136 | 137 | # >>>>>>>>>>>>>>>>>>>>>>><<<<<<<<<<<<<<<<<<<<<<< 138 | 139 | 140 | -------------------------------------------------------------------------------- /2013/1015-PBS/01-setting_up/dotscreenrc: -------------------------------------------------------------------------------- 1 | # Set the caption on the bottom line (works on later screens) 2 | caption always "%{= kw}%-w%{= BW}%n %t%{-}%+w %-= %H %{gk} |%c %{yk}%d.%m.%Y | %72=Load: %l %{wk}" 3 | 4 | # Set the hardstatus on later screen versions (this is displayed in the 5 | # title bar in X for example) 6 | #hardstatus string "%n %t (screen on %H) %{= Ck}%H:%L>%{= dd} %h" 7 | hardstatus always "%n %t (screen on %H) %{= Ck}%H:%L>%{= dd} %h" 8 | 9 | #hardstatus alwayslastline 10 | hardstatus lastline 11 | 12 | # xterm understands both im/ic and doesn't have a status line. 13 | termcap xterm hs@:cs=\E[%i%d;%dr:im=\E[4h:ei=\E[4l 14 | terminfo xterm hs@:cs=\E[%i%p1%d;%p2%dr:im=\E[4h:ei=\E[4l 15 | 16 | # 80/132 column switching must be enabled for ^AW to work 17 | # change init sequence to not switch width 18 | termcapinfo xterm Z0=\E[?3h:Z1=\E[?3l:is=\E[r\E[m\E[2J\E[H\E[?7h\E[?1;4;6l 19 | 20 | # Make the output buffer large for (fast) xterms. 21 | # termcapinfo xterm* OL=10000 22 | termcapinfo xterm* OL=100 23 | 24 | # tell screen that xterm can switch to dark background and has function 25 | # keys. 26 | termcapinfo xterm 'VR=\E[?5h:VN=\E[?5l' 27 | termcapinfo xterm 'k1=\E[11~:k2=\E[12~:k3=\E[13~:k4=\E[14~' 28 | termcapinfo xterm 'kh=\EOH:kI=\E[2~:kD=\E[3~:kH=\EOF:kP=\E[5~:kN=\E[6~' 29 | 30 | # special xterm hardstatus: use the window title. 31 | termcapinfo xterm 'hs:ts=\E]2;:fs=\007:ds=\E]2;screen\007' 32 | 33 | #terminfo xterm 'vb=\E[?5h$<200/>\E[?5l' 34 | termcapinfo xterm 'vi=\E[?25l:ve=\E[34h\E[?25h:vs=\E[34l' 35 | 36 | # emulate part of the 'K' charset 37 | termcapinfo xterm 'XC=K%,%\E(B,[\304,\\\\\326,]\334,{\344,|\366,}\374,~\337' 38 | 39 | # xterm-52 tweaks: 40 | # - uses background color for delete operations 41 | termcapinfo xterm* be 42 | 43 | # Do not use xterm's alternative window buffer, it breaks scrollback (see bug #61195) 44 | termcapinfo xterm|xterms|xs ti@:te=\E[2J 45 | 46 | 47 | 48 | startup_message off # Turn off the splash screen 49 | escape ^pp ## No more ctrl-a but instead ctrl-p 50 | defscrollback 3000 # default: 100 51 | screen -t EDIT 0 emacs -nw 52 | screen -t EDITibis 1 ssh -t ${USER}@ibis emacs -nw 53 | screen -t hpgl 2 bash 54 | screen -t sub00 3 ssh ${USER}@ibissub00.umiacs.umd.edu 55 | screen -t sub01 4 ssh ${USER}@ibissub01.umiacs.umd.edu 56 | screen -t sub02 5 ssh ${USER}@ibissub02.umiacs.umd.edu 57 | screen -t walnut 6 ssh ${USER}@walnut.umiacs.umd.edud 58 | 59 | bind V screen -t 'vim' 0 vim 60 | #bindkey "^[Od" prev # change window with ctrl-left 61 | #bindkey "^[Oc" next # change window with ctrl-right 62 | bindkey ^[O3D prev 63 | bindkey ^[O3C next 64 | bind = resize = 65 | bind + resize +1 66 | bind - resize -1 67 | bind _ resize max 68 | bind f eval "caption splitonly" "hardstatus ignore" 69 | bind F eval "caption always" "hardstatus alwayslastline" 70 | acladd najib 71 | acladd abelew 72 | -------------------------------------------------------------------------------- /2013/1015-PBS/01-setting_up/ssh_keygen.log: -------------------------------------------------------------------------------- 1 | Script started on Tue 15 Oct 2013 10:24:03 AM EDT 2 | ]0;trey@chirp: ~trey@chirp:~>hostname 3 | chirp 4 | ]0;trey@chirp: ~trey@chirp:~>ssh abelew@ibissub001.umiacs.umd.edu 5 | Enter passphrase for key '/home/trey/.ssh/id_dsa': 6 | abelew@ibissub01.umiacs.umd.edu's password: 7 | Last login: Wed Oct 2 11:11:06 2013 from hpgl.umd.edu 8 | There are 340 processes running. 9 | The following filesystems are >= 90% full. 10 | /szfs/szattic-spare1 11 | /szfs/szattic-genefind1 12 | /szfs/szattic-asmg4 13 | /szfs/szattic-data1 14 | /szfs/szattic-asmg1 15 | /szfs/szattic-root 16 | /szfs/szattic-misc1 17 | /szfs/szsubmit 18 | /szfs/szdevel 19 | /szfs/szattic-asmg2 20 | /fs/szdevel 21 | 10:24:17 up 17 days, 17:49, 16 users, load average: 0.97, 0.83, 0.56 22 | ]0;abelew@ibissub01: ~abelew@ibissub01:~>hostname 23 | ibissub01.umiacs.umd.edu 24 | ]0;abelew@ibissub01: ~abelew@ibissub01:~>ssh-kels -ld .ssh 25 | ls: cannot access .ssh: No such file or directory 26 | ]0;abelew@ibissub01: ~abelew@ibissub01:~>ssh-gkeygen 27 | Generating public/private rsa key pair. 28 | Enter file in which to save the key (/cbcbhomes/abelew/.ssh/id_rsa): 29 | Created directory '/cbcbhomes/abelew/.ssh'. 30 | Enter passphrase (empty for no passphrase): 31 | Enter same passphrase again: 32 | Your identification has been saved in /cbcbhomes/abelew/.ssh/id_rsa. 33 | Your public key has been saved in /cbcbhomes/abelew/.ssh/id_rsa.pub. 34 | The key fingerprint is: 35 | 96:9a:9c:ee:5d:9b:89:0c:d0:e1:d8:ae:c5:9c:bf:72 abelew@ibissub01.umiacs.umd.edu 36 | ]0;abelew@ibissub01: ~abelew@ibissub01:~>cd .ssh 37 | ]0;abelew@ibissub01: ~/.sshabelew@ibissub01:~/.ssh>ls -al 38 | total 20 39 | drwx--S--- 1 abelew hpgl 108 Oct 15 10:24 ./ 40 | drwxr-sr-x 1 abelew hpgl 8192 Oct 15 10:24 ../ 41 | -rw------- 1 abelew hpgl 1675 Oct 15 10:24 id_rsa 42 | -rw-r--r-- 1 abelew hpgl 413 Oct 15 10:24 id_rsa.pub 43 | ]0;abelew@ibissub01: ~/.sshabelew@ibissub01:~/.ssh>cat id_rsa.pub > authorized_keys 44 | ]0;abelew@ibissub01: ~/.sshabelew@ibissub01:~/.ssh>ssh abelew@ibissub02.umiacs.umd.edu 45 | abelew@ibissub02.umiacs.umd.edu's password: 46 | 47 | ]0;abelew@ibissub01: ~/.sshabelew@ibissub01:~/.ssh>ls -al 48 | total 24 49 | drwx--S--- 1 abelew hpgl 140 Oct 15 10:24 ./ 50 | drwxr-sr-x 1 abelew hpgl 8192 Oct 15 10:24 ../ 51 | -rw-rw-r-- 1 abelew hpgl 413 Oct 15 10:24 authorized_keys 52 | -rw------- 1 abelew hpgl 1675 Oct 15 10:24 id_rsa 53 | -rw-r--r-- 1 abelew hpgl 413 Oct 15 10:24 id_rsa.pub 54 | ]0;abelew@ibissub01: ~/.sshabelew@ibissub01:~/.ssh>chmod 600 authorized_keys 55 | ]0;abelew@ibissub01: ~/.sshabelew@ibissub01:~/.ssh>!ss 56 | ssh abelew@ibissub02.umiacs.umd.edu 57 | Last login: Thu Sep 26 13:34:33 2013 from hpgl.umd.edu 58 | There are 154 processes running. 59 | The following filesystems are >= 90% full. 60 | /szfs/szattic-genefind1 61 | /szfs/szdevel 62 | /szfs/szattic-asmg1 63 | /szfs/szattic-data1 64 | /szfs/szattic-asmg2 65 | /szfs/szsubmit 66 | /szfs/szattic-misc1 67 | /szfs/szattic-asmg4 68 | /szfs/szattic-spare1 69 | /szfs/szattic-root 70 | 10:25:03 up 25 days, 13:59, 4 users, load average: 0.00, 0.00, 0.00 71 | ]0;abelew@ibissub02: ~abelew@ibissub02:~>ssh abelew@walnut 72 | Last login: Tue Oct 15 10:23:52 2013 from 129-2-129-19.wireless.umd.edu 73 | There are 422 processes running. 74 | The following filesystems are >= 90% full. 75 | /fs/szattic-asmg6 76 | /fs/szdevel 77 | /fs/www-users 78 | 10:25:07 up 3 days, 18:29, 14 users, load average: 0.08, 0.09, 0.03 79 | ]0;abelew@walnut: ~abelew@walnut:~>exit 80 | ]0;abelew@walnut: ~abelew@walnut:~>logout 81 | Connection to walnut closed. 82 | ]0;abelew@ibissub02: ~abelew@ibissub02:~>exit 83 | ]0;abelew@ibissub02: ~abelew@ibissub02:~>logout 84 | Connection to ibissub02.umiacs.umd.edu closed. 85 | ]0;abelew@ibissub01: ~/.sshabelew@ibissub01:~/.ssh>exit 86 | ]0;abelew@ibissub01: ~abelew@ibissub01:~>logout 87 | Connection to ibissub01.umiacs.umd.edu closed. 88 | ]0;trey@chirp: ~trey@chirp:~>exit 89 | 90 | Script done on Tue 15 Oct 2013 10:25:19 AM EDT 91 | -------------------------------------------------------------------------------- /2013/1015-PBS/01-setting_up/ssh_keygen.time: -------------------------------------------------------------------------------- 1 | 0.258316 31 2 | 0.006067 1 3 | 2.185456 1 4 | 0.069826 1 5 | 0.059827 1 6 | 0.199634 1 7 | 0.089802 1 8 | 0.109802 1 9 | 0.069841 1 10 | 0.059988 2 11 | 0.169900 7 12 | 0.000496 31 13 | 0.000321 1 14 | 1.565886 1 15 | 0.089641 1 16 | 0.119879 1 17 | 0.069795 1 18 | 0.019988 1 19 | 0.129679 1 20 | 0.069918 1 21 | 0.109754 1 22 | 0.049966 1 23 | 0.049992 1 24 | 0.229571 1 25 | 0.219550 1 26 | 0.119802 1 27 | 0.109861 1 28 | 0.139672 1 29 | 0.129662 1 30 | 0.159740 1 31 | 0.219759 1 32 | 0.259477 1 33 | 0.139706 4 34 | 0.678804 1 35 | 0.099732 1 36 | 0.489201 1 37 | 0.199479 1 38 | 0.189635 1 39 | 0.042799 1 40 | 0.096967 1 41 | 0.109815 1 42 | 0.079855 1 43 | 0.050048 1 44 | 0.139530 1 45 | 0.159907 1 46 | 0.049839 2 47 | 0.170103 1 48 | 0.169224 1 49 | 0.080019 2 50 | 0.319606 51 51 | 0.346450 46 52 | 1.090468 2 53 | 2.086011 57 54 | 0.078806 34 55 | 0.261778 44 56 | 0.032624 22 57 | 0.013848 46 58 | 0.000184 42 59 | 0.000362 106 60 | 0.001007 72 61 | 0.001585 43 62 | 0.580943 1 63 | 2.813138 1 64 | 0.059623 1 65 | 0.080065 1 66 | 0.160044 1 67 | 0.059153 1 68 | 0.129874 1 69 | 0.050090 1 70 | 0.209715 2 71 | 0.339074 26 72 | 0.060169 43 73 | 0.000735 1 74 | 2.184857 1 75 | 0.139490 1 76 | 0.099928 1 77 | 0.210340 1 78 | 0.538491 1 79 | 0.069562 6 80 | 0.459135 3 81 | 0.049957 1 82 | 0.399323 1 83 | 0.079759 1 84 | 0.060037 1 85 | 0.050208 1 86 | 0.079935 1 87 | 0.079295 1 88 | 0.139787 1 89 | 0.059867 1 90 | 0.119906 1 91 | 0.119579 1 92 | 0.050137 2 93 | 0.249482 22 94 | 0.008599 29 95 | 0.000091 43 96 | 0.000813 1 97 | 1.607281 1 98 | 0.109687 1 99 | 0.039862 1 100 | 0.169881 1 101 | 0.179445 4 102 | 0.539022 1 103 | 0.199730 1 104 | 0.059690 1 105 | 0.129892 1 106 | 0.209584 1 107 | 0.131049 1 108 | 0.078899 2 109 | 0.448817 41 110 | 0.051817 69 111 | 0.397594 2 112 | 1.197267 45 113 | 0.002313 44 114 | 0.000641 2 115 | 0.785551 29 116 | 0.000155 2 117 | 1.376857 70 118 | 0.017016 70 119 | 0.001528 106 120 | 0.001162 43 121 | 0.000349 1 122 | 1.047817 1 123 | 0.052773 1 124 | 0.116821 1 125 | 0.219704 1 126 | 0.179545 1 127 | 0.199654 1 128 | 0.129861 2 129 | 0.219710 53 130 | 0.000868 1 131 | 0.168907 1 132 | 0.109658 1 133 | 0.059881 1 134 | 0.039644 1 135 | 0.100193 1 136 | 0.099712 2 137 | 0.049817 10 138 | 0.009259 63 139 | 0.000192 60 140 | 0.000385 124 141 | 0.000859 53 142 | 0.000608 1 143 | 1.466016 1 144 | 0.059654 1 145 | 0.259418 1 146 | 0.099834 1 147 | 0.049922 1 148 | 0.119808 1 149 | 0.339569 1 150 | 0.179573 1 151 | 0.189531 1 152 | 0.329954 1 153 | 0.000102 1 154 | 0.348636 4 155 | 0.211030 1 156 | 0.518331 1 157 | 0.418771 1 158 | 0.109906 1 159 | 0.079959 1 160 | 0.179407 1 161 | 0.079789 1 162 | 0.099866 1 163 | 0.179897 1 164 | 0.099827 1 165 | 0.319454 1 166 | 0.110595 1 167 | 0.218965 1 168 | 0.149503 1 169 | 0.229537 1 170 | 0.040009 1 171 | 0.189725 1 172 | 0.149785 2 173 | 0.369365 53 174 | 0.012935 1 175 | 3.310855 1 176 | 0.119944 1 177 | 0.039784 1 178 | 0.109749 1 179 | 0.060067 1 180 | 0.060186 1 181 | 0.118999 1 182 | 0.069938 1 183 | 0.109908 1 184 | 0.019847 1 185 | 0.319594 1 186 | 0.459571 1 187 | 0.090496 1 188 | 0.148736 1 189 | 0.099784 1 190 | 0.130121 1 191 | 0.129632 1 192 | 0.159786 1 193 | 0.089610 1 194 | 0.279739 1 195 | 1.138099 1 196 | 0.181704 1 197 | 0.187878 1 198 | 0.039613 1 199 | 0.091127 1 200 | 0.128401 1 201 | 0.049796 1 202 | 0.050256 1 203 | 0.129545 1 204 | 0.159668 1 205 | 0.079836 1 206 | 0.099656 1 207 | 0.050405 1 208 | 0.169337 1 209 | 0.029945 2 210 | 0.369569 44 211 | 0.204739 2 212 | 2.322297 2 213 | 0.000358 53 214 | 0.000114 1 215 | 1.814669 1 216 | 0.030003 1 217 | 0.069836 1 218 | 0.109790 1 219 | 0.129562 1 220 | 0.069945 2 221 | 0.060041 10 222 | 0.008912 63 223 | 0.000460 60 224 | 0.000149 193 225 | 0.000853 53 226 | 0.000746 1 227 | 0.967049 1 228 | 0.039668 1 229 | 0.249629 1 230 | 0.029891 1 231 | 0.099886 1 232 | 0.119706 1 233 | 0.749003 1 234 | 0.120235 1 235 | 0.149039 1 236 | 0.089965 1 237 | 0.229713 1 238 | 0.060003 14 239 | 0.140150 2 240 | 0.329000 53 241 | 0.005896 1 242 | 0.503175 1 243 | 0.209653 1 244 | 0.119633 2 245 | 0.099970 37 246 | 0.007825 57 247 | 0.206344 34 248 | 0.302896 44 249 | 0.038669 119 250 | 0.010165 84 251 | 0.000170 72 252 | 0.002058 43 253 | 0.524784 1 254 | 0.554525 1 255 | 0.089589 1 256 | 0.080095 1 257 | 0.089779 1 258 | 0.060296 1 259 | 0.079445 1 260 | 0.089671 1 261 | 0.079983 1 262 | 0.069946 1 263 | 0.039859 1 264 | 0.226642 1 265 | 0.192966 1 266 | 0.069641 1 267 | 0.059982 1 268 | 0.079905 1 269 | 0.419810 1 270 | 0.089363 2 271 | 0.209631 74 272 | 0.321568 34 273 | 0.329943 44 274 | 0.023163 19 275 | 0.010763 13 276 | 0.000174 15 277 | 0.000396 71 278 | 0.004004 37 279 | 0.673782 6 280 | 1.173992 37 281 | 1.132104 8 282 | 0.563899 31 283 | 0.002914 43 284 | 0.001170 6 285 | 1.193439 43 286 | 0.545291 8 287 | 2.837324 49 288 | 0.003276 53 289 | 0.001006 6 290 | 1.864504 43 291 | 0.992063 8 292 | 0.585735 49 293 | 0.002998 31 294 | 0.000408 6 295 | -------------------------------------------------------------------------------- /2013/1015-PBS/02-first_submission/01-first_submission.time: -------------------------------------------------------------------------------- 1 | 0.332915 109 2 | 0.658449 1 3 | 1.114697 1 4 | 0.021945 4 5 | 0.467698 4 6 | 0.179490 2 7 | 3.412968 109 8 | 0.001893 1 9 | 0.043671 1 10 | 0.045579 2 11 | 0.535718 197 12 | 0.018259 109 13 | 0.002785 1 14 | 0.497540 1 15 | 0.079012 1 16 | 0.148550 1 17 | 1.075554 1 18 | 0.378937 1 19 | 0.072127 1 20 | 0.206449 1 21 | 0.199359 1 22 | 0.050167 1 23 | 0.054704 1 24 | 0.214423 1 25 | 0.177306 4 26 | 0.518669 1 27 | 0.607045 1 28 | 0.441124 1 29 | 0.792492 1 30 | 0.110109 4 31 | 0.379033 4 32 | 0.148908 1 33 | 0.298088 1 34 | 0.448385 4 35 | 0.268542 4 36 | 0.139436 1 37 | 0.319207 1 38 | 0.120120 1 39 | 0.299875 1 40 | 0.430007 2 41 | 1.499800 174 42 | 0.031995 299 43 | 0.000036 106 44 | 0.000306 109 45 | 0.002376 1 46 | 7.477690 1 47 | 0.061973 1 48 | 0.097943 1 49 | 0.068039 1 50 | 0.092966 18 51 | 0.187898 2 52 | 0.338191 149 53 | 0.003773 1 54 | 0.085329 1 55 | 0.131602 2 56 | 0.066744 73 57 | 0.008055 149 58 | 0.004155 1 59 | 0.569002 1 60 | 0.017293 1 61 | 0.109976 2 62 | 0.170144 72 63 | 0.000213 149 64 | 0.003043 1 65 | 2.032141 1 66 | 0.242940 1 67 | 0.062192 1 68 | 0.189781 1 69 | 0.000124 1 70 | 0.452854 1 71 | 0.120030 1 72 | 0.060311 1 73 | 0.119789 17 74 | 0.000174 2 75 | 0.515722 33 76 | 0.075684 149 77 | 0.004044 1 78 | 1.992161 1 79 | 0.088616 1 80 | 0.097149 1 81 | 0.062914 1 82 | 0.070229 1 83 | 0.089009 1 84 | 0.071441 2 85 | 0.475025 9 86 | 0.008312 81 87 | 0.000323 76 88 | 0.000184 149 89 | 0.003923 1 90 | 1.449213 1 91 | 0.067391 1 92 | 0.026653 1 93 | 0.054478 1 94 | 0.084219 1 95 | 0.540305 1 96 | 0.553940 1 97 | 0.288057 1 98 | 0.160871 1 99 | 0.077395 2 100 | 0.262960 1 101 | 1.489097 1 102 | 0.031000 1 103 | 0.258989 18 104 | 0.196199 2 105 | 0.770746 34 106 | 0.012551 71 107 | 0.000031 44 108 | 0.000310 30 109 | 0.000153 39 110 | 0.000135 39 111 | 0.000137 60 112 | 0.000145 53 113 | 0.000157 44 114 | 0.000200 30 115 | 0.000128 101 116 | 0.000153 41 117 | 0.000176 19 118 | 0.000116 49 119 | 0.000136 27 120 | 0.000029 30 121 | 0.000210 27 122 | 0.000033 86 123 | 0.000031 133 124 | 0.000072 30 125 | 0.000053 15 126 | 0.000028 30 127 | 0.000012 82 128 | 0.000037 83 129 | 0.000063 36 130 | 0.000013 71 131 | 0.000045 94 132 | 0.000039 29 133 | 0.000076 46 134 | 0.000013 48 135 | 0.000023 15 136 | 0.000011 22 137 | 0.000407 18 138 | 0.978246 89 139 | 0.000053 53 140 | 0.000022 27 141 | 0.000038 45 142 | 0.000029 15 143 | 0.000042 92 144 | 0.000070 69 145 | 0.000022 133 146 | 0.000060 133 147 | 0.000062 133 148 | 0.000060 133 149 | 0.000061 133 150 | 0.000060 133 151 | 0.000061 133 152 | 0.000122 133 153 | 0.000047 133 154 | 0.000062 133 155 | 0.000060 133 156 | 0.000061 133 157 | 0.000060 133 158 | 0.000061 129 159 | 0.000070 149 160 | 0.003441 1 161 | 0.731760 4 162 | 1.332388 6 163 | -------------------------------------------------------------------------------- /2013/1015-PBS/02-first_submission/02-first_submission.time: -------------------------------------------------------------------------------- 1 | 0.817962 149 2 | 0.512272 1 3 | 1.882153 1 4 | 0.205499 1 5 | 0.085064 1 6 | 0.200745 1 7 | 0.000109 1 8 | 0.472444 1 9 | 0.092828 1 10 | 0.714289 1 11 | 0.386124 1 12 | 0.712167 1 13 | 0.109607 1 14 | 0.090376 1 15 | 0.063548 1 16 | 0.193105 1 17 | 0.105335 1 18 | 0.274014 1 19 | 0.184442 1 20 | 0.198196 1 21 | 0.252445 1 22 | 0.053293 1 23 | 0.120202 1 24 | 0.148902 1 25 | 0.065921 1 26 | 0.053217 1 27 | 0.080533 1 28 | 0.465320 1 29 | 0.160518 1 30 | 0.263556 1 31 | 0.904697 1 32 | 1.241647 1 33 | 2.446130 1 34 | 0.323668 1 35 | 0.177180 1 36 | 0.052612 1 37 | 0.173148 1 38 | 0.074798 1 39 | 0.508694 1 40 | 0.327975 1 41 | 0.072643 1 42 | 0.129588 1 43 | 0.103200 1 44 | 0.085934 1 45 | 0.186137 1 46 | 0.086131 1 47 | 0.245007 1 48 | 0.081040 1 49 | 0.042791 1 50 | 0.530887 1 51 | 0.561336 1 52 | 0.451059 1 53 | 0.271719 1 54 | 0.196552 4 55 | 0.304173 4 56 | 0.098557 1 57 | 0.123619 1 58 | 0.205278 1 59 | 2.515486 1 60 | 0.656197 1 61 | 0.032198 1 62 | 0.186820 16 63 | 0.000139 1 64 | 1.492179 1 65 | 0.065539 1 66 | 0.247281 1 67 | 0.154594 3 68 | 0.000030 2 69 | 0.512351 33 70 | 0.066164 149 71 | 0.004021 1 72 | 2.243752 1 73 | 0.180839 1 74 | 0.040453 1 75 | 0.061700 1 76 | 0.111000 1 77 | 0.206629 1 78 | 2.478458 1 79 | 0.100293 1 80 | 0.099153 1 81 | 0.059952 1 82 | 0.132695 1 83 | 0.155352 1 84 | 0.092358 1 85 | 0.035418 1 86 | 0.170246 1 87 | 0.078812 1 88 | 0.058214 1 89 | 0.148921 1 90 | 0.125346 1 91 | 0.068737 1 92 | 0.109663 1 93 | 0.089371 1 94 | 0.187261 1 95 | 0.057745 1 96 | 0.071779 1 97 | 0.062209 1 98 | 0.091453 1 99 | 0.451225 1 100 | 0.227104 1 101 | 0.201372 1 102 | 0.046614 1 103 | 0.054833 1 104 | 0.075092 1 105 | 0.136664 1 106 | 0.139312 1 107 | 0.144009 1 108 | 0.079729 1 109 | 0.066407 1 110 | 0.093880 1 111 | 0.227992 1 112 | 0.066714 1 113 | 0.123022 1 114 | 0.098904 1 115 | 0.131450 1 116 | 0.185630 1 117 | 0.022204 1 118 | 0.150797 1 119 | 0.112097 1 120 | 0.052444 1 121 | 0.153640 1 122 | 0.026801 1 123 | 0.249278 1 124 | 0.121998 1 125 | 0.396320 1 126 | 0.142447 1 127 | 0.431054 2 128 | 0.360723 51 129 | 0.000440 149 130 | 0.003474 1 131 | 0.302694 1 132 | 0.126469 2 133 | 0.168196 101 134 | 0.035935 149 135 | 0.003996 1 136 | 1.406419 1 137 | 0.207015 1 138 | 0.079892 1 139 | 0.072227 1 140 | 0.093773 1 141 | 0.220387 1 142 | 0.331027 1 143 | 0.125746 1 144 | 0.135464 1 145 | 0.108042 1 146 | 0.125441 1 147 | 0.068201 1 148 | 0.063058 1 149 | 0.105020 1 150 | 0.080404 1 151 | 0.093929 1 152 | 0.136442 1 153 | 0.109858 1 154 | 0.106688 1 155 | 0.023009 1 156 | 0.167471 1 157 | 0.072984 1 158 | 0.097892 1 159 | 0.098884 1 160 | 0.233027 1 161 | 0.061788 1 162 | 0.079448 1 163 | 0.225450 1 164 | 0.053454 1 165 | 0.144021 1 166 | 0.177028 1 167 | 0.486971 1 168 | 0.212150 2 169 | 0.234400 27 170 | 0.000894 182 171 | 0.003341 1 172 | 0.787987 1 173 | 0.079005 4 174 | 3.197040 4 175 | 0.167570 1 176 | 2.326296 7 177 | 0.422019 2 178 | 0.554317 27 179 | 0.000332 149 180 | 0.003249 1 181 | 1.165733 1 182 | 0.080446 2 183 | 0.091483 149 184 | 0.036930 1 185 | 1.161484 1 186 | 0.103992 1 187 | 0.093667 1 188 | 0.004446 1 189 | 0.056590 1 190 | 0.092430 1 191 | 0.042806 2 192 | 0.110522 10 193 | 0.012797 82 194 | 0.000321 77 195 | 0.000145 77 196 | 0.000027 77 197 | 0.000022 77 198 | 0.000021 73 199 | 0.000086 73 200 | 0.000022 80 201 | 0.000009 77 202 | 0.000019 77 203 | 0.000020 80 204 | 0.000020 80 205 | 0.000020 149 206 | 0.003740 1 207 | 1.152288 1 208 | 0.044704 1 209 | 0.114692 1 210 | 0.044986 1 211 | 0.097398 1 212 | 0.367210 1 213 | 0.124953 1 214 | 0.069329 1 215 | 0.209285 13 216 | 0.000139 1 217 | 2.535162 1 218 | 0.153579 1 219 | 0.250697 1 220 | 0.239324 1 221 | 0.000146 1 222 | 1.673659 6 223 | 0.098081 2 224 | 1.239299 34 225 | 0.016238 72 226 | 0.000014 62 227 | 0.000212 76 228 | 0.000156 60 229 | 0.000141 30 230 | 0.000127 101 231 | 0.000147 41 232 | 0.000552 72 233 | 0.000130 49 234 | 0.000133 27 235 | 0.000094 33 236 | 0.000135 27 237 | 0.000163 86 238 | 0.000153 133 239 | 0.000165 30 240 | 0.000119 15 241 | 0.000108 30 242 | 0.000120 82 243 | 0.000137 103 244 | 0.000198 71 245 | 0.000153 19 246 | 0.000011 95 247 | 0.000181 29 248 | 0.000116 22 249 | 0.000122 24 250 | 0.000095 48 251 | 0.000135 15 252 | 0.000010 17 253 | 0.000185 23 254 | 0.000109 23 255 | 0.000114 43 256 | 0.000117 22 257 | 0.000417 38 258 | 0.588936 43 259 | 0.000017 35 260 | 0.000010 10 261 | 0.000018 15 262 | 0.000275 93 263 | 0.000059 69 264 | 0.000036 133 265 | 0.000070 133 266 | 0.000052 133 267 | 0.000070 133 268 | 0.000061 133 269 | 0.000074 133 270 | 0.000050 133 271 | 0.000055 133 272 | 0.000061 133 273 | 0.000061 133 274 | 0.000060 133 275 | 0.000061 133 276 | 0.000060 133 277 | 0.000060 129 278 | 0.000081 149 279 | 0.003530 6 280 | -------------------------------------------------------------------------------- /2013/1015-PBS/02-first_submission/03-first_submission.time: -------------------------------------------------------------------------------- 1 | 0.596119 149 2 | 0.516825 1 3 | 2.390344 1 4 | 0.076705 1 5 | 1.128477 1 6 | 0.074303 1 7 | 0.144025 1 8 | 0.062012 1 9 | 0.211337 12 10 | 0.000138 1 11 | 0.832368 1 12 | 0.036345 1 13 | 0.249647 4 14 | 0.197333 2 15 | 1.432332 62 16 | 0.009869 60 17 | 0.000269 61 18 | 0.000234 69 19 | 0.000261 25 20 | 0.000172 68 21 | 0.000201 81 22 | 0.000212 66 23 | 0.000267 2 24 | 0.000162 18 25 | 0.000147 152 26 | 0.000195 33 27 | 0.103599 52 28 | 0.000982 74 29 | 0.000232 1 30 | 19.285354 1 31 | 0.093720 1 32 | 0.145872 1 33 | 0.146528 1 34 | 0.112498 1 35 | 0.079231 1 36 | 0.118548 1 37 | 0.124679 1 38 | 0.167361 1 39 | 0.154241 2 40 | 0.596655 1 41 | 6.842308 1 42 | 0.166079 1 43 | 0.151355 1 44 | 0.364772 1 45 | 0.261718 1 46 | 0.199186 1 47 | 0.054768 1 48 | 0.082262 1 49 | 0.143964 1 50 | 0.117842 1 51 | 0.025937 1 52 | 0.122318 1 53 | 0.004159 1 54 | 0.148558 1 55 | 0.072287 1 56 | 0.098885 1 57 | 0.079903 4095 58 | 0.078213 4095 59 | 0.000072 102 60 | 0.000393 2067 61 | 0.000203 1639 62 | 0.000019 25 63 | 0.008441 149 64 | 0.003767 1 65 | 0.000263 1 66 | 0.000241 1 67 | 0.000278 1 68 | 0.000221 1 69 | 0.000237 1 70 | 0.000229 1 71 | 0.000310 1 72 | 0.000242 1 73 | 0.000243 1 74 | 0.000248 2 75 | 0.000184 37 76 | 0.157000 149 77 | 0.003766 1 78 | 0.000202 1 79 | 0.000155 1 80 | 0.000149 1 81 | 0.000170 1 82 | 0.000156 1 83 | 0.000157 1 84 | 0.000207 1 85 | 0.000160 1 86 | 0.000161 1 87 | 0.000158 1 88 | 0.000153 1 89 | 0.000171 1 90 | 0.000195 1 91 | 0.000173 1 92 | 0.000175 1 93 | 0.000176 1 94 | 0.000177 17 95 | 1.374628 3 96 | 0.065028 1 97 | 1.638728 1 98 | 0.187398 1 99 | 0.047258 1 100 | 0.072341 1 101 | 0.112766 1 102 | 0.274008 1 103 | 0.464967 1 104 | 0.192185 1 105 | 0.069532 1 106 | 0.240331 1 107 | 0.187000 2 108 | 0.311782 27 109 | 0.000939 160 110 | 0.003630 1 111 | 1.816962 7 112 | 0.366945 2 113 | 0.262341 5 114 | 0.000309 149 115 | 0.003678 1 116 | 2.667555 1 117 | 0.218042 1 118 | 0.142551 1 119 | 0.091299 1 120 | 0.122099 1 121 | 0.174928 1 122 | 0.444331 1 123 | 0.146473 1 124 | 0.055010 1 125 | 0.481029 1 126 | 0.296983 2 127 | 0.228071 6 128 | 0.000303 149 129 | 0.003203 6 130 | -------------------------------------------------------------------------------- /2013/1015-PBS/02-first_submission/first_submission-v2.sh: -------------------------------------------------------------------------------- 1 | #!/bin/env bash 2 | echo "This is essentially the same submission script as last time." 3 | echo "Except all those parameters will need to be specified on the command line." 4 | echo "In addition, this time I will source my bashrc explicitly." 5 | 6 | ## Keep in mind the bashrc should never print _anything_ to screen 7 | source ${HOME}/.bashrc 8 | 9 | if [ -n "$PBS_O_WORKDIR" ]; then 10 | cd $PBS_O_WORKDIR 11 | fi 12 | 13 | echo "Find out the executing host:" 14 | uname -a 15 | echo "Find out the current working directory:" 16 | pwd 17 | echo "Find out if any PBS specific variables are set:" 18 | env | grep "^PBS" 19 | -------------------------------------------------------------------------------- /2013/1015-PBS/02-first_submission/first_submission-v3.sh: -------------------------------------------------------------------------------- 1 | #!/bin/env bash 2 | echo "This is essentially the same submission script as last time." 3 | echo "Except this time I will call qsub from within this script." 4 | echo "This makes use of a variable I have in my bashrc 'PBS_ARGS'" 5 | echo "Ok, a big caveat: cat < $DONE 30 | EOF 31 | 32 | echo "At this time the script has been submitted to pbs." 33 | echo "I can tell this script to wait until the script is finished executing..." 34 | while [ /bin/true ]; do 35 | sleep 5 36 | if [ -r "$DONE" ]; then 37 | cat $DONE 38 | rm $DONE 39 | break 40 | fi 41 | done 42 | echo "The script is finished." 43 | -------------------------------------------------------------------------------- /2013/1015-PBS/02-first_submission/first_submission.sh: -------------------------------------------------------------------------------- 1 | #!/bin/env bash 2 | #PBS -N first_submission 3 | #PBS -j eo -e ${HOME}/logs/first_submission.log 4 | #PBS -V 5 | #PBS -S /bin/bash 6 | #PBS -l mem=400MB,walltime=00:01:00 7 | #PBS -q high_throughput 8 | #PBS -m n 9 | 10 | echo "This is a simple submission script for PBS" 11 | echo "It is requesting, in order: " 12 | echo "1. To get the name 'first_submission'" 13 | echo "2. To join the output and error logs." 14 | echo "4. To receive the shell environment from the calling host." 15 | echo "5. To use 400 megs of ram and 1 minute of CPU time." 16 | echo "6. And to enter the high_throughput queue." 17 | echo "7. I don't want any emails from pbs." 18 | 19 | echo "Find out the executing host:" 20 | uname -a 21 | echo "Find out the current working directory:" 22 | pwd 23 | echo "Find out if any PBS specific variables are set:" 24 | env | grep "^PBS" 25 | -------------------------------------------------------------------------------- /2013/1015-PBS/02-first_submission/first_submission_v2.e46373: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/umd-byob/presentations/23ad5271b4fcd24c6c3ebc94eb42e1b9eb19bd44/2013/1015-PBS/02-first_submission/first_submission_v2.e46373 -------------------------------------------------------------------------------- /2013/1015-PBS/02-first_submission/first_submission_v2.o46373: -------------------------------------------------------------------------------- 1 | There are 117 processes running. 2 | 19:37:26 up 24 days, 23:11, 0 users, load average: 0.02, 0.01, 0.00 3 | This is essentially the same submission script as last time. 4 | Except all those parameters will need to be specified on the command line. 5 | In addition, this time I will source my bashrc explicitly. 6 | Find out the executing host: 7 | Linux beech.umiacs.umd.edu 2.6.18-348.16.1.el5 #1 SMP Sat Jul 27 01:05:23 EDT 2013 x86_64 GNU/Linux 8 | Find out the current working directory: 9 | /cbcbhomes/abelew/byob/presentations/2013/1015-PBS/02-first_submission 10 | Find out if any PBS specific variables are set: 11 | PBS_VERSION=TORQUE-2.5.12 12 | PBS_JOBNAME=first_submission_v2 13 | PBS_ENVIRONMENT=PBS_BATCH 14 | PBS_O_WORKDIR=/cbcbhomes/abelew/byob/presentations/2013/1015-PBS/02-first_submission 15 | PBS_ARGS= -V -S /cbcb/lab/nelsayed/local/bin/bash -q throughput -l walltime=12:00:00 -j eo -e ibissub00.umiacs.umd.edu:/cbcbhomes/abelew/outputs/pbs.out -m n 16 | PBS_TASKNUM=1 17 | PBS_O_HOME=/cbcbhomes/abelew 18 | PBS_LARGE= -V -S /cbcb/lab/nelsayed/local/bin/bash -q large -l walltime=72:00:00 19 | PBS_LOG= -j eo -e ibissub00.umiacs.umd.edu:/cbcbhomes/abelew/outputs/pbs.out -m n 20 | PBS_WALLTIME=10800 21 | PBS_GPUFILE=/var/spool/torque/aux//46373.cbcbtorque.umiacs.umd.edugpu 22 | PBS_MOMPORT=15003 23 | PBS_WORKSTATION= -V -S /cbcb/lab/nelsayed/local/bin/bash -q workstation -l walltime=12:00:00 24 | PBS_O_QUEUE=high_throughput 25 | PBS_O_LOGNAME=abelew 26 | PBS_O_LANG=en_US.UTF-8 27 | PBS_JOBCOOKIE=DDE6F12BB3044ABABDB9AD35F0E0B404 28 | PBS_NODENUM=0 29 | PBS_NUM_NODES=1 30 | PBS_O_SHELL=/bin/bash 31 | PBS_SERVER=cbcbtorque 32 | PBS_JOBID=46373.cbcbtorque.umiacs.umd.edu 33 | PBS_O_HOST=ibissub00.umiacs.umd.edu 34 | PBS_VNODENUM=0 35 | PBS_QUEUE=high_throughput 36 | PBS_O_MAIL=/var/spool/mail/abelew 37 | PBS_NP=1 38 | PBS_NUM_PPN=1 39 | PBS_THROUGHPUT= -V -S /cbcb/lab/nelsayed/local/bin/bash -q throughput -l walltime=12:00:00 40 | PBS_NODEFILE=/var/spool/torque/aux//46373.cbcbtorque.umiacs.umd.edu 41 | PBS_O_PATH=/cbcb/lab/nelsayed/local/bin:/sbin:/usr/sbin:/usr/local/sbin:/opt/UMmaui/bin:/opt/UMtorque/bin:/usr/local/stow/gcc-4.6.0/bin:/cbcb/lab/nelsayed/local/texlive/2013/bin/x86_64-linux:/cbcb/lab/nelsayed/programs/epd-7.2-2-rh5-x86_64/bin:/usr/local/stow/autoconf-2.65/bin:/cbcb/lab/nelsayed/programs/allpaths/bin:/cbcb/lab/nelsayed/programs/ncbi-blast-2.2.27+/bin:/cbcb/lab/nelsayed/programs/blast-2.2.26/bin:/cbcb/lab/nelsayed/programs/bedtools-2.17.0/bin:/cbcb/lab/nelsayed/programs/bwa-0.6.2:/cbcb/lab/nelsayed/programs/cd-hit-v4.6.1-2012-08-27:/cbcb/lab/nelsayed/programs/cufflinks-2.0.0.Linux_x86_64:/cbcb/lab/nelsayed/programs/FastQC:/cbcb/lab/nelsayed/programs/gmap-2012-04-04/bin:/cbcb/lab/nelsayed/programs/Grinder-0.4.5/script:/cbcb/lab/nelsayed/programs/hmmer-3.0-linux-intel-x86_64/binaries:/cbcb/lab/nelsayed/programs/IGV_EA:/cbcb/lab/nelsayed/programs/mauve_snapshot_2012-06-07:/cbcb/lab/nelsayed/programs/prinseq-lite-0.20.3:/cbcb/lab/nelsayed/programs/rsem-1.1.18-modified:/cbcb/lab/nelsayed/programs/samtools-0.1.18:/cbcb/lab/nelsayed/programs/signalp-4.1:/cbcb/lab/nelsayed/programs/SOAPsplice-v1.9/bin:/cbcb/lab/nelsayed/programs/SOAPdenovo-Trans:/cbcb/lab/nelsayed/programs/subread-1.2.0/bin:/cbcb/lab/nelsayed/programs/tmhmm-2.0c/bin:/cbcb/lab/nelsayed/programs/tophat-2.0.7.Linux_x86_64:/cbcb/lab/nelsayed/programs/tools:/cbcb/lab/nelsayed/programs/trinityrnaseq_r2013-02-25:/cbcb/lab/nelsayed/programs/trinityrnaseq_r2013-02-25/util:/cbcb/lab/nelsayed/programs/trinityrnaseq_r2013-02-25/util/RSEM_util:/cbcb/lab/nelsayed/programs/velvet_1.2.03:/cbcb/lab/nelsayed/programs/bin:/cbcb/lab/nelsayed/local/haskell/bin:/cbcb/lab/nelsayed/scripts:/cbcb/lab/nelsayed/scripts/PBS:/usr/local/stow/xfce-4.4.3/bin:/cbcb/lab/nelsayed/local/bin:/bin:/usr/local/bin:/usr/bin:/cbcb/lab/nelsayed/local/biopieces/bp_bin 42 | -------------------------------------------------------------------------------- /2013/1015-PBS/02-first_submission/submissionv2.e46372: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/umd-byob/presentations/23ad5271b4fcd24c6c3ebc94eb42e1b9eb19bd44/2013/1015-PBS/02-first_submission/submissionv2.e46372 -------------------------------------------------------------------------------- /2013/1015-PBS/02-first_submission/submissionv2.o46372: -------------------------------------------------------------------------------- 1 | There are 117 processes running. 2 | 19:35pm up 24 days 23:10, 0 users, load average: 0.08, 0.02, 0.01 3 | This is essentially the same submission script as last time. 4 | Except all those parameters will need to be specified on the command line. 5 | In addition, this time I will source my bashrc explicitly. 6 | You appear to have an execute loop. 7 | This may happen if your .bashrc calls your .bash_profile 8 | and the .bash_profile calls .bashrc; similarly, having 9 | newgrp(1) in the .bashrc could cause this. 10 | exiting now. 11 | -------------------------------------------------------------------------------- /2013/1015-PBS/03-real_examples/01-wdarocha_bwa2.sh: -------------------------------------------------------------------------------- 1 | #PBS -S /bin/bash 2 | #PBS -l mem=24GB,walltime=72:00:00 3 | #PBS -q large 4 | #PBS -N BWA_JC 5 | #PBS -j eo -e ibissub01.umiacs.umd.edu:/cbcb/personal-scratch/wdarocha/Trinity/Jaccard_Clip/BWA 6 | 7 | source /cbcb/lab/nelsayed/scripts/dotbashrc 8 | 9 | bwa bwasw /cbcb/lab/nelsayed/ref_data/tcruzi_clbrener/genome/tc_combined/nonemser_with_unassigned_contigs/TcruziNonEsmeraldo_UnassignedContigs-4.1.fasta /cbcb/personal-scratch/wdarocha/Trinity/Jaccard_Clip/Gstrain_TranscriptomeAssembly_Trinity_JC_SingleLine_HigherThan450bp_SLplusminusSameStrand.fasta > bwa_trinity_output_Gstrain_TranscriptomeAssembly_NonEsmPlusUnassigned.sam 10 | 11 | -------------------------------------------------------------------------------- /2013/1015-PBS/03-real_examples/02-tophat.sh: -------------------------------------------------------------------------------- 1 | #PBS -S /bin/bash 2 | #PBS -l mem=6GB,walltime=72:00:00 3 | #PBS -N wdr_gfp 4 | #PBS -j eo -e /cbcb/personal-scratch/wdarocha/gfp/tophat.out 5 | set -x 6 | base=/cbcb/personal-scratch/wdarocha/HPGL0120 7 | 8 | cd $base 9 | 10 | export TOPHAT="/cbcb/lab/nelsayed/programs/tophat-2.0.5.Linux_x86_64/tophat" 11 | export OPTIONS="-p 4 -o $base/processed/tophat_HPGL0120_Esm/tophat_HPGL0120_EL.out" 12 | export REFLIB="-G /cbcb/lab/nelsayed/ref_data/tcruzi/tcruzi_clbrener/annotation_tcruzi_clbrener/tc_esmer/TcruziEsmeraldo-Like_TriTrypDB-4.1_CDS_exon_gene_psu.gtf " 13 | export JUNCS="--no-novel-juncs /cbcb/lab/nelsayed/ref_data/tcruzi/tcruzi_clbrener/genome_tcruzi_clbrener/tc_esmer/TcruziEsmeraldo-LikeGenomic_TriTrypDB-4.1 " 14 | export INPUTS="$base/processed/tophat_HPGL0120_Esm/HPGL0120_EL.fastq " 15 | 16 | $TOPHAT $OPTIONS $REFLIB $JUNCS $INPUTS 17 | 18 | -------------------------------------------------------------------------------- /2013/1015-PBS/03-real_examples/03-rRNA_filter.sh: -------------------------------------------------------------------------------- 1 | #!/bin/env bash 2 | source ${LAB}/scripts/dotbashrc 3 | 4 | export MYWD=${PWD} 5 | export QSUB_ARGS="$PBS_ARGS -" 6 | export RNA_LIB="../libraries/rRNA" 7 | export IN="rp.fasta" 8 | export MYTEST=0 9 | unset MYTEST 10 | export MYLOG="/cbcbhomes/abelew/project_scratch/riboseq/014/test.out" 11 | 12 | 13 | cat <<"EOF" | qsub $QSUB_ARGS 14 | source ${LAB}/scripts/dotbashrc 15 | cd $MYWD 16 | start_log 17 | 18 | coproc dstat -t -c -d -n -g -m -y -p -r --fs --tcp --vm --proc-count --output=${MYWD}/${PBS_JOBNAME}.csv 2>>${MYWD}/${PBS_JOBNAME}_dstat.txt 1>&2 19 | 20 | export BOB="testme" 21 | echo "" 22 | env > $MYLOG 23 | ls >> $MYLOG 24 | 25 | echo "echoing bob $BOB" >> $MYLOG 26 | 27 | CORES=" -p $(cpus) " 28 | MULTALIGN=${MULTIALIGN-" -k 1 "} 29 | TRIM3p=${TRIM3p-" -3 0 "} 30 | SEEDARGS=${SEEDARGS-" -D 5 -R 1 -N 0 -L 10 -i S,0,2.50 "} 31 | FILETYPE=${FILETYPE-" -f "} 32 | REPORTARGS=${REPORTARGS-" --un ${MYWD}/unsorted.fasta --al-gz ${MYWD}/ribosomal.fasta.gz --no-head --no-sq "} 33 | 34 | 35 | CMD="bowtie2 -x ${RNA_LIB} ${FILETYPE} -U ${MYWD}/${IN} -S ${MYWD}/${IN}.sam ${SEEDARGS} ${TRIM3p} ${MULTALIGN} ${REPORTARGS} ${CORES} 2>${MYWD}/${IN}.stderr 1>${MYWD}/${IN}.stdout" 36 | 37 | if [ -n "$MYTEST" ]; then 38 | echo "Would run:" >> $MYLOG 39 | echo "$CMD" >> $MYLOG 40 | echo "This can be modified by exporting following variables:" >> $MYLOG 41 | echo "MULTIALIGN, TRIM3p, SEEDARGS, FILETYPE, REPORTARGS" >> $MYLOG 42 | else 43 | eval $CMD 44 | fi 45 | 46 | if [[ "$?" = "0" ]]; then 47 | echo "success $? in ${SECONDS}" >> ${MYWD}/status.txt 48 | else 49 | echo "fail $? in ${SECONDS}" >> ${MYWD}/status.txt 50 | fi 51 | 52 | EOJ 53 | -------------------------------------------------------------------------------- /2013/1015-PBS/03-real_examples/04-CountTable_qsub_WDR.sh: -------------------------------------------------------------------------------- 1 | #!/cbcb/lab/nelsayed/local/bin/bash 2 | source /cbcb/lab/nelsayed/scripts/dotbashrc 3 | MYHIST=$HOME/.readline_inputs 4 | history -r $MYHIST 5 | history -a $MYHIST 6 | history -n $MYHIST 7 | echo "Please provide an accepted_hits_sorted.sam file:" 8 | read -e -i ${WD} 9 | #export INPUT=$REPLY 10 | ## I think this will remove tailing spaces 11 | #export INPUT={INPUT%%*( )} 12 | export INPUT=${REPLY%%*( )} 13 | history -s ${INPUT} 14 | echo "Please provide a fasta.gff file:" 15 | read -e -i ${REF} 16 | #export ANN=$REPLY 17 | export ANN=${REPLY%%*( )} 18 | history -s ${ANN} 19 | echo "Please provide the output file:" 20 | read -e -i ${WD} 21 | #export OUTPUT=$REPLY 22 | export OUTPUT=${REPLY%%*( )} 23 | history -s ${OUTPUT} 24 | 25 | if [ -r "$INPUT" ]; then 26 | echo "Input: $INPUT is readable." 27 | else 28 | echo "Input: $INPUT is not readable, check the path." 29 | exit 1 30 | fi 31 | if [ -r "$ANN" ]; then 32 | echo "Annotations: $ANN is readable." 33 | else 34 | echo "Annotations: $ANN is not readable, check the path." 35 | exit 1 36 | fi 37 | if [ -z "$OUTPUT" ]; then 38 | echo "No output was provided." 39 | exit 1 40 | fi 41 | 42 | history -w 43 | 44 | #echo "HTSEQ options:" 45 | #read -e -i "htseq-count -q --stranded=no -t gene -i ID - " 46 | #export HTSEQ=$REPLY 47 | #export MYHTSEQ=${HTSEQ-"htseq-count -q --stranded=no -t gene -i ID - "} 48 | #export HTSEQCMD="samtools view ${INPUT} | ${MYHTSEQ} ${ANN} > ${OUTPUT}" 49 | 50 | 51 | #export PBS_ARGS=${PBS_ARGS-"-V -S ${LAB}/local/bin/bash -q workstation -l walltime=12:00:00 "} 52 | #export PBS_LOG=${PBS_LOG-" -j eo -e ibissub00.umiacs.umd.edu:${WD}/CountTable.out -m n"} 53 | 54 | #echo "Please double-check the PBS arguments:" 55 | #read -e -i "${PBS_ARGS}" 56 | #export PBS_ARGS=$REPLY 57 | #echo "And the logging arguments:" 58 | #read -e -i "${PBS_LOG}" 59 | #export PBS_LOG=$REPLY 60 | #QSUB_ARGS=" $PBS_ARGS $PBS_LOG -" 61 | #cat <<"EOF" | qsub $QSUB_ARGS 62 | ## If you wish to have changeable qsub arguments, look above. 63 | cat <<"EOF" | qsub - 64 | #!/usr/bin/bash 65 | #PBS -V 66 | #PBS -S /bin/bash 67 | #PBS -l mem=24GB,walltime=72:00:00 68 | #PBS -q large 69 | #PBS -N Comb_One_144_Count 70 | #PBS -j eo -e ${WD}/ErrorOutput.Count 71 | 72 | source ${LAB}/scripts/dotbashrc 73 | start_log ${WD}/CountTable.log 74 | 75 | CMD="samtools view ${INPUT} | htseq-count -q --stranded=no -t gene -i ID - ${ANN} > ${OUTPUT}" 76 | eval $CMD 77 | 78 | end_log 79 | 80 | EOF 81 | -------------------------------------------------------------------------------- /2013/1015-PBS/03-real_examples/06-tnseq_align.sh: -------------------------------------------------------------------------------- 1 | #!/bin/env bash 2 | . /cbcb/lab/nelsayed/scripts/dotbashrc 3 | . ~abelew/.bash_aliases 4 | 5 | export BASELIB_DIR=~abelew/project_scratch/libraries 6 | export MYWD=$(pwd) 7 | 8 | function Submit_Bowtie1 { 9 | export QUERY=$1 10 | export LIB=$2 11 | export OUTDIR=$3 12 | 13 | export INNAME=$(basename $QUERY .fasta.gz) 14 | ## Note that I am setting default alignment parameters to a single match which is _not_ randomly placed 15 | ## and no mismatches 16 | export BT_ARGS=${BTARGS-" -m 1 -k 1 -n 0 "} 17 | export SAMFILE="${INNAME}.sam" 18 | export OPTIONS="$BT_ARGS -f -S ${OUTDIR}/${SAMFILE}" 19 | 20 | export BT_LIB="$BASELIB_DIR/$LIB/$LIB" 21 | if [ ! -e "${BT_LIB}.1.ebwt" ]; then 22 | echo "The reference library: $BASELIB_DIR/$LIB/$LIB does not exist." 23 | return 1 24 | fi 25 | export BT_CMD="bowtie $OPTIONS $BT_LIB -1 $QUERY" 26 | export MYTEST=1 27 | 28 | if [ "$MYTEST" = "1" ]; then 29 | echo "Would run $BT_CMD" 30 | else 31 | cat <<"EOF" | qsub $QSUB_ARGS 32 | ## Body of the script to run bowtie goes here. 33 | source ${LAB}/scripts/dotbashrc 34 | echo "Changing directory to: ${MYWD}" 35 | cd ${MYWD} 36 | 37 | CORES=" -p $(cpus) " 38 | 39 | CMD="bowtie 40 | 41 | cd ${REFDIR} && ${BINDIR}/bowtie2 -x ${RNA_LIB} ${FILETYPE} -U ${MYWD}/${IN}.${NODE} -S ${MYWD}/${IN}.sam.${NODE} ${SEEDARGS} ${TRIM3p} ${MULTALIGN} ${REPORTARGS} ${CORES} 2>${MYWD}/${IN}.stderr.${NODE} 1>${MYWD}/${IN}.stdout.${NODE}" 42 | if [ -n "$MYTEST" ]; then 43 | echo "Would run:" 44 | echo "$CMD" 45 | echo "This can be modified by exporting following variables:" 46 | echo "MULTIALIGN, TRIM3p, SEEDARGS, FILETYPE, REPORTARGS" 47 | else 48 | eval $CMD 49 | fi 50 | 51 | if [[ "$?" = "0" ]]; then 52 | echo "$NODE success $? in ${SECONDS}" >> ${MYWD}/status.txt 53 | else 54 | echo "$NODE fail $? in ${SECONDS}" >> ${MYWD}/status.txt 55 | fi 56 | end_log 57 | 58 | EOF 59 | 60 | fi 61 | 62 | } 63 | 64 | for lib in $(/bin/ls *.fasta.gz); do 65 | for condition in random1_nomismatch random1_1mismatch unique_nomismatch unique_1mismatch; do 66 | Submit_Bowtie1 $lib gas "$MYWD/$condition" 67 | done 68 | done 69 | -------------------------------------------------------------------------------- /2013/1015-PBS/03-real_examples/08-read_fasta.pl: -------------------------------------------------------------------------------- 1 | #!/usr/bin/env perl 2 | use strict; 3 | use warnings; 4 | use POSIX; 5 | use Bio::Seq; 6 | use Bio::SearchIO::fasta; 7 | use Bio::SeqIO; 8 | use autodie; 9 | 10 | my %sequences = (); 11 | Make_Seq(); 12 | sub Make_Seq { 13 | my $assembly = new Bio::SeqIO(-format => 'fasta', -file => $ARGV[1]); 14 | while (my $query_seq = $assembly->next_seq()) { 15 | my $id = $query_seq->id; 16 | my $desc = $query_seq->desc; 17 | my $sequence = $query_seq->seq; 18 | $sequences{$id} = $sequence; 19 | } 20 | } 21 | 22 | 23 | #my $searchio = new Bio::SearchIO(-format => 'fasta', -file => 'compare_trinity_to_clbrener.out', -best => 1,); 24 | my $searchio = new Bio::SearchIO(-format => 'fasta', -file => $ARGV[0], -best => 1,); 25 | print "QUERYNAME\tChromosome\tStart\tEnd\t%ID\tScore\tSig\tCompLength\tHit_Ident\n"; 26 | 27 | my %handles = (); 28 | my @pcts = (50,55,60,65,70,75,80,85,90,95,100); 29 | for my $pct (@pcts) { 30 | my $filename = "${pct}_hits.txt"; 31 | open(my $tmp, ">$filename"); 32 | my $handle = $tmp; 33 | $handles{$pct} = $handle; 34 | } 35 | 36 | 37 | my @histogram = (); 38 | foreach my $c (0 .. 100) { 39 | $histogram[$c] = 0; 40 | } 41 | RESULT: while(my $result = $searchio->next_result()) { 42 | my $hit_count = 0; 43 | while(my $hit = $result->next_hit) { 44 | $hit_count++; 45 | my $query_name = $result->query_name(); 46 | my $query_length = $result->query_length(); 47 | my $accession = $hit->accession(); 48 | my $acc2 = $hit->name(); 49 | my $length = $hit->length(); 50 | my $score = $hit->raw_score(); 51 | my $sig = $hit->significance(); 52 | my $ident = ($hit->frac_identical() * 100); 53 | my ($start, $end, $hsp_identical, $hsp_cons); 54 | 55 | HSP: while (my $hsp = $hit->next_hsp) { 56 | $start = $hsp->start('subject'); 57 | ## Maybe want $hsp->start('subject'); 58 | $end = $hsp->end('subject'); 59 | $hsp_identical = $hsp->frac_identical('total'); 60 | $hsp_identical = sprintf("%.4f", $hsp_identical); 61 | ## $fun = sprintf("%2d %", $something, $somethingelse); 62 | $hsp_identical = $hsp_identical * 100; 63 | 64 | ## The following 6 lines define the histogram of how many components are x% identical to the genome. 65 | my $hsp_hist = floor($hsp_identical); 66 | if (defined($histogram[$hsp_hist])) { 67 | $histogram[$hsp_hist] = $histogram[$hsp_hist] + 1; 68 | } else { 69 | $histogram[$hsp_hist] = 1; 70 | } 71 | 72 | # $hsp_cons = $hsp->frac_cons('total'); 73 | last HSP; ## The 'HSP: ' is a label given to a logical loop 74 | ## If you then say next HSP; it will immediately stop processing the loop and move to the next iteration of the loop 75 | ## If instead you say last HSP; it will immediately break out of the loop. 76 | } 77 | 78 | ## This prints %identity with respect to the overlap of the component 79 | ## Then the score of the hit, its Evalue, and the component length. 80 | print STDOUT "${query_name}\t${acc2}\t${start}\t${end}\t${ident}\t${score}\t${sig}\t${query_length}\t${hsp_identical}\t\n"; 81 | PCT: for my $pct (@pcts) { 82 | if ($hsp_identical <= $pct) { 83 | select $handles{$pct}; 84 | print ">$query_name matches $acc2 at $start $end with ${ident}% identity 85 | $sequences{$query_name} 86 | "; 87 | next RESULT; 88 | } 89 | } 90 | } ## End each hit for a single result 91 | # print "$result had $hit_count hits\n"; 92 | 93 | } ## Finish looking at each sequence 94 | 95 | open(HIST, ">hist.txt"); 96 | foreach my $c (0 .. $#histogram) { 97 | print HIST "$c $histogram[$c]\n"; 98 | } 99 | close(HIST); 100 | -------------------------------------------------------------------------------- /2013/1015-PBS/03-real_examples/08-split_align-fasta.pl: -------------------------------------------------------------------------------- 1 | #!/bin/env perl 2 | use strict; 3 | use warnings; 4 | use autodie; 5 | use Getopt::Long; 6 | use Bio::SeqIO; 7 | use POSIX; 8 | use Pod::Usage; 9 | 10 | my %conf = ( 11 | number => 200, 12 | input => undef, 13 | lib => undef, 14 | ); 15 | 16 | my %options = ( 17 | 'num|n:s' => \$conf{number}, 18 | 'input|i:s' => \$conf{input}, 19 | 'lib|l:s' => \$conf{lib}, 20 | ); 21 | my $argv_result = GetOptions(%options); 22 | 23 | pod2usage({-message => q{Mandatory argument '--input or -i' is missing, containing the input fasta}, 24 | -exitval => 1, 25 | -verbose => 1,} 26 | ) unless $conf{input}; 27 | pod2usage({-message => q{Mandatory argument '--lib or -l' is missing, containing the fasta library}, 28 | -exitval => 1, 29 | -verbose => 1,} 30 | ) unless $conf{lib}; 31 | 32 | 33 | ## Number entries 23119 34 | ## Thus 116 entries in each fasta 35 | 36 | ## To arrive at this number, I just rounded up (/ 23119 200) 37 | ## However, we can count the number of fasta entries in the input file 38 | my $num_per_split = Get_Split(); 39 | print "Going to make $conf{num} directories with $num_per_split files each.\n"; 40 | Make_Directories($num_per_split); 41 | Make_Align(); 42 | 43 | sub Get_Split { 44 | my $input = $conf{input}; 45 | my $splits = $conf{number}; 46 | my $in = new Bio::SeqIO(-file => $input,); 47 | my $seqs = 0; 48 | while (my $in_seq = $in->next_seq()) { 49 | $seqs++; 50 | } 51 | my $ret = ceil($seqs / $splits); 52 | return($ret); 53 | } 54 | 55 | sub Make_Directories { 56 | my $num_per_split = shift; 57 | my $splits = $conf{number}; 58 | ## I am choosing to make directories starting at 1000 59 | ## This way I don't have to think about the difference from 60 | ## 99 to 100 (2 characters to 3) as long as no one splits more than 9000 ways... 61 | my $dir = 1000; 62 | 63 | for my $c ($dir .. ($dir + $splits)) { 64 | print "Making directory: split/$c\n"; 65 | system("mkdir -p split/$c"); 66 | } 67 | 68 | my $in = new Bio::SeqIO(-file => $conf{input},); 69 | my $count = 0; 70 | while (my $in_seq = $in->next_seq()) { 71 | my $id = $in_seq->id(); 72 | my $seq = $in_seq->seq(); 73 | open(OUT, ">>split/$dir/in.fasta"); 74 | print OUT ">$id 75 | $seq 76 | "; 77 | close(OUT); 78 | $count++; 79 | if ($count >= $num_per_split) { 80 | $count = 0; 81 | $dir++; 82 | my $last = $dir; 83 | $last--; 84 | print "Wrote $num_per_split entries to $last\n"; 85 | } ## End for each iteration of $num_per_split files 86 | } ## End while reading the fasta 87 | } 88 | 89 | sub Make_Align { 90 | my $array_end = 1000 + $conf{number}; 91 | my $array_string = qq"1000-${array_end}"; 92 | my $string = qq? 93 | cat <<"EOF" | qsub -t ${array_string} -V -S /cbcb/lab/nelsayed/local/bin/bash -q throughput -l walltime=18:00:00,mem=8Gb -j eo -e ibissub00.umiacs.umd.edu:/cbcbhomes/abelew/outputs/pbs.out -m n 94 | source /cbcb/lab/nelsayed/scripts/dotbashrc 95 | CMD="glsearch36 -m 1 -b 1 -d 1 \ 96 | $ENV{PWD}/split/\${PBS_ARRAYID}/in.fasta \ 97 | $conf{lib} \ 98 | 2> $ENV{PWD}/split/\${PBS_ARRAYID}/split.err \ 99 | 1> $ENV{PWD}/split/\${PBS_ARRAYID}/split.out" 100 | echo \$CMD 101 | eval \$CMD 102 | EOF 103 | ?; 104 | print "TESTME: 105 | $string\n"; 106 | 107 | open(CMD, "$string |"); 108 | while(my $line = ) { 109 | chomp $line; 110 | print "$line\n"; 111 | } 112 | close(CMD); 113 | print "Submission finished.\n"; 114 | } 115 | 116 | -------------------------------------------------------------------------------- /2013/1015-PBS/03-real_examples/09-read_blast.pl: -------------------------------------------------------------------------------- 1 | #!/usr/bin/env perl 2 | use strict; 3 | use warnings; 4 | use POSIX; 5 | use Bio::Seq; 6 | use Bio::SearchIO::blast; 7 | use Bio::SeqIO; 8 | use autodie; 9 | 10 | #my $searchio = new Bio::SearchIO(-format => 'fasta', -file => 'compare_trinity_to_clbrener.out', -best => 1,); 11 | my $searchio = new Bio::SearchIO(-format => 'blast', -file => $ARGV[0], -best => 1,); 12 | print "QUERYNAME\tChromosome\tStart\tEnd\t%ID\tScore\tSig\tCompLength\tHit_Ident\n"; 13 | RESULT: while(my $result = $searchio->next_result()) { 14 | my $hit_count = 0; 15 | while(my $hit = $result->next_hit) { 16 | $hit_count++; 17 | my $query_name = $result->query_name(); 18 | my $query_length = $result->query_length(); 19 | my $accession = $hit->accession(); 20 | my $acc2 = $hit->name(); 21 | my $acc3 = $hit->description(); 22 | my $length = $hit->length(); 23 | my $score = $hit->raw_score(); 24 | my $sig = $hit->significance(); 25 | my $ident = ($hit->frac_identical() * 100); 26 | my ($start, $end, $hsp_identical, $hsp_cons); 27 | 28 | HSP: while (my $hsp = $hit->next_hsp) { 29 | $start = $hsp->start('subject'); 30 | ## Maybe want $hsp->start('subject'); 31 | $end = $hsp->end('subject'); 32 | $hsp_identical = $hsp->frac_identical('total'); 33 | $hsp_identical = sprintf("%.4f", $hsp_identical); 34 | ## $fun = sprintf("%2d %", $something, $somethingelse); 35 | $hsp_identical = $hsp_identical * 100; 36 | 37 | # $hsp_cons = $hsp->frac_cons('total'); 38 | last HSP; ## The 'HSP: ' is a label given to a logical loop 39 | ## If you then say next HSP; it will immediately stop processing the loop and move to the next iteration of the loop 40 | ## If instead you say last HSP; it will immediately break out of the loop. 41 | } 42 | 43 | ## This prints %identity with respect to the overlap of the component 44 | ## Then the score of the hit, its Evalue, and the component length. 45 | print STDOUT "${query_name}\t${acc2}\t${acc3}\t${start}\t${end}\t${ident}\t${score}\t${sig}\t${query_length}\t${hsp_identical}\t\n"; 46 | next RESULT; 47 | } ## End each hit for a single result 48 | # print "$result had $hit_count hits\n"; 49 | 50 | } ## Finish looking at each sequence 51 | -------------------------------------------------------------------------------- /2013/1015-PBS/03-real_examples/09-split_align-blast.pl: -------------------------------------------------------------------------------- 1 | #!/bin/env perl 2 | use strict; 3 | use warnings; 4 | use autodie; 5 | use Getopt::Long; 6 | use Bio::SeqIO; 7 | use POSIX; 8 | use Pod::Usage; 9 | 10 | my %conf = ( 11 | number => 200, 12 | input => undef, 13 | lib => undef, 14 | ); 15 | 16 | my %options = ( 17 | 'num|n:s' => \$conf{number}, 18 | 'input|i:s' => \$conf{input}, 19 | 'lib|l:s' => \$conf{lib}, 20 | ); 21 | my $argv_result = GetOptions(%options); 22 | 23 | pod2usage({-message => q{Mandatory argument '--input or -i' is missing, containing the input fasta}, 24 | -exitval => 1, 25 | -verbose => 1,} 26 | ) unless $conf{input}; 27 | pod2usage({-message => q{Mandatory argument '--lib or -l' is missing, containing the fasta library}, 28 | -exitval => 1, 29 | -verbose => 1,} 30 | ) unless $conf{lib}; 31 | 32 | 33 | ## Number entries 23119 34 | ## Thus 116 entries in each fasta 35 | 36 | ## To arrive at this number, I just rounded up (/ 23119 200) 37 | ## However, we can count the number of fasta entries in the input file 38 | my $num_per_split = Get_Split(); 39 | print "Going to make $conf{num} directories with $num_per_split files each.\n"; 40 | Make_Directories($num_per_split); 41 | Make_Align(); 42 | 43 | sub Get_Split { 44 | my $input = $conf{input}; 45 | my $splits = $conf{number}; 46 | my $in = new Bio::SeqIO(-file => $input,); 47 | my $seqs = 0; 48 | while (my $in_seq = $in->next_seq()) { 49 | $seqs++; 50 | } 51 | my $ret = ceil($seqs / $splits); 52 | return($ret); 53 | } 54 | 55 | sub Make_Directories { 56 | my $num_per_split = shift; 57 | my $splits = $conf{number}; 58 | ## I am choosing to make directories starting at 1000 59 | ## This way I don't have to think about the difference from 60 | ## 99 to 100 (2 characters to 3) as long as no one splits more than 9000 ways... 61 | my $dir = 1000; 62 | 63 | for my $c ($dir .. ($dir + $splits)) { 64 | print "Making directory: split/$c\n"; 65 | system("mkdir -p split/$c"); 66 | } 67 | 68 | my $in = new Bio::SeqIO(-file => $conf{input},); 69 | my $count = 0; 70 | while (my $in_seq = $in->next_seq()) { 71 | my $id = $in_seq->id(); 72 | my $seq = $in_seq->seq(); 73 | open(OUT, ">>split/$dir/in.fasta"); 74 | print OUT ">$id 75 | $seq 76 | "; 77 | close(OUT); 78 | $count++; 79 | if ($count >= $num_per_split) { 80 | $count = 0; 81 | $dir++; 82 | my $last = $dir; 83 | $last--; 84 | print "Wrote $num_per_split entries to $last\n"; 85 | } ## End for each iteration of $num_per_split files 86 | } ## End while reading the fasta 87 | } 88 | 89 | sub Make_Align { 90 | my $array_end = 1000 + $conf{number}; 91 | my $array_string = qq"1000-${array_end}"; 92 | my $string = qq? 93 | cat <<"EOF" | qsub -t ${array_string} -V -S /cbcb/lab/nelsayed/local/bin/bash -q throughput -l walltime=18:00:00,mem=8Gb -j eo -e ibissub00.umiacs.umd.edu:/cbcbhomes/abelew/outputs/pbs.out -m n 94 | source /cbcb/lab/nelsayed/scripts/dotbashrc 95 | CMD="blastall -e 1000 -p blastx -d nr \ 96 | -i $ENV{PWD}/split/\${PBS_ARRAYID}/in.fasta" 97 | echo \$CMD 98 | eval \$CMD 99 | EOF 100 | ?; 101 | print "TESTME: 102 | $string\n"; 103 | 104 | open(CMD, "$string |"); 105 | while(my $line = ) { 106 | chomp $line; 107 | print "$line\n"; 108 | } 109 | close(CMD); 110 | print "Submission finished.\n"; 111 | } 112 | 113 | -------------------------------------------------------------------------------- /2013/1015-PBS/04-gotcha/46366.cbcbtorque.umiacs.umd.edu.ER: -------------------------------------------------------------------------------- 1 | ++ env 2 | +++ which bowtie2 3 | ++ BOWTIE=/cbcb/lab/nelsayed/programs/testbin/bowtie2 4 | ++ echo 'bowtie lies at /cbcb/lab/nelsayed/programs/testbin/bowtie2' 5 | +++ cpus 6 | ++++ /bin/cat /proc/cpuinfo 7 | ++++ /bin/grep '^processor' 8 | ++++ /usr/bin/wc -l 9 | +++ CPUS=24 10 | +++ echo 24 11 | +++ return 24 12 | ++ CPUs=24 13 | ++ export CPUs=23 14 | ++ CPUs=23 15 | ++ TOPHAT=/cbcb/lab/nelsayed/local/bin/tophat 16 | ++ CORES=' -p 23 ' 17 | ++ GAP=' -r 170 ' 18 | ++ OUT=/cbcb/project-scratch/gingerhl/HPGL0282-C24/tophat_282_hg19 19 | ++ REF=/cbcb/lab/nelsayed/ref_data/hsapiens/genome/hg19/hg19 20 | ++ INPUT1=/cbcb/lab/nelsayed/raw_data/sylvio/HPGL0282/processed/HPGL0282_R1_combined.filtered.fastq 21 | ++ INPUT2=/cbcb/lab/nelsayed/raw_data/sylvio/HPGL0282/processed/HPGL0282_R2_combined.filtered.fastq 22 | ++ for file in '"$OUT $GTF $REF $INPUT1 $INPUT2"' 23 | ++ '[' -r /cbcb/project-scratch/gingerhl/HPGL0282-C24/tophat_282_hg19 ']' 24 | ++ echo '/cbcb/project-scratch/gingerhl/HPGL0282-C24/tophat_282_hg19 /cbcb/lab/nelsayed/ref_data/hsapiens/genome/hg19/hg19 /cbcb/lab/nelsayed/raw_data/sylvio/HPGL0282/processed/HPGL0282_R1_combined.filtered.fastq /cbcb/lab/nelsayed/raw_data/sylvio/HPGL0282/processed/HPGL0282_R2_combined.filtered.fastq is not readable.' 25 | ++ CMD='/cbcb/lab/nelsayed/local/bin/tophat -p 23 -r 170 -o /cbcb/project-scratch/gingerhl/HPGL0282-C24/tophat_282_hg19 /cbcb/lab/nelsayed/ref_data/hsapiens/genome/hg19/hg19 /cbcb/lab/nelsayed/raw_data/sylvio/HPGL0282/processed/HPGL0282_R1_combined.filtered.fastq /cbcb/lab/nelsayed/raw_data/sylvio/HPGL0282/processed/HPGL0282_R2_combined.filtered.fastq' 26 | ++ date 27 | ++ echo 'Going to run: /cbcb/lab/nelsayed/local/bin/tophat -p 23 -r 170 -o /cbcb/project-scratch/gingerhl/HPGL0282-C24/tophat_282_hg19 /cbcb/lab/nelsayed/ref_data/hsapiens/genome/hg19/hg19 /cbcb/lab/nelsayed/raw_data/sylvio/HPGL0282/processed/HPGL0282_R1_combined.filtered.fastq /cbcb/lab/nelsayed/raw_data/sylvio/HPGL0282/processed/HPGL0282_R2_combined.filtered.fastq' 28 | ++ eval /cbcb/lab/nelsayed/local/bin/tophat -p 23 -r 170 -o /cbcb/project-scratch/gingerhl/HPGL0282-C24/tophat_282_hg19 /cbcb/lab/nelsayed/ref_data/hsapiens/genome/hg19/hg19 /cbcb/lab/nelsayed/raw_data/sylvio/HPGL0282/processed/HPGL0282_R1_combined.filtered.fastq /cbcb/lab/nelsayed/raw_data/sylvio/HPGL0282/processed/HPGL0282_R2_combined.filtered.fastq 29 | +++ /cbcb/lab/nelsayed/local/bin/tophat -p 23 -r 170 -o /cbcb/project-scratch/gingerhl/HPGL0282-C24/tophat_282_hg19 /cbcb/lab/nelsayed/ref_data/hsapiens/genome/hg19/hg19 /cbcb/lab/nelsayed/raw_data/sylvio/HPGL0282/processed/HPGL0282_R1_combined.filtered.fastq /cbcb/lab/nelsayed/raw_data/sylvio/HPGL0282/processed/HPGL0282_R2_combined.filtered.fastq 30 | 31 | [2013-10-14 11:41:38] Beginning TopHat run (v2.0.8) 32 | ----------------------------------------------- 33 | [2013-10-14 11:41:38] Checking for Bowtie 34 | Bowtie version: 2.1.0.0 35 | [2013-10-14 11:41:38] Checking for Samtools 36 | Samtools version: 0.1.19.0 37 | [2013-10-14 11:41:38] Checking for Bowtie index files 38 | [2013-10-14 11:41:38] Checking for reference FASTA file 39 | [2013-10-14 11:41:38] Generating SAM header for /cbcb/lab/nelsayed/ref_data/hsapiens/genome/hg19/hg19 40 | format: fastq 41 | quality scale: phred33 (default) 42 | [2013-10-14 11:42:14] Preparing reads 43 | left reads: min. length=101, max. length=101, 38126777 kept reads (626 discarded) 44 | right reads: min. length=101, max. length=101, 38125369 kept reads (2034 discarded) 45 | [2013-10-14 12:16:28] Mapping left_kept_reads to genome hg19 with Bowtie2 46 | [2013-10-14 12:59:52] Mapping left_kept_reads_seg1 to genome hg19 with Bowtie2 (1/4) 47 | [2013-10-14 13:03:58] Mapping left_kept_reads_seg2 to genome hg19 with Bowtie2 (2/4) 48 | [2013-10-14 13:08:20] Mapping left_kept_reads_seg3 to genome hg19 with Bowtie2 (3/4) 49 | [2013-10-14 13:12:52] Mapping left_kept_reads_seg4 to genome hg19 with Bowtie2 (4/4) 50 | [2013-10-14 13:19:04] Mapping right_kept_reads to genome hg19 with Bowtie2 51 | [2013-10-14 14:02:44] Mapping right_kept_reads_seg1 to genome hg19 with Bowtie2 (1/4) 52 | [2013-10-14 14:06:55] Mapping right_kept_reads_seg2 to genome hg19 with Bowtie2 (2/4) 53 | [2013-10-14 14:11:54] Mapping right_kept_reads_seg3 to genome hg19 with Bowtie2 (3/4) 54 | [2013-10-14 14:17:08] Mapping right_kept_reads_seg4 to genome hg19 with Bowtie2 (4/4) 55 | [2013-10-14 14:23:10] Searching for junctions via segment mapping 56 | [2013-10-14 14:35:35] Retrieving sequences for splices 57 | [2013-10-14 14:38:53] Indexing splices 58 | [2013-10-14 14:40:31] Mapping left_kept_reads_seg1 to genome segment_juncs with Bowtie2 (1/4) 59 | [2013-10-14 14:43:01] Mapping left_kept_reads_seg2 to genome segment_juncs with Bowtie2 (2/4) 60 | [2013-10-14 14:45:54] Mapping left_kept_reads_seg3 to genome segment_juncs with Bowtie2 (3/4) 61 | [2013-10-14 14:48:40] Mapping left_kept_reads_seg4 to genome segment_juncs with Bowtie2 (4/4) 62 | [2013-10-14 14:51:36] Joining segment hits 63 | [2013-10-14 15:04:06] Mapping right_kept_reads_seg1 to genome segment_juncs with Bowtie2 (1/4) 64 | [2013-10-14 15:06:33] Mapping right_kept_reads_seg2 to genome segment_juncs with Bowtie2 (2/4) 65 | [2013-10-14 15:09:23] Mapping right_kept_reads_seg3 to genome segment_juncs with Bowtie2 (3/4) 66 | [2013-10-14 15:12:08] Mapping right_kept_reads_seg4 to genome segment_juncs with Bowtie2 (4/4) 67 | [2013-10-14 15:15:08] Joining segment hits 68 | [2013-10-14 15:28:05] Reporting output tracks 69 | ----------------------------------------------- 70 | [2013-10-14 16:18:14] Run complete: 04:36:35 elapsed 71 | ++ echo 'Completed at:' 72 | ++ date 73 | ++ echo 'with return: 0' 74 | -------------------------------------------------------------------------------- /2013/1015-PBS/05-advanced/01: -------------------------------------------------------------------------------- 1 | #PBS -S /bin/bash -l walltime=48:00:00,nodes=1 -N fold_lin_01 2 | . ~/.bash_environment 3 | cd 4 | /usr/local/bin/perl -w -I /usr/local/prfdb/prfdb_test/usr/perl.linux/lib /prf_daemon.pl 01 2>>/outputs/linux.err 1>>/outputs/linux. 5 | out 6 | 7 | -------------------------------------------------------------------------------- /2013/1015-PBS/05-advanced/queue.sh: -------------------------------------------------------------------------------- 1 | #!/bin/bash 2 | . ~/.bashrc 3 | cd $PRFDB_HOME 4 | ./prf_daemon --make_jobs 5 | QSTAT=/usr/local/torque/bin/qstat 6 | USERID=`id | awk -F'(' '{print $2}' | awk -F ')' '{print $1}'` 7 | PARTIAL=`grep pbs_partialname prfdb.conf | awk -F= '{print $2}' | sed 's/'\''//g'` 8 | DAEMONS=`grep pbs_num_daemons prfdb.conf | awk -F= '{print $2}'` 9 | for arch in lin 10 | do 11 | for num in $(eval echo {01..`echo -n $DAEMONS`}) 12 | do 13 | num=`echo $num | awk '{printf "%02d", $num}'` 14 | EXIST=`$QSTAT | grep $USERID | grep $PARTIAL | awk '{print $2}' | grep $arch | awk -F'_' '{print $3}' | grep $num` 15 | if [ "$EXIST" = "" ]; then 16 | if [ $arch = "lin" ]; then 17 | /usr/local/bin/qsub -j eo -e outputs/qsub.err -m n jobs/linux/$num 2>/dev/null 1>&2 18 | elif [ $arch = "aix" ]; then 19 | /usr/local/bin/qsub -j eo -e outputs/qsub.err -m n jobs/aix/$num 2>/dev/null 1>&2 20 | elif [ $arch = "iri" ]; then 21 | /usr/local/bin/qsub -j eo -e outputs/qsub.err -m n jobs/irix/$num 2>/dev/null 1>&2 22 | fi 23 | fi 24 | done 25 | done 26 | at "now + 4 hours" < queue.sh 2>/dev/null 1>&2 27 | -------------------------------------------------------------------------------- /2013/1015-PBS/05-advanced/render.py: -------------------------------------------------------------------------------- 1 | from pymol import stored, cmd, selector, util 2 | session = str(os.environ["SESSION"]) 3 | name = str(os.environ["SESSIONNAME"]) 4 | sessiondir = str(os.environ["SESSIONDIR"]) 5 | frames = int(os.environ["FRAMES"]) 6 | renderers = int(os.environ["RENDERERS"]) 7 | renderer_num = int(os.environ["PBS_ARRAYID"]) 8 | renderer_num = renderer_num - 1 9 | frames_per_renderer = frames / renderers 10 | start_frame = 1 + (renderer_num * frames_per_renderer) 11 | end_frame = start_frame + frames_per_renderer 12 | print "render.py: session:" + session + " name: " + name 13 | print "Session directory:" + sessiondir 14 | print "This is renderer: " + str(renderer_num) + " Responsible for frames: " + str(start_frame) + "-" + str(end_frame) 15 | os.chdir(sessiondir) 16 | cmd.load(session) 17 | 18 | ### Movie commands go here. 19 | cmd.set("cartoon_ring_mode", 3) 20 | cmd.mset("1 x%i" %(frames)) 21 | util.mroll(1, frames, 1, 'y') 22 | ## End movie commands. 23 | 24 | cmd.viewport("800","800") 25 | cmd.set("hash_max", 255) 26 | cmd.set("cache_frames", 0) 27 | cmd.set("ray_trace_fog", 0) 28 | cmd.set("ray_trace_frames", 1) 29 | cmd.set("ray_shadows", 0) 30 | cmd.set("antialias", 0) 31 | cmd.set("auto_zoom", 0) 32 | cmd.mclear 33 | for frame in range(start_frame, end_frame): 34 | print "Rendering frame: " + str(frame) 35 | filename = name + str("%04d" % frame) + ".png" 36 | print "FILENAME: " + filename 37 | if not os.path.exists(filename): 38 | cmd.mpng(name, frame, frame) 39 | else: 40 | print str(filename) + " already exists." 41 | cmd.mclear 42 | -------------------------------------------------------------------------------- /2013/1015-PBS/05-advanced/submit.sh: -------------------------------------------------------------------------------- 1 | #!/bin/bash 2 | echo "This script will submit a ribosome movie session for encoding" 3 | echo "It requires a couple environment variables to be set." 4 | echo "1. SESSIONDIR : a directory in which the pymol session file should live." 5 | echo "2. MOVIE_SCRIPT : A python script used to direct the movie." 6 | #export MYBASE=/a/f20-fs1/data/dt-vol6/abelew 7 | export MYBASE=/export/lustre_1/abelew 8 | export LD_LIBRARY_PATH=${MYBASE}/bin:${LD_LIBRARY_PATH} 9 | export PATH=${MYBASE}/bin:${PATH} 10 | export PYM=pymol1.3 11 | export FREEMOL=${MYBASE}/bin/freemol 12 | echo "Type the name of the directory with your pymol session here." 13 | echo "It should contain a single file named 'session.pse' inside it." 14 | read -e NAME 15 | export SESSIONNAME=$NAME 16 | export SESSIONDIR=$MYBASE/$SESSIONNAME 17 | export SESSION=${SESSIONDIR}/session.pse 18 | 19 | if [ ! -f "$SESSION" ]; then 20 | echo "The file $SESSION does not exist." 21 | exit 1 22 | fi 23 | 24 | if [ "$FRAMES" = "" ]; then 25 | echo "The default number of frames in the movie is 480." 26 | echo "Change this by export FRAMES=###" 27 | export FRAMES=480 28 | fi 29 | 30 | if [ "$RENDERERS" = "" ]; then 31 | echo "The default number of compute nodes is 30." 32 | echo "Change this with export RENDERERS=##" 33 | export RENDERERS=40 34 | fi 35 | 36 | if [ "$MOVIE_SCRIPT" = "" ]; then 37 | echo "The default movie script is render.py" 38 | echo "Change this by export MOVIE_SCRIPT=\"newscript.py\"" 39 | export MOVIE_SCRIPT=$MYBASE/bin/render.py 40 | else 41 | export MOVIE_SCRIPT=$MYBASE/bin/$MOVIE_SCRIPT 42 | fi 43 | 44 | if [ "$PBS_QUEUE" = "" ]; then 45 | echo "The default PBS queue is 'serial'" 46 | echo "To go faster, do:" 47 | echo "export PBS_QUEUE=\"narrow-long -A clfshpc-hi\"" 48 | export PBS_QUEUE="narrow-long -A clfshpc-hi" 49 | # export PBS_QUEUE="serial" 50 | fi 51 | #export MEM=$(expr ${RENDERERS} \* 12000) 52 | export MEM=7168 53 | export QSUB_COMMAND="/usr/local/torque/bin/qsub\ 54 | -q $PBS_QUEUE -l pmem=${MEM}mb\ 55 | -N $SESSIONNAME\ 56 | -S /bin/sh -t 1-$RENDERERS -V -o ${MYBASE}/logs/${SESSIONNAME}_out \ 57 | -e ${MYBASE}/logs/${SESSIONNAME}_err ${MYBASE}/bin/pymol_pbs" 58 | 59 | echo "Waiting for 5 seconds if you wish to change anything before starting the submission." 60 | echo "This script will run:" 61 | echo "$QSUB_COMMAND" 62 | echo "" 63 | echo "" 64 | echo "Using the pymol session file: $SESSION" 65 | echo "Rendering $FRAMES frames." 66 | echo "And movie instructions found in: $MOVIE_SCRIPT" 67 | echo "If this is correct, type 'go' and hit return." 68 | read GO 69 | if [ $GO = 'go' ]; then 70 | echo "Changing directory to $SESSIONDIR" 71 | cd $SESSIONDIR && $QSUB_COMMAND 72 | echo "Running $QSUB_COMMAND" 73 | echo "Image files should be in $SESSIONDIR." 74 | 75 | else 76 | exit 0 77 | fi 78 | -------------------------------------------------------------------------------- /2013/1022-quality_assurance_wgs_rnaseq/README.md: -------------------------------------------------------------------------------- 1 | ##Quality assurance for WGS/RNA-seq data with PRINSEQ 2 | 3 | ### Introduction 4 | [PRINSEQ](http://prinseq.sourceforge.net/manual.html) is a tool for summarizing, and filtering next-gen sequencing data. 5 | 6 | ### Quality control 7 | 1. Length 8 | 2. Base qualities 9 | 3. Duplicates 10 | 11 | ### Tutorial 12 | [PRINSEQ](http://prinseq.sourceforge.net/manual.html) is available as a [standalone version](http://sourceforge.net/projects/prinseq/files/) (perl script) or as a web service. 13 | 14 | 15 | * Download the [example](http://trace.ddbj.nig.ac.jp/DRASearch/run?acc=SRR069556) RNA-seq data. 16 | * Run PRINSEQ with few parameters. 17 | 18 | ``` 19 | perl $PRINSEQ_DIR/prinseq-lite.pl -fastq SRR069556.fastq \ 20 | -graph_data SRR069556_graph.gd \ 21 | -log log.txt 22 | ``` 23 | * Upload **SRR069556_graph.gd** to [PRINSEQ webserver](http://edwards.sdsu.edu/cgi-bin/prinseq/prinseq.cgi?report=1). 24 | * Look at [figures](http://edwards.sdsu.edu/cgi-bin/prinseq/tmp/1382390222/SRR069556.fastq_graph.gd.html) to determine reasonable parameter cutoffs. 25 | * Rerun PRINSEQ with filtering options. 26 | 27 | ``` 28 | perl $PRINSEQ_DIR/prinseq-lite.pl -fastq SRR069556.fastq \ 29 | -min_len 20 \ 30 | -trim_qual_right 10 \ 31 | -min_qual_mean 15 \ 32 | -ns_max_n 1 \ 33 | -derep 235 \ 34 | -lc_method dust \ 35 | -lc_threshold 70 \ 36 | -trim_tail_left 5 \ 37 | -trim_tail_right 5 \ 38 | -out_format 3 \ 39 | -graph_data SRR069556_try2_graph.gd \ 40 | -log log.txt \ 41 | -out_good SRR069556_try2_good.fastq \ 42 | -out_bad SRR069556_try2_bad.fastq 43 | ``` 44 | 45 | ### Tips 46 | I strongly recommend you first upload a very small sample (10k sequences) to the web server and then look at the distribution of your data. Afterwards, you can look at the plots to get a good idea of what filtering parameters you should use. 47 | 48 | ### Web demo -------------------------------------------------------------------------------- /2013/1029-RNA-Seq_normalization_assumptions/rnaseq_normalization_assumptions.pptx: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/umd-byob/presentations/23ad5271b4fcd24c6c3ebc94eb42e1b9eb19bd44/2013/1029-RNA-Seq_normalization_assumptions/rnaseq_normalization_assumptions.pptx -------------------------------------------------------------------------------- /2013/1105-batch_effects_in_RNA-Seq/BYOB_batch_20131105_v1.pptx: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/umd-byob/presentations/23ad5271b4fcd24c6c3ebc94eb42e1b9eb19bd44/2013/1105-batch_effects_in_RNA-Seq/BYOB_batch_20131105_v1.pptx -------------------------------------------------------------------------------- /2013/1203_piplining_with_python/README.md: -------------------------------------------------------------------------------- 1 | # Pipelining with Python 2 | by Lee Mendelowitz 3 | LMendelo@umiacs.umd.edu 4 | 5 | This presentation gives an overview of useful built-in Python modules for writing Python scripts to 6 | replace shell scripts, and gives an overview of the [Ruffus](http://www.ruffus.org.uk/index.html) library 7 | for building Bioinformatics pipelines. 8 | 9 | The slides are in presentation.html and should be viewable in a modern web browser. 10 | -------------------------------------------------------------------------------- /2013/1203_piplining_with_python/example1_sys.py: -------------------------------------------------------------------------------- 1 | #!/usr/bin/env python 2 | ''' 3 | example1_sys.py 4 | Print 5 | Usage: 6 | ./exampe1_sys.py FILE_PATH 7 | cat FILE_PATH | ./example1_sys.py 8 | ''' 9 | 10 | from sys import stdout, stdin, stderr, argv 11 | 12 | def count_words(file): 13 | ''' 14 | Print the number of words per line in a file. 15 | ''' 16 | for line_num, line in enumerate(file): 17 | num_words = len(line.split()) 18 | stdout.write('%i: %i\n'%(line_num, num_words)) 19 | 20 | if __name__ == '__main__': 21 | 22 | # By default, read from standard input 23 | file = stdin 24 | filename = 'stdin' 25 | 26 | # If a filepath is provided, use the file instead of 27 | # standard input 28 | args = argv[1:] 29 | if args: 30 | filename = args[0] 31 | file = open(filename) 32 | 33 | # Run 34 | stderr.write('-'*50 + '\n') 35 | stderr.write('Filename: %s\n'%filename) 36 | count_words(file) 37 | 38 | -------------------------------------------------------------------------------- /2013/1203_piplining_with_python/pipeline.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/umd-byob/presentations/23ad5271b4fcd24c6c3ebc94eb42e1b9eb19bd44/2013/1203_piplining_with_python/pipeline.png -------------------------------------------------------------------------------- /2013/1203_piplining_with_python/ruffus_example.py: -------------------------------------------------------------------------------- 1 | from ruffus import * 2 | from ruffus_example_utils import * 3 | 4 | input_file = 'example1.input' 5 | if not exists(input_file): 6 | touch(input_file) 7 | 8 | @files(input_file, 'A.task1') 9 | def task1(inputs, outputs): 10 | printInfo('task1', inputs, outputs) 11 | createFiles(outputs) 12 | 13 | @follows(task1) 14 | @files('A.task1', '1.task2') 15 | def task2(inputs, outputs): 16 | printInfo('task2', inputs, outputs) 17 | createFiles(outputs) 18 | 19 | pipeline_printout(sys.stdout, [task2]) 20 | pipeline_run([task2]) -------------------------------------------------------------------------------- /2013/1203_piplining_with_python/ruffus_example2.py: -------------------------------------------------------------------------------- 1 | from ruffus import * 2 | from ruffus_example_utils import * 3 | 4 | input_file = 'example2.input' 5 | if not exists(input_file): 6 | touch(input_file) 7 | 8 | @files(input_file, ['A.task1', 'B.task1', 'C.task1', 'D.task1']) 9 | def task1(inputs, outputs): 10 | printInfo('task1', inputs, outputs) 11 | createFiles(outputs) 12 | 13 | task2_jobs = [ ['A.task1', '1.task2'], 14 | ['B.task1', '2.task2'], 15 | ['C.task1', ['3.task2', '4.task2']], 16 | ['D.task1', ['5.task2'] ] 17 | ] 18 | 19 | @follows(task1) 20 | @files(task2_jobs) 21 | def task2(inputs, outputs): 22 | printInfo('task2', inputs, outputs) 23 | createFiles(outputs) 24 | 25 | pipeline_printout(sys.stdout, [task2], ) 26 | pipeline_run([task2], multiprocess=4) -------------------------------------------------------------------------------- /2013/1203_piplining_with_python/ruffus_example_utils.py: -------------------------------------------------------------------------------- 1 | import sys 2 | from os.path import exists 3 | 4 | touch = lambda fname: open(fname, 'w') 5 | 6 | def printInfo(task_name, inputs, outputs): 7 | print 'Running %s with:'%task_name 8 | print '\tinputs: %s'%str(inputs) 9 | print '\toutputs: %s'%str(outputs) 10 | 11 | def createFiles(filelist): 12 | if isinstance(filelist, (list, tuple)): 13 | for f in filelist: 14 | touch(f) 15 | elif isinstance(filelist, str): 16 | touch(filelist) 17 | else: 18 | raise TypeError('fileList is not list, tuple or str') -------------------------------------------------------------------------------- /2013/1203_piplining_with_python/styles.css: -------------------------------------------------------------------------------- 1 | body { 2 | font-family:"Verdana", Sans-serif; 3 | font-size: 22px; 4 | } 5 | -------------------------------------------------------------------------------- /2013/README.md: -------------------------------------------------------------------------------- 1 | Contents 2 | ======== 3 | * **09/03** Introduction to BYOB ([Keith Hughitt](https://github.com/khughitt)) 4 | * **09/03** Bacterial Transcriptome Analysis ([Jonathan Goodson](https://github.com/jgoodson)) 5 | * **09/03** Reproducible Research Using Knitr/R ([Keith Hughitt](https://github.com/khughitt)) 6 | * **09/10** De Novo Transcriptome Assembly ([Ted Gibbons](https://github.com/trgibbons)) 7 | * **09/17** A (not so) Complete Guide to Genome Browsers ([Matt Conte](https://github.com/conte1)) 8 | * **09/24** SNP Calling with RNA-Seq Data ([Josie Reinhardt](https://github.com/JosieReinhardt)) 9 | * **10/01** Molecular Evolution Analyses using PAML ([Kawther Abdilleh](https://github.com/kabdilleh)) 10 | * **10/08** Unannotated RNA-Seq Transcripts and Long Noncoding RNA Detection: RNAcode and CPAT ([Kevin Nyberg](https://github.com/kevingnyberg)) 11 | * **10/15** PBS: Please Be Sane ([Ashton Trey Belew](https://github.com/abelew)) 12 | * **10/22** Quality Assurance for WGS and RNA-seq Data ([Chris Hill](https://github.com/cmhill)) 13 | * **10/29** The Assumptions behind RNA-Seq Normalization Methods ([Kwame Okrah](https://github.com/kokrah)) 14 | * **11/05** RNA-Seq: Evaluating and Correcting for Batch Effects in RNA-Seq Data (Laura Dillon) 15 | * **12/03** Pipelining with Python ([Lee Mendelowitz](https://github.com/LeeMendelowitz)) 16 | -------------------------------------------------------------------------------- /2014/0218-rna-seq_normalization_issues/BYOB-14b18-Sailfish-final.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/umd-byob/presentations/23ad5271b4fcd24c6c3ebc94eb42e1b9eb19bd44/2014/0218-rna-seq_normalization_issues/BYOB-14b18-Sailfish-final.pdf -------------------------------------------------------------------------------- /2014/0311-git_with_confidence/img/git_after_ff_merge.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/umd-byob/presentations/23ad5271b4fcd24c6c3ebc94eb42e1b9eb19bd44/2014/0311-git_with_confidence/img/git_after_ff_merge.png -------------------------------------------------------------------------------- /2014/0311-git_with_confidence/img/git_after_rebase.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/umd-byob/presentations/23ad5271b4fcd24c6c3ebc94eb42e1b9eb19bd44/2014/0311-git_with_confidence/img/git_after_rebase.png -------------------------------------------------------------------------------- /2014/0311-git_with_confidence/img/git_after_recursive_merge.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/umd-byob/presentations/23ad5271b4fcd24c6c3ebc94eb42e1b9eb19bd44/2014/0311-git_with_confidence/img/git_after_recursive_merge.png -------------------------------------------------------------------------------- /2014/0311-git_with_confidence/img/git_before_ff_merge.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/umd-byob/presentations/23ad5271b4fcd24c6c3ebc94eb42e1b9eb19bd44/2014/0311-git_with_confidence/img/git_before_ff_merge.png -------------------------------------------------------------------------------- /2014/0311-git_with_confidence/img/git_before_rebase.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/umd-byob/presentations/23ad5271b4fcd24c6c3ebc94eb42e1b9eb19bd44/2014/0311-git_with_confidence/img/git_before_rebase.png -------------------------------------------------------------------------------- /2014/0311-git_with_confidence/img/git_before_recursive_merge.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/umd-byob/presentations/23ad5271b4fcd24c6c3ebc94eb42e1b9eb19bd44/2014/0311-git_with_confidence/img/git_before_recursive_merge.png -------------------------------------------------------------------------------- /2014/0311-git_with_confidence/img/git_branch.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/umd-byob/presentations/23ad5271b4fcd24c6c3ebc94eb42e1b9eb19bd44/2014/0311-git_with_confidence/img/git_branch.png -------------------------------------------------------------------------------- /2014/0311-git_with_confidence/img/git_commit.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/umd-byob/presentations/23ad5271b4fcd24c6c3ebc94eb42e1b9eb19bd44/2014/0311-git_with_confidence/img/git_commit.png -------------------------------------------------------------------------------- /2014/0311-git_with_confidence/img/git_commit_parents.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/umd-byob/presentations/23ad5271b4fcd24c6c3ebc94eb42e1b9eb19bd44/2014/0311-git_with_confidence/img/git_commit_parents.png -------------------------------------------------------------------------------- /2014/0311-git_with_confidence/img/git_tree.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/umd-byob/presentations/23ad5271b4fcd24c6c3ebc94eb42e1b9eb19bd44/2014/0311-git_with_confidence/img/git_tree.png -------------------------------------------------------------------------------- /2014/0311-git_with_confidence/img/git_workflow.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/umd-byob/presentations/23ad5271b4fcd24c6c3ebc94eb42e1b9eb19bd44/2014/0311-git_with_confidence/img/git_workflow.png -------------------------------------------------------------------------------- /2014/0311-git_with_confidence/references/git.from.bottom.up.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/umd-byob/presentations/23ad5271b4fcd24c6c3ebc94eb42e1b9eb19bd44/2014/0311-git_with_confidence/references/git.from.bottom.up.pdf -------------------------------------------------------------------------------- /2014/0408-sequence-clustering/2014-04-08_BYOB_Sequence_Clustering.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/umd-byob/presentations/23ad5271b4fcd24c6c3ebc94eb42e1b9eb19bd44/2014/0408-sequence-clustering/2014-04-08_BYOB_Sequence_Clustering.pdf -------------------------------------------------------------------------------- /2014/0408-sequence-clustering/README.md: -------------------------------------------------------------------------------- 1 | Sequence Clustering 2 | =================== 3 | Ted Gibbons 4 | 5 | Understanding the OrthoMCL pipeline 6 | ----------------------------------- 7 | The [OrthoMCL](http://orthomcl.org/orthomcl/) pipeline was introduced over a 8 | decade ago and remains one of the most popular approaches for clustering 9 | (protein) sequences. Several variants of the pipeline have since been 10 | published, but the major steps remain unchanged. In this first session, I will 11 | explain each step of the OrthoMCL pipeline in detail. 12 | 13 | Effects of graph weighting parameters and inflation values on clustering 14 | ------------------------------------------------------------------------ 15 | The primary differences between the various flavors of the OrthoMCL pipeline 16 | are the metrics used to weight the graph and/or MCL inflation parameter 17 | value(s). The effects of these differences can be very difficult to understand 18 | and the corresponding publications provide very little, if any, helpful 19 | information. I have therefore systematically explored the effects of these 20 | parameters on clustering a subset of sequences from the [euKaryotic Orthologous 21 | Groups (KOG) database](http://www.ncbi.nlm.nih.gov/COG/), simulating the 22 | fragmentation that commonly results from modern high-throughput sequencing 23 | projects. 24 | -------------------------------------------------------------------------------- /2014/0422-functional_variants_snpeff_snpsift/BYOB_SnpEff_SnpSift.pptx: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/umd-byob/presentations/23ad5271b4fcd24c6c3ebc94eb42e1b9eb19bd44/2014/0422-functional_variants_snpeff_snpsift/BYOB_SnpEff_SnpSift.pptx -------------------------------------------------------------------------------- /2014/0422-functional_variants_snpeff_snpsift/README.md: -------------------------------------------------------------------------------- 1 | Prediction of functional variants using SnpEff/SnpSift 2 | ====================================================== 3 | Will Gammerdinger 4 | -------------------------------------------------------------------------------- /2014/0429-tools-for-cnv-detection/BYOB_April29.pptx: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/umd-byob/presentations/23ad5271b4fcd24c6c3ebc94eb42e1b9eb19bd44/2014/0429-tools-for-cnv-detection/BYOB_April29.pptx -------------------------------------------------------------------------------- /2014/0429-tools-for-cnv-detection/DND.svdet.cnv.conf: -------------------------------------------------------------------------------- 1 | 2 | input_format = sam 3 | sv_type = all 4 | mates_orientation=RF 5 | read1_length=151 6 | read2_length=100 7 | mates_file=/N/dc2/scratch/joserein/CNVdata/svdetect/TD_Gom12_Drive.a600.sort.RG.norm.sam 8 | cmap_file=/N/dc2/scratch/joserein/CNVdata/svdetect/genome.scf.20k.lens.svdetect.txt 9 | output_dir=/N/dc2/scratch/joserein/CNVdata/svdetect/ 10 | tmp_dir=/N/dc2/scratch/joserein/tmp/ 11 | num_threads=4 12 | 13 | 14 | 15 | split_mate_file=1 16 | window_size=10000 17 | step_length=5000 18 | mates_file_ref=/N/dc2/scratch/joserein/CNVdata/svdetect/TD_Gom12_ND1.RG.norm.sam 19 | 20 | -------------------------------------------------------------------------------- /2014/0429-tools-for-cnv-detection/README.md: -------------------------------------------------------------------------------- 1 | Tools for detecting insertions, deletions, and other CNVs using Illumina data 2 | ============================================================================= 3 | Josie Reinhardt 4 | 5 | For some background on the tools used here, check out: 6 | 7 | - Min Zhao, Qingguo Wang, Quan Wang, Peilin Jia, Zhongming Zhao, (2013) 8 | Computational Tools For Copy Number Variation (Cnv) Detection Using 9 | Next-Generation Sequencing Data: Features And Perspectives. *Bmc 10 | Bioinformatics* **14** S1-NA 11 | [10.1186/1471-2105-14-S11-S1](http://dx.doi.org/10.1186/1471-2105-14-S11-S1)> 12 | -------------------------------------------------------------------------------- /2014/0429-tools-for-cnv-detection/bash_scripts/140424_CNVnator_all1.sh: -------------------------------------------------------------------------------- 1 | #PBS -l walltime=12:00:00,vmem=10gb,mem=10gb,nodes=1 2 | #PBS -N CNVnator_all_bin50 3 | #PBS -k o 4 | #PBS -M josiereinhardt@gmail.com 5 | #PBS -m e 6 | #PBS -e /N/dc2/scratch/joserein/CNVdata/Td_Drive_Cnvnator.e.txt 7 | #PBS -o /N/dc2/scratch/joserein/CNVdata/Td_Drive_Cnvnator.o.txt 8 | 9 | export ROOTSYS=~/bin/root/ 10 | export PATH=$PATH:~/bin/root/bin/ 11 | export LD_LIBRARY_PATH=${LD_LIBRARY_PATH}:${ROOTSYS}/lib 12 | 13 | cnvnator -root /N/dc2/scratch/joserein/CNVdata/Td_Drive.root -tree /N/dc2/scratch/joserein/CNVdata/TD_Gom12_Drive.a600.sort.RG.bam 14 | cnvnator -root /N/dc2/scratch/joserein/CNVdata/Td_Drive.root -his 50 15 | cnvnator -root /N/dc2/scratch/joserein/CNVdata/Td_Drive.root -stat 50 16 | cnvnator -root /N/dc2/scratch/joserein/CNVdata/Td_Drive.root -partition 50 17 | cnvnator -root /N/dc2/scratch/joserein/CNVdata/Td_Drive.root -call 50 18 | -------------------------------------------------------------------------------- /2014/0429-tools-for-cnv-detection/bash_scripts/140424_CNVnator_extract1.sh: -------------------------------------------------------------------------------- 1 | #PBS -l walltime=12:00:00,vmem=10gb,mem=10gb,nodes=1 2 | #PBS -N CNVnator_extract1 3 | #PBS -j oe 4 | #PBS -k o 5 | #PBS -M josiereinhardt@gmail.com 6 | #PBS -m e 7 | 8 | export ROOTSYS=~/bin/root/ 9 | export PATH=$PATH:~/bin/root/bin/ 10 | export LD_LIBRARY_PATH=${LD_LIBRARY_PATH}:${ROOTSYS}/lib 11 | 12 | cnvnator -root /N/dc2/scratch/joserein/CNVdata/Td_Drive.root -tree /N/dc2/scratch/joserein/CNVdata/TD_Gom12_Drive.a600.sort.RG.bam 13 | -------------------------------------------------------------------------------- /2014/0429-tools-for-cnv-detection/bash_scripts/140424_CNVnator_extract2.sh: -------------------------------------------------------------------------------- 1 | #PBS -l walltime=12:00:00,vmem=10gb,mem=10gb,nodes=1 2 | #PBS -N CNVnator_extract2 3 | #PBS -k o 4 | #PBS -M josiereinhardt@gmail.com 5 | #PBS -m e 6 | 7 | export ROOTSYS=~/bin/root/ 8 | export PATH=$PATH:~/bin/root/bin/ 9 | export LD_LIBRARY_PATH=${LD_LIBRARY_PATH}:${ROOTSYS}/lib 10 | 11 | cnvnator -root /N/dc2/scratch/joserein/CNVdata/Td_Nondrive1.root -tree /N/dc2/scratch/joserein/CNVdata/TD_Gom12_ND1.RG.bam 12 | -------------------------------------------------------------------------------- /2014/0429-tools-for-cnv-detection/bash_scripts/140424_breakdancer.sh: -------------------------------------------------------------------------------- 1 | #PBS -l walltime=12:00:00,vmem=10gb,mem=10gb,nodes=1 2 | #PBS -N breakdancer_max 3 | #PBS -k o 4 | #PBS -M josiereinhardt@gmail.com 5 | #PBS -m e 6 | 7 | module add samtools 8 | breakdancer_max -r 5 -g /N/dc2/scratch/joserein/CNVdata/breakdancer/Td_Gom_DvsND_breakdancer.bed -d /N/dc2/scratch/joserein/CNVdata/breakdancer/Td_Gom_DvsND_breakdancer /N/dc2/scratch/joserein/CNVdata/breakdancer/TdGom_DvsND_CNV.cfg 9 | 10 | -------------------------------------------------------------------------------- /2014/0429-tools-for-cnv-detection/bash_scripts/140425_Hydra_novoalign1.sh: -------------------------------------------------------------------------------- 1 | #PBS -l walltime=72:00:00,vmem=10gb,mem=10gb,nodes=8 2 | #PBS -N Hydra_novoalign1 3 | #PBS -M josiereinhardt@gmail.com 4 | #PBS -m e 5 | 6 | module add samtools 7 | module add bedtools 8 | module add novoalign 9 | 10 | novoalign -c 8 -d /N/dc2/scratch/joserein/genome.scf.20k.novoalign -f /N/dc2/scratch/joserein/CNVdata/TD_Gom12_Drive.pair1.tier1.fq /N/dc2/scratch/joserein/CNVdata/TD_Gom12_Drive.pair2.tier1.fq -i 500 50 -r Random -o SAM | samtools view -Sb - > /N/dc2/scratch/joserein/CNVdata/TD_Gom12_Drive.tier2.bam 11 | bamToBed -i /N/dc2/scratch/joserein/CNVdata/TD_Gom12_Drive.tier2.bam -tag NM | ~/programs/Hydra-Version-0.5.3/scripts/pairDiscordants.py -i stdin -m hydra -z 722 -y 182 -n 1000 > /N/dc2/scratch/joserein/CNVdata/TD_Gom12_Drive.bedpe 12 | ~/programs/Hydra-Version-0.5.3/scripts/dedupDiscordants.py -i /N/dc2/scratch/joserein/CNVdata/TD_Gom12_Drive.bedpe -s 3 > /N/dc2/scratch/joserein/CNVdata/TD_Gom12_Drive.dedup.bedpe 13 | hydra -in /N/dc2/scratch/joserein/CNVdata/TD_Gom12_Drive.dedup.bedpe -out /N/dc2/scratch/joserein/CNVdata/TD_Gom12_Drive.hydra.breaks -mld 240 -mno 480 14 | -------------------------------------------------------------------------------- /2014/0429-tools-for-cnv-detection/bash_scripts/140425_Hydra_novoalign2.sh: -------------------------------------------------------------------------------- 1 | #PBS -l walltime=72:00:00,vmem=10gb,mem=10gb,nodes=8 2 | #PBS -N Hydra_novoalign2 3 | #PBS -M josiereinhardt@gmail.com 4 | #PBS -m e 5 | 6 | module add samtools 7 | module add bedtools 8 | module add novoalign 9 | 10 | novoalign -c 8 -d /N/dc2/scratch/joserein/genome.scf.20k.novoalign -f /N/dc2/scratch/joserein/CNVdata/TD_Gom12_ND1.pair1.tier1.fq /N/dc2/scratch/joserein/CNVdata/TD_Gom12_ND1.pair2.tier1.fq -i 500 50 -r Random -o SAM | samtools view -Sb - > /N/dc2/scratch/joserein/CNVdata/TD_Gom12_ND1.tier2.bam 11 | bamToBed -i /N/dc2/scratch/joserein/CNVdata/TD_Gom12_ND1.tier2.bam -tag NM | ~/programs/Hydra-Version-0.5.3/scripts/pairDiscordants.py -i stdin -m hydra -z 700 -y 220 -n 1000 > /N/dc2/scratch/joserein/CNVdata/TD_Gom12_ND1.bedpe 12 | ~/programs/Hydra-Version-0.5.3/scripts/dedupDiscordants.py -i /N/dc2/scratch/joserein/CNVdata/TD_Gom12_ND1.bedpe -s 3 > /N/dc2/scratch/joserein/CNVdata/TD_Gom12_ND1.dedup.bedpe 13 | hydra -in /N/dc2/scratch/joserein/CNVdata/TD_Gom12_ND1.dedup.bedpe -out /N/dc2/scratch/joserein/CNVdata/TD_Gom12_ND1.hydra.breaks -mld 240 -mno 480 14 | -------------------------------------------------------------------------------- /2014/0429-tools-for-cnv-detection/bash_scripts/140425_Hydra_samtools1_try2.sh: -------------------------------------------------------------------------------- 1 | #PBS -l walltime=12:00:00,vmem=10gb,mem=10gb,nodes=1 2 | #PBS -N Hydra_samtools1_try2 3 | #PBS -M josiereinhardt@gmail.com 4 | #PBS -m e 5 | 6 | module add samtools 7 | 8 | samtools view -uF 2 /N/dc2/scratch/joserein/CNVdata/TD_Gom12_Drive.a600.sort.RG.bam | samtools sort -n - /N/dc2/scratch/joserein/CNVdata/TD_Gom12_Drive.RG.nopair 9 | bamToFastq -bam /N/dc2/scratch/joserein/CNVdata/TD_Gom12_Drive.RG.nopair.bam -fq1 /N/dc2/scratch/joserein/CNVdata/TD_Gom12_Drive.pair1.tier1.fq -fq2 /N/dc2/scratch/joserein/CNVdata/TD_Gom12_Drive.pair2.tier1.fq 10 | -------------------------------------------------------------------------------- /2014/0429-tools-for-cnv-detection/bash_scripts/140425_Hydra_samtools2_try2.sh: -------------------------------------------------------------------------------- 1 | #PBS -l walltime=12:00:00,vmem=10gb,mem=10gb,nodes=1 2 | #PBS -N Hydra_samtools2_try2 3 | #PBS -M josiereinhardt@gmail.com 4 | #PBS -m e 5 | 6 | module add samtools 7 | 8 | samtools view -uF 2 /N/dc2/scratch/joserein/CNVdata/TD_Gom12_ND1.RG.bam | samtools sort -n - /N/dc2/scratch/joserein/CNVdata/TD_Gom12_ND1.RG.nopair 9 | bamToFastq -bam /N/dc2/scratch/joserein/CNVdata/TD_Gom12_ND1.RG.nopair.bam -fq1 /N/dc2/scratch/joserein/CNVdata/TD_Gom12_ND1.pair1.tier1.fq -fq2 /N/dc2/scratch/joserein/CNVdata/TD_Gom12_ND1.pair2.tier1.fq 10 | -------------------------------------------------------------------------------- /2014/0429-tools-for-cnv-detection/bash_scripts/140425_SVdetect_preprocess1.sh: -------------------------------------------------------------------------------- 1 | #PBS -l walltime=12:00:00,vmem=10gb,mem=10gb,nodes=1 2 | #PBS -N svdetect_pre1 3 | #PBS -M josiereinhardt@gmail.com 4 | #PBS -m e 5 | module add samtools 6 | 7 | export PERL5LIB=~/perl5/lib/perl5 8 | perl ~/programs/SVDetect_r0.8b/scripts/BAM_preprocessingPairs.pl -n 10000000 -d -o /N/dc2/scratch/joserein/CNVdata/svdetect/ /N/dc2/scratch/joserein/CNVdata/TD_Gom12_Drive.a600.sort.RG.bam 9 | -------------------------------------------------------------------------------- /2014/0429-tools-for-cnv-detection/bash_scripts/140425_SVdetect_preprocess2.sh: -------------------------------------------------------------------------------- 1 | #PBS -l walltime=12:00:00,vmem=10gb,mem=10gb,nodes=1 2 | #PBS -N svdetect_pre2 3 | #PBS -M josiereinhardt@gmail.com 4 | #PBS -m e 5 | 6 | module add samtools 7 | 8 | export PERL5LIB=~/perl5/lib/perl5 9 | perl ~/programs/SVDetect_r0.8b/scripts/BAM_preprocessingPairs.pl -n 10000000 -d -o /N/dc2/scratch/joserein/CNVdata/svdetect/ /N/dc2/scratch/joserein/CNVdata/TD_Gom12_ND1.RG.bam 10 | -------------------------------------------------------------------------------- /2014/0429-tools-for-cnv-detection/bash_scripts/140425_picard_ins_stats1.sh: -------------------------------------------------------------------------------- 1 | #PBS -l walltime=12:00:00,vmem=10gb,mem=10gb,nodes=1 2 | #PBS -N picard_insstats1 3 | #PBS -M josiereinhardt@gmail.com 4 | #PBS -m e 5 | 6 | module add java 7 | java -Xmx2g -jar ~/programs/picard/CollectInsertSizeMetrics.jar INPUT=/N/dc2/scratch/joserein/CNVdata/TD_Gom12_Drive.a600.sort.RG.bam OUTPUT=/N/dc2/scratch/joserein/CNVdata/TD_Gom12_Drive.picardstats.txt VALIDATION_STRINGENCY=LENIENT HISTOGRAM_FILE=/N/dc2/scratch/joserein/CNVdata/TD_Gom12_Drive.picardstats.hist 8 | -------------------------------------------------------------------------------- /2014/0429-tools-for-cnv-detection/bash_scripts/140425_picard_ins_stats2.sh: -------------------------------------------------------------------------------- 1 | #PBS -l walltime=12:00:00,vmem=10gb,mem=10gb,nodes=1 2 | #PBS -N picard_insstats2 3 | #PBS -M josiereinhardt@gmail.com 4 | #PBS -m e 5 | 6 | module add java 7 | java -Xmx2g -jar ~/programs/picard/CollectInsertSizeMetrics.jar INPUT=/N/dc2/scratch/joserein/CNVdata/TD_Gom12_ND1.RG.bam OUTPUT=/N/dc2/scratch/joserein/CNVdata/TD_Gom12_ND1.picardstats.txt VALIDATION_STRINGENCY=LENIENT HISTOGRAM_FILE=/N/dc2/scratch/joserein/CNVdata/TD_Gom12_ND1.picardstats.hist 8 | -------------------------------------------------------------------------------- /2014/0429-tools-for-cnv-detection/bash_scripts/140425_svdetect_DND_script.sh: -------------------------------------------------------------------------------- 1 | #PBS -l walltime=48:00:00,vmem=10gb,mem=10gb,nodes=1 2 | #PBS -N svdetect 3 | #PBS -M josiereinhardt@gmail.com 4 | #PBS -m e 5 | 6 | export PERL5LIB=~/perl5/lib/perl5 7 | module add samtools 8 | module add bedtools 9 | 10 | samtools view /N/dc2/scratch/joserein/CNVdata/svdetect/TD_Gom12_Drive.a600.sort.RG.norm.bam > /N/dc2/scratch/joserein/CNVdata/svdetect/TD_Gom12_Drive.a600.sort.RG.norm.sam 11 | samtools view /N/dc2/scratch/joserein/CNVdata/svdetect/TD_Gom12_Drive.a600.sort.RG.ab.bam > /N/dc2/scratch/joserein/CNVdata/svdetect/TD_Gom12_Drive.a600.sort.RG.ab.sam 12 | samtools view /N/dc2/scratch/joserein/CNVdata/svdetect/TD_Gom12_ND1.RG.norm.bam > /N/dc2/scratch/joserein/CNVdata/svdetect/TD_Gom12_ND1.RG.norm.sam 13 | samtools view /N/dc2/scratch/joserein/CNVdata/svdetect/TD_Gom12_ND1.RG.ab.bam > /N/dc2/scratch/joserein/CNVdata/svdetect/TD_Gom12_ND1.RG.ab.sam 14 | 15 | #Generation and filtering of links from the sample data 16 | SVDetect linking filtering -conf ~/drive.sv.conf 17 | 18 | #Generation and filtering of links from the reference data 19 | SVDetect linking filtering -conf ~/nondrive.sv.conf 20 | 21 | #Comparison of links between the two datasets 22 | SVDetect links2compare -conf ~/drive.sv.conf 23 | 24 | #Calculation of depth-of-coverage log-ratios 25 | SVDetect cnv ratio2bedgraph -conf ~/DND.svdet.cnv.conf 26 | -------------------------------------------------------------------------------- /2014/0429-tools-for-cnv-detection/bash_scripts/140427_CNVnator_histstatpartcall1.sh: -------------------------------------------------------------------------------- 1 | #PBS -l walltime=48:00:00,vmem=10gb,mem=10gb,nodes=1 2 | #PBS -N CNVnator_statpartcall1 3 | #PBS -j oe 4 | #PBS -k o 5 | #PBS -M josiereinhardt@gmail.com 6 | #PBS -m e 7 | 8 | export ROOTSYS=~/bin/root/ 9 | export PATH=$PATH:~/bin/root/bin/ 10 | export LD_LIBRARY_PATH=${LD_LIBRARY_PATH}:${ROOTSYS}/lib 11 | 12 | cnvnator -root /N/dc2/scratch/joserein/CNVdata/Td_Drive.root -his 5000 -d /N/dc2/scratch/joserein/genome_20k/ 13 | cnvnator -root /N/dc2/scratch/joserein/CNVdata/Td_Drive.root -stat 5000 14 | cnvnator -root /N/dc2/scratch/joserein/CNVdata/Td_Drive.root -partition 5000 15 | cnvnator -root /N/dc2/scratch/joserein/CNVdata/Td_Drive.root -call 5000 > /N/dc2/scratch/joserein/CNVdata/Td_drive.cnvnator.calls.out 16 | -------------------------------------------------------------------------------- /2014/0429-tools-for-cnv-detection/bash_scripts/140427_CNVnator_histstatpartcall2.sh: -------------------------------------------------------------------------------- 1 | #PBS -l walltime=48:00:00,vmem=10gb,mem=10gb,nodes=1 2 | #PBS -N CNVnator_statpartcall2 3 | #PBS -j oe 4 | #PBS -k o 5 | #PBS -M josiereinhardt@gmail.com 6 | #PBS -m e 7 | 8 | export ROOTSYS=~/bin/root/ 9 | export PATH=$PATH:~/bin/root/bin/ 10 | export LD_LIBRARY_PATH=${LD_LIBRARY_PATH}:${ROOTSYS}/lib 11 | 12 | cnvnator -root /N/dc2/scratch/joserein/CNVdata/Td_Nondrive1.root -his 5000 -d /N/dc2/scratch/joserein/genome_20k/ 13 | cnvnator -root /N/dc2/scratch/joserein/CNVdata/Td_Nondrive1.root -stat 5000 14 | cnvnator -root /N/dc2/scratch/joserein/CNVdata/Td_Nondrive1.root -partition 5000 15 | cnvnator -root /N/dc2/scratch/joserein/CNVdata/Td_Nondrive1.root -call 5000 > /N/dc2/scratch/joserein/CNVdata/Td_Nondrive1.cnvnator.calls.out 16 | -------------------------------------------------------------------------------- /2014/0429-tools-for-cnv-detection/bash_scripts/140428_svdetect_componly.sh: -------------------------------------------------------------------------------- 1 | #PBS -l walltime=48:00:00,vmem=10gb,mem=10gb,nodes=1 2 | #PBS -N svdetect 3 | #PBS -M josiereinhardt@gmail.com 4 | #PBS -m e 5 | 6 | export PERL5LIB=~/perl5/lib/perl5 7 | module add samtools 8 | module add bedtools 9 | 10 | ###This is a example of script for using SVDetect 11 | 12 | #Comparison of links between the two datasets 13 | SVDetect links2compare -conf ~/drive.sv.conf 14 | 15 | -------------------------------------------------------------------------------- /2014/0429-tools-for-cnv-detection/bash_scripts/drive.sv.conf: -------------------------------------------------------------------------------- 1 | 2 | input_format = sam 3 | sv_type = all 4 | mates_orientation=RF 5 | read1_length=50 6 | read2_length=50 7 | mates_file=/N/dc2/scratch/joserein/CNVdata/svdetect/TD_Gom12_Drive.a600.sort.RG.ab.sam 8 | cmap_file=/N/dc2/scratch/joserein/CNVdata/svdetect/genome.scf.20k.lens.svdetect.txt 9 | output_dir=/N/dc2/scratch/joserein/CNVdata/svdetect/ 10 | tmp_dir=tmp 11 | num_threads=2 12 | 13 | 14 | 15 | split_mate_file=1 16 | window_size=1000 17 | step_length=200 18 | 19 | 20 | 21 | split_link_file=0 22 | strand_filtering=1 23 | order_filtering=1 24 | insert_size_filtering=1 25 | nb_pairs_threshold=5 26 | nb_pairs_order_threshold=5 27 | indel_sigma_threshold=3 28 | dup_sigma_threshold=2 29 | singleton_sigma_threshold=4 30 | final_score_threshold=0.8 31 | mu_length=435 32 | sigma_length=76 33 | 34 | 35 | 36 | 37 | 190,190,190 = 1,2 38 | 0,0,0 = 3,3 39 | 0,0,255 = 4,4 40 | 0,255,0 = 5,5 41 | 153,50,205 = 6,7 42 | 255,140,0 = 8,10 43 | 255,0,0 = 11,10000 44 | 45 | 46 | 47 | 48 | list_samples=TD_Gom12_Drive.a600.sort.RG,TD_Gom12_ND1.RG 49 | list_read_lengths=50-50,50-50 50 | file_suffix=.ab.sam.all.links.filtered 51 | min_overlap=0.05 52 | same_sv_type=1 53 | circos_output=0 54 | bed_output=1 55 | sv_output=1 56 | 57 | 58 | -------------------------------------------------------------------------------- /2014/0429-tools-for-cnv-detection/bash_scripts/nondrive.sv.conf: -------------------------------------------------------------------------------- 1 | 2 | input_format = sam 3 | sv_type = all 4 | mates_orientation=RF 5 | read1_length=50 6 | read2_length=50 7 | mates_file=/N/dc2/scratch/joserein/CNVdata/svdetect/TD_Gom12_ND1.RG.ab.sam 8 | cmap_file=/N/dc2/scratch/joserein/CNVdata/svdetect/genome.scf.20k.lens.svdetect.txt 9 | output_dir=/N/dc2/scratch/joserein/CNVdata/svdetect/ 10 | tmp_dir=tmp 11 | num_threads=2 12 | 13 | 14 | 15 | split_mate_file=1 16 | window_size=1000 17 | step_length=200 18 | 19 | 20 | 21 | split_link_file=0 22 | strand_filtering=1 23 | order_filtering=1 24 | insert_size_filtering=1 25 | nb_pairs_threshold=5 26 | nb_pairs_order_threshold=5 27 | indel_sigma_threshold=3 28 | dup_sigma_threshold=2 29 | singleton_sigma_threshold=4 30 | final_score_threshold=0.8 31 | mu_length=429 32 | sigma_length=87 33 | 34 | 35 | 36 | 37 | 190,190,190 = 1,2 38 | 0,0,0 = 3,3 39 | 0,0,255 = 4,4 40 | 0,255,0 = 5,5 41 | 153,50,205 = 6,7 42 | 255,140,0 = 8,10 43 | 255,0,0 = 11,10000 44 | 45 | 46 | 47 | -------------------------------------------------------------------------------- /2014/0429-tools-for-cnv-detection/install/2014_April24: -------------------------------------------------------------------------------- 1 | Notebook for today 2 | 3 | I am working on trying out some fo the CNV tools availalbe to prepare for my talk on Tuesday. 4 | 5 | I am trying to install stuff on mason AND my computer because some tools dont' really work on OSX 6 | (this can be part of the talk). 7 | 8 | 9 | ##### MY DATA ###### 10 | 11 | My test data (which is really experimental data) consists of two bam files, which are 1 of the nondrive 12 | pools and the drive pool. Some of the programs allow doing this sort of case/control analysis. 13 | 14 | There will be some variants from this data, as some of the tools require some prefiltering (Such as 15 | removing the concordant reads, for Hydra) 16 | 17 | breakdancer and maybe some of the other programs require "read group data" if you want to compare 18 | multiple samples. I used a tool to ADD readgroup data because bwa does not include it by default (whoops). 19 | 20 | AddOrReplaceReadGroups.jar is a picard java application (/Localapps/picard-tools-1.108/picard-tools-1.108/) 21 | 22 | $ java -Xmx2g -jar AddOrReplaceReadGroups.jar INPUT=TD_Gom12_Drive.a600.sort.bam OUTPUT=TD_Gom12_Drive.a600.sort.RG.bam RGID=1 RGLB=drive RGPL=ILLUMINA RGPU=1 RGSM=TdGD VALIDATION_STRINGENCY=LENIENT 23 | $ java -Xmx2g -jar AddOrReplaceReadGroups.jar INPUT=TD_Gom12_ND1.a600.sort.bam OUTPUT=TD_Gom12_ND1.RG.bam RGLB=nondrive1 RGID=2 RGPL=ILLUMINA RGPU=2 RGSM=TdGND1 VALIDATION_STRINGENCY=LENIENT 24 | 25 | ##### CNVNATOR ###### 26 | 27 | I eventually got this to install apparently successfully on OSX BUT when I try to run it 28 | it will fail. I will see if I have better luck on mason! 29 | 30 | To install CNVnator you first have to install a program called root. If you want to install this 31 | from source it's a pain. There is a precompiled version for macosX but to get root to "work" on Mavericks 32 | you need to do some weird stuff because Mavericks doesn't have gcc anymore. Then, you need to install 33 | CNVnator, and set environment variables before running it. 34 | 35 | Each time I want to run CNVnator on mason I need to set these variables as part of the PBS script: 36 | 37 | export ROOTSYS=~/bin/root/ 38 | export PATH=$PATH:~/bin/root/bin/ 39 | export LD_LIBRARY_PATH=${LD_LIBRARY_PATH}:${ROOTSYS}/lib 40 | 41 | Test data for CNVnator: 42 | You need to include read group data for anaysis towork for CNVnator. I used a tool to ADD readgroup 43 | data because bwa does not include it by default (whoops). 44 | 45 | Adding readgroup data: 46 | 47 | 48 | Runnign CNVnator 49 | This program is multi step. This is kind of a pain since I need to write scripts to do it though I 50 | probably could do it all in 1 script I didn't. I'll try that out sometime. 51 | 52 | 53 | ##### breakdancer ##### 54 | 55 | http://breakdancer.sourceforge.net/ 56 | 57 | Breakdancer using Paired End information only in order to detect various types of CNVs. It includes 58 | options for comparing multiple libraries/samples, so it seems like it could be quite useful. 59 | 60 | Installing and running on OSX 61 | 62 | Overview: I had difficulty getting the C version fo breakdancer to install but ultimately I got an 63 | early perl version to work. 64 | 65 | (I downloaded BreakDancer_1.0 from the website becuase it includes the full perl version). 66 | 67 | $ perl bam2cfg.pl ~/Desktop/GenomePaper/CNV_detection/TD_Gom12_ND1.RG.bam ~/Desktop/GenomePaper/CNV_detection/TD_Gom12_Drive.a600.sort.RG.bam > ~/Desktop/GenomePaper/CNV_detection/TdGom_DvsND_CNV.cfg 68 | $ perl BreakDancerMax.pl -g ~/Desktop/GenomePaper/CNV_detection/TdGom_DvsND_CNV.bed -d ~/Desktop/GenomePaper/CNV_detection/TdGom_DvsND_CNV.bed -r 5 ~/Desktop/GenomePaper/CNV_detection/TdGom_DvsND_CNV.cfg > ~/Desktop/GenomePaper/CNV_detection/TdGom_DvsND_CNV.breakdancer.out.txt 69 | 70 | Installing and running on mason: 71 | 72 | I just downloaded a precompiled binary and it seems to run. I uploaded the files that came out of 73 | bam2cvg.pl above, TdGom_DvsND_CNV.bed and TdGom_DvsND_CNV.cfg, however I changed the .cfg file to have the 74 | proper paths: 75 | 76 | /N/dc2/scratch/joserein/CNVdata/ 77 | 78 | running breakdancer 79 | breakdancer /N/dc2/scratch/joserein/CNVdata/breakdancer/ 80 | -------------------------------------------------------------------------------- /2014/0429-tools-for-cnv-detection/install/2014_April25: -------------------------------------------------------------------------------- 1 | Continuing to try to run the CNV tools. 2 | 3 | ##### CNVNATOR ###### 4 | 5 | The hist runs I did yetserday failed - the program (stupidly) requries you to split your reference 6 | into a single fasta file per chromosome. 7 | 8 | I ran a perlscript I have (genesplit.pl) to create a directory of fastas. Then rerun adding that dir: 9 | 10 | cnvnator -root /N/dc2/scratch/joserein/CNVdata/Td_Nondrive1.root -his 100 -d /N/dc2/scratch/joserein/genome_20k/ 11 | 12 | Then rerun the other PBS scripts ("stat", then "part" then "call") 13 | 14 | 15 | 16 | ##### Hydra ###### 17 | 18 | Instructions can be found: https://code.google.com/p/hydra-sv/wiki/TypicalWorkflow 19 | 20 | I could not compile successfully on mavericks. Installation was trivial on mason. 21 | 22 | Hydra only uses discordant readpairs but apparently is fairly sophisticated in how it uses them 23 | 24 | Hydra requries a few steps of preprocessing to run. First, do a regular alignment with bwa (or bowtie2 etc), 25 | 26 | Then pull out only discordant reads using samtools. 27 | 28 | Then, use these reads as a dataset for a second alignment with a slower, more sensitive aligner. 29 | They recommend novoalign which IS available as a module on mason (so include module add novoalign on PBS). 30 | 31 | You can choose to do a third super sensitive alignment at this stage, or not. 32 | 33 | In order to figure out what values to use for the min and max insert size, as well as the deviation, 34 | (these are needed for two of hydra's scripts) I ran a picard program called CollectInsertSizeMetrics: 35 | 36 | java -Xmx2g -jar ~/programs/picard/CollectInsertSizeMetrics.jar INPUT=/N/dc2/scratch/joserein/CNVdata/TD_Gom12_Drive.a600.sort.RG.bam OUTPUT=/N/dc2/scratch/joserein/CNVdata/TD_Gom12_Drive.picardstats.txt VALIDATION_STRINGENCY=LENIENT HISTOGRAM_FILE=/N/dc2/scratch/joserein/CNVdata/TD_Gom12_Drive.picardstats.hist 37 | 38 | results: 39 | drive nondrive 40 | MEDIAN_INSERT_SIZE 452 460 41 | MEDIAN_ABSOLUTE_DEVIATION 27 24 42 | MIN_INSERT_SIZE 1 1 43 | MAX_INSERT_SIZE 759622 1588806 44 | MEAN_INSERT_SIZE 435 429 45 | STANDARD_DEVIATION 76.1 87.5 46 | 47 | Hydra's first script is pairDiscordants.py. The -n option says how often a read should be allowed 48 | to hit. It also requires a -z option for the max concordant insert and -y for minimum. 49 | From the manual: "pairDiscordants.py allows you to set the upper (-z) and lower 50 | (-y) bound for concordants (this should be derived by plotting the alignment distance of your pairs 51 | and computing median and m.a.d. values)." 52 | 53 | So... for my pairs, since there's little other guidance, I will take a shot in the dark and do 54 | +/- 10*MAD for -y and -z 55 | 56 | Hence: 57 | for drive: 58 | -i stdin -m hydra -z 722 -y 182 -n 1000 59 | for nondrive: 60 | -i stdin -m hydra -z 700 -y 220 -n 1000 61 | 62 | The second script is dedupDiscordants.py and it seems to be just another filter, they only mention one 63 | option -s nd not hwat it does. 64 | 65 | Then, running hydra, 66 | e.g.: hydra -in sample.disc.deduped.bedpe -out sample.breaks -mld 500 -mno 1500 67 | 68 | They say The -mld and -mno values are based on the statistics computed in tier 2. Use the Hydra help 69 | (-h) to find suggested approaches for computing the values for these parameters. 70 | 71 | -h says: 72 | 73 | -mld Maximum allowable length difference b/w mappings. 74 | Typically set to 10 * m.a.d. of the DNA fragment libraries. 75 | see: http://en.wikipedia.org/wiki/Median_absolute_deviation 76 | 77 | -mno Maximum allowable non-overlap b/w mappings. 78 | Typically set to median + (20 * m.a.d.) of the DNA fragment libraries. 79 | 80 | OK for my data that would be: 81 | drive: -mld 240 -mno 480 82 | nondrive: -mld 240 -mno 480 83 | 84 | I made a single script for each that would run these steps (everything after the first samtools filter) 85 | 86 | ~/scripts/PBSscripts/140425_Hydra_novoalign1/2.sh 87 | 88 | OK hydra failed because (I realized) I need to first have my original bamfiles sorted byseq name rather 89 | than by position... 90 | 91 | So I am going to "unsort" them as part of the samtools step 92 | 93 | Then, I will run the multi step script again. 94 | 95 | ####### SV_DETECT ####### 96 | 97 | SV detect uses sam (not bam) files as input. HOWEVER it requires for the SV mode to only have anomalously mapped 98 | pairs, and for the CNV mode (depth) to only have correctly mapped pairs. 99 | 100 | I could try using the hydra input (since basicallyt he same thing is done), but why not try to script that comes with SVDETECT? 101 | 102 | BAM_preprocessingPairs.pl 103 | 104 | Usage: BAM_preprocessing_pairs.pl [options] 105 | 106 | Options: -t BOOLEAN read type: =1 (Illumina), =0 (SOLiD) [1] 107 | -p BOOLEAN pair type: =1 (paired-end), =0 (mate-pair) [1] 108 | -n INTEGER number of pairs for calculating mu and sigma lengths [1000000] 109 | -s INTEGER minimum value of ISIZE for calculating mu and sigma lengths [0] 110 | -S INTEGER maximum value of ISIZE for calculating mu and sigma lengths [10000] 111 | -f REAL minimal number of sigma fold for filtering pairs [3] 112 | -d dump normal pairs into a file [] (optional) 113 | -o STRING output directory [working directory] 114 | 115 | perl ~/programs/SVDetect_r0.8b/scripts/BAM_preprocessingPairs.pl -n 10000000 -d -o /N/dc2/scratch/joserein/CNVdata/svdetect/ /N/dc2/scratch/joserein/CNVdata/TD_Gom12_Drive.a600.sort.RG.bam 116 | perl ~/programs/SVDetect_r0.8b/scripts/BAM_preprocessingPairs.pl -n 10000000 -d -o /N/dc2/scratch/joserein/CNVdata/svdetect/ /N/dc2/scratch/joserein/CNVdata/TD_Gom12_ND1.RG.bam 117 | 118 | SV_detect uses configuration files to run anything. Example are included with the software. 119 | -------------------------------------------------------------------------------- /2014/0429-tools-for-cnv-detection/results/Breakdancer_SVout.xlsx: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/umd-byob/presentations/23ad5271b4fcd24c6c3ebc94eb42e1b9eb19bd44/2014/0429-tools-for-cnv-detection/results/Breakdancer_SVout.xlsx -------------------------------------------------------------------------------- /2014/0429-tools-for-cnv-detection/results/SV_CNV_hist.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/umd-byob/presentations/23ad5271b4fcd24c6c3ebc94eb42e1b9eb19bd44/2014/0429-tools-for-cnv-detection/results/SV_CNV_hist.jpg -------------------------------------------------------------------------------- /2014/0429-tools-for-cnv-detection/results/SV_CNV_hist.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/umd-byob/presentations/23ad5271b4fcd24c6c3ebc94eb42e1b9eb19bd44/2014/0429-tools-for-cnv-detection/results/SV_CNV_hist.pdf -------------------------------------------------------------------------------- /2014/0429-tools-for-cnv-detection/results/SV_detect_CNVout.xlsx: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/umd-byob/presentations/23ad5271b4fcd24c6c3ebc94eb42e1b9eb19bd44/2014/0429-tools-for-cnv-detection/results/SV_detect_CNVout.xlsx -------------------------------------------------------------------------------- /2014/0429-tools-for-cnv-detection/results/SV_detect_SVout.xlsx: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/umd-byob/presentations/23ad5271b4fcd24c6c3ebc94eb42e1b9eb19bd44/2014/0429-tools-for-cnv-detection/results/SV_detect_SVout.xlsx -------------------------------------------------------------------------------- /2014/0429-tools-for-cnv-detection/results/SV_detect_details.xlsx: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/umd-byob/presentations/23ad5271b4fcd24c6c3ebc94eb42e1b9eb19bd44/2014/0429-tools-for-cnv-detection/results/SV_detect_details.xlsx -------------------------------------------------------------------------------- /2014/0506-r-graphics/Graphics.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/umd-byob/presentations/23ad5271b4fcd24c6c3ebc94eb42e1b9eb19bd44/2014/0506-r-graphics/Graphics.png -------------------------------------------------------------------------------- /2014/0506-r-graphics/README.md: -------------------------------------------------------------------------------- 1 | The Traditional R Graphical System 2 | ================================== 3 | -------------------------------------------------------------------------------- /2014/0506-r-graphics/cool.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/umd-byob/presentations/23ad5271b4fcd24c6c3ebc94eb42e1b9eb19bd44/2014/0506-r-graphics/cool.jpg -------------------------------------------------------------------------------- /2014/0506-r-graphics/example.r: -------------------------------------------------------------------------------- 1 | # For demo only. 2 | # Please refer to Paul Murrells R Graphics for precise and detailed info. 3 | 4 | set.seed(55) 5 | true.expr = round(exp(rgamma(5000, shape=5, rate=1))) 6 | names(true.expr) = paste("gene_", 1:length(true.expr)) 7 | head(true.expr) 8 | 9 | # ---- D. melanogaster data 10 | melan.fem = true.expr 11 | mal.scal = rep(c(4, 0.25, 1), c(500, 500, 4000)) 12 | melan.mal = true.expr * mal.scal 13 | col.melan.sex = ifelse(mal.scal==4, "red", 14 | ifelse(mal.scal==0.25, "yellow", "gray40")) 15 | table(col.melan.sex) 16 | 17 | # ---- D. persimilis data 18 | pers.fem = true.expr * rep(c(2, 1), c(200, 4800)) 19 | pers.mal = true.expr * rep(c(4, 0.25, 2, 1), c(500, 200, 300, 4000)) 20 | fc = rep(c(4, 0.25, 2, 1), c(500, 200, 300, 4000)) / rep(c(2, 1), c(200, 4800)) 21 | table(fc) 22 | col.pers.sex = ifelse(fc==0.25, "blue", 23 | ifelse(fc==2, "green", 24 | ifelse(fc==4, "gray60", "gray40"))) 25 | table(col.pers.sex) 26 | 27 | # --- Plot real expression levels 28 | # (x, y) plot: difficult to read 29 | plot(log2(melan.fem), log2(melan.mal), 30 | col=col.melan.sex, 31 | main="melan.fem - melan.mal = 0 ?") 32 | 33 | plot(log2(pers.fem), log2(pers.mal), 34 | col=col.pers.sex, 35 | main="pers.fem - pers.mal = 0 ?") 36 | 37 | # MA-plot: Easy to interpret vertical deviation 38 | M = log2(melan.fem) - log2(melan.mal) 39 | A = 0.5*(log2(melan.mal) + log2(melan.mal)) 40 | plot(A, M, col=col.melan.sex, 41 | main="melan.fem - melan.mal = 0 ?") 42 | 43 | M = log2(pers.fem) - log2(pers.mal) 44 | A = 0.5*(log2(pers.fem) + log2(pers.mal)) 45 | plot(A, M, col=col.pers.sex, 46 | main="melan.fem - melan.mal = 0 ?") 47 | 48 | # Add noise 49 | melan.fem.x = rnbinom(length(melan.fem), mu=melan.fem, size=1/0.1) 50 | melan.mal.x = rnbinom(length(melan.mal), mu=melan.mal, size=1/0.1) 51 | 52 | # (x, y) plot: difficult to read 53 | plot(log2(melan.fem.x), log2(melan.mal.x), 54 | col=densCols(log2(melan.fem.x), log2(melan.mal.x)), 55 | cex=0.3, pch=19) 56 | points(log2(melan.fem.x)[1:1000], log2(melan.mal.x)[1:1000], 57 | col=col.melan.sex[1:1000], cex=0.8) 58 | abline(a=0, b=1, col="red") 59 | 60 | # MA-plot: Easy to interpret vertical deviation 61 | M = log2(melan.fem.x) - log2(melan.mal.x) 62 | A = 0.5*(log2(melan.mal.x) + log2(melan.mal.x)) 63 | plot(A, M, col=densCols(A, M), pch=19, cex=0.3) 64 | points(A[1:1000], M[1:1000], col=col.melan.sex[1:1000], 65 | cex=0.8) 66 | abline(h=c(-2, -1, 0, 1, 2), lty=c(3, 2, 1, 2, 3)) 67 | box("figure", col="red") 68 | 69 | # change background 70 | plot(0, 0, 71 | ylim=c(-4.5, 4.5), 72 | xlim=c(0, 25), 73 | col=densCols(A, M), pch=19, cex=0.3, 74 | main="melan.fem - melan.mal = 0 ?") 75 | 76 | rect(par("usr")[1], 77 | par("usr")[3], 78 | par("usr")[2], 79 | par("usr")[4], 80 | col="gray30") 81 | 82 | points(A, M, col=densCols(A, M), pch=19, cex=0.5) 83 | points(A[1:1000], M[1:1000], col=col.melan.sex[1:1000]) 84 | abline(h=c(-2, -1, 0, 1, 2), lty=c(2, 2, 1, 2, 2), 85 | col="gray80") 86 | box("figure", col="red") 87 | 88 | # change margin 89 | par("mar") # see default 90 | par(mar=c(2.5, 2.5, 1, 0.5)) 91 | plot(A, M, col=densCols(A, M), pch=19, cex=0.3, 92 | main="melan.fem - melan.mal = 0 ?", 93 | xaxt="none", yaxt="none", 94 | xlab="", ylab="") 95 | 96 | rect(par("usr")[1], 97 | par("usr")[3], 98 | par("usr")[2], 99 | par("usr")[4], 100 | col="gray30") 101 | 102 | points(A, M, col=densCols(A, M), pch=19, cex=0.5) 103 | points(A[1:1000], M[1:1000], col=col.melan.sex[1:1000]) 104 | abline(h=c(-2, -1, 0, 1, 2), lty=c(2, 2, 1, 2, 2), 105 | col="gray80") 106 | box("figure", col="red") 107 | 108 | # x ---margins 109 | axis(side=1, at=seq(0, 25), labels=FALSE) 110 | mtext(text=seq(0, 25), side=1, line=0.5, 111 | at=seq(0, 25), 112 | cex=0.5) 113 | mtext(text="average expr", side=1, line=1.2, cex=0.8) 114 | 115 | # y ---margins 116 | axis(side=2, at=seq(-4.5, 4.5), labels=FALSE) 117 | mtext(text=seq(-4.5, 4.5), side=2, line=0.5, 118 | at=seq(-4.5, 4.5), 119 | cex=0.5) 120 | mtext(text="log FC", side=2, line=1.2, cex=0.8) 121 | 122 | # add legend 123 | rect(21, 3.7, 24.5, 4.8, col="white", cex=0.8) 124 | points(24, 4.5, pch=19, col="yellow", cex=0.8) 125 | points(24, 4, pch=19, col="red") 126 | text(24, 4.5, "fem baised", pos=2, cex=0.5) 127 | text(24, 4, "mal baised", pos=2, cex=0.5) 128 | 129 | box("figure", col="red") 130 | -------------------------------------------------------------------------------- /2014/0506-r-graphics/hiveF1.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/umd-byob/presentations/23ad5271b4fcd24c6c3ebc94eb42e1b9eb19bd44/2014/0506-r-graphics/hiveF1.jpg -------------------------------------------------------------------------------- /2014/0506-r-graphics/r_graphics_byob.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/umd-byob/presentations/23ad5271b4fcd24c6c3ebc94eb42e1b9eb19bd44/2014/0506-r-graphics/r_graphics_byob.pdf -------------------------------------------------------------------------------- /2014/0506-r-graphics/tsi_fig.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/umd-byob/presentations/23ad5271b4fcd24c6c3ebc94eb42e1b9eb19bd44/2014/0506-r-graphics/tsi_fig.pdf -------------------------------------------------------------------------------- /2014/0908-unix_tools/test.sam.gz: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/umd-byob/presentations/23ad5271b4fcd24c6c3ebc94eb42e1b9eb19bd44/2014/0908-unix_tools/test.sam.gz -------------------------------------------------------------------------------- /2014/0908-unix_tools/test_insert.sam.gz: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/umd-byob/presentations/23ad5271b4fcd24c6c3ebc94eb42e1b9eb19bd44/2014/0908-unix_tools/test_insert.sam.gz -------------------------------------------------------------------------------- /2014/0916-ncgas-and-sda/BYOB_NCGAS_SDA.pptx: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/umd-byob/presentations/23ad5271b4fcd24c6c3ebc94eb42e1b9eb19bd44/2014/0916-ncgas-and-sda/BYOB_NCGAS_SDA.pptx -------------------------------------------------------------------------------- /2014/0923-conserved-protein-domains/BYOB_presentation_Thomas_Peterson_09-23-14.pptx: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/umd-byob/presentations/23ad5271b4fcd24c6c3ebc94eb42e1b9eb19bd44/2014/0923-conserved-protein-domains/BYOB_presentation_Thomas_Peterson_09-23-14.pptx -------------------------------------------------------------------------------- /2014/0930-extreme-motif-detection/images/Trypanosoma_parasiteblood_cells_ger.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/umd-byob/presentations/23ad5271b4fcd24c6c3ebc94eb42e1b9eb19bd44/2014/0930-extreme-motif-detection/images/Trypanosoma_parasiteblood_cells_ger.jpg -------------------------------------------------------------------------------- /2014/0930-extreme-motif-detection/images/cluster_motif_profiles.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/umd-byob/presentations/23ad5271b4fcd24c6c3ebc94eb42e1b9eb19bd44/2014/0930-extreme-motif-detection/images/cluster_motif_profiles.png -------------------------------------------------------------------------------- /2014/0930-extreme-motif-detection/images/clustering_example.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/umd-byob/presentations/23ad5271b4fcd24c6c3ebc94eb42e1b9eb19bd44/2014/0930-extreme-motif-detection/images/clustering_example.png -------------------------------------------------------------------------------- /2014/0930-extreme-motif-detection/images/embj7594407-fig-0002-m.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/umd-byob/presentations/23ad5271b4fcd24c6c3ebc94eb42e1b9eb19bd44/2014/0930-extreme-motif-detection/images/embj7594407-fig-0002-m.jpg -------------------------------------------------------------------------------- /2014/0930-extreme-motif-detection/images/example_motif.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/umd-byob/presentations/23ad5271b4fcd24c6c3ebc94eb42e1b9eb19bd44/2014/0930-extreme-motif-detection/images/example_motif.png -------------------------------------------------------------------------------- /2014/0930-extreme-motif-detection/images/motivation_1.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/umd-byob/presentations/23ad5271b4fcd24c6c3ebc94eb42e1b9eb19bd44/2014/0930-extreme-motif-detection/images/motivation_1.png -------------------------------------------------------------------------------- /2014/1007-phylotranscriptomics-using-orthograph/BYOB_2014-10-07-files.tar: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/umd-byob/presentations/23ad5271b4fcd24c6c3ebc94eb42e1b9eb19bd44/2014/1007-phylotranscriptomics-using-orthograph/BYOB_2014-10-07-files.tar -------------------------------------------------------------------------------- /2014/1007-phylotranscriptomics-using-orthograph/BYOB_2014-10-07.pptx: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/umd-byob/presentations/23ad5271b4fcd24c6c3ebc94eb42e1b9eb19bd44/2014/1007-phylotranscriptomics-using-orthograph/BYOB_2014-10-07.pptx -------------------------------------------------------------------------------- /2014/1021-dont-fear-the-reapr/BYOB_10_21_14_Reapr.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/umd-byob/presentations/23ad5271b4fcd24c6c3ebc94eb42e1b9eb19bd44/2014/1021-dont-fear-the-reapr/BYOB_10_21_14_Reapr.pdf -------------------------------------------------------------------------------- /2014/README.md: -------------------------------------------------------------------------------- 1 | Contents 2 | ======== 3 | * **01/28** Introduction to Git and Github for Scientists ([Keith Hughitt](https://github.com/khughitt)) 4 | * **02/18** FPKM, RPKM, KPKM, TPM, TMM and Other RNA-Seq Normalization Issues ([Steve Mount](http://www.clfs.umd.edu/labs/mount/)) 5 | * **03/11** Git with Confidence ([Lee Mendelowitz](https://github.com/LeeMendelowitz)) 6 | * **04/08** Sequence Clustering ([Ted Gibbons](https://github.com/trgibbons)) 7 | * **04/22** Prediction of functional variants using SnpEff/SnpSift (Will Gammerdinger) 8 | * **04/29** Tools for detecting insertions, deletions, and other CNVs using Illumina data ([Josie Reinhardt]) 9 | * **05/06** The Traditional R Graphical System ([Kwame Okrah](https://github.com/kokrah)) 10 | * **09/08** Unix command line tools: tips and tricks ([Chris Hill](https://github.com/cmhill)) 11 | * **09/16** The NCGAS Mason Cluster and the Scholarly Data Archive (Will Gammerdinger) 12 | * **09/23** Comparative Genomics for Human Disease Using Conserved Protein Domains (Thomas Peterson) 13 | * **09/30** An EXTREME approach to motif detection ([Keith Hughitt](https://github.com/khughitt)) 14 | * **10/07** Phylotranscriptomics using Orthograph (Josie Reinhardt) 15 | * **10/21** Don't fear the REAPR: improved genome assembly with mate-pairs and long reads (Matt Conte) 16 | -------------------------------------------------------------------------------- /2015/0210-RNA-Seq_expression_threshold/BYOB_2-10-15.pptx: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/umd-byob/presentations/23ad5271b4fcd24c6c3ebc94eb42e1b9eb19bd44/2015/0210-RNA-Seq_expression_threshold/BYOB_2-10-15.pptx -------------------------------------------------------------------------------- /2015/0210-RNA-Seq_expression_threshold/README.md: -------------------------------------------------------------------------------- 1 | BYOB: February 10, 2015 2 | Kevin Nyberg 3 | 4 | Choosing an expression threshold for your RNA-Seq data. -------------------------------------------------------------------------------- /2015/0224-Provean-mutation-effects/PROVEAN_BYOB.pptx: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/umd-byob/presentations/23ad5271b4fcd24c6c3ebc94eb42e1b9eb19bd44/2015/0224-Provean-mutation-effects/PROVEAN_BYOB.pptx -------------------------------------------------------------------------------- /2015/0224-Provean-mutation-effects/README.md: -------------------------------------------------------------------------------- 1 | BYOB: February 24, 2015 2 | Will Gammerdinger 3 | 4 | PROVEAN: A Tool for Predicting Functional Impacts of Nonsynonymous Mutations -------------------------------------------------------------------------------- /2015/0303-shiny-interactive-data/demo1/README.md: -------------------------------------------------------------------------------- 1 | Shiny demo: K-means clustering 2 | ============================== 3 | 4 | Overview 5 | -------- 6 | 7 | ![I. versicolor](../images/1280px-Iris_versicolor_3.jpg) 8 | (source: [Wikipedia](http://en.wikipedia.org/wiki/Iris_flower_data_set#mediaviewer/File:Iris_versicolor_3.jpg)) 9 | 10 | The first demo application is a very simple Shiny app which includes one UI 11 | control (a slider) and one output (a scatterplot). The application demonstrates 12 | the use of kmeans clustering to group datapoints in [Fisher's iris 13 | dataset](http://en.wikipedia.org/wiki/Iris_flower_data_set) which includes 14 | three related iris flower species and some properties (sepal height, petal 15 | length, etc.) for a collection of representative flowers from each species. 16 | 17 | Usage 18 | ----- 19 | 20 | ![demo 1](../images/demo1.png) 21 | 22 | To run this demo, open up an R console in the directory containing `server.R` 23 | and `ui.R` and type: 24 | 25 | ```r 26 | library(shiny) 27 | runApp() 28 | ``` 29 | 30 | -------------------------------------------------------------------------------- /2015/0303-shiny-interactive-data/demo1/server.R: -------------------------------------------------------------------------------- 1 | library(shiny) 2 | library(dplyr) 3 | library(ggplot2) 4 | 5 | shinyServer(function(input, output) { 6 | output$plot = renderPlot({ 7 | dat = tbl_df(iris) %>% select(-Species) 8 | result = kmeans(dat, input$k) 9 | dat = cbind(dat, cluster=result$cluster) 10 | qplot(Sepal.Length, Sepal.Width, data=dat, color=factor(cluster)) 11 | }) 12 | }) 13 | 14 | -------------------------------------------------------------------------------- /2015/0303-shiny-interactive-data/demo1/ui.R: -------------------------------------------------------------------------------- 1 | library(shiny) 2 | 3 | shinyUI(fluidPage( 4 | titlePanel("Kmeans clustering demo"), 5 | sidebarLayout( 6 | sidebarPanel( 7 | sliderInput("k", "Number of clusters:", min= 1, max = 10, value=2) 8 | ), 9 | mainPanel( 10 | plotOutput("plot") 11 | ) 12 | ) 13 | )) 14 | -------------------------------------------------------------------------------- /2015/0303-shiny-interactive-data/demo2/README.md: -------------------------------------------------------------------------------- 1 | Shiny demo #2: RNA-Seq normalization 2 | ==================================== 3 | 4 | Overview 5 | -------- 6 | 7 | ![Fission yeast](../images/Fission_yeast.jpg) 8 | (Source: [Wikipedia](http://en.wikipedia.org/wiki/Schizosaccharomyces_pombe#mediaviewer/File:Fission_yeast.jpg)) 9 | 10 | In this demo, a Shiny app is constructed to allow users to explore the impacts 11 | of various normalization methods on RNA-Seq data. 12 | 13 | The two methods compared are: 14 | 15 | - Size factor normalization (using 16 | [DESeq](http://bioconductor.org/packages/release/bioc/html/DESeq.html)) 17 | - Quantile normalization 18 | 19 | 20 | A publically available [fission yeast time series 21 | dataset](http://bioconductor.org/packages/release/data/experiment/vignettes/fission/inst/doc/fission.html) 22 | is used for the demonstration. You can read more about the original experiment 23 | [here](http://www.ncbi.nlm.nih.gov/pubmed/24853205). 24 | 25 | One thing to note with this demo compared to the previous one is the use of 26 | [reactive expressions](http://shiny.rstudio.com/tutorial/lesson6/). There are 27 | a couple variables in `server.R` which are defined using the `reactive()` 28 | function. The effect of this is that the values of the variables will be 29 | cached, and the expressions will only be re-evaluated when something they 30 | depend on changes. In this case, the selection of differing normalization 31 | methods using the select box triggers both of the reactive expressions to be 32 | re-evaluated. The `renderPlot()` function used to create the output plot is 33 | also implicitly reactive and is re-run each time its underlying data changes. 34 | 35 | Usage 36 | ----- 37 | 38 | ![demo 2](../images/demo2.png) 39 | 40 | To run this demo, open up an R console in the directory containing `server.R` 41 | and `ui.R` and type: 42 | 43 | ```r 44 | library(shiny) 45 | runApp() 46 | ``` 47 | 48 | -------------------------------------------------------------------------------- /2015/0303-shiny-interactive-data/demo2/server.R: -------------------------------------------------------------------------------- 1 | library(shiny) 2 | library(dplyr) 3 | library(ggplot2) 4 | library(reshape2) 5 | library(preprocessCore) 6 | library(DESeq) 7 | library(fission) 8 | 9 | shinyServer(function(input, output) { 10 | # Count matrix and metadata 11 | data(fission) 12 | count_matrix = data.frame(assay(fission)) 13 | metadata = colData(fission) 14 | 15 | # normalized counts 16 | counts_normed = reactive({ 17 | # No normalization 18 | if(input$norm_method == 'none') { 19 | return(count_matrix) 20 | } else if(input$norm_method == 'size_factor') { 21 | # Size factor normalization 22 | x = newCountDataSet(count_matrix, metadata$strain) 23 | x = estimateSizeFactors(x) 24 | return(as.data.frame(counts(x, normalize=TRUE))) 25 | } else if (input$norm_method == 'quantile') { 26 | # Quantile normalization 27 | x = as.data.frame(normalize.quantiles(as.matrix(count_matrix), 28 | copy=TRUE)) 29 | rownames(x) = rownames(count_matrix) 30 | colnames(x) = colnames(count_matrix) 31 | return(x) 32 | } 33 | }) 34 | 35 | # Convert counts to long format 36 | counts_long = reactive({ 37 | # get log-normalized counts 38 | counts_wide = log2(counts_normed() + 1) 39 | counts_wide$id = rownames(counts_wide) 40 | long = melt(counts_wide, id=c("id")) 41 | colnames(long) = c("gene_id", "sample_id", "expr") 42 | long$strain = metadata$strain[match(long$sample_id, 43 | rownames(metadata))] 44 | return(long) 45 | }) 46 | 47 | # Boxplot 48 | output$count_distributions = renderPlot({ 49 | plt = ggplot(data=counts_long(), aes(x=sample_id, y=expr)) + 50 | geom_boxplot(aes(fill=strain)) + 51 | theme(axis.text.x=element_text(angle=90, hjust=1)) 52 | print(plt) 53 | }) 54 | 55 | # Summary table 56 | output$metadata_table = renderDataTable( 57 | cbind(id=rownames(metadata), data.frame(metadata)), 58 | options=list(pageLength=5) 59 | ) 60 | }) 61 | 62 | -------------------------------------------------------------------------------- /2015/0303-shiny-interactive-data/demo2/ui.R: -------------------------------------------------------------------------------- 1 | library(shiny) 2 | 3 | shinyUI(fluidPage( 4 | titlePanel("Shiny demo #2: RNA-Seq normalization methods"), 5 | sidebarLayout( 6 | sidebarPanel( 7 | selectInput("norm_method", "Normalization method:", 8 | c("None" = "none", 9 | "Size Factor Normalization" = "size_factor", 10 | "Quantile Normalization" = "quantile")), 11 | width=2 12 | ), 13 | mainPanel( 14 | plotOutput("count_distributions"), 15 | dataTableOutput('metadata_table') 16 | 17 | ) 18 | ) 19 | )) 20 | -------------------------------------------------------------------------------- /2015/0303-shiny-interactive-data/images/1280px-Iris_versicolor_3.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/umd-byob/presentations/23ad5271b4fcd24c6c3ebc94eb42e1b9eb19bd44/2015/0303-shiny-interactive-data/images/1280px-Iris_versicolor_3.jpg -------------------------------------------------------------------------------- /2015/0303-shiny-interactive-data/images/Fission_yeast.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/umd-byob/presentations/23ad5271b4fcd24c6c3ebc94eb42e1b9eb19bd44/2015/0303-shiny-interactive-data/images/Fission_yeast.jpg -------------------------------------------------------------------------------- /2015/0303-shiny-interactive-data/images/demo1.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/umd-byob/presentations/23ad5271b4fcd24c6c3ebc94eb42e1b9eb19bd44/2015/0303-shiny-interactive-data/images/demo1.png -------------------------------------------------------------------------------- /2015/0303-shiny-interactive-data/images/demo2.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/umd-byob/presentations/23ad5271b4fcd24c6c3ebc94eb42e1b9eb19bd44/2015/0303-shiny-interactive-data/images/demo2.png -------------------------------------------------------------------------------- /2015/0303-shiny-interactive-data/images/shiny-hello-world.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/umd-byob/presentations/23ad5271b4fcd24c6c3ebc94eb42e1b9eb19bd44/2015/0303-shiny-interactive-data/images/shiny-hello-world.png -------------------------------------------------------------------------------- /2015/0310-Hombrew-at-BYOB/homebrew_install.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/umd-byob/presentations/23ad5271b4fcd24c6c3ebc94eb42e1b9eb19bd44/2015/0310-Hombrew-at-BYOB/homebrew_install.png -------------------------------------------------------------------------------- /2015/0310-Hombrew-at-BYOB/unix_timeline.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/umd-byob/presentations/23ad5271b4fcd24c6c3ebc94eb42e1b9eb19bd44/2015/0310-Hombrew-at-BYOB/unix_timeline.png -------------------------------------------------------------------------------- /2015/0331-Amplicon-sequencing-microbial-communities/Allard_BYOB.pptx: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/umd-byob/presentations/23ad5271b4fcd24c6c3ebc94eb42e1b9eb19bd44/2015/0331-Amplicon-sequencing-microbial-communities/Allard_BYOB.pptx -------------------------------------------------------------------------------- /2015/0331-Amplicon-sequencing-microbial-communities/README.md: -------------------------------------------------------------------------------- 1 | BYOB: March 31, 2015 2 | Sarah Allard 3 | 4 | Using amplicon sequencing to analyze microbial community structure and diversity. -------------------------------------------------------------------------------- /2015/0414-Annovar-annotate-genetic-variants/ANNOVAR_Tutorial_BYOB_04-13-15_Peterson.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/umd-byob/presentations/23ad5271b4fcd24c6c3ebc94eb42e1b9eb19bd44/2015/0414-Annovar-annotate-genetic-variants/ANNOVAR_Tutorial_BYOB_04-13-15_Peterson.pdf -------------------------------------------------------------------------------- /2015/0414-Annovar-annotate-genetic-variants/README.md: -------------------------------------------------------------------------------- 1 | BYOB: April 14, 2015 2 | Thomas Peterson 3 | 4 | Using Annovar to annotate genetic variants -------------------------------------------------------------------------------- /2015/0421-DELLY-annotate-CNVs/150216_bwamem.sh: -------------------------------------------------------------------------------- 1 | #PBS -l walltime=24:00:00,vmem=10gb,mem=10gb,nodes=8 2 | #PBS -N 150216_bwamem_TDGD 3 | #PBS -M josiereinhardt@gmail.com 4 | #PBS -m e 5 | 6 | module add bwa 7 | module add samtools 8 | 9 | bwa mem -t 8 /N/dc2/scratch/joserein/Genome2015/assembly.CA.coords9.79.15.17.0.05u.fa /N/dc2/scratch/joserein/rawreads/TD_Gom12_Drive_R1.fastq.gz /N/dc2/scratch/joserein/rawreads/TD_Gom12_Drive_R2.fastq.gz | samtools view -bS - > /N/dc2/scratch/joserein/bwa_split_newgenome/TDGD_full2015_150212_bwasplit2.bam 10 | samtools sort /N/dc2/scratch/joserein/bwa_split_newgenome/TDGD_full2015_150212_bwasplit2.bam /N/dc2/scratch/joserein/bwa_split_newgenome/TDGD_full2015_150212_bwasplit2.sort 11 | samtools index /N/dc2/scratch/joserein/bwa_split_newgenome/TDGD_full2015_150212_bwasplit2.sort.bam -------------------------------------------------------------------------------- /2015/0421-DELLY-annotate-CNVs/150217_delly.sh: -------------------------------------------------------------------------------- 1 | #PBS -l walltime=36:00:00,vmem=5gb,mem=5gb,nodes=1 2 | #PBS -N 150217_delly 3 | #PBS -M josiereinhardt@gmail.com 4 | #PBS -m e 5 | 6 | delly -t DEL -o /N/dc2/scratch/joserein/DELLY/TDG_deletions.vcr -g /N/dc2/scratch/joserein/Genome2015/assembly.CA.coords9.79.15.17.0.05u.fa /N/dc2/scratch/joserein/bwa_split_newgenome/TDGD_full2015_150212_bwasplit2.sort.bam /N/dc2/scratch/joserein/bwa_split_newgenome/TDGND1_full2015contig_150212_bwamem.sort.bam /N/dc2/scratch/joserein/bwa_split_newgenome/TDGND2_full2015contig_150212_bwamem.sort.bam 7 | -------------------------------------------------------------------------------- /2015/0421-DELLY-annotate-CNVs/150217_dellydup.sh: -------------------------------------------------------------------------------- 1 | #PBS -l walltime=36:00:00,vmem=5gb,mem=5gb,nodes=1 2 | #PBS -N 150217_dellydup 3 | #PBS -M josiereinhardt@gmail.com 4 | #PBS -m e 5 | 6 | delly -t DUP -o /N/dc2/scratch/joserein/DELLY/TDG_duplications.vcr -g /N/dc2/scratch/joserein/Genome2015/assembly.CA.coords9.79.15.17.0.05u.fa /N/dc2/scratch/joserein/bwa_split_newgenome/TDGD_full2015_150212_bwasplit2.sort.bam /N/dc2/scratch/joserein/bwa_split_newgenome/TDGND1_full2015contig_150212_bwamem.sort.bam /N/dc2/scratch/joserein/bwa_split_newgenome/TDGND2_full2015contig_150212_bwamem.sort.bam 7 | -------------------------------------------------------------------------------- /2015/0421-DELLY-annotate-CNVs/150217_dellyinv.sh: -------------------------------------------------------------------------------- 1 | #PBS -l walltime=36:00:00,vmem=5gb,mem=5gb,nodes=1 2 | #PBS -N 150217_dellyinv 3 | #PBS -M josiereinhardt@gmail.com 4 | #PBS -m e 5 | 6 | delly -t INV -o /N/dc2/scratch/joserein/DELLY/TDG_inversions.vcr -g /N/dc2/scratch/joserein/Genome2015/assembly.CA.coords9.79.15.17.0.05u.fa /N/dc2/scratch/joserein/bwa_split_newgenome/TDGD_full2015_150212_bwasplit2.sort.bam /N/dc2/scratch/joserein/bwa_split_newgenome/TDGND1_full2015contig_150212_bwamem.sort.bam /N/dc2/scratch/joserein/bwa_split_newgenome/TDGND2_full2015contig_150212_bwamem.sort.bam 7 | -------------------------------------------------------------------------------- /2015/0421-DELLY-annotate-CNVs/CNV_parse_dellyVCF_PASSVAR.pl: -------------------------------------------------------------------------------- 1 | #!/usr/bin/perl 2 | 3 | # filters through VCF files from Delly, looks for the cases where 1 sample has a variant 4 | # call and the other 2 don't. 5 | 6 | # col 6 - pass filter 7 | # col 9 - sample 1 8 | # col 10 - sample 2 9 | # col 11 - sample 3 10 | # etc... 11 | 12 | open IN, $ARGV[0]; 13 | open OUT, ">$ARGV[1]"; 14 | 15 | $header = ; 16 | $first = 0; 17 | $second = 0; 18 | $third = 0; 19 | $firstandsecond = 0; 20 | $firstandthird = 0; 21 | $secondandthird = 0; 22 | 23 | until (eof IN) { 24 | $line = ; 25 | chomp $line; 26 | @lineA = split /\t/, $line; 27 | $passfail = $lineA[6]; 28 | $sam1 = $lineA[9]; 29 | $sam2 = $lineA[10]; 30 | $sam3 = $lineA[11]; 31 | if ($line =~ /^#/) { 32 | print OUT $line . "\n"; 33 | next; 34 | } 35 | 36 | if ($sam1 =~ /^1\/1/) { 37 | if ($sam2 =~ /^0\/0/ and $sam3 =~ /^0\/0/) { 38 | if ($sam1 =~ /PASS/) { 39 | print OUT $line . "\tfirst" . "\n"; 40 | ++$first; 41 | } 42 | } elsif ($sam2 =~ /^0\/0/ and $sam3 =~ /^1\/1/) { 43 | if ($sam1 =~ /PASS/ and $sam3 =~ /PASS/) { 44 | print OUT $line . "\t1stand3rd" . "\n"; 45 | ++$firstandthird; 46 | } 47 | } elsif ($sam3 =~ /^0\/0/ and $sam2 =~ /^1\/1/) { 48 | if ($sam1 =~ /PASS/ and $sam2 =~ /PASS/) { 49 | print OUT $line . "\t1stand2nd" . "\n"; 50 | ++$firstandsecond; 51 | } 52 | } 53 | } elsif ($sam2 =~ /^1\/1/) { 54 | if ($sam3 =~ /^0\/0/ and $sam1 =~ /^0\/0/) { 55 | if ($sam2 =~ /PASS/) { 56 | print OUT $line . "\t2nd" . "\n"; 57 | ++$second; 58 | } 59 | } elsif ($sam1 =~ /^0\/0/ and $sam3 =~ /^1\/1/) { 60 | if ($sam2 =~ /PASS/ and $sam3 =~ /PASS/) { 61 | print OUT $line . "\t2ndand3rd" . "\n"; 62 | ++$secondandthird; 63 | } 64 | } 65 | } elsif ($sam3 =~ /^1\/1/) { 66 | if ($sam1 =~ /^0\/0/ and $sam2 =~ /^0\/0/) { 67 | if ($sam3 =~ /PASS/) { 68 | print OUT $line . "\t3rd" . "\n"; 69 | ++$third; 70 | } 71 | } 72 | } 73 | } 74 | print "first\tsecond\tthird\tfirstandsecond\tfirstandthird\tsecondandthird\n"; 75 | print $first . "\t" . $second . "\t" . $third ."\t" . $firstandsecond . "\t" . $firstandthird . "\t" . $secondandthird . "\n"; 76 | -------------------------------------------------------------------------------- /2015/0421-DELLY-annotate-CNVs/DELLY_BYOB.pptx: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/umd-byob/presentations/23ad5271b4fcd24c6c3ebc94eb42e1b9eb19bd44/2015/0421-DELLY-annotate-CNVs/DELLY_BYOB.pptx -------------------------------------------------------------------------------- /2015/0421-DELLY-annotate-CNVs/Delly_results_sm.xlsx: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/umd-byob/presentations/23ad5271b4fcd24c6c3ebc94eb42e1b9eb19bd44/2015/0421-DELLY-annotate-CNVs/Delly_results_sm.xlsx -------------------------------------------------------------------------------- /2015/0421-DELLY-annotate-CNVs/README.md: -------------------------------------------------------------------------------- 1 | BYOB: April 21, 2015 2 | Josie Reinhardt 3 | 4 | Using DELLY to annotate CNVs in genomic resequencing data -------------------------------------------------------------------------------- /2015/0910-reproducible-manuscripts-with-rmarkdown/README.md: -------------------------------------------------------------------------------- 1 | Writing reproducible manuscripts with RMarkdown 2 | =============================================== 3 | 4 | [Keith Hughitt](mailto:khughitt@umd.edu) 5 | 6 | September 10, 2015 7 | 8 | ![Markdown document with corresponding PDF](images/markdown_pdf_example.png) 9 | 10 | Overview 11 | -------- 12 | 13 | In this tutorial, we will explore the use of 14 | [Markdown](http://daringfireball.net/projects/markdown/) for writing scientific 15 | manuscripts. After discussing some of the strengths and weaknesses of using 16 | Markdown over other approaches such as [LaTeX](http://www.latex-project.org/) 17 | or [Google Docs](https://www.google.com/docs/about/), the basic steps of 18 | putting together a Markdown-based paper will be described. Next, examples will 19 | be provided on various aspects of Markdown manuscript generation, including: 20 | 21 | - Figures and tables 22 | - PDF generation using [pandoc](http://pandoc.org/getting-started.html) 23 | - Bibliography management using [BibTeX](http://www.bibtex.org/) and the 24 | [Pandoc citeproc extension](https://github.com/jgm/pandoc-citeproc) 25 | - Extending Markdown documents with LaTeX. 26 | 27 | Finally, to tie things together, we will discuss the use of 28 | [knitr](http://yihui.name/knitr/) and [RMarkdown](http://rmarkdown.rstudio.com/) 29 | to create fully-reproducible manuscripts containing figures and tables 30 | generated in [R](https://www.r-project.org/). 31 | 32 | Why Markdown? 33 | ------------- 34 | 35 | Before we get to far, it is worth considering the merits (and drawbacks) of 36 | using Markdown for writing scientific manuscripts in the first place. 37 | 38 | ### Advantages 39 | 40 | - Its _easy_ 41 | - Plain-text 42 | - Simple, light-weight syntax 43 | - Use any editor 44 | - Un-rendered markdown is nearly as readable as rendered 45 | - Can use version control to track history 46 | - Markdown is becoming increasingly common (Github, Stack overflow, etc.) 47 | - Easier to learn than LaTeX 48 | - Supports embedded LaTeX and HTML 49 | - Can be rendered to PDF, HTML, LaTeX, Word, etc. 50 | - Easy bibliography management using BibTeX and Pandoc 51 | - Can be readily combined with R to automatically generate plots and tables 52 | using R code. 53 | 54 | ### Disadvantages 55 | 56 | - Collaborative writing not as straight-forward 57 | - For technically-savvy, using Git to provides a simple way to collaborate 58 | on a document 59 | - [Markx](https://github.com/yoavram/markx) is another potentially interesting 60 | collaborative Markdown editor aimed at scientists. 61 | - Still, useful features like reviewer comments will are missing. 62 | - Less flexible than LaTeX or Word 63 | - Because of the simplicity of the Markdown syntax, the formatting options 64 | available are fairly limited. 65 | - In particular, image handling is very basic in Markdown. 66 | - This can be overcome, however, by using bits of embedded LaTeX. 67 | 68 | Converting Markdown to PDF 69 | -------------------------- 70 | 71 | To render a plain Markdown document (as opposed to an RMarkdown document), you 72 | can use the Pandoc command: 73 | 74 | ```sh 75 | pandoc -i input.md output.pdf 76 | ``` 77 | 78 | Example 1: Basic Markdown document 79 | ---------------------------------- 80 | 81 | To begin, lets create a simply Markdown example to demonstrate how to include 82 | figures, tables, and formulas. 83 | 84 | - [Example 1](examples/01-simple-markdown-document.md) 85 | 86 | 87 | Example 2: Adding LaTeX to improve flexibility 88 | ---------------------------------------------- 89 | 90 | In the next example, we will see how adding a bit of LaTeX to our Markdown 91 | documents can give us a much finer control of image placement and other 92 | formatting issues. 93 | 94 | - [Example 2](examples/02-markdown-and-latex.md) 95 | 96 | Example 3: Using BibTeX to add a bibliography 97 | --------------------------------------------- 98 | 99 | In the next example, we will see how BibTeX and the pandoc `citeproc` extension 100 | can be used include references into a manuscript. 101 | 102 | - [Example 3](examples/03-including-a-bibliography.md) 103 | 104 | For RMarkdown documents, another approach would be to use the `knitcitations` 105 | package by Carl Boettiger: 106 | 107 | http://www.carlboettiger.info/2012/05/30/knitcitations.html 108 | 109 | This allows you to simply include DOIs inline and have a bibliography 110 | automatically generated for you. 111 | 112 | Example 4: Reproducible manuscripts with RMarkdown 113 | -------------------------------------------------- 114 | 115 | Finally, bringing it all together, we will look at an example RMarkdown 116 | document which includes all of the above elements with figures and 117 | tables generated on-the-fly using knitr. 118 | 119 | - [Example 4](examples/04-rmarkdown-manuscript-example.Rmd) 120 | 121 | Further reading 122 | --------------- 123 | 124 | **Markdown-based manuscripts** 125 | 126 | 1. [Simple template for scientific manuscripts in R markdown](http://www.petrkeil.com/?p=2401) 127 | 2. [Writing academic papers in Markdown using Sublime Text and Pandoc](http://nikolasander.com/writing-in-markdown/) 128 | 3. [Writing Scientific Papers Using Markdown](https://danieljhocking.wordpress.com/2014/12/09/writing-scientific-papers-using-markdown/) 129 | 4. [Markdown-for-Manuscripts](https://github.com/djhocking/Markdown-for-Manuscripts) 130 | 5. [Opening Science (a book written in Markdown)](http://book.openingscience.org/) 131 | 6. [Library Data: R Markdown and Figures](http://www.rci.rutgers.edu/~ag978/litdata/figs/) 132 | 133 | -------------------------------------------------------------------------------- /2015/0910-reproducible-manuscripts-with-rmarkdown/examples/01-simple-markdown-document.md: -------------------------------------------------------------------------------- 1 | --- 2 | title: A Simple Markdown Document 3 | author: Keith Hughitt 4 | date: 2015/09/09 5 | --- 6 | 7 | Overview 8 | ======== 9 | 10 | This is a simple example demonstrating the use of external figures, tables, and 11 | formulas in a Markdown document. 12 | 13 | There are actually multiple Markdown standards in common use. In additional to 14 | the [main standard](http://daringfireball.net/projects/markdown/), there are also 15 | [extensions](https://en.wikipedia.org/wiki/Markdown#Extensions) to the standard 16 | from from [Github](https://help.github.com/articles/github-flavored-markdown/), 17 | [Pandoc](http://pandoc.org/README.html#pandocs-markdown), etc. Here, we will 18 | make use of some of the extended table support in the Pandoc markdown 19 | extension. 20 | 21 | ### Table example 22 | 23 | Below is an example table in the Markdown pipe-style table. This is the default 24 | output from the knitr `kable` function. 25 | 26 | | | Sepal.Length| Sepal.Width| Petal.Length| Petal.Width|Species | 27 | |:---|------------:|-----------:|------------:|-----------:|:----------| 28 | |1 | 5.1| 3.5| 1.4| 0.2|setosa | 29 | |2 | 4.9| 3.0| 1.4| 0.2|setosa | 30 | |3 | 4.7| 3.2| 1.3| 0.2|setosa | 31 | |50 | 5.0| 3.3| 1.4| 0.2|setosa | 32 | |51 | 7.0| 3.2| 4.7| 1.4|versicolor | 33 | |52 | 6.4| 3.2| 4.5| 1.5|versicolor | 34 | |53 | 6.9| 3.1| 4.9| 1.5|versicolor | 35 | |100 | 5.7| 2.8| 4.1| 1.3|versicolor | 36 | |101 | 6.3| 3.3| 6.0| 2.5|virginica | 37 | |102 | 5.8| 2.7| 5.1| 1.9|virginica | 38 | |103 | 7.1| 3.0| 5.9| 2.1|virginica | 39 | |104 | 6.3| 2.9| 5.6| 1.8|virginica | 40 | 41 | Table: Subset of the Iris [1] dataset. 42 | 43 | For more examples of table styles supported by Pandoc Markdown, see the 44 | [Pandoc 45 | Documentation](http://pandoc.org/README.html#tables). 46 | 47 | ### Formulas 48 | 49 | Pandoc and RMarkdown (which uses Pandoc under the hood) both support arbitrary 50 | inline or block LaTeX expressions. 51 | 52 | $$ 53 | \arg\min_{\mathbf{w},\mathbf{\xi}, b } \left\{\frac{1}{2} \|\mathbf{w}\|^2 + C\sum_{i=1}^n \xi_i \right\} 54 | $$ 55 | 56 | Soft margin SVM objective function. (source: 57 | [Wikipedia](http://en.wikipedia.org/wiki/Support_vector_machine)) 58 | 59 | ### Figure example 60 | 61 | ![Figure 1: Classification of Iris datasets using SVMs](images/iris_svm.png) 62 | 63 | 64 | References 65 | ---------- 66 | 67 | 1. R. A. FISHER. “THE USE OF MULTIPLE MEASUREMENTS IN TAXONOMIC PROBLEMS”. In: _Annals of Eugenics_ 7.2 (Sep. 68 | 1936), pp. 179-188. DOI: 10.1111/j.1469-1809.1936.tb02137.x. . 70 | 71 | 72 | 73 | -------------------------------------------------------------------------------- /2015/0910-reproducible-manuscripts-with-rmarkdown/examples/01-simple-markdown-document.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/umd-byob/presentations/23ad5271b4fcd24c6c3ebc94eb42e1b9eb19bd44/2015/0910-reproducible-manuscripts-with-rmarkdown/examples/01-simple-markdown-document.pdf -------------------------------------------------------------------------------- /2015/0910-reproducible-manuscripts-with-rmarkdown/examples/02-markdown-and-latex.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/umd-byob/presentations/23ad5271b4fcd24c6c3ebc94eb42e1b9eb19bd44/2015/0910-reproducible-manuscripts-with-rmarkdown/examples/02-markdown-and-latex.pdf -------------------------------------------------------------------------------- /2015/0910-reproducible-manuscripts-with-rmarkdown/examples/03-including-a-bibliography.md: -------------------------------------------------------------------------------- 1 | --- 2 | title: Using BibTeX to add a bibliography 3 | author: Keith Hughitt 4 | date: 2015/09/10 5 | bibliography: references.bib 6 | --- 7 | 8 | Overview 9 | ======== 10 | 11 | In this example, the use of BibTeX for incorporating references into a Markdown 12 | manuscript is described. 13 | 14 | [BibTeX](http://www.bibtex.org/Format/) is a file format used for describing 15 | references and is commonly used in conjunction with LaTeX documents. A `.bib` 16 | file contains one or more reference entries, each of which contains all of the 17 | necessary information to describe a single bibliography entry. 18 | 19 | An example entry might look like: 20 | 21 | ```tex 22 | @article{Yizhak2015, 23 | author = {Yizhak, Keren and Chaneton, Barbara and Gottlieb, Eyal and Ruppin, Eytan}, 24 | keywords = {11,15252,20145307,2015,817,accepted 26 may 2015,cancer metabolism,doi 10,genome-scale simulations,metabolic modeling,mol syst biol,msb,received 22 october 2014,revised 4 april}, 25 | pages = {1--17}, 26 | title = {{Modeling cancer metabolism on a genome scale}}, 27 | year = {2015} 28 | } 29 | 30 | Additional fields for including an abstract, isbn, etc. may also be included. 31 | ``` 32 | 33 | Notice in particular the string just after the opening bracket on the first 34 | line: `Yizhak2015`. This will be the reference key you use to cite the artcile 35 | in your document text (more on this later.) 36 | 37 | Most popular reference managers are capable of generating .bib files for a 38 | collection of references. For example, in Mendeley, you can use the "export" 39 | option on a selected set of references to generate a BibTeX file. An easy way to 40 | organize your references then is to simply have a separate folder for each 41 | manuscript in Mendeley. Indeed there is even an option to have Mendeley 42 | automatically generate a .bib file for each folder you create, such that 43 | whenever a new reference is added to that folder, the corresponding .bib file 44 | is automatically updated. 45 | 46 | Adding a BibTeX bibliography 47 | ============================ 48 | 49 | After generating a BibTeX bibliography, you need to tell Pandoc where to find 50 | it. To do this, edit the YAML block at the top of your Markdown document, and 51 | add a `bibliography` entry: 52 | 53 | ``` 54 | --- 55 | title: Using BibTeX to add a bibliography 56 | author: Keith Hughitt 57 | date: 2015/09/10 58 | bibliography: references.bib 59 | --- 60 | ``` 61 | 62 | To cite an article in your bibliography, you simply add an entry of the 63 | following form: 64 | 65 | ``` 66 | [@Ref1; @Ref2; etc.] 67 | ``` 68 | 69 | For example, here we cite the first two entries from our bibliography 70 | [@Shuman2013; @Yan2015], and a few more [@Yao2015; @Clarke2008; @Yin2015; 71 | @Yizhak2015]. 72 | 73 | To have a references section automaticaly generated and appended to the bottom 74 | of the file, we can then use the excellent [pandoc citeproc 75 | extension](https://github.com/jgm/pandoc-citeproc): 76 | 77 | ``` 78 | pandoc --filter pandoc-citeproc input.md -o output.pdf 79 | ``` 80 | 81 | Specifying the citation style to use 82 | ==================================== 83 | 84 | To specify which citation and bibliography style should be used for the output 85 | PDF, you can use the `--csl` citeproc parameter to choose from one of [over 7500 86 | possible citation styles](https://github.com/citation-style-language/styles). 87 | 88 | Simply download the citation style (.csl) file for the format you wish to use 89 | and modify your pandoc command to include it, e.g.: 90 | 91 | ``` 92 | pandoc --filter pandoc-citeproc input.md --csl nucleic-acids-research.csl -o output.pdf 93 | ``` 94 | 95 | Some additional examples of how to include citations in Markdown are described 96 | in the [RMarkdown documentation](http://rmarkdown.rstudio.com/authoring_bibliographies_and_citations.html). 97 | 98 | 99 | References 100 | ========== 101 | 102 | -------------------------------------------------------------------------------- /2015/0910-reproducible-manuscripts-with-rmarkdown/examples/03-including-a-bibliography.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/umd-byob/presentations/23ad5271b4fcd24c6c3ebc94eb42e1b9eb19bd44/2015/0910-reproducible-manuscripts-with-rmarkdown/examples/03-including-a-bibliography.pdf -------------------------------------------------------------------------------- /2015/0910-reproducible-manuscripts-with-rmarkdown/examples/04-rmarkdown-manuscript-example.Rmd: -------------------------------------------------------------------------------- 1 | --- 2 | title: RMarkdown Manuscript Example 3 | author: Keith Hughitt 4 | date: 2015/09/10 5 | bibliography: 04-rmarkdown-manuscript-example.bib 6 | csl: nucleic-acids-research.csl 7 | header-includes: 8 | - \usepackage[font=small,labelfont=bf]{caption} 9 | - \usepackage{wrapfig} 10 | - \usepackage{framed} 11 | - \usepackage{float} 12 | - \pagenumbering{gobble} 13 | output: 14 | pdf_document: 15 | keep_tex: true 16 | 17 | fig_caption: yes 18 | --- 19 | 20 | 23 | \setlength\intextsep{0pt} 24 | 25 | 30 |
31 | ![](images/placeholder.png) 32 |
33 | 34 | Overview 35 | -------- 36 | 37 | RMarkdown [@Allaire2015] documents are simply Markdown documents with chunks of 38 | R code embedded in them. When building the document, R code chunks are executed 39 | using knitr [@Yihui2015] and the outputs from each code block are embedded in 40 | the resulting file. 41 | 42 | To parse our example RMarkdown file, instead of calling pandoc directly, we 43 | will now use the `render()` function of the 44 | [rmarkdown](http://rmarkdown.rstudio.com/) library. 45 | 46 | For example, to render this file, open up an R console in the directory 47 | containing this file and run: 48 | 49 | ```r 50 | library('rmarkdown') 51 | render('04-rmarkdown-manuscript-example.Rmd') 52 | ``` 53 | 54 | You will have to first install rmarkdown if it is not already installed on your 55 | system. 56 | 57 | RMarkdown figures 58 | ----------------- 59 | 60 | Example figure from the [ggtree](https://github.com/GuangchuangYu/ggtree) 61 | [@Guangchuang2015; @Wickham2009] vignette: 62 | 63 | ```{r load_libraries, include=FALSE, echo=FALSE} 64 | # load libraries 65 | library("ggtree") 66 | library("ggplot2") 67 | library("colorspace") 68 | ``` 69 | 70 | ```{r ggtree_example, echo=FALSE, warning=FALSE, dpi=96, fig.width=600/96, fig.cap='ggtree example figure'} 71 | # load tree 72 | nwk = system.file("extdata", "sample.nwk", package="ggtree") 73 | tree = read.tree(nwk) 74 | 75 | # assign OTUs 76 | cls = list(c1=c("A", "B", "C", "D", "E"), 77 | c2=c("F", "G", "H"), 78 | c3=c("L", "K", "I", "J"), 79 | c4="M") 80 | tree = groupOTU(tree, cls) 81 | 82 | # plot tree 83 | ggtree(tree, aes(color=group, linetype=group)) + 84 | geom_text(aes(label=label), hjust=-.25) + 85 | scale_color_manual(values=c("black", rainbow_hcl(4))) + 86 | theme(legend.position="right") 87 | ``` 88 | 89 | Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor 90 | incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis 91 | nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat. 92 | Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu 93 | fugiat nulla pariatur. Excepteur sint occaecat cupidatat non proident, sunt in 94 | culpa qui officia deserunt mollit anim id est laborum. 95 | 96 | ```{r ggtree_example2, dpi=96, out.width='0.5\\textwidth', fig.align='right', fig.cap='Another example from the ggtree vignette'} 97 | ggtree(tree) %>% hilight(21, "steelblue") 98 | ``` 99 | 100 | Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor 101 | incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis 102 | nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat. 103 | Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu 104 | fugiat nulla pariatur. Excepteur sint occaecat cupidatat non proident, sunt in 105 | culpa qui officia deserunt mollit anim id est laborum. 106 | 107 | Some additional tips 108 | -------------------- 109 | 110 | ### Getting BibTeX entries for R packages 111 | 112 | To get a BibTeX-formatted reference entry for an R package, use the 113 | `citation()` and `toBibtex()` functions: 114 | 115 | ```r 116 | toBibtex(citation('package-name')) 117 | ``` 118 | 119 | Note that you will have to manually add a reference @key for each entry 120 | generated this way. 121 | 122 | ### Set fig_caption: true to ensure that captions show up 123 | 124 | Figure captions can be specified in knitr using the `fig.cap` [chunk 125 | option](http://yihui.name/knitr/options/). To ensure that the captions are 126 | displayed properly, however, you will want to set `fig_caption: true` in the 127 | YAML header block, under the `pdf_document` section. 128 | 129 | ### Set keep_tex: true to help with debugging 130 | 131 | RMarkdown documents are converted to LaTeX first before being converted into 132 | PDF and other formats. When debugging formatting, etc. issues, it may be help 133 | to enable this option in the `pdf_document` section on the YAML metadata block 134 | at the top of your file. 135 | 136 | System information 137 | ------------------ 138 | 139 | ```{r sysinfo} 140 | sessionInfo() 141 | ``` 142 | 143 | References 144 | ---------- 145 | 146 | -------------------------------------------------------------------------------- /2015/0910-reproducible-manuscripts-with-rmarkdown/examples/04-rmarkdown-manuscript-example.bib: -------------------------------------------------------------------------------- 1 | @Article{Guangchuang2015, 2 | title = {ggtree: an R package for visualization and annotation of phylogenetic tree with different types of meta-data}, 3 | author = {Guangchuang Yu and David Smith and Huachen Zhu and Yi Guan and Tommy Tsan-Yuk Lam}, 4 | year = {submitted}, 5 | journal = {Methods in Ecology and Evolution}, 6 | } 7 | @Book{Wickham2009, 8 | author = {Hadley Wickham}, 9 | title = {ggplot2: elegant graphics for data analysis}, 10 | publisher = {Springer New York}, 11 | year = {2009}, 12 | isbn = {978-0-387-98140-6}, 13 | url = {http://had.co.nz/ggplot2/book}, 14 | } 15 | @Book{Yihui2015, 16 | title = {Dynamic Documents with {R} and knitr}, 17 | author = {Yihui Xie}, 18 | publisher = {Chapman and Hall/CRC}, 19 | address = {Boca Raton, Florida}, 20 | year = {2015}, 21 | edition = {2nd}, 22 | note = {ISBN 978-1498716963}, 23 | url = {http://yihui.name/knitr/}, 24 | } 25 | @Manual{Allaire2015, 26 | title = {rmarkdown: Dynamic Documents for R}, 27 | author = {JJ Allaire and Joe Cheng and Yihui Xie and Jonathan McPherson and Winston Chang and Jeff Allen and Hadley Wickham and Aron Atkins and Rob Hyndman}, 28 | year = {2015}, 29 | note = {R package version 0.8}, 30 | url = {http://CRAN.R-project.org/package=rmarkdown}, 31 | } 32 | > 33 | -------------------------------------------------------------------------------- /2015/0910-reproducible-manuscripts-with-rmarkdown/examples/04-rmarkdown-manuscript-example.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/umd-byob/presentations/23ad5271b4fcd24c6c3ebc94eb42e1b9eb19bd44/2015/0910-reproducible-manuscripts-with-rmarkdown/examples/04-rmarkdown-manuscript-example.pdf -------------------------------------------------------------------------------- /2015/0910-reproducible-manuscripts-with-rmarkdown/examples/images/iris_svm.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/umd-byob/presentations/23ad5271b4fcd24c6c3ebc94eb42e1b9eb19bd44/2015/0910-reproducible-manuscripts-with-rmarkdown/examples/images/iris_svm.png -------------------------------------------------------------------------------- /2015/0910-reproducible-manuscripts-with-rmarkdown/examples/images/placeholder.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/umd-byob/presentations/23ad5271b4fcd24c6c3ebc94eb42e1b9eb19bd44/2015/0910-reproducible-manuscripts-with-rmarkdown/examples/images/placeholder.png -------------------------------------------------------------------------------- /2015/0910-reproducible-manuscripts-with-rmarkdown/examples/images/ponyo.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/umd-byob/presentations/23ad5271b4fcd24c6c3ebc94eb42e1b9eb19bd44/2015/0910-reproducible-manuscripts-with-rmarkdown/examples/images/ponyo.png -------------------------------------------------------------------------------- /2015/0910-reproducible-manuscripts-with-rmarkdown/examples/nucleic-acids-research.csl: -------------------------------------------------------------------------------- 1 | 2 | 130 | -------------------------------------------------------------------------------- /2015/0910-reproducible-manuscripts-with-rmarkdown/examples/references.bib: -------------------------------------------------------------------------------- 1 | @article{Shuman2013, 2 | abstract = {In applications such as social, energy, transportation, sensor, and neuronal networks, high-dimensional data naturally reside on the vertices of weighted graphs. The emerging field of signal processing on graphs merges algebraic and spectral graph theoretic concepts with computational harmonic analysis to process such signals on graphs. In this tutorial overview, we outline the main challenges of the area, discuss different ways to define graph spectral domains, which are the analogs to the classical frequency domain, and highlight the importance of incorporating the irregular structures of graph data domains when processing signals on graphs. We then review methods to generalize fundamental operations such as filtering, translation, modulation, dilation, and downsampling to the graph setting and survey the localized, multiscale transforms that have been proposed to efficiently extract information from high-dimensional data on graphs. We conclude with a brief discussion of open issues and possible extensions.}, 3 | archivePrefix = {arXiv}, 4 | arxivId = {1211.0053}, 5 | author = {Shuman, David I. and Narang, Sunil K. and Frossard, Pascal and Ortega, Antonio and Vandergheynst, Pierre}, 6 | doi = {10.1109/MSP.2012.2235192}, 7 | eprint = {1211.0053}, 8 | isbn = {1053-5888 VO - 30}, 9 | issn = {10535888}, 10 | journal = {IEEE Signal Processing Magazine}, 11 | number = {3}, 12 | pages = {83--98}, 13 | title = {{The emerging field of signal processing on graphs: Extending high-dimensional data analysis to networks and other irregular domains}}, 14 | volume = {30}, 15 | year = {2013} 16 | } 17 | @article{Yan2015, 18 | author = {Yan, Gang and Tsekenis, Georgios and Barzel, Baruch and Slotine, Jean-Jacques and Liu, Yang-Yu and Barab\'{a}si, Albert-L\'{a}szl\'{o}}, 19 | doi = {10.1038/nphys3422}, 20 | issn = {1745-2473}, 21 | journal = {Nature Physics}, 22 | number = {August}, 23 | title = {{Spectrum of controlling and observing complex networks}}, 24 | url = {http://www.nature.com/doifinder/10.1038/nphys3422}, 25 | year = {2015} 26 | } 27 | @article{Yao2015, 28 | author = {Yao, Shun and Yoo, Shinjae and Yu, Dantong}, 29 | doi = {10.1186/s12859-015-0710-1}, 30 | issn = {1471-2105}, 31 | journal = {BMC Bioinformatics}, 32 | keywords = {gene expression data,gene regulatory networks,granger causality,time series}, 33 | number = {1}, 34 | pages = {273}, 35 | title = {{Prior knowledge driven Granger causality analysis on gene regulatory network discovery}}, 36 | url = {http://www.biomedcentral.com/1471-2105/16/273}, 37 | volume = {16}, 38 | year = {2015} 39 | } 40 | @article{Clarke2008, 41 | abstract = {High-throughput genomic and proteomic technologies are widely used in cancer research to build better predictive models of diagnosis, prognosis and therapy, to identify and characterize key signalling networks and to find new targets for drug development. These technologies present investigators with the task of extracting meaningful statistical and biological information from high-dimensional data spaces, wherein each sample is defined by hundreds or thousands of measurements, usually concurrently obtained. The properties of high dimensionality are often poorly understood or overlooked in data modelling and analysis. From the perspective of translational science, this Review discusses the properties of high-dimensional data spaces that arise in genomic and proteomic studies and the challenges they can pose for data analysis and interpretation.}, 42 | author = {Clarke, Robert and Ressom, Habtom W and Wang, Antai and Xuan, Jianhua and Liu, Minetta C and Gehan, Edmund a and Wang, Yue}, 43 | doi = {10.1038/nrc2294}, 44 | isbn = {1474-1768 (Electronic)$\backslash$r1474-175X (Linking)}, 45 | issn = {1474-175X}, 46 | journal = {Nature reviews. Cancer}, 47 | number = {1}, 48 | pages = {37--49}, 49 | pmid = {18097463}, 50 | title = {{The properties of high-dimensional data spaces: implications for exploring gene and protein expression data.}}, 51 | volume = {8}, 52 | year = {2008} 53 | } 54 | @article{Yin2015, 55 | author = {Yin, Weiwei and Garimalla, Swetha and Moreno, Alberto and Galinski, Mary R. and Styczynski, Mark P.}, 56 | doi = {10.1186/s12918-015-0194-7}, 57 | issn = {1752-0509}, 58 | journal = {BMC Systems Biology}, 59 | keywords = {Bayesian networks,Network learning algorithm,Tree-,bayesian networks,malaria,network learning algorithm,non-human primate,tree-like networks}, 60 | number = {1}, 61 | pages = {49}, 62 | publisher = {BMC Systems Biology}, 63 | title = {{A tree-like Bayesian structure learning algorithm for small-sample datasets from complex biological model systems}}, 64 | url = {http://www.biomedcentral.com/1752-0509/9/49}, 65 | volume = {9}, 66 | year = {2015} 67 | } 68 | @article{Yizhak2015, 69 | author = {Yizhak, Keren and Chaneton, Barbara and Gottlieb, Eyal and Ruppin, Eytan}, 70 | keywords = {11,15252,20145307,2015,817,accepted 26 may 2015,cancer metabolism,doi 10,genome-scale simulations,metabolic modeling,mol syst biol,msb,received 22 october 2014,revised 4 april}, 71 | pages = {1--17}, 72 | title = {{Modeling cancer metabolism on a genome scale}}, 73 | year = {2015} 74 | } 75 | -------------------------------------------------------------------------------- /2015/0910-reproducible-manuscripts-with-rmarkdown/images/markdown_pdf_example.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/umd-byob/presentations/23ad5271b4fcd24c6c3ebc94eb42e1b9eb19bd44/2015/0910-reproducible-manuscripts-with-rmarkdown/images/markdown_pdf_example.png -------------------------------------------------------------------------------- /2015/0924-trees-with-ete2/README.md: -------------------------------------------------------------------------------- 1 | Jupyter Notebook version of presentation is in this directory. 2 | -------------------------------------------------------------------------------- /2015/0924-trees-with-ete2/named.fasttree: -------------------------------------------------------------------------------- 1 | ((A._aeolicus_NusG:0.49304,(E._coli_NusG:0.43700,(Synechocystis_sp._NusG:0.57409,(B._subtilis_NusG:0.25014,(T._thermophilus_NusG:0.36426,M._tuberculosis_NusG:0.39848)0.488:0.08259)0.863:0.11053)0.176:0.03586)0.859:0.16494)0.994:0.45988,((M._xanthus_TaA:0.73658,Bacteroides_sp._UpxY:0.81904)0.985:0.50710,((V._cholera_RfaH:0.42967,(Y._pestis_RfaH:0.15659,(S._marcescens_RfaH:0.04928,(S._typhi_RfaH:0.06213,(E._coli_RfaH:0.00055,S._flexneri_RfaH:0.01019)0.926:0.05084)0.999:0.20091)0.815:0.10181)0.977:0.31461)0.964:0.48726,(K._pneumoniae_ActX:0.97385,((E._coli_ActX:0.0,M._morganii_ActX:0.0):0.00456,S._enteritidis_ActX:0.00055)0.996:1.18849)0.890:0.47524)0.915:0.35747)0.784:0.15982,((M._jannaschii_Spt5:0.30499,(P._furiosus_Spt5:0.38398,(S._acidocaldarius_Spt5:0.24418,A._ambivalens_Spt5:0.11073)1.000:0.59354)0.729:0.18574)1.000:1.07722,((((B._amyloliquefaciens_LoaP:0.18383,B._brevis_LoaP:0.24965)0.996:0.45593,(P._polymyxa_LoaP:0.29078,C._cellulolyticum_LoaP:0.25178)1.000:0.72688)0.789:0.13866,(T._wiegelii_LoaP:0.21551,C._subterraneus_LoaP:0.21879)0.986:0.30125)0.772:0.10435,A._metalliredigens_LoaP:0.47937)0.947:0.35903)0.559:0.07384); 2 | -------------------------------------------------------------------------------- /2015/1008-snakemake/Snakefile: -------------------------------------------------------------------------------- 1 | 2 | samples = ['A', 'B', 'C'] 3 | final_files = expand("mapped_reads/{s}.bam", s=samples) 4 | #print(final_files) 5 | 6 | rule a: 7 | input: final_files 8 | 9 | rule bwa_map: 10 | input: 11 | fasta="data/genome.fa", 12 | fastq="data/samples/{sample}.fastq" 13 | 14 | output: "mapped_reads/{sample}.bam" 15 | shell: 16 | """ 17 | bwa mem {input.fasta} {input.fastq} \\ 18 | | samtools view -Sb - > {output} 19 | """ 20 | 21 | # vim: ft=python 22 | -------------------------------------------------------------------------------- /2015/1008-snakemake/prepare-environment.sh: -------------------------------------------------------------------------------- 1 | if [ ! -e snakemake-tutorial-data.tar.gz ]; then 2 | wget https://bitbucket.org/johanneskoester/snakemake/downloads/snakemake-tutorial-data.tar.gz 3 | else 4 | echo "tarball exists; skipping download" 5 | fi 6 | if [ ! -e data/genome.fa ]; then 7 | tar -xf snakemake-tutorial-data.tar.gz 8 | else 9 | echo "tarball appears to already be extracted; skipping" 10 | fi 11 | 12 | if [ ! $(conda env list | grep "snakemake-tutorial" | wc -l) -ne 0 ]; then 13 | conda create -n snakemake-tutorial -c bioconda --file requirements.txt python=3 14 | else 15 | echo "conda environment 'snakemake-tutorial' already exists" 16 | fi 17 | -------------------------------------------------------------------------------- /2015/1008-snakemake/rnaseq-dag.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/umd-byob/presentations/23ad5271b4fcd24c6c3ebc94eb42e1b9eb19bd44/2015/1008-snakemake/rnaseq-dag.png -------------------------------------------------------------------------------- /2015/1008-snakemake/serve-slides.sh: -------------------------------------------------------------------------------- 1 | python -m SimpleHTTPServer 8000 2 | -------------------------------------------------------------------------------- /2015/1008-snakemake/slides.html: -------------------------------------------------------------------------------- 1 | 2 | 3 | 4 | Title 5 | 6 | 18 | 19 | 20 | 22 | 25 | 26 | 27 | -------------------------------------------------------------------------------- /2015/1008-snakemake/source.markdown: -------------------------------------------------------------------------------- 1 | class: center, middle 2 | # Reproducible bioinformatics workflows with Snakemake 3 | 4 | .right[Ryan Dale, PhD] 5 | .right[Laboratory of Cellular and Developmental Biology] 6 | .right[NIDDK, NIH] 7 | 8 | --- 9 | class: middle 10 | 11 | # What is snakemake? 12 | [Snakemake](https://bitbucket.org/johanneskoester/snakemake) is a *workflow 13 | management system*. 14 | 15 | Snakemake *workflows* are essentially Python scripts plus some declarative code 16 | that define *rules*. 17 | 18 | Rules describe how to create output files from input files. 19 | 20 | When your script gets too unwieldy and you think *"there must be a better way"* 21 | then it's probably time to use a workflow management system. 22 | --- 23 | 24 | # Example use-case: complete RNA-seq analysis 25 | 26 | Start with FASTQ files and a config file describing experimental design 27 | - Trim reads (`cutadapt`) 28 | - Map to genome (`HISAT2`) 29 | - Evaluate rRNA and directional bias (`picard`) 30 | - Remove multimappers (`samtools`) 31 | - Count reads in genes (`subread/FeatureCounts`) 32 | - DE analysis (`R/Bioconductor/DESeq2`) 33 | - Run `cufflinks` (soon `kallisto`) 34 | - FastQC at every stage 35 | - Aggregate QC and results into HTML reports 36 | 37 | End with HTML reports with links to final results, plots, stats, descriptions 38 | ready to send to collaborator. 39 | 40 | --- 41 | 42 | background-image: url(rnaseq-dag.png) 43 | 44 | --- 45 | # Other workflow management systems 46 | 47 | Tool | Link 48 | --------------:|------: 49 | `make` | https://www.gnu.org/software/make/ 50 | `ruffus` | http://www.ruffus.org.uk/ 51 | `Rake` | http://docs.seattlerb.org/rake/ 52 | `luigi` | https://github.com/spotify/luigi 53 | `bpipe` | http://docs.bpipe.org/ 54 | Galaxy | https://usegalaxy.org/ 55 | `paver` | http://paver.github.io/paver/ 56 | 57 | Most handle dependencies and only running those jobs that need updating. 58 | 59 | Differences are in philosophy, syntax, features. 60 | 61 | --- 62 | class: middle 63 | # Rules 64 | 65 | A *rule* defines how to go from an input file to an output file. 66 | 67 | Typically written in a file called `Snakefile`. 68 | 69 | ```python 70 | rule sort: 71 | input: "words.txt" 72 | output: "sorted-words.txt" 73 | shell: "sort {input} > {output}" 74 | ``` 75 | 76 | Execute with: 77 | ```bash 78 | snakemake 79 | ``` 80 | 81 | --- 82 | class: middle 83 | # Wildcards 84 | Often we use wildcards in a rule 85 | 86 | Ask for a file at the command line; snakemake will figure out what to do. 87 | 88 | ```python 89 | rule sort: 90 | input: "{filename}" 91 | output: "sorted-{filename}" 92 | shell: "sort {input} > {output}" 93 | ``` 94 | 95 | Execute with: 96 | ```bash 97 | snakemake sorted-words.txt 98 | ``` 99 | 100 | 101 | --- 102 | ###How to think like snakemake 103 | ```bash 104 | snakemake sorted-words.txt 105 | ``` 106 | -- 107 | Look in the current directory for a file called `Snakefile` and parse it. 108 | 109 | -- 110 | 111 | Look for a rule whose "`output:`" matches the provided file, `sorted-words.txt` 112 | 113 | -- 114 | 115 | ```python 116 | rule sort: 117 | input: "{filename}" 118 | output: "sorted-{filename}" 119 | shell: "sort {input} > {output}" 120 | ``` 121 | -- 122 | 123 | Use the discovered wildcards (here, `{filename} = words.txt`) to fill in `input`. 124 | 125 | -- 126 | 127 | ```python 128 | rule sort: 129 | input: "words.txt" 130 | output: "sorted-words.txt" 131 | shell: "sort {input} > {output}" 132 | ``` 133 | -- 134 | 135 | Is `sorted-words.txt` missing? Older than `words.txt`? Run the rule, filling in 136 | `{input}` and `{output}` as appropriate. 137 | 138 | -- 139 | 140 | ```bash 141 | sort words.txt > sorted-words.txt 142 | ``` 143 | 144 | --- 145 | class: middle 146 | 147 | # Why snakemake in particular? 148 | 149 | --- 150 | class: middle 151 | 152 | ## Parallelization is trivial 153 | 154 | runs on single machine or cluster with *no changes to code* 155 | 156 | ```bash 157 | snakemake # one core 158 | ``` 159 | 160 | ```bash 161 | snakemake -j8 # 8 cores 162 | ``` 163 | 164 | ```bash 165 | snakemake -j100 --cluster "qsub" # submit up to 100 jobs at a time, PBS/SGE 166 | ``` 167 | ```bash 168 | snakemake -j100 --cluster "sbatch" # same, but SLURM 169 | ``` 170 | 171 | --- 172 | class: middle 173 | ## Bash and Python can be used interchangeably 174 | 175 | ```python 176 | rule sort_bash: 177 | input: "words.txt" 178 | output: "sorted-words.txt" 179 | shell: "sort {input} > {output}" 180 | ``` 181 | ```python 182 | rule sort_python: 183 | input: "words.txt" 184 | output: "sorted-words.txt" 185 | run: 186 | with open(str(output[0])) as fout: 187 | for line in sorted(open(str(input[0]))): 188 | fout.write(line) 189 | ``` 190 | 191 | ```python 192 | rule sort_bash_from_python: 193 | input: "words.txt" 194 | output: "sorted-words.txt" 195 | run: 196 | print("starting to sort", input[0]) 197 | shell("sort {input} > {output}") 198 | print("sorted", output[0]) 199 | ``` 200 | --- 201 | class: middle 202 | # Other nice things 203 | 204 | - Snakemake is an extension of the Python language. *Any valid Python code is 205 | valid Snakemake code*. 206 | 207 | - Generate HTML reports. 208 | 209 | - Temporary and protected files 210 | 211 | - Create output dirs 212 | 213 | - Diagrams showing the DAG of jobs to be run 214 | 215 | - Tracking changes in code: you can detect when parameters changed 216 | 217 | - Configure per-job resources (number of threads, cluster memory, etc) 218 | 219 | - Call R via `rpy2`. (In practice, I tend to have separate scripts since `rpy2` 220 | can be finicky) 221 | 222 | 223 | --- 224 | class: middle 225 | 226 | # [snakemake tutorial](http://htmlpreview.github.io/?https://bitbucket.org/johanneskoester/snakemake/raw/master/snakemake-tutorial.html) 227 | 228 | 229 | 230 | -------------------------------------------------------------------------------- /2015/1022-allele-specific-expression/1022-AlleleSpecificExpression.pptx: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/umd-byob/presentations/23ad5271b4fcd24c6c3ebc94eb42e1b9eb19bd44/2015/1022-allele-specific-expression/1022-AlleleSpecificExpression.pptx -------------------------------------------------------------------------------- /2015/1022-allele-specific-expression/README.md: -------------------------------------------------------------------------------- 1 | BYOB: October 22, 2015 2 | Kevin Nyberg 3 | 4 | Calculating allele-specific expression data using RNA-Seq -------------------------------------------------------------------------------- /2016/0301-kallisto-ASE/0301-kallisto-ASE.pptx: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/umd-byob/presentations/23ad5271b4fcd24c6c3ebc94eb42e1b9eb19bd44/2016/0301-kallisto-ASE/0301-kallisto-ASE.pptx -------------------------------------------------------------------------------- /2016/0301-kallisto-ASE/README.md: -------------------------------------------------------------------------------- 1 | BYOB: March 1, 2016 2 | Kevin Nyberg 3 | 4 | Allele-specific expression of RNA-Seq data using Kallisto. -------------------------------------------------------------------------------- /2016/0315-labnote/README.md: -------------------------------------------------------------------------------- 1 | Labnote - A light-weight HTML lab notebook generator 2 | ==================================================== 3 | 4 | Labnote is a small tool I wrote to generate HTML lab notebooks from a 5 | collection of analysis outputs, scripts, etc. 6 | 7 | ![Labnote notebook screenshot](example_notebook.png) 8 | 9 | It plays particularly well with: 10 | 11 | 1. [knitr](http://yihui.name/knitr/) 12 | 2. [Jupyter Notebook](http://jupyter.org/) 13 | 14 | Both of which are capable of outputting HTML with code and figures embedded. 15 | 16 | Neither of these are required though and Labnote should be compatible with a 17 | wide range of directory structures and file formats. 18 | 19 | For more information, and to see a working example, check out: 20 | 21 | 1. [Labnote Github repo](https://github.com/khughitt/labnote) 22 | 2. [Labnote documentation](labnote.readthedocs.org) 23 | 24 | -------------------------------------------------------------------------------- /2016/0315-labnote/example_notebook.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/umd-byob/presentations/23ad5271b4fcd24c6c3ebc94eb42e1b9eb19bd44/2016/0315-labnote/example_notebook.png -------------------------------------------------------------------------------- /2016/0412-chromonomer/BYOB_Chromonomer_April-2016.pptx: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/umd-byob/presentations/23ad5271b4fcd24c6c3ebc94eb42e1b9eb19bd44/2016/0412-chromonomer/BYOB_Chromonomer_April-2016.pptx -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | Bring Your Own Bioinformatics: Presentations 2 | ============================================= 3 | University of Maryland, College Park bioinformatics and computational biology 4 | interest group. 5 | 6 | For more information about the club and upcoming meetings, check out the [BYOB 7 | website](http://umd-byob.github.io/) and [Google Group](https://groups.google.com/forum/#!forum/umd-byob). 8 | 9 | Instructions for uploading a presentation 10 | ========================================= 11 | If you are planning to give a presentation at BYOB, and have related materials 12 | that you are able to share, the below steps will describe how they can be added 13 | to this repository. 14 | 15 | What to include 16 | --------------- 17 | 18 | ### Overview 19 | 20 | Ideally, everything needed to reproduce the results of the presentation, along 21 | with the presentation itself should be included. "Presentation" is used loosely 22 | here are may refer to any number of different formats such as a powerpoint 23 | presentation, a well-commented script or code example, a Knitr source file, 24 | IPython notebook, Markdown notes, etc. 25 | 26 | In cases where datasets have been used as part of the analysis, special care 27 | must be taken. First, when possible, try and make use of datasets that are 28 | publically available and do not require any special access. In cases where the 29 | datasets are very small (less than ~500Kb) then you can simply include them in 30 | the repository directly. When larger datasets are used in the analysis, 31 | instructions should be included on how to find and download the dataset, 32 | ideally from a permanent location. In both cases be sure to provide information 33 | about the source of the dataset. 34 | 35 | ### List of things to include 36 | 37 | 1. **README.md** - A Brief description of the presentation including the title 38 | and author, along with descriptions of the material included, and how to 39 | use them. 40 | 2. **Presentation** - Powerpoint slides, Markdown, IPython notebooks, etc. 41 | 3. **Code** - Code examples, or shell script of commands used in presentation. 42 | 4. **Data** - Any datasets used in tutorial (see above for notes on how to 43 | handle). 44 | 5. **Output** - If output is generated as part of the tutorial, and the 45 | resulting files are not too large, they may be included directly in the 46 | repo. Otherwise, the output should be described as best as possible so that 47 | users can judge whether they obtained the correct output. 48 | 6. **Other** - As long as you have the rights, and the filesizes are small, any 49 | additional files that are relevant to the presenation, and do not fit well 50 | in any of the above categories may be included. 51 | 52 | When in doubt, look at examples of other presentations that have been uploaded 53 | in the past to get an idea for what to include. 54 | 55 | ### A Note on using Knitr output 56 | 57 | If you used [Knitr/RMarkdown](http://www.rstudio.com/ide/docs/authoring/using_markdown) 58 | to generate all or part of your presntation, it may be more convenient to 59 | rename the generated .md file README.md and use it as your prensentation 60 | overview file described above. If you do this, please be sure to include your 61 | name and presentation title in the output as well. 62 | 63 | Uploading to Github 64 | ------------------- 65 | 66 | If this is your first time using Git or Github, it is advisable to first take 67 | a little bit of time familiarising yourself with the basics of these tools. 68 | Some tutorials have been included at the bottom of the page to help get you 69 | started. Once you feel more comfortable with these tools, continue reading 70 | below. 71 | 72 | ### 1. Create a fork of the BYOB repo 73 | The first step is to [create a fork](https://help.github.com/articles/fork-a-repo) 74 | of the [byob](https://github.com/umd-byob/byob) repo. 75 | 76 | Once you have done this, [clone](http://git-scm.com/book/en/Git-Basics-Getting-a-Git-Repository) 77 | your fork 78 | 79 | ### 2. Create a new directory 80 | 81 | Enter the 'presentations' directory, and then the directory for the current 82 | year. Create a new directory with the format: 83 | 84 | > mmdd-short_presentation_title 85 | 86 | All of your materials will go inside this directory. 87 | 88 | ### 3. Add your materials 89 | 90 | Begin adding your materials here. Feel free to make multiple [commits](http://gitref.org/basic/) 91 | as you are working on your materials. 92 | 93 | ### 4. Add a README.md 94 | 95 | Create a README.md as described above. 96 | 97 | ### 5. Commit changes 98 | 99 | If you have not already commited all of your changes, do so now. E.g.: 100 | 101 | git add . 102 | git commit -am "Adding presentation xxx" 103 | git push 104 | 105 | The last command will push the changes to your fork of the BYOB repository. 106 | 107 | ### 6. Submit a pull request 108 | 109 | Once your materials are finished and ready to be added to the main BYOB repo, 110 | submit a [pull request](https://help.github.com/articles/using-pull-requests) 111 | with your changes: 112 | 113 | From the github page for your fork, click on the "pull requests" button on the 114 | right of the page, and then click "new pull request". 115 | 116 | Follow the instructions to enter a description of the changes and submit 117 | your request. 118 | 119 | All done! Once your changes have been reviewed to make sure they look okay 120 | they will be added to the main repo. 121 | 122 | At this point you are finished. If you wish to make additional changes later 123 | on, however, the same steps above may be followed to submit additional pull 124 | requests. 125 | 126 | More Information 127 | ================ 128 | 1. http://net.tutsplus.com/tutorials/other/easy-version-control-with-git/ 129 | 2. http://git-scm.com/docs/gittutorial 130 | 3. http://daringfireball.net/projects/markdown/ 131 | 132 | --------------------------------------------------------------------------------