├── Gifs ├── PCA.gif ├── DTU_Boxplot.gif ├── Process_time.gif ├── Sample2Sample.gif ├── Selected_Genes.gif ├── Sample_variaiblity.gif ├── Volcano Plots_DGE.gif ├── Volcano Plots_DTE.gif ├── nanoporeata_figure.tif.png ├── nanoporeata_supp2_fig1.png └── nanoporeata_supp2_fig2.png ├── app ├── NanopoReaTA_Rpackages.RDS ├── server │ ├── bash_scripts │ │ ├── run_genebodycoverage.sh │ │ ├── run_nextflow.sh │ │ └── get_number_of_mapped_reads.sh │ ├── scripts_nextflow │ │ ├── convert_gtf_to_df.py │ │ ├── get_read_length_from_fastq.sh │ │ ├── infer_experiment_absolute_gene_amount.py │ │ ├── merge_all_fc.py │ │ ├── merge_all_salmon.py │ │ ├── createFeaturePercentiles.py │ │ └── infer_experiment_inner_variability.py │ ├── python_scripts │ │ └── get_geneBody_coverage.py │ └── R_scripts │ │ ├── read_length_distribution_plots.R │ │ ├── dea_function.R │ │ ├── dte_function.R │ │ ├── gene_wise_analysis_function.R │ │ ├── dtu_function.R │ │ ├── dtu_and_dte_function.R │ │ └── infer_experiment_plots.R ├── NanopoReaTA_Rpackage_versions.txt ├── app.R ├── app_docker.R ├── install.R ├── Dockerfile └── requirements_nanoporeata.yml ├── example_conf_files ├── example_metadata.txt └── example_config.txt ├── README.md └── LICENSE /Gifs/PCA.gif: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/AnWiercze/NanopoReaTA/HEAD/Gifs/PCA.gif -------------------------------------------------------------------------------- /Gifs/DTU_Boxplot.gif: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/AnWiercze/NanopoReaTA/HEAD/Gifs/DTU_Boxplot.gif -------------------------------------------------------------------------------- /Gifs/Process_time.gif: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/AnWiercze/NanopoReaTA/HEAD/Gifs/Process_time.gif -------------------------------------------------------------------------------- /Gifs/Sample2Sample.gif: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/AnWiercze/NanopoReaTA/HEAD/Gifs/Sample2Sample.gif -------------------------------------------------------------------------------- /Gifs/Selected_Genes.gif: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/AnWiercze/NanopoReaTA/HEAD/Gifs/Selected_Genes.gif -------------------------------------------------------------------------------- /Gifs/Sample_variaiblity.gif: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/AnWiercze/NanopoReaTA/HEAD/Gifs/Sample_variaiblity.gif -------------------------------------------------------------------------------- /Gifs/Volcano Plots_DGE.gif: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/AnWiercze/NanopoReaTA/HEAD/Gifs/Volcano Plots_DGE.gif -------------------------------------------------------------------------------- /Gifs/Volcano Plots_DTE.gif: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/AnWiercze/NanopoReaTA/HEAD/Gifs/Volcano Plots_DTE.gif -------------------------------------------------------------------------------- /app/NanopoReaTA_Rpackages.RDS: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/AnWiercze/NanopoReaTA/HEAD/app/NanopoReaTA_Rpackages.RDS -------------------------------------------------------------------------------- /Gifs/nanoporeata_figure.tif.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/AnWiercze/NanopoReaTA/HEAD/Gifs/nanoporeata_figure.tif.png -------------------------------------------------------------------------------- /Gifs/nanoporeata_supp2_fig1.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/AnWiercze/NanopoReaTA/HEAD/Gifs/nanoporeata_supp2_fig1.png -------------------------------------------------------------------------------- /Gifs/nanoporeata_supp2_fig2.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/AnWiercze/NanopoReaTA/HEAD/Gifs/nanoporeata_supp2_fig2.png -------------------------------------------------------------------------------- /app/server/bash_scripts/run_genebodycoverage.sh: -------------------------------------------------------------------------------- 1 | #!/bin/bash 2 | echo "STARTED" 3 | python $1 --bamList $2 --gene $3 --converted_gtf $4 --output_dir $5 -------------------------------------------------------------------------------- /app/server/bash_scripts/run_nextflow.sh: -------------------------------------------------------------------------------- 1 | #!/bin/bash 2 | echo "Nextflow started!" 3 | echo $1 4 | echo $2 5 | echo $3 6 | nextflow run $1 -params-file $2 -w $3 7 | -------------------------------------------------------------------------------- /example_conf_files/example_metadata.txt: -------------------------------------------------------------------------------- 1 | Samples Condition rep run 2 | barcode01 HEK293 rep1 run1 3 | barcode02 HEK293 rep2 run2 4 | barcode03 HeLa rep1 run1 5 | barcode04 HeLa rep2 run2 6 | -------------------------------------------------------------------------------- /app/server/bash_scripts/get_number_of_mapped_reads.sh: -------------------------------------------------------------------------------- 1 | function getMappedReads { 2 | bam="$1/*bam" 3 | outFile=$2 4 | echo "Sample\tnum_reads\tnum_mapped_reads" > $outFile 5 | for i in $bam; do 6 | numReads=$( samtools view -c $bam ) 7 | numMappedReads=$( samtools view -c -F 260 $bam ) 8 | sample=$( basename $bam ) 9 | sample=${sample/.bam/} 10 | echo "$sample\t$numReads\t$numMappedReads" >> $outFile 11 | done 12 | } 13 | echo $1 14 | echo $2 15 | getMappedReads $1 $2/mapping_stats.txt 16 | -------------------------------------------------------------------------------- /example_conf_files/example_config.txt: -------------------------------------------------------------------------------- 1 | threads: 16 2 | barcoded: 1 3 | DRS: 0 4 | metadata: /home/stefan/metadata_Hela_Hek293T.txt 5 | general_folder: /home/stefan/GiantDisk/23_03_23_HEK_HeLa_Test/ 6 | genome_fasta: /home/stefan/BigDisk/Work/Gerber_AG/Homo_sapiens.GRCh38.dna.primary_assembly.fa 7 | transcriptome_fasta: /home/stefan/BigDisk/Work/Gerber_AG/gencode.v40.transcripts.fa 8 | genome_gtf: /home/stefan/BigDisk/Work/Gerber_AG/gencode.v40.primary_assembly.annotation.gtf 9 | bed_file: /home/stefan/BigDisk/Work/Gerber_AG/hg38_GENCODE.v38.bed 10 | run_dir: /home/stefan/GiantDisk/run_dir_HEK_Hela2/ 11 | -------------------------------------------------------------------------------- /app/server/scripts_nextflow/convert_gtf_to_df.py: -------------------------------------------------------------------------------- 1 | import argparse 2 | import pandas as pd 3 | import numpy as np 4 | import gtfparse as gtf_parse 5 | 6 | opt_parser = argparse.ArgumentParser() 7 | opt_parser.add_argument("-i", "--input", dest="input_file", help="Insert a gtf file to parse", metavar="FILE") 8 | opt_parser.add_argument("-o", "--output",dest="output_file", help="Insert a gpath for the output file", metavar="FILE") 9 | options = opt_parser.parse_args() 10 | 11 | 12 | 13 | input_file = options.input_file 14 | #print(input_file) 15 | df = gtf_parse.read_gtf(input_file) 16 | 17 | #print(df.head(20)) 18 | #print(type(df)) 19 | #print(len(df)) 20 | 21 | output_file = options.output_file 22 | #print(output_file) 23 | pd.DataFrame.to_csv(df,output_file) 24 | -------------------------------------------------------------------------------- /app/NanopoReaTA_Rpackage_versions.txt: -------------------------------------------------------------------------------- 1 | pkg Version 2 | shinyFiles 0.9.3 3 | gridtext 0.1.5 4 | ComplexHeatmap 2.10.0 5 | tximport 1.22.0 6 | rnaseqDTU 1.14.0 7 | devtools 2.4.5 8 | usethis 2.1.6 9 | rafalib 1.0.0 10 | edgeR 3.36.0 11 | limma 3.50.3 12 | stageR 1.16.0 13 | DEXSeq 1.40.0 14 | DESeq2 1.34.0 15 | GenomicFeatures 1.46.5 16 | AnnotationDbi 1.56.2 17 | DRIMSeq 1.22.0 18 | rstatix 0.7.1 19 | shiny 1.7.2 20 | markdown 1.4 21 | shinydashboard 0.7.2 22 | DT 0.26 23 | shinyWidgets 0.7.5 24 | shinyjs 2.1.0 25 | shinyBS 0.61.1 26 | dashboardthemes 1.1.6 27 | rjson 0.2.21 28 | shinycssloaders 1.0.0 29 | yaml 2.3.5 30 | rstudioapi 0.14 31 | reticulate 1.26 32 | processx 3.7.0 33 | memuse 4.2-2 34 | BiocParallel 1.28.3 35 | foreach 1.5.2 36 | BSgenome 1.62.0 37 | rtracklayer 1.54.0 38 | Biostrings 2.62.0 39 | XVector 0.34.0 40 | SummarizedExperiment 1.24.0 41 | Biobase 2.54.0 42 | GenomicRanges 1.46.1 43 | GenomeInfoDb 1.30.1 44 | IRanges 2.28.0 45 | S4Vectors 0.32.4 46 | BiocGenerics 0.40.0 47 | MatrixGenerics 1.6.0 48 | matrixStats 0.63.0 49 | waiter 0.2.5 50 | futile.logger 1.4.3 51 | optparse 1.7.3 52 | stringr 1.4.1 53 | dplyr 1.0.9 54 | purrr 0.3.4 55 | tidyverse 1.3.2 56 | tibble 3.1.8 57 | tidyr 1.2.0 58 | readr 2.1.2 59 | forcats 0.5.2 60 | ggrepel 0.9.2 61 | ggpubr 0.5.0 62 | ggplot2 3.4.0 63 | pheatmap 1.0.12 64 | RColorBrewer 1.1-3 65 | reshape2 1.4.4 66 | scales 1.2.1 67 | data.table 1.14.2 68 | openxlsx 4.2.5.1 69 | gridExtra 2.3 70 | -------------------------------------------------------------------------------- /app/server/scripts_nextflow/get_read_length_from_fastq.sh: -------------------------------------------------------------------------------- 1 | #!/usr/bin/env bash 2 | 3 | 4 | ###### Function to extract read length from fastq files 5 | echo "--------------------------------------------------" 6 | echo "#################### READ LENGTHS ################" 7 | echo "--------------------------------------------------" 8 | 9 | fastq_file="$1" # $input_dir/*/*/*/*.fastq.gz 10 | sample="$2" # ERRxyz 11 | output_dir="$3" 12 | 13 | mkdir -p $output_dir 14 | if [ ! -f $output_dir/"$sample"_read_lengths_pass.txt ] 15 | then 16 | echo "Length" > $output_dir/"$sample"_read_lengths_pass.txt 17 | fi 18 | #if [ ! -f $output_dir/"$sample"_read_lengths_fail.txt ]; then 19 | # echo "Length" > $output_dir/"$sample"_read_lengths_fail.txt 20 | #fi 21 | 22 | ### Only passed reads will be processed 23 | #if [[ "$fastq_file" == *"pass"* ]]; then 24 | if [[ "$fastq_file" == *".gz" ]] 25 | then 26 | echo "HERE" 27 | zcat $fastq_file | awk 'NR%4==2' | awk '{ print length }' >> $output_dir/"$sample"_read_lengths_pass.txt 28 | else 29 | echo "HERE2" 30 | cat $fastq_file | awk 'NR%4==2' | awk '{ print length }' >> $output_dir/"$sample"_read_lengths_pass.txt 31 | fi 32 | #fi 33 | #if [[ "$fastq_file" == *"fail"* ]]; then 34 | # if [[ "$fastq_file" == *".gz" ]]; then 35 | # zcat $j | awk 'NR%4==2' | awk '{ print length }' >> $output_dir/"$sample"_read_lengths_fail.txt 36 | # else 37 | # cat $j | awk 'NR%4==2' | awk '{ print length }' >> $output_dir/"$sample"_read_lengths_fail.txt 38 | # fi 39 | #fi 40 | 41 | -------------------------------------------------------------------------------- /app/server/scripts_nextflow/infer_experiment_absolute_gene_amount.py: -------------------------------------------------------------------------------- 1 | import argparse 2 | import pandas as pd 3 | import numpy as np 4 | import os 5 | 6 | 7 | 8 | opt_parser = argparse.ArgumentParser() 9 | 10 | opt_parser.add_argument("-s", "--sample_file", dest="sample", help="Insert a sample file to add names to", metavar="FILE") 11 | opt_parser.add_argument("-m", "--metadata_file",dest="metadata", help="Insert a metadata file to extract metdata from", metavar="FILE") 12 | opt_parser.add_argument("-o", "--output_path",dest="output", help="Insert a template file to extract names from", metavar="FILE") 13 | 14 | options = opt_parser.parse_args() 15 | 16 | sample = options.sample 17 | metadata = options.metadata 18 | output_path = options.output 19 | 20 | sample_df = pd.read_csv(sample,header = 0, index_col = 0, sep = "\t") 21 | #print(sample_df) 22 | if not os.path.exists(output_path): 23 | output_df = pd.DataFrame() 24 | else: 25 | output_df = pd.read_csv(output_path,header=0, sep = "\t") 26 | 27 | samplenames = [] 28 | genes_counted = [] 29 | for i in range(len(sample_df.iloc[0,:])): 30 | column = sample_df.iloc[:,i] 31 | #print(column[0]) 32 | if not type(column[0]) == type(""): 33 | name = column.name 34 | samplenames.append(column.name) 35 | column = pd.DataFrame(column) 36 | #print(column[name]) 37 | length = len(column.loc[column[name] > 0,:]) 38 | genes_counted.append(length) 39 | 40 | #print(samplenames) 41 | #print(genes_counted) 42 | 43 | input_list = [genes_counted] 44 | new_row_df = pd.DataFrame(np.array(input_list), columns = samplenames) 45 | 46 | #print(new_row_df) 47 | 48 | output_df = pd.concat([output_df,new_row_df]) 49 | output_df = output_df.reset_index(drop = True) 50 | #print(output_df) 51 | output_df.to_csv(output_path, sep="\t", index = 0) 52 | #print(output_df) 53 | 54 | 55 | 56 | 57 | 58 | 59 | 60 | 61 | 62 | 63 | 64 | -------------------------------------------------------------------------------- /app/server/scripts_nextflow/merge_all_fc.py: -------------------------------------------------------------------------------- 1 | import os 2 | import pandas as pd 3 | import sys 4 | 5 | 6 | array = [sys.argv[i] for i in range(1,len(sys.argv)-1)] 7 | new_data_path = sys.argv[-1] 8 | print(array) 9 | 10 | path = "" 11 | bool_existence = False 12 | for i,val in enumerate(array): 13 | path = val + "merged_fc/merged_fc.csv" 14 | if os.path.exists(path): 15 | template_df = pd.read_csv(path, sep = "\t",header = 1) 16 | template_df = template_df.iloc[:,0] 17 | bool_existence = True 18 | break 19 | else: 20 | execute = False 21 | print("None of the .csv in this path exist") 22 | 23 | if bool_existence: 24 | for i,val in enumerate(array): 25 | path = val + "merged_fc/merged_fc.csv" 26 | if os.path.exists(path): 27 | folder_name = val.split("/") 28 | folder_name = folder_name[-2] 29 | print("Folder name 2: ",folder_name) 30 | df_i = pd.read_csv(path, sep="\t", header = 1) 31 | colnames = list(df_i.columns) 32 | colnames[-1] = folder_name 33 | print("Columns 2: ",colnames) 34 | df_i.columns = colnames 35 | template_df = pd.concat([template_df,df_i.iloc[:,-1]], axis = 1) 36 | else: 37 | folder_name = val.split("/") 38 | folder_name = folder_name[-2] 39 | print("Folder name not existent: ", folder_name) 40 | zero_list = [0 for i in range(len(template_df.iloc[:]))] 41 | template_df = pd.concat([template_df,pd.DataFrame(zero_list)], axis = 1) 42 | colnames = list(template_df.columns) 43 | colnames[-1] = folder_name 44 | template_df.columns = colnames 45 | #print(template_df.head()) 46 | 47 | template_df.to_csv(new_data_path, sep="\t", index = False) 48 | 49 | 50 | 51 | 52 | 53 | 54 | 55 | 56 | 57 | -------------------------------------------------------------------------------- /app/app.R: -------------------------------------------------------------------------------- 1 | #!/usr/bin/env Rscript 2 | 3 | ################################################################################ 4 | ## ## 5 | ## NANOPOREATA ## 6 | ## ## 7 | ################################################################################ 8 | 9 | # ______________________________________________________________________________ 10 | # LIBRARIES #### 11 | options(repos = list(CRAN="http://cran.rstudio.com/")) 12 | source("install.R", local = T) 13 | 14 | # Save list of required R packages + version 15 | package_versions = data.table::rbindlist(lapply(sessionInfo()$otherPkgs, function(i) data.frame("Version" = i$Version)), idcol = "pkg") 16 | write.table(package_versions, "NanopoReaTA_Rpackage_versions.txt", sep = "\t", col.names = T, row.names = F, quote = F) 17 | 18 | # ______________________________________________________________________________ 19 | # SETTINGS #### 20 | options(shiny.maxRequestSize = 30*1024^2) 21 | 22 | # ______________________________________________________________________________ 23 | # FUNCTIONS #### 24 | ## DEA #### 25 | source("server/R_scripts/dea_function.R", local = TRUE) 26 | 27 | ## DTE #### 28 | source("server/R_scripts/dte_function.R", local = TRUE) 29 | 30 | ## DTU #### 31 | source("server/R_scripts/dtu_function.R", local = TRUE) 32 | 33 | ## READ LENGTH DISTRIBUTION #### 34 | source("server/R_scripts/read_length_distribution_plots.R", local = TRUE) 35 | 36 | ## GENE WISE ANALYSIS #### 37 | source("server/R_scripts/gene_wise_analysis_function.R", local = TRUE) 38 | 39 | ## INFER EXPERIMENT #### 40 | source("server/R_scripts/infer_experiment_plots.R", local = TRUE) 41 | #____________________________________________________________________________ 42 | # FRONTEND 43 | source("ui/ui.R", local = TRUE) 44 | # ______________________________________________________________________________ 45 | # BACKEND 46 | source("server/server.R", local = TRUE) 47 | 48 | # ______________________________________________________________________________ 49 | # LAUNCH APP 50 | shinyApp(ui, server) 51 | 52 | 53 | 54 | -------------------------------------------------------------------------------- /app/app_docker.R: -------------------------------------------------------------------------------- 1 | #!/usr/bin/env Rscript 2 | 3 | ################################################################################ 4 | ## ## 5 | ## NANOPOREATA ## 6 | ## ## 7 | ################################################################################ 8 | 9 | # ______________________________________________________________________________ 10 | # LIBRARIES #### 11 | options(repos = list(CRAN="http://cran.rstudio.com/")) 12 | source("install.R", local = T) 13 | 14 | # Save list of required R packages + version 15 | package_versions = data.table::rbindlist(lapply(sessionInfo()$otherPkgs, function(i) data.frame("Version" = i$Version)), idcol = "pkg") 16 | write.table(package_versions, "NanopoReaTA_Rpackage_versions.txt", sep = "\t", col.names = T, row.names = F, quote = F) 17 | 18 | # ______________________________________________________________________________ 19 | # SETTINGS #### 20 | options(shiny.maxRequestSize = 30*1024^2) 21 | 22 | # ______________________________________________________________________________ 23 | # FUNCTIONS #### 24 | ## DEA #### 25 | source("server/R_scripts/dea_function.R", local = TRUE) 26 | 27 | ## DTE #### 28 | source("server/R_scripts/dte_function.R", local = TRUE) 29 | 30 | ## DTU #### 31 | source("server/R_scripts/dtu_function.R", local = TRUE) 32 | 33 | ## READ LENGTH DISTRIBUTION #### 34 | source("server/R_scripts/read_length_distribution_plots.R", local = TRUE) 35 | 36 | ## GENE WISE ANALYSIS #### 37 | source("server/R_scripts/gene_wise_analysis_function.R", local = TRUE) 38 | 39 | ## INFER EXPERIMENT #### 40 | source("server/R_scripts/infer_experiment_plots.R", local = TRUE) 41 | #____________________________________________________________________________ 42 | # FRONTEND 43 | source("ui/ui.R", local = TRUE) 44 | # ______________________________________________________________________________ 45 | # BACKEND 46 | source("server/server.R", local = TRUE) 47 | 48 | # ______________________________________________________________________________ 49 | # LAUNCH APP 50 | APP <- shinyApp(ui, server) 51 | runApp(APP,host = '0.0.0.0',port=8080) 52 | 53 | -------------------------------------------------------------------------------- /app/server/scripts_nextflow/merge_all_salmon.py: -------------------------------------------------------------------------------- 1 | import os 2 | import pandas as pd 3 | import sys 4 | 5 | 6 | array = [sys.argv[i] for i in range(1,len(sys.argv)-2)] 7 | new_data_path1 = sys.argv[-2] 8 | new_data_path2 = sys.argv[-1] 9 | #print(array) 10 | 11 | path = "" 12 | bool_existence = False 13 | for i,val in enumerate(array): 14 | path = val + "salmon/quant.sf" 15 | if os.path.exists(path): 16 | template_df1 = pd.read_csv(path, sep = "\t",header = 0) 17 | template_df1 = template_df1.iloc[:,0:3] 18 | template_df2 = pd.read_csv(path, sep = "\t",header = 0) 19 | template_df2 = template_df2.iloc[:,0:3] 20 | bool_existence = True 21 | break 22 | else: 23 | execute = False 24 | #print("None of the quant.sf in this path exist") 25 | 26 | if bool_existence: 27 | for i,val in enumerate(array): 28 | path = val + "salmon/quant.sf" 29 | if os.path.exists(path): 30 | folder_name = val.split("/") 31 | folder_name = folder_name[-2] 32 | print("Folder name 2: ",folder_name) 33 | df_i = pd.read_csv(path, sep="\t", header = 0) 34 | colnames = list(df_i.columns) 35 | colnames[-1] = folder_name 36 | colnames[-2] = folder_name 37 | print("Columns 2: ",colnames) 38 | df_i.columns = colnames 39 | template_df1 = pd.concat([template_df1,df_i.iloc[:,-2]], axis = 1) 40 | template_df2 = pd.concat([template_df2,df_i.iloc[:,-1]], axis = 1) 41 | else: 42 | folder_name = val.split("/") 43 | folder_name = folder_name[-2] 44 | zero_list1 = [0 for i in range(len(template_df1.iloc[:,1]))] 45 | zero_list2 = [0 for i in range(len(template_df2.iloc[:,1]))] 46 | template_df1 = pd.concat([template_df1,pd.DataFrame(zero_list1)], axis = 1) 47 | template_df2 = pd.concat([template_df2,pd.DataFrame(zero_list2)], axis = 1) 48 | colnames1 = list(template_df1.columns) 49 | colnames1[-1] = folder_name 50 | template_df1.columns = colnames1 51 | colnames2 = list(template_df2.columns) 52 | colnames2[-1] = folder_name 53 | template_df2.columns = colnames2 54 | template_df1["Name"] = [i.split("|")[0] for i in list(template_df1["Name"])] 55 | template_df2["Name"] = [i.split("|")[0] for i in list(template_df2["Name"])] 56 | template_df1.to_csv(new_data_path1, sep="\t", index = True) 57 | template_df2.to_csv(new_data_path2, sep="\t", index = True) 58 | 59 | 60 | -------------------------------------------------------------------------------- /app/server/scripts_nextflow/createFeaturePercentiles.py: -------------------------------------------------------------------------------- 1 | #### All Functions from RSeQC package version 4.0.0 2 | # Some functions got minimal changes 3 | import sys 4 | import json 5 | import argparse 6 | import math 7 | 8 | # Argument 9 | parser = argparse.ArgumentParser() 10 | parser.add_argument('--bed') 11 | parser.add_argument('--output_dir') 12 | args = parser.parse_args() 13 | 14 | def percentile_list(N): 15 | """ 16 | Find the percentile of a list of values. 17 | @parameter N - is a list of values. Note N MUST BE already sorted. 18 | @return - the list of percentile of the values 19 | """ 20 | if not N:return None 21 | if len(N) <100: return N 22 | per_list=[] 23 | for i in range(1,101): 24 | k = (len(N)-1) * i/100.0 25 | f = math.floor(k) 26 | c = math.ceil(k) 27 | if f == c: 28 | per_list.append( int(N[int(k)]) ) 29 | else: 30 | d0 = N[int(f)] * (c-k) 31 | d1 = N[int(c)] * (k-f) 32 | per_list.append(int(round(d0+d1))) 33 | return per_list 34 | 35 | 36 | def genebody_percentile(refbed, outDir, mRNA_len_cut = 100): 37 | 38 | g_percentiles = {} 39 | transcript_count = 0 40 | for line in open(refbed,'r'): 41 | try: 42 | if line.startswith(('#','track','browser')):continue 43 | # Parse fields from gene tabls 44 | fields = line.split() 45 | chrom = fields[0] 46 | tx_start = int( fields[1] ) 47 | tx_end = int( fields[2] ) 48 | geneName = fields[3] 49 | strand = fields[5] 50 | geneID = geneName 51 | 52 | exon_starts = list(map( int, fields[11].rstrip( ',\n' ).split( ',' ) )) 53 | exon_starts = list(map((lambda x: x + tx_start ), exon_starts)) 54 | exon_ends = list(map( int, fields[10].rstrip( ',\n' ).split( ',' ) )) 55 | exon_ends = list(map((lambda x, y: x + y ), exon_starts, exon_ends)) 56 | transcript_count += 1 57 | except: 58 | print("[NOTE:input bed must be 12-column] skipped this line: " + line, end=' ', file=sys.stderr) 59 | continue 60 | gene_all_base=[] 61 | mRNA_len =0 62 | flag=0 63 | for st,end in zip(exon_starts,exon_ends): 64 | gene_all_base.extend(list(range(st+1,end+1))) #1-based coordinates on genome 65 | if len(gene_all_base) < mRNA_len_cut: 66 | continue 67 | g_percentiles[geneID] = (chrom, strand, percentile_list(gene_all_base)) #get 100 points from each gene's coordinates 68 | 69 | with open(outDir + '/g_percentiles.json', 'w') as fp: 70 | json.dump(g_percentiles, fp) 71 | 72 | ## Bed file from https://sourceforge.net/projects/rseqc/files/BED/Human_Homo_sapiens/hg38_GENCODE.v38.bed.gz + gunzip 73 | #refbed = "/path/to/bed/file/hg38_GENCODE.v38.bed" 74 | genebody_percentile(args.bed, args.output_dir) 75 | -------------------------------------------------------------------------------- /app/install.R: -------------------------------------------------------------------------------- 1 | # source of this script: 2 | # Title: NanopoReaTA 3 | # Author: all 4 | # Date: 16.08.2022 5 | # Availability: https://github.com/AnWiercze/testApp 6 | 7 | 8 | ################################################################################ 9 | # Check that the currently-installed version of R 10 | # is the correct version 11 | ################################################################################ 12 | 13 | options(repos = list(CRAN="http://cran.rstudio.com/")) 14 | 15 | 16 | R_min_version = "4.1.2" 17 | R_version = paste0(R.Version()$major, ".", R.Version()$minor) 18 | if(compareVersion(R_version, R_min_version) == -1){ 19 | stop("You need to have at least version 4.1.3 of R to run the app.\n", 20 | "Launch should fail.\n", 21 | "Go to http://cran.r-project.org/ and install version 4.1.3 of R or higher.") 22 | } 23 | 24 | 25 | #Check if BiocManager is installed and install otherwise 26 | availpacks = .packages(all.available = TRUE) 27 | if (!("BiocManager" %in% availpacks)){ 28 | install.packages("BiocManager") 29 | 30 | } 31 | 32 | ################################################################################ 33 | # Install basic required packages if not available/installed. 34 | ################################################################################ 35 | install_missing_packages = function(pkg, version = NULL, verbose = TRUE){ 36 | availpacks = .packages(all.available = TRUE) 37 | require("BiocManager") 38 | missingPackage = FALSE 39 | if(!any(pkg %in% availpacks)){ 40 | if(verbose){ 41 | message("The following package is missing.\n", 42 | pkg, "\n", 43 | "Installation will be attempted...") 44 | } 45 | missingPackage <- TRUE 46 | } 47 | if(!is.null(version) & !missingPackage){ 48 | # version provided and package not missing, so compare. 49 | if( compareVersion(a = as.character(packageVersion(pkg)), 50 | b = version) < 0 ){ 51 | if(verbose){ 52 | message("Current version of package\n", 53 | pkg, "\t", 54 | packageVersion(pkg), "\n", 55 | "is less than required. 56 | Update will be attempted.") 57 | } 58 | missingPackage <- TRUE 59 | } 60 | } 61 | if(missingPackage){ 62 | BiocManager::install(pkg, update=FALSE) 63 | } 64 | } 65 | ################################################################################ 66 | # Define list of package names and required versions. 67 | ################################################################################ 68 | deppkgs = readRDS("NanopoReaTA_Rpackages.RDS") 69 | #deppkgs = readRDS("tryout_versions.RDS") 70 | 71 | # Loop on package check, install, update 72 | pkg1 = mapply(install_missing_packages, 73 | pkg = names(deppkgs), 74 | version = deppkgs, 75 | MoreArgs = list(verbose = TRUE), 76 | SIMPLIFY = FALSE, 77 | USE.NAMES = TRUE) 78 | ################################################################################ 79 | # Load packages 80 | ################################################################################ 81 | for(i in names(deppkgs)){ 82 | suppressPackageStartupMessages({library(i, character.only = TRUE)}) 83 | message(i, " package version:\n", packageVersion(i)) 84 | } 85 | ################################################################################ 86 | -------------------------------------------------------------------------------- /app/server/scripts_nextflow/infer_experiment_inner_variability.py: -------------------------------------------------------------------------------- 1 | import argparse 2 | import pandas as pd 3 | import numpy as np 4 | import os 5 | 6 | 7 | 8 | opt_parser = argparse.ArgumentParser() 9 | 10 | opt_parser.add_argument("-s", "--sample_file", dest="sample", help="Insert a sample file to add names to", metavar="FILE") 11 | opt_parser.add_argument("-m", "--metadata_file",dest="metadata", help="Insert a metadata file to extract metadata from", metavar="FILE") 12 | opt_parser.add_argument("-d", "--percentages_output_path",dest="percentages", help="Insert an output path for percenatges", metavar="FILE") 13 | opt_parser.add_argument("-o", "--output_path",dest="output", help="Insert a template file to extract names from", metavar="FILE") 14 | 15 | 16 | options = opt_parser.parse_args() 17 | 18 | sample = options.sample 19 | metadata = options.metadata 20 | output_path = options.output 21 | p_output_path = options.percentages 22 | 23 | 24 | 25 | sample_df = pd.read_csv(sample,header = 0, index_col = 0, sep = "\t") 26 | #print(sample_df) 27 | if not os.path.exists(output_path): 28 | output_df = pd.DataFrame() 29 | else: 30 | output_df = pd.read_csv(output_path,header=0, sep = "\t") 31 | 32 | if not os.path.exists(p_output_path): 33 | p_output_df = pd.DataFrame() 34 | else: 35 | p_output_df = pd.read_csv(p_output_path, header=0, sep = "\t") 36 | 37 | 38 | samplenames = [] 39 | columns_of_percentages = [] 40 | for i in range(len(sample_df.iloc[0,:])): 41 | column = sample_df.iloc[:,i] 42 | if not type(column[0]) == type(""): 43 | name = column.name 44 | samplenames.append(name) 45 | sum = column.sum() 46 | percentages = [] 47 | for i in column: 48 | #print(i) 49 | #print(sum) 50 | percentages.append(float(i/sum)) 51 | #print(percentages[0]) 52 | columns_of_percentages.append(percentages) 53 | #print(samplenames) 54 | #print(columns_of_percentages) 55 | 56 | #print(len(output_df)) 57 | new_percentages_row = pd.DataFrame(columns = samplenames, index = [0]) 58 | 59 | for i,val in enumerate(samplenames): 60 | #print(columns_of_percentages[i]) 61 | #print([k for k in range(len(columns_of_percentages[val]))]) 62 | new_percentages_row.loc[0,val] = columns_of_percentages[i] 63 | 64 | p_output_df = pd.concat([p_output_df,new_percentages_row]) 65 | 66 | p_output_df = p_output_df.reset_index(drop = True) 67 | p_output_df.to_csv(p_output_path, sep="\t", index = 0) 68 | #print(p_output_df) 69 | if len(p_output_df) > 1: 70 | #print(p_output_df.dtypes) 71 | mean_list = [] 72 | for i,val in enumerate(samplenames): 73 | last = list(p_output_df[val])[-1] 74 | before_last = list(p_output_df[val])[-2].replace("[","") 75 | before_last = before_last.replace("]","") 76 | before_last = before_last.replace(",","") 77 | before_last = before_last.split() 78 | b_last = [] 79 | for i in before_last: 80 | b_last.append(float(i)) 81 | 82 | 83 | 84 | 85 | #print(last[0:5]) 86 | #print(b_last[0:5]) 87 | tmp_list = [] 88 | for k in range(len(last)): 89 | difference = abs(abs(last[k]) - abs(b_last[k])) 90 | #print("Hello world") 91 | #print(difference) 92 | #print(before_last[i]) 93 | tmp_list.append(difference) 94 | tmp_mean = np.mean(tmp_list) 95 | mean_list.append(tmp_mean) 96 | print(mean_list) 97 | 98 | new_mean_df = pd.DataFrame([mean_list], columns = samplenames) 99 | 100 | output_df = pd.concat([output_df,new_mean_df]) 101 | print(output_df) 102 | output_df.to_csv(output_path, sep="\t", index = 0) 103 | 104 | 105 | 106 | -------------------------------------------------------------------------------- /app/server/python_scripts/get_geneBody_coverage.py: -------------------------------------------------------------------------------- 1 | import sys 2 | from numpy import std,mean 3 | import operator 4 | import pysam 5 | import collections 6 | import pandas as pd 7 | import json 8 | from os.path import basename 9 | import glob 10 | 11 | import argparse 12 | 13 | parser = argparse.ArgumentParser(description='Run Gene Body Coverage.') 14 | parser.add_argument('--bamList', nargs='+') 15 | parser.add_argument('--gene') 16 | parser.add_argument('--converted_gtf') 17 | parser.add_argument('--output_dir') 18 | 19 | args = parser.parse_args() 20 | 21 | print(args) 22 | 23 | def genebody_coverage(bam, position_list): 24 | ''' 25 | position_list is dict returned from genebody_percentile 26 | position is 1-based genome coordinate 27 | ''' 28 | samfile = pysam.Samfile(bam, "rb") 29 | chr_bool = False 30 | for read in samfile.fetch(): 31 | if "chr" in str(read): 32 | chr_bool = True 33 | break 34 | aggreagated_cvg = collections.defaultdict(int) 35 | 36 | gene_finished = 0 37 | for chrom, strand, positions in list(position_list.values()): 38 | chrom = chrom.replace("chr","") 39 | if chr_bool == True: 40 | chrom = f"chr{chrom}" 41 | coverage = {} 42 | for i in positions: 43 | coverage[i] = 0.0 44 | chrom_start = positions[0]-1 45 | if chrom_start <0: chrom_start=0 46 | chrom_end = positions[-1] 47 | print(chrom_start,chrom_end) 48 | 49 | for pileupcolumn in samfile.pileup(chrom, chrom_start, chrom_end, truncate=True): 50 | ref_pos = pileupcolumn.pos+1 51 | if ref_pos not in positions: 52 | continue 53 | if pileupcolumn.n == 0: 54 | coverage[ref_pos] = 0 55 | continue 56 | cover_read = 0 57 | for pileupread in pileupcolumn.pileups: 58 | if pileupread.is_del: continue 59 | if pileupread.alignment.is_qcfail:continue 60 | if pileupread.alignment.is_secondary:continue 61 | if pileupread.alignment.is_unmapped:continue 62 | if pileupread.alignment.is_duplicate:continue 63 | cover_read +=1 64 | coverage[ref_pos] = cover_read 65 | tmp = [coverage[k] for k in sorted(coverage)] 66 | if strand == '-': 67 | tmp = tmp[::-1] 68 | for i in range(0,len(tmp)): 69 | aggreagated_cvg[i] += tmp[i] 70 | gene_finished += 1 71 | 72 | if gene_finished % 100 == 0: 73 | print("\t%d transcripts finished\r" % (gene_finished), end=' ', file=sys.stderr) 74 | return aggreagated_cvg 75 | 76 | def reduceDict(gp, transOfInterest): 77 | gp_sub = dict((k, gp[k]) for k in transOfInterest['transcript_id'].values 78 | if k in gp) 79 | return gp_sub 80 | 81 | def getGeneCoverage(bamFilesList, geneOfInterest, gtfFile, outDir): 82 | # Output file name 83 | outFile = outDir + "/" + "samples.geneBodyCoverage.txt" 84 | OUT1 = open(outFile ,'w') 85 | print("Percentile\t" + '\t'.join([str(i) for i in range(1,101)]), file=OUT1) 86 | # Load gtf to map transcripts to genes 87 | gtf = pd.read_csv(gtfFile) 88 | # Load dict containing transcript percentiles 89 | with open(outDir + "/g_percentiles.json") as json_file: 90 | gp = json.load(json_file) 91 | #for i in gp.keys(): 92 | # gp[i][0] = gp[i][0].replace("chr", "") 93 | #print(gtf.head()) 94 | gtf_sub = gtf[['gene_id', 'transcript_id', 'transcript_name', 'gene_name']].drop_duplicates() 95 | transOfInterest = gtf_sub[gtf_sub['gene_id']== geneOfInterest].dropna() 96 | gp_sub = reduceDict(gp, transOfInterest) 97 | file_container = [] 98 | for bamfile in bamFilesList: 99 | print(bamfile) 100 | cvg = genebody_coverage(bamfile, gp_sub) 101 | print(cvg) 102 | if len(cvg) == 0: 103 | print("\nCannot get coverage signal from " + basename(bamfile) + ' ! Skip', file=sys.stderr) 104 | continue 105 | tmp = valid_name(basename(bamfile).replace('.bam','')) # scrutinize R identifer 106 | if file_container.count(tmp) == 0: 107 | print(tmp + '\t' + '\t'.join([str(cvg[k]) for k in sorted(cvg)]), file=OUT1) 108 | else: 109 | print(tmp + '.' + str(file_container.count(tmp)) + '\t' + '\t'.join([str(cvg[k]) for k in sorted(cvg)]), file=OUT1) 110 | file_container.append(tmp) 111 | OUT1.close() 112 | 113 | def valid_name(s): 114 | '''make sure the string 's' is valid name for R variable''' 115 | symbols = 'abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789_.' 116 | digit = '0123456789' 117 | rid = '_'.join(i for i in s.split()) #replace space(s) with '_' 118 | if rid[0] in digit:rid = 'V' + rid 119 | tmp = '' 120 | for i in rid: 121 | if i in symbols: 122 | tmp = tmp + i 123 | else: 124 | tmp = tmp + '_' 125 | return tmp 126 | 127 | bamFilesList = glob.glob(str(args.bamList[0]) + "*.bam") 128 | print(bamFilesList) 129 | geneOfInterest = args.gene 130 | gtf = args.converted_gtf 131 | outDir = args.output_dir 132 | #bamFilesList = ['/media/anna/MinION_Drive/run_dir_22_09/bam_genome_merged/ERR6053055.bam', '/media/anna/MinION_Drive/run_dir_22_09/bam_genome_merged/ERR6053056.bam', '/media/anna/MinION_Drive/run_dir_22_09/bam_genome_merged/ERR6053097.bam', '/media/anna/MinION_Drive/run_dir_22_09/bam_genome_merged/ERR6053098.bam'] 133 | #geneOfInterest = 'ENSG00000160182.3' 134 | #gtf='/media/anna/MinION_Drive/run_dir_22_09/converted_gtf.csv' 135 | #outDir='/media/anna/MinION_Drive/run_dir_22_09/' 136 | 137 | getGeneCoverage(bamFilesList, geneOfInterest, gtf, outDir) 138 | 139 | -------------------------------------------------------------------------------- /app/Dockerfile: -------------------------------------------------------------------------------- 1 | FROM ubuntu:22.04 2 | 3 | 4 | # Install base utilities 5 | RUN apt-get update 6 | RUN apt-get install -y build-essential autoconf libtool 7 | RUN apt-get install -y wget 8 | RUN apt-get install -y zip 9 | RUN apt-get install -y apt-utils 10 | RUN apt-get install -y firefox 11 | RUN apt-get update 12 | RUN apt-get install -y gcc-12 gcc-12-base gcc-12-doc g++-12 13 | RUN apt-get install -y libstdc++-12-dev libstdc++-12-doc 14 | RUN apt-get install -y bedops 15 | RUN apt-get update 16 | RUN apt-get clean 17 | RUN rm -rf /var/lib/apt/lists/* 18 | 19 | # Install Human reference 20 | RUN mkdir Human_reference_data 21 | RUN wget https://sourceforge.net/projects/rseqc/files/BED/Human_Homo_sapiens/hg38_GENCODE_V42_Comprehensive.bed.gz/download -O hg38_GENCODE_V42_Comprehensive.bed.gz 22 | RUN gunzip hg38_GENCODE_V42_Comprehensive.bed.gz 23 | 24 | RUN wget https://ftp.ebi.ac.uk/pub/databases/gencode/Gencode_human/release_43/GRCh38.primary_assembly.genome.fa.gz 25 | RUN gunzip GRCh38.primary_assembly.genome.fa.gz 26 | 27 | RUN wget https://ftp.ebi.ac.uk/pub/databases/gencode/Gencode_human/release_43/gencode.v43.transcripts.fa.gz 28 | RUN gunzip gencode.v43.transcripts.fa.gz 29 | 30 | RUN wget https://ftp.ebi.ac.uk/pub/databases/gencode/Gencode_human/release_43/gencode.v43.primary_assembly.annotation.gtf.gz 31 | RUN gunzip gencode.v43.primary_assembly.annotation.gtf.gz 32 | 33 | RUN mv hg38_GENCODE_V42_Comprehensive.bed Human_reference_data 34 | RUN mv GRCh38.primary_assembly.genome.fa Human_reference_data 35 | RUN mv gencode.v43.transcripts.fa Human_reference_data 36 | RUN mv gencode.v43.primary_assembly.annotation.gtf Human_reference_data 37 | 38 | # Install Mouse reference 39 | RUN mkdir Mouse_reference_data 40 | RUN wget https://sourceforge.net/projects/rseqc/files/BED/Mouse_Mus_musculus/GRCm39_GENCODE_VM27.bed.gz/download -O GRCm39_GENCODE_VM27.bed.gz 41 | RUN gunzip GRCm39_GENCODE_VM27.bed.gz 42 | 43 | RUN wget https://ftp.ebi.ac.uk/pub/databases/gencode/Gencode_mouse/release_M32/GRCm39.primary_assembly.genome.fa.gz 44 | RUN gunzip GRCm39.primary_assembly.genome.fa.gz 45 | 46 | RUN wget https://ftp.ebi.ac.uk/pub/databases/gencode/Gencode_mouse/release_M32/gencode.vM32.transcripts.fa.gz 47 | RUN gunzip gencode.vM32.transcripts.fa.gz 48 | 49 | RUN wget https://ftp.ebi.ac.uk/pub/databases/gencode/Gencode_mouse/release_M32/gencode.vM32.primary_assembly.annotation.gtf.gz 50 | RUN gunzip gencode.vM32.primary_assembly.annotation.gtf.gz 51 | 52 | RUN mv GRCm39_GENCODE_VM27.bed Mouse_reference_data 53 | RUN mv GRCm39.primary_assembly.genome.fa Mouse_reference_data 54 | RUN mv gencode.vM32.transcripts.fa Mouse_reference_data 55 | RUN mv gencode.vM32.primary_assembly.annotation.gtf Mouse_reference_data 56 | 57 | # Install Yeast reference 58 | RUN mkdir Yeast_reference_data 59 | RUN wget https://ftp.ensembl.org/pub/release-111/fasta/saccharomyces_cerevisiae/dna/Saccharomyces_cerevisiae.R64-1-1.dna.toplevel.fa.gz 60 | RUN gunzip Saccharomyces_cerevisiae.R64-1-1.dna.toplevel.fa.gz 61 | RUN wget https://ftp.ensembl.org/pub/release-111/fasta/saccharomyces_cerevisiae/cdna/Saccharomyces_cerevisiae.R64-1-1.cdna.all.fa.gz 62 | RUN gunzip Saccharomyces_cerevisiae.R64-1-1.cdna.all.fa.gz 63 | RUN wget https://ftp.ensembl.org/pub/release-111/gtf/saccharomyces_cerevisiae/Saccharomyces_cerevisiae.R64-1-1.111.gtf.gz 64 | RUN gunzip Saccharomyces_cerevisiae.R64-1-1.111.gtf.gz 65 | RUN mv Saccharomyces_cerevisiae.R64-1-1.dna.toplevel.fa Yeast_reference_data 66 | RUN mv Saccharomyces_cerevisiae.R64-1-1.cdna.all.fa Yeast_reference_data 67 | RUN cat Saccharomyces_cerevisiae.R64-1-1.111.gtf | sed s/gene_biotype/gene_type/g | sed s/transcript_biotype/transcript_type/g > ./Yeast_reference_data/Saccharomyces_cerevisiae.R64-1-1.111.gtf 68 | RUN rm Saccharomyces_cerevisiae.R64-1-1.111.gtf 69 | 70 | RUN mkdir Reference_data 71 | RUN mv Human_reference_data Reference_data 72 | RUN mv Mouse_reference_data Reference_data 73 | RUN mv Yeast_reference_data Reference_data 74 | 75 | RUN apt-get update && apt-get install -y curl 76 | RUN curl https://sh.rustup.rs -sSf | bash -s -- -y 77 | ENV PATH="/root/.cargo/bin:${PATH}" 78 | RUN cargo install gxf2bed 79 | RUN gxf2bed --input /Reference_data/Yeast_reference_data/Saccharomyces_cerevisiae.R64-1-1.111.gtf --output /Reference_data/Yeast_reference_data/Saccharomyces_cerevisiae.R64-1-1.111.bed 80 | 81 | # Install miniconda 82 | ENV CONDA_DIR /opt/conda 83 | RUN wget --quiet https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh -O ~/miniconda.sh && \ 84 | /bin/bash ~/miniconda.sh -b -p /opt/conda 85 | 86 | # Put conda in path so we can use conda activate 87 | ENV PATH=$CONDA_DIR/bin:$PATH 88 | 89 | RUN wget https://github.com/AnWiercze/NanopoReaTA/archive/refs/heads/master.zip 90 | 91 | RUN unzip master.zip 92 | 93 | WORKDIR ./NanopoReaTA-master/app/ 94 | 95 | RUN conda env create -f ./requirements_nanoporeata.yml 96 | 97 | #RUN conda init bash 98 | #RUN source /root/.bashrc 99 | #RUN conda run -n nanoporeata Rscript install.R 100 | RUN apt-get install -y liblapack-dev libblas-dev 101 | SHELL ["conda", "run", "--no-capture-output", "-n", "nanoporeata", "Rscript", "install.R"] 102 | RUN conda init bash 103 | 104 | ENV PORT=8080 105 | 106 | EXPOSE 8080 107 | 108 | ENV SHINY_LOG_STDERR=1 109 | 110 | 111 | ENTRYPOINT ["conda","run", "--no-capture-output","-n","nanoporeata", "Rscript", "app_docker.R"] 112 | 113 | 114 | -------------------------------------------------------------------------------- /app/server/R_scripts/read_length_distribution_plots.R: -------------------------------------------------------------------------------- 1 | ################################################################################ 2 | ## Length distribution ## 3 | ################################################################################ 4 | 5 | # This script creates plots of sample- and group-wise length distributions. 6 | 7 | theme_set(theme_light()) 8 | theme_update( 9 | panel.background = element_rect(fill = "transparent"), # bg of the panel 10 | plot.background = element_rect(fill = "transparent", color = NA), # bg of the plot 11 | legend.background = element_rect(fill = "transparent"), # get rid of legend bg 12 | legend.title = element_text(size = 20, color = "white"), 13 | legend.key = element_rect(colour = "transparent", fill = "transparent"), 14 | legend.text = element_text(size = 20, color = "white"), 15 | axis.text = element_text(angle = 45, hjust = 1, size = 17, color = "white"), 16 | plot.title = element_text(hjust = 0.5, face = "bold", size = 23, color = "white"), 17 | axis.title = element_text(size = 23, color = "white"), 18 | axis.line = element_line(color = "white"), 19 | axis.ticks = element_line(color = "white")) 20 | 21 | createLengthPlots <- function(readLengths_df_filt, metadata, conditionCol, conditions, color_conditions){ 22 | # Extract conditions of interest 23 | metadata = metadata[metadata[[conditionCol]] %in% conditions, ] 24 | 25 | # Join the metadata information with read length file 26 | readLengths_df_filt = readLengths_df_filt %>% 27 | left_join(metadata, by = c("Sample" = "Samples")) 28 | # Plot sample-wise distribution of all reads 29 | sampleWise_All = ggplot(readLengths_df_filt, aes(x=Length, color=Sample)) + 30 | geom_density() + 31 | scale_x_continuous(labels=scales::comma) + 32 | xlab("Read length") + 33 | ylab("Density") + 34 | ggtitle("Sample-wise length distribution\n(filtered longest 1 % of reads)") + 35 | theme( 36 | # rect = element_rect(fill = "transparent"), 37 | panel.background = element_rect(fill = 'transparent', color = "white"), # bg of the panel 38 | plot.background = element_rect(fill = "transparent", color = NA), # bg of the plot 39 | legend.background = element_rect(fill = "transparent"), # get rid of legend bg 40 | # legend.box.background = element_rect(fill = "transparent"), 41 | legend.title = element_text(size = 16, color = "white"), 42 | legend.key = element_rect(colour = "transparent", fill = "transparent"), 43 | legend.text = element_text(size = 14, color = "white"), 44 | axis.text = element_text(angle = 45, hjust = 1, size = 14, color = "white"), 45 | plot.title = element_text(hjust = 0.5, face = "bold", size = 23, color = "white"), 46 | axis.title = element_text(size = 23, color = "white"), 47 | axis.line = element_line(color = "white"), 48 | axis.ticks = element_line(color = "white")) 49 | 50 | # Plot group-wise distribution of all reads 51 | groupWise_All = ggplot(readLengths_df_filt, aes(x = Length, color = Condition)) + 52 | scale_color_manual(values=color_conditions) + 53 | geom_density() + 54 | scale_x_continuous(labels=scales::comma) + 55 | xlab("Read length") + 56 | ylab("Density") + 57 | ggtitle("Condition-wise length distribution\n(filtered longest 1 % of reads)") + 58 | theme( 59 | # rect = element_rect(fill = "transparent"), 60 | panel.background = element_rect(fill = 'transparent', color = "white"), # bg of the panel 61 | plot.background = element_rect(fill = "transparent", color = NA), # bg of the plot 62 | legend.background = element_rect(fill = "transparent"), # get rid of legend bg 63 | # legend.box.background = element_rect(fill = "transparent"), 64 | legend.title = element_text(size = 16, color = "white"), 65 | legend.key = element_rect(colour = "transparent", fill = "transparent"), 66 | legend.text = element_text(size = 14, color = "white"), 67 | axis.text = element_text(angle = 45, hjust = 1, size = 14, color = "white"), 68 | plot.title = element_text(hjust = 0.5, face = "bold", size = 23, color = "white"), 69 | axis.title = element_text(size = 23, color = "white"), 70 | axis.line = element_line(color = "white"), 71 | axis.ticks = element_line(color = "white")) 72 | 73 | 74 | return(list(sampleWise_All = sampleWise_All, groupWise_All = groupWise_All)) 75 | } 76 | samplewise_read_length.download <- function(readLengths_df_filt, metadata, conditionCol, conditions){ 77 | theme_update(legend.title = element_text(size = 20, color = "black"), 78 | # legend.key = element_rect(colour = "transparent", fill = "transparent"), 79 | legend.text = element_text(size = 20, color = "black"), 80 | axis.text = element_text(angle = 45, hjust = 1, size = 17, color = "black"), 81 | plot.title = element_text(hjust = 0.5, face = "bold", size = 23, color = "black"), 82 | axis.title = element_text(size = 23, color = "black"), 83 | axis.line = element_line(color = "black"), 84 | axis.ticks = element_line(color = "black"), 85 | panel.background = element_rect(fill = "white"), # bg of the panel 86 | plot.background = element_rect(fill = "white"), # bg of the plot 87 | legend.background = element_rect(fill = "white")) 88 | metadata = metadata[metadata[[conditionCol]] %in% conditions, ] 89 | 90 | # Join the metadata information with read length file 91 | readLengths_df_filt = readLengths_df_filt %>% 92 | left_join(metadata, by = c("Sample" = "Samples")) 93 | 94 | # Plot sample-wise distribution of all reads 95 | sampleWise_All = ggplot(readLengths_df_filt, aes(x = Length, color = Sample)) + 96 | geom_density() + 97 | scale_x_continuous(labels=scales::comma) + 98 | xlab("Read length") + 99 | ylab("Density") + 100 | ggtitle("All reads: Sample-wise length distribution\n(filtered longest 1 % of reads)") + 101 | theme(plot.title = element_text(hjust = 0.5)) 102 | 103 | return(sampleWise_All) 104 | 105 | } 106 | groupwise_read_length.download <- function(readLengths_df_filt, metadata, conditionCol, conditions, color_conditions){ 107 | metadata = metadata[metadata[[conditionCol]] %in% conditions, ] 108 | theme_update(legend.title = element_text(size = 20, color = "black"), 109 | # legend.key = element_rect(colour = "transparent", fill = "transparent"), 110 | legend.text = element_text(size = 20, color = "black"), 111 | axis.text = element_text(angle = 45, hjust = 1, size = 17, color = "black"), 112 | plot.title = element_text(hjust = 0.5, face = "bold", size = 23, color = "black"), 113 | axis.title = element_text(size = 23, color = "black"), 114 | axis.line = element_line(color = "black"), 115 | axis.ticks = element_line(color = "black"), 116 | panel.background = element_rect(fill = "white"), # bg of the panel 117 | plot.background = element_rect(fill = "white"), # bg of the plot 118 | legend.background = element_rect(fill = "white")) 119 | # Join the metadata information with read length file 120 | readLengths_df_filt = readLengths_df_filt %>% 121 | left_join(metadata, by = c("Sample" = "Samples")) 122 | 123 | # Plot group-wise distribution of all reads 124 | groupWise_All = ggplot(readLengths_df_filt, aes(x = Length, color = Condition)) + 125 | scale_color_manual(values=color_conditions) + 126 | geom_density() + 127 | scale_x_continuous(labels=scales::comma) + 128 | xlab("Read length") + 129 | ylab("Density") + 130 | ggtitle("All reads: Condition-wise length distribution\n(filtered longest 1 % of reads)") + 131 | theme(plot.title = element_text(hjust = 0.5)) 132 | 133 | return(groupWise_All) 134 | } 135 | 136 | 137 | -------------------------------------------------------------------------------- /app/requirements_nanoporeata.yml: -------------------------------------------------------------------------------- 1 | name: nanoporeata 2 | channels: 3 | - anaconda 4 | - bioconda 5 | - conda-forge 6 | - defaults 7 | dependencies: 8 | - _libgcc_mutex=0.1 9 | - _openmp_mutex=4.5 10 | - _r-mutex=1.0.1 11 | - argon2-cffi=21.3.0 12 | - argon2-cffi-bindings=21.2.0 13 | - asttokens=2.0.8 14 | - attrs=22.1.0 15 | - backcall=0.2.0 16 | - backports=1.0 17 | - backports.functools_lru_cache=1.6.4 18 | - bcftools=1.3.1 19 | - beautifulsoup4=4.11.1 20 | - binutils_impl_linux-64=2.38 21 | - binutils_linux-64=2.38.0 22 | - blas=1.0 23 | - bleach=5.0.1 24 | - bottleneck=1.3.5 25 | - bwidget=1.9.14 26 | - bzip2=1.0.8 27 | - c-ares=1.18.1 28 | - ca-certificates=2022.9.24 29 | - cairo=1.16.0 30 | - certifi=2022.9.24 31 | - cffi=1.15.1 32 | - cmake=3.22.1 33 | - coreutils=8.25 34 | - curl=7.83.1 35 | - cxx-compiler=1.0.0 36 | - debugpy=1.6.3 37 | - decorator=5.1.1 38 | - defusedxml=0.7.1 39 | - entrypoints=0.4 40 | - executing=0.10.0 41 | - expat=2.4.8 42 | - flit-core=3.7.1 43 | - font-ttf-dejavu-sans-mono=2.37 44 | - font-ttf-inconsolata=3.000 45 | - font-ttf-source-code-pro=2.038 46 | - font-ttf-ubuntu=0.83 47 | - fontconfig=2.14.0 48 | - fonts-conda-ecosystem=1 49 | - fonts-conda-forge=1 50 | - freetype=2.12.1 51 | - freetype-py=2.3.0 52 | - fribidi=1.0.10 53 | - gcc_impl_linux-64=11.2.0 54 | - gcc_linux-64=11.2.0 55 | - gettext=0.19.8.1 56 | - gfortran_impl_linux-64=11.2.0 57 | - graphite2=1.3.13 58 | - gtfparse=1.2.1 59 | - gsl=2.7 60 | - gxx_impl_linux-64=11.2.0 61 | - gxx_linux-64=11.2.0 62 | - harfbuzz=5.1.0 63 | - htslib=1.3.1 64 | - icu=70.1 65 | - importlib-metadata=4.11.4 66 | - importlib_resources=5.9.0 67 | - intel-openmp=2021.4.0 68 | - ipykernel=6.15.1 69 | - ipython=8.4.0 70 | - ipython_genutils=0.2.0 71 | - jedi=0.18.1 72 | - jinja2=3.1.2 73 | - jpeg=9e 74 | - jsonschema=4.14.0 75 | - jupyter_client=7.3.4 76 | - jupyter_core=4.11.1 77 | - jupyterlab_pygments=0.2.2 78 | - k8=0.2.5 79 | - kernel-headers_linux-64=2.6.32 80 | - keyutils=1.6.1 81 | - krb5=1.19.3 82 | - ld_impl_linux-64=2.38 83 | - lerc=4.0.0 84 | - libarchive=3.5.2 85 | - libblas=3.9.0 86 | - libcblas=3.9.0 87 | - libcurl=7.83.1 88 | - libdeflate=1.13 89 | - libedit=3.1.20191231 90 | - libev=4.33 91 | - libffi=3.4.2 92 | - libgcc=7.2.0 93 | - libgcc-devel_linux-64=11.2.0 94 | - libgcc-ng=12.1.0 95 | - libgfortran-ng=12.1.0 96 | - libgfortran5=12.1.0 97 | - libgit2=1.5.0 98 | - libglib=2.72.1 99 | - libgomp=12.1.0 100 | - libiconv=1.16 101 | - libjpeg-turbo=2.1.4 102 | - liblapack=3.9.0 103 | - libmamba=0.25.0 104 | - libnghttp2=1.47.0 105 | - libnsl=2.0.0 106 | - libopenblas=0.3.21 107 | - libpng=1.6.38 108 | - libsodium=1.0.18 109 | - libsolv=0.7.22 110 | - libsqlite=3.39.2 111 | - libssh2=1.10.0 112 | - libstdcxx-devel_linux-64=11.2.0 113 | - libstdcxx-ng=12.1.0 114 | - libtiff=4.4.0 115 | - libuuid=2.32.1 116 | - libwebp-base=1.2.4 117 | - libxcb=1.13 118 | - libxml2=2.9.14 119 | - libxslt=1.1.35 120 | - libzlib=1.2.12 121 | - llvm-openmp=14.0.4 122 | - lxml=4.9.1 123 | - lz4-c=1.9.3 124 | - lzo=2.10 125 | - make=4.3 126 | - markupsafe=2.1.1 127 | - matplotlib-inline=0.1.6 128 | - minimap2=2.24 129 | - mistune=2.0.4 130 | - mkl=2021.4.0 131 | - mkl-service=2.4.0 132 | - mkl_fft=1.3.1 133 | - mkl_random=1.2.2 134 | - nbclient=0.6.7 135 | - nbconvert=7.0.0 136 | - nbconvert-core=7.0.0 137 | - nbconvert-pandoc=7.0.0 138 | - nbformat=5.4.0 139 | - ncurses=6.3 140 | - nest-asyncio=1.5.5 141 | - nextflow=21.10.6 142 | - nomkl=3.0 143 | - notebook=6.4.12 144 | - numexpr=2.8.3 145 | - numpy=1.23.1 146 | - numpy-base=1.23.1 147 | - openblas=0.3.21 148 | - openjdk=8.0.152 149 | - openssl=1.1.1q 150 | - packaging=21.3 151 | - pandas=1.4.4 152 | - pandoc=2.19.2 153 | - pandocfilters=1.5.0 154 | - pango=1.50.9 155 | - parallel=20220722 156 | - parso=0.8.3 157 | - pcre=8.45 158 | - pcre2=10.37 159 | - perl=5.32.1 160 | - pexpect=4.8.0 161 | - pickleshare=0.7.5 162 | - pip=22.1.2 163 | - pixman=0.40.0 164 | - pkgutil-resolve-name=1.3.10 165 | - prometheus_client=0.14.1 166 | - prompt-toolkit=3.0.30 167 | - psutil=5.9.1 168 | - pthread-stubs=0.4 169 | - ptyprocess=0.7.0 170 | - pure_eval=0.2.2 171 | - pybind11-abi=4 172 | - pycparser=2.21 173 | - pygments=2.13.0 174 | - pyparsing=3.0.4 175 | - pyrsistent=0.18.1 176 | - pysam=0.9.1 177 | - python=3.9.13 178 | - python-dateutil=2.8.2 179 | - python-fastjsonschema=2.16.1 180 | - python_abi=3.9 181 | - pytz=2022.1 182 | - pyzmq=23.2.1 183 | - r-askpass=1.1 184 | - r-assertthat=0.2.1 185 | - r-backports=1.4.1 186 | - r-base=4.1.3 187 | - r-base64enc=0.1_3 188 | - r-bit=4.0.4 189 | - r-bit64=4.0.5 190 | - r-blob=1.2.3 191 | - r-boot=1.3_28 192 | - r-brew=1.0_8 193 | - r-brio=1.1.3 194 | - r-broom=1.0.0 195 | - r-bslib=0.4.0 196 | - r-cachem=1.0.6 197 | - r-callr=3.7.2 198 | - r-caret=6.0_93 199 | - r-cellranger=1.1.0 200 | - r-class=7.3_20 201 | - r-cli=3.3.0 202 | - r-clipr=0.8.0 203 | - r-cluster=2.1.3 204 | - r-codetools=0.2_18 205 | - r-colorspace=2.0_3 206 | - r-commonmark=1.8.0 207 | - r-cpp11=0.4.2 208 | - r-crayon=1.5.1 209 | - r-credentials=1.3.2 210 | - r-crul=1.2.0 211 | - r-curl=4.3.2 212 | - r-data.table=1.14.2 213 | - r-dbi=1.1.3 214 | - r-dbplyr=2.2.1 215 | - r-desc=1.4.1 216 | - r-devtools=2.4.5 217 | - r-diffobj=0.3.5 218 | - r-digest=0.6.29 219 | - r-downlit=0.4.2 220 | - r-dplyr=1.0.9 221 | - r-dtplyr=1.2.2 222 | - r-e1071=1.7_11 223 | - r-ellipsis=0.3.2 224 | - r-essentials=4.1 225 | - r-evaluate=0.16 226 | - r-fansi=1.0.3 227 | - r-farver=2.1.1 228 | - r-fastmap=1.1.0 229 | - r-fontawesome=0.3.0 230 | - r-forcats=0.5.2 231 | - r-foreach=1.5.2 232 | - r-foreign=0.8_82 233 | - r-formatr=1.12 234 | - r-fs=1.5.2 235 | - r-future=1.27.0 236 | - r-future.apply=1.9.0 237 | - r-gargle=1.2.0 238 | - r-generics=0.1.3 239 | - r-gert=1.5.0 240 | - r-ggplot2=3.3.6 241 | - r-gh=1.3.1 242 | - r-gistr=0.9.0 243 | - r-gitcreds=0.1.2 244 | - r-glmnet=4.1_2 245 | - r-globals=0.16.0 246 | - r-glue=1.6.2 247 | - r-googledrive=2.0.0 248 | - r-googlesheets4=1.0.1 249 | - r-gower=1.0.0 250 | - r-gtable=0.3.0 251 | - r-hardhat=1.2.0 252 | - r-haven=2.5.0 253 | - r-hexbin=1.28.2 254 | - r-highr=0.9 255 | - r-hms=1.1.2 256 | - r-htmltools=0.5.3 257 | - r-htmlwidgets=1.5.4 258 | - r-httpcode=0.3.0 259 | - r-httpuv=1.6.5 260 | - r-httr=1.4.4 261 | - r-ids=1.0.1 262 | - r-ini=0.3.1 263 | - r-ipred=0.9_13 264 | - r-irdisplay=1.1 265 | - r-irkernel=1.3 266 | - r-isoband=0.2.5 267 | - r-iterators=1.0.14 268 | - r-jquerylib=0.1.4 269 | - r-jsonlite=1.8.0 270 | - r-kernsmooth=2.23_20 271 | - r-knitr=1.39 272 | - r-labeling=0.4.2 273 | - r-later=1.2.0 274 | - r-lattice=0.20_45 275 | - r-lava=1.6.10 276 | - r-lazyeval=0.2.2 277 | - r-lifecycle=1.0.1 278 | - r-listenv=0.8.0 279 | - r-lobstr=1.1.2 280 | - r-lubridate=1.8.0 281 | - r-magrittr=2.0.3 282 | - r-maps=3.4.0 283 | - r-mass=7.3_58.1 284 | - r-matrix=1.4_1 285 | - r-memoise=2.0.1 286 | - r-mgcv=1.8_40 287 | - r-mime=0.12 288 | - r-miniui=0.1.1.1 289 | - r-modelmetrics=1.2.2.2 290 | - r-modelr=0.1.9 291 | - r-munsell=0.5.0 292 | - r-nlme=3.1_159 293 | - r-nnet=7.3_17 294 | - r-numderiv=2016.8_1.1 295 | - r-openssl=2.0.2 296 | - r-parallelly=1.32.1 297 | - r-pbdzmq=0.3_7 298 | - r-pillar=1.8.1 299 | - r-pkgbuild=1.3.1 300 | - r-pkgconfig=2.0.3 301 | - r-pkgdown=2.0.6 302 | - r-pkgload=1.3.0 303 | - r-plyr=1.8.7 304 | - r-praise=1.0.0 305 | - r-prettyunits=1.1.1 306 | - r-proc=1.18.0 307 | - r-processx=3.7.0 308 | - r-prodlim=2019.11.13 309 | - r-profvis=0.3.7 310 | - r-progress=1.2.2 311 | - r-progressr=0.10.1 312 | - r-promises=1.2.0.1 313 | - r-proxy=0.4_27 314 | - r-pryr=0.1.5 315 | - r-ps=1.7.1 316 | - r-purrr=0.3.4 317 | - r-quantmod=0.4.20 318 | - r-r6=2.5.1 319 | - r-ragg=1.2.3 320 | - r-randomforest=4.7_1.1 321 | - r-rappdirs=0.3.3 322 | - r-rbokeh=0.5.2 323 | - r-rcmdcheck=1.4.0 324 | - r-rcolorbrewer=1.1_3 325 | - r-rcpp=1.0.9 326 | - r-rcpptoml=0.1.7 327 | - r-readr=2.1.2 328 | - r-readxl=1.4.1 329 | - r-recipes=1.0.1 330 | - r-recommended=4.1 331 | - r-rematch=1.0.1 332 | - r-rematch2=2.1.2 333 | - r-remotes=2.4.2 334 | - r-repr=1.1.4 335 | - r-reprex=2.0.2 336 | - r-reshape2=1.4.4 337 | - r-rlang=1.0.4 338 | - r-rmarkdown=2.15 339 | - r-roxygen2=7.2.1 340 | - r-rpart=4.1.16 341 | - r-rprojroot=2.0.3 342 | - r-rstudioapi=0.14 343 | - r-rversions=2.1.2 344 | - r-rvest=1.0.3 345 | - r-sass=0.4.2 346 | - r-scales=1.2.1 347 | - r-selectr=0.4_2 348 | - r-sessioninfo=1.2.2 349 | - r-shape=1.4.6 350 | - r-shiny=1.7.2 351 | - r-sourcetools=0.1.7 352 | - r-spatial=7.3_15 353 | - r-squarem=2021.1 354 | - r-stringi=1.7.8 355 | - r-stringr=1.4.1 356 | - r-survival=3.4_0 357 | - r-sys=3.4.1 358 | - r-systemfonts=1.0.4 359 | - r-testthat=3.1.4 360 | - r-textshaping=0.3.6 361 | - r-tibble=3.1.8 362 | - r-tidyr=1.2.0 363 | - r-tidyselect=1.1.2 364 | - r-tidyverse=1.3.2 365 | - r-timedate=4021.104 366 | - r-tinytex=0.41 367 | - r-triebeard=0.3.0 368 | - r-ttr=0.24.3 369 | - r-tzdb=0.3.0 370 | - r-urlchecker=1.0.1 371 | - r-urltools=1.7.3 372 | - r-usethis=2.1.6 373 | - r-utf8=1.2.2 374 | - r-uuid=1.1_0 375 | - r-vctrs=0.4.1 376 | - r-viridislite=0.4.1 377 | - r-vroom=1.5.7 378 | - r-waldo=0.4.0 379 | - r-whisker=0.4 380 | - r-withr=2.5.0 381 | - r-xfun=0.32 382 | - r-xml=3.99_0.10 383 | - r-xml2=1.3.3 384 | - r-xopen=1.0.0 385 | - r-xtable=1.8_4 386 | - r-xts=0.12.1 387 | - r-yaml=2.3.5 388 | - r-zip=2.2.1 389 | - r-zoo=1.8_10 390 | - readline=8.1.2 391 | - reproc=14.2.3 392 | - reproc-cpp=14.2.3 393 | - salmon=0.14.2 394 | - samtools=1.3.1 395 | - sed=4.8 396 | - send2trash=1.8.0 397 | - setuptools=63.4.1 398 | - six=1.16.0 399 | - soupsieve=2.3.2.post1 400 | - sqlite=3.39.2 401 | - stack_data=0.4.0 402 | - subread=2.0.1 403 | - sysroot_linux-64=2.12 404 | - tbb=2020.3 405 | - terminado=0.15.0 406 | - tinycss2=1.1.1 407 | - tk=8.6.12 408 | - tktable=2.10 409 | - tornado=6.2 410 | - traitlets=5.3.0 411 | - tzdata=2022a 412 | - wcwidth=0.2.5 413 | - webencodings=0.5.1 414 | - wheel=0.37.1 415 | - xorg-kbproto=1.0.7 416 | - xorg-libice=1.0.10 417 | - xorg-libsm=1.2.3 418 | - xorg-libx11=1.7.2 419 | - xorg-libxau=1.0.9 420 | - xorg-libxdmcp=1.1.3 421 | - xorg-libxext=1.3.4 422 | - xorg-libxrender=0.9.10 423 | - xorg-libxt=1.2.1 424 | - xorg-renderproto=0.11.1 425 | - xorg-xextproto=7.3.0 426 | - xorg-xproto=7.0.31 427 | - xz=5.2.5 428 | - yaml=0.2.5 429 | - yaml-cpp=0.7.0 430 | - zeromq=4.3.4 431 | - zipp=3.8.1 432 | - zlib=1.2.12 433 | - zstd=1.5.2 434 | 435 | -------------------------------------------------------------------------------- /app/server/R_scripts/dea_function.R: -------------------------------------------------------------------------------- 1 | #!/usr/bin/env Rscript 2 | 3 | ###### Functions for DEA ###### 4 | 5 | options(warn=-1) 6 | 7 | safe_colorblind_palette <- c("#88CCEE", "#CC6677", "#DDCC77", "#117733", "#332288", "#AA4499", 8 | "#44AA99", "#999933", "#882255", "#661100", "#6699CC", "#888888") 9 | 10 | createDDS2 <- function(counts, metadata, first.level, ref.level){ 11 | flog.info("########## Create DDS object ###########") 12 | 13 | dds <- DESeqDataSetFromMatrix(countData = counts, 14 | colData = metadata, 15 | design = ~ conditions) 16 | 17 | 18 | dds$conditions = factor(dds$conditions, levels = c(first.level, ref.level)) 19 | 20 | dds <- DESeq(dds, parallel = T) 21 | 22 | return(dds) 23 | } 24 | 25 | createRES <- function(dds, first.level, ref.level, pvalue, gtf_file){ 26 | flog.info("########## Create results object ###########") 27 | 28 | res <- results(dds, contrast = c("conditions", first.level, ref.level), alpha = pvalue, parallel = T) 29 | res_df <- as.data.frame(res) 30 | res_df <- na.omit(res_df) 31 | res_df <- res_df[order(res_df$padj, res_df$pvalue, decreasing = F),] 32 | res_df <- cbind(names = rownames(res_df), res_df) 33 | res_df$Significance <- ifelse(res_df$padj < pvalue, TRUE, FALSE) 34 | res_df$gencode = res_df$names 35 | tmp = gtf_file[which(gtf_file$gene_id %in% res_df$names),] 36 | res_df$genes = tmp[match(res_df$names, tmp$gene_id), "gene_name"] 37 | res_df$genes = make.unique(res_df$genes) 38 | res_df = na.omit(res_df) 39 | row.names(res_df) = res_df$genes 40 | res_df$names = res_df$genes 41 | res_df$genes <- NULL 42 | return(res_df) 43 | } 44 | 45 | createPCA<- function(rld, first.level, ref.level, condi_col){ 46 | flog.info("########## Create PCA plot ###########") 47 | 48 | rldObject = rld 49 | pcaData <- plotPCA(rldObject, intgroup="conditions", returnData=TRUE) 50 | 51 | percentVar <- round(100 * attr(pcaData, "percentVar")) 52 | 53 | pca_plot = ggplot(pcaData, aes(PC1, PC2, color=conditions, label=name)) + 54 | scale_color_manual(values = condi_col) + 55 | geom_point(size=5) + 56 | ggtitle("PCA plot") + 57 | theme_bw() + 58 | xlab(paste0("PC1: ",percentVar[1],"% variance")) + 59 | ylab(paste0("PC2: ",percentVar[2],"% variance")) + 60 | guides(color = guide_legend(order = 1), fill = guide_legend(order = 0)) + 61 | geom_text_repel(aes(label = pcaData$name), size = 6, box.padding = 0.5, max.overlaps = Inf) + 62 | theme( 63 | panel.background = element_rect(fill = "transparent"), # bg of the panel 64 | plot.background = element_rect(fill = "transparent", color = NA), # bg of the plot 65 | legend.background = element_rect(fill = "transparent"), # get rid of legend bg 66 | legend.title = element_text(size = 20, color = "white"), 67 | legend.key = element_rect(colour = "transparent", fill = "transparent"), 68 | legend.text = element_text(size = 20, color = "white"), 69 | axis.text = element_text(angle = 45, hjust = 1, size = 17, color = "white"), 70 | plot.title = element_text(hjust = 0.5, face = "bold", size = 23, color = "white"), 71 | axis.title = element_text(size = 23, color = "white")) 72 | return(pca_plot) 73 | } 74 | 75 | createVolcano <- function(res_df, condi_col){ 76 | flog.info("########## Create volcano plot ###########") 77 | res_df$Significance_reg = ifelse(res_df$Significance, 78 | ifelse(res_df$log2FoldChange > 0, "Up", "Down"), 79 | "Not sig.") 80 | res_df$Significance_reg = factor(res_df$Significance_reg, levels = c("Up", "Down","Not sig.")) 81 | 82 | res_df$Significance_reg = factor(res_df$Significance_reg, levels = c("Up", "Down","Not sig.")) 83 | 84 | # Subset significat genes to color in blue 85 | gen_subset <- subset(res_df, res_df$Significance == TRUE) 86 | gen_subset <- gen_subset[order(gen_subset$padj),] 87 | if (dim(gen_subset)[1] > 10){ 88 | gen_subset_short <- gen_subset[1:10,] 89 | } else { 90 | gen_subset_short <- gen_subset 91 | } 92 | 93 | res_df$geneLabels = ifelse(res_df$names %in% gen_subset_short$names, TRUE, FALSE) 94 | color_code = list("Up" = condi_col[1], 95 | "Down" = condi_col[2], 96 | "Not sig." = "gray") 97 | 98 | # Creates volcano plot with the 25 most significant genes labeled 99 | 100 | vol = ggplot(data=res_df, aes(x=log2FoldChange, y=-log10(padj), colour=Significance_reg)) + 101 | #scale_color_manual(values = colorCode) + 102 | geom_point(size=1.75) + 103 | xlab("log2 fold change") + ylab("-log10 p-adjusted")+ 104 | ggtitle(paste0("Differential Expression (", names(condi_col)[1], " vs. ", names(condi_col)[2], ")")) + # add conditions to title 105 | scale_color_manual(values = color_code) + 106 | theme_bw() + 107 | guides(colour = guide_legend(override.aes = list(size=5))) + 108 | #theme(legend.position = "None")+ 109 | geom_text_repel(max.overlaps = Inf, max.time = 1, aes(x=log2FoldChange, y=-log10(padj)), 110 | label = ifelse(res_df$geneLabels == TRUE, as.character(res_df$names),""), 111 | box.padding = 0.5, show.legend = F, size = 6) + 112 | theme( 113 | panel.background = element_rect(fill = "transparent"), # bg of the panel 114 | plot.background = element_rect(fill = "transparent", color = NA), # bg of the plot 115 | # panel.grid.major = element_blank(), # get rid of major grid 116 | # panel.grid.minor = element_blank(), # get rid of minor grid 117 | legend.background = element_rect(fill = "transparent"), # get rid of legend bg 118 | #legend.box.background = element_rect(fill = "transparent"), # get rid of legend panel bg 119 | #legend.title = element_text(size = 20, color = "white"), 120 | legend.title = element_blank(), # remove legend title 121 | legend.key = element_rect(colour = "transparent", fill = "transparent"), 122 | legend.text = element_text(size = 20, color = "white"), 123 | axis.text = element_text(angle = 45, hjust = 1, size = 17, color = "white"), 124 | plot.title = element_text(hjust = 0.5, face = "bold", size = 23, color = "white"), 125 | axis.title = element_text(size = 23, color = "white")) 126 | return(vol) 127 | } 128 | 129 | createHeatmap <- function(dds, rld, condi_col, main_color = "RdBu", gtf_file = NA, genes = NA){ 130 | flog.info("########## Create Heatmap of Expression ###########") 131 | 132 | print("Heatmap will be created...") 133 | colList = list("conditions" = condi_col) 134 | select <- genes[, "gencode"] 135 | df <- as.data.frame(colData(dds)["conditions"]) 136 | heat_input = assay(rld)[select,] 137 | row.names(heat_input) = genes[, "names"] 138 | 139 | ha = HeatmapAnnotation(Condition = df$conditions, 140 | col = list(Condition = condi_col), 141 | name = "Condition ", 142 | show_annotation_name = F, 143 | annotation_height = 2, 144 | annotation_width = 2, 145 | annotation_legend_param = list( 146 | title_gp = gpar(fontsize = 18, col = "white"), 147 | labels_gp = gpar(fontsize = 16, col = "white"), 148 | title_position = "lefttop-rot" 149 | )) # changed from annotation_label = gt_render(c("condition")) 150 | g = ComplexHeatmap::Heatmap( 151 | heat_input, 152 | name = "Norm. counts", 153 | col = hcl.colors(50, main_color), 154 | cluster_rows = T, 155 | cluster_columns = T, 156 | show_column_dend = F, 157 | top_annotation = ha, 158 | show_row_dend = T, 159 | show_column_names=T, 160 | column_title = NULL, 161 | column_names_rot = 45, 162 | row_dend_gp = gpar(col = "white"), 163 | row_names_side = "right", 164 | row_names_gp = gpar(fontsize = 15, col = "white"), 165 | column_names_gp = gpar(fontsize = 20, col = "white"), 166 | heatmap_legend_param = list( 167 | title_gp = gpar(fontsize = 18, col = "white"), 168 | labels_gp = gpar(fontsize = 16, col = "white"), 169 | legend_height = unit(6, "cm"), 170 | grid_width = unit(0.5, "cm"), 171 | title_position = "lefttop-rot" 172 | ) 173 | ) 174 | g_draw = draw(g, background = "transparent") 175 | 176 | ha2 = HeatmapAnnotation(Condition = df$conditions, 177 | col = list(Condition = condi_col), 178 | name = "Condition ", 179 | show_annotation_name = F, 180 | annotation_height = 2, 181 | annotation_width = 2, 182 | annotation_legend_param = list( 183 | title_gp = gpar(fontsize = 18, col = "black"), 184 | labels_gp = gpar(fontsize = 16, col = "black"), 185 | title_position = "lefttop-rot" 186 | )) # changed from annotation_label = gt_render(c("condition")) 187 | 188 | g2 = ComplexHeatmap::Heatmap( 189 | heat_input, 190 | name = "Norm. counts", 191 | col = hcl.colors(50, main_color), 192 | cluster_rows = T, 193 | cluster_columns = T, 194 | show_column_dend = F, 195 | top_annotation = ha2, 196 | show_row_dend = T, 197 | show_column_names=T, 198 | column_title = NULL, 199 | column_names_rot = 45, 200 | row_names_side = "right", 201 | row_names_gp = gpar(fontsize = 15, col = "black"), 202 | column_names_gp = gpar(fontsize = 15, col = "black"), 203 | heatmap_legend_param = list( 204 | title_gp = gpar(fontsize = 18, col = "black"), 205 | labels_gp = gpar(fontsize = 16, col = "black"), 206 | legend_height = unit(6, "cm"), 207 | grid_width = unit(0.5, "cm"), 208 | title_position = "lefttop-rot" 209 | ) 210 | ) 211 | 212 | g_draw2 = draw(g2, background = "transparent") 213 | 214 | return(list("heat" = g_draw, "heat.down" = g_draw2)) 215 | } 216 | 217 | createSam2Sam <- function(rld){ 218 | flog.info("########## Create Sample to sample distance heatmap ###########") 219 | 220 | sampleDists <- dist(t(assay(rld))) 221 | sampleDistMatrix <- as.matrix(sampleDists) 222 | rownames(sampleDistMatrix) <- paste(rld$conditions, rld$Samples, sep="-") 223 | colnames(sampleDistMatrix) <- NULL 224 | colors <- colorRampPalette( rev(brewer.pal(9, "Blues")))(255) 225 | 226 | g = ComplexHeatmap::Heatmap(sampleDistMatrix, 227 | name = "Distance", 228 | cluster_rows = T, 229 | cluster_columns = T, 230 | show_column_dend = F, 231 | show_row_dend = T, 232 | show_column_names=F, col = colors, 233 | column_title = NULL, 234 | row_names_side = "right", 235 | heatmap_legend_param = list( 236 | title_gp = gpar(fontsize = 22, col = "white"), 237 | labels_gp = gpar(fontsize = 20, col = "white"), 238 | legend_direction = "horizontal", 239 | legend_width = unit(8, "cm"), 240 | grid_height = unit(1, "cm"), 241 | title_position = "lefttop"), 242 | row_names_gp = gpar(fontsize = 22, col = "white"), 243 | row_dend_gp = gpar(col = "white")) 244 | g_draw = draw(g, background = "transparent", 245 | heatmap_legend_side = "bottom", 246 | padding = unit(c(2, 2, 2, 30), "mm")) 247 | 248 | g2 = ComplexHeatmap::Heatmap(sampleDistMatrix, 249 | name = "Distance", 250 | cluster_rows = T, 251 | cluster_columns = T, 252 | show_column_dend = F, 253 | show_row_dend = T, 254 | show_column_names=F, col = colors, 255 | column_title = NULL, 256 | row_names_side = "right", 257 | heatmap_legend_param = list( 258 | title_gp = gpar(fontsize = 12, col = "black"), 259 | labels_gp = gpar(fontsize = 10, col = "black"), 260 | legend_direction = "horizontal", 261 | legend_width = unit(4, "cm"), 262 | grid_height = unit(0.5, "cm"), 263 | title_position = "lefttop"), 264 | row_names_gp = gpar(fontsize = 12, col = "black"), 265 | row_dend_gp = gpar(col = "black")) 266 | g_draw2 = draw(g2, background = "transparent", 267 | heatmap_legend_side = "bottom", 268 | padding = unit(c(2, 2, 2, 2), "mm")) 269 | 270 | return(list(g_draw, g_draw2)) 271 | } 272 | 273 | run_preprocessing_dea <- function(meta.file, counts.file, condition.col, first.level, ref.level, pvalue, gtf_file){ 274 | flog.info("########## Differential Expression Analysis ###########") 275 | counts = counts.file 276 | 277 | metadata = meta.file 278 | row.names(metadata) <- metadata$Samples 279 | 280 | missingSampleInfos = colnames(counts)[-which(colnames(counts) %in% metadata$Samples)] 281 | if (length(missingSampleInfos) > 0){ 282 | print(paste0("No metadata found for the following samples: ", paste(missingSampleInfos, collapse = ","))) 283 | print(paste0(">>>>> Counts will be excluded!")) 284 | 285 | } else { 286 | print("All required information is included!") 287 | } 288 | metadata = metadata[which(row.names(metadata) %in% colnames(counts)),] 289 | 290 | colnames(metadata)[colnames(metadata) == condition.col] = "conditions" 291 | 292 | if (is.na(first.level) & is.na(ref.level)){ 293 | first.level = unique(metadata$conditions)[1] 294 | ref.level = unique(metadata$conditions)[2] 295 | 296 | } else if (is.na(first.level)){ 297 | first.level = unique(metadata$conditions[!(metadata$conditions == ref.level)])[1] 298 | } else { 299 | ref.level = unique(metadata$conditions[!(metadata$conditions == first.level)])[1] 300 | } 301 | print(metadata) 302 | print(c(first.level, ref.level)) 303 | metadata = metadata[which(metadata$conditions %in% c(first.level, ref.level)),] 304 | metadata = metadata[which(row.names(metadata) %in% intersect(row.names(metadata), colnames(counts))),] 305 | 306 | counts = counts[, match(row.names(metadata), colnames(counts))] 307 | 308 | dds = createDDS2(counts, metadata, first.level, ref.level) 309 | rld = rlog(dds) 310 | res_df = createRES(dds, first.level, ref.level, pvalue, gtf_file) 311 | print(head(res_df)) 312 | 313 | deaProcess = list(res_df = res_df, 314 | rld = rld, 315 | dds = dds, 316 | counts = counts, 317 | metadata = metadata, 318 | first.level = first.level, 319 | ref.level = ref.level) 320 | 321 | } 322 | 323 | save_rds <- function(deaResults, output.dir){ 324 | 325 | flog.info("Saving results to the rds object: deaResults.RDS") 326 | print(names(deaResults)) 327 | print(paste0(output.dir, "deaResults.rds")) 328 | saveRDS(deaResults, paste0(output.dir, "deaResults.rds")) 329 | 330 | } 331 | -------------------------------------------------------------------------------- /app/server/R_scripts/dte_function.R: -------------------------------------------------------------------------------- 1 | #!/usr/bin/env Rscript 2 | 3 | ###### Functions for DEA ###### 4 | 5 | options(warn=-1) 6 | 7 | safe_colorblind_palette <- c("#88CCEE", "#CC6677", "#DDCC77", "#117733", "#332288", "#AA4499", 8 | "#44AA99", "#999933", "#882255", "#661100", "#6699CC", "#888888") 9 | 10 | createDDS2_DTE <- function(counts, metadata, first.level, ref.level){ 11 | flog.info("########## Create DDS object ###########") 12 | 13 | dds <- DESeqDataSetFromMatrix(countData = counts, 14 | colData = metadata, 15 | design = ~ conditions) 16 | 17 | 18 | dds$conditions = factor(dds$conditions, levels = c(first.level, ref.level)) 19 | 20 | dds <- DESeq(dds, parallel = T) 21 | 22 | return(dds) 23 | } 24 | 25 | createRES_DTE <- function(dds, first.level, ref.level, pvalue, gtf_file){ 26 | flog.info("########## Create results object ###########") 27 | 28 | res <- results(dds, contrast = c("conditions", first.level, ref.level), alpha = pvalue, parallel = T) 29 | res_df <- as.data.frame(res) 30 | res_df <- na.omit(res_df) 31 | res_df <- res_df[order(res_df$padj, res_df$pvalue, decreasing = F),] 32 | res_df <- cbind(names = rownames(res_df), res_df) 33 | res_df$Significance <- ifelse(res_df$padj < pvalue, TRUE, FALSE) 34 | res_df$gencode = res_df$names 35 | tmp = gtf_file[which(gtf_file$transcript_id %in% res_df$names),] 36 | res_df$transcripts = tmp[match(res_df$names, tmp$transcript_id), "transcript_name"] 37 | #res_df$transcript_ids = tmp[match(res_df$names, tmp$transcript_id), "transcript_id"] 38 | res_df$transcripts = make.unique(res_df$transcripts) 39 | res_df = na.omit(res_df) 40 | row.names(res_df) = res_df$transcripts 41 | res_df$names = res_df$transcripts 42 | res_df$transcripts <- NULL 43 | return(res_df) 44 | } 45 | 46 | createPCA_DTE<- function(rld, first.level, ref.level, condi_col){ 47 | flog.info("########## Create PCA plot ###########") 48 | 49 | rldObject = rld 50 | pcaData <- plotPCA(rldObject, intgroup="conditions", returnData=TRUE) 51 | 52 | percentVar <- round(100 * attr(pcaData, "percentVar")) 53 | 54 | pca_plot = ggplot(pcaData, aes(PC1, PC2, color=conditions, label=name)) + 55 | scale_color_manual(values = condi_col) + 56 | geom_point(size=5) + 57 | ggtitle("PCA plot") + 58 | theme_bw() + 59 | xlab(paste0("PC1: ",percentVar[1],"% variance")) + 60 | ylab(paste0("PC2: ",percentVar[2],"% variance")) + 61 | guides(color = guide_legend(order = 1), fill = guide_legend(order = 0)) + 62 | geom_text_repel(aes(label = pcaData$name), size = 6, box.padding = 0.5, max.overlaps = Inf) + 63 | theme( 64 | panel.background = element_rect(fill = "transparent"), # bg of the panel 65 | plot.background = element_rect(fill = "transparent", color = NA), # bg of the plot 66 | legend.background = element_rect(fill = "transparent"), # get rid of legend bg 67 | legend.title = element_text(size = 20, color = "white"), 68 | legend.key = element_rect(colour = "transparent", fill = "transparent"), 69 | legend.text = element_text(size = 20, color = "white"), 70 | axis.text = element_text(angle = 45, hjust = 1, size = 17, color = "white"), 71 | plot.title = element_text(hjust = 0.5, face = "bold", size = 23, color = "white"), 72 | axis.title = element_text(size = 23, color = "white")) 73 | return(pca_plot) 74 | } 75 | 76 | createVolcano_DTE <- function(res_df, condi_col){ 77 | flog.info("########## Create volcano plot ###########") 78 | res_df$Significance_reg = ifelse(res_df$Significance, 79 | ifelse(res_df$log2FoldChange > 0, "Up", "Down"), 80 | "Not sig.") 81 | res_df$Significance_reg = factor(res_df$Significance_reg, levels = c("Up", "Down","Not sig.")) 82 | 83 | res_df$Significance_reg = factor(res_df$Significance_reg, levels = c("Up", "Down","Not sig.")) 84 | 85 | # Subset significat transcripts to color in blue 86 | transcript_subset <- subset(res_df, res_df$Significance == TRUE) 87 | transcript_subset <- transcript_subset[order(transcript_subset$padj),] 88 | if (dim(transcript_subset)[1] > 10){ 89 | transcript_subset_short <- transcript_subset[1:10,] 90 | } else { 91 | transcript_subset_short <- transcript_subset 92 | } 93 | 94 | res_df$transcriptLabels = ifelse(res_df$names %in% transcript_subset_short$names, TRUE, FALSE) 95 | color_code = list("Up" = condi_col[1], 96 | "Down" = condi_col[2], 97 | "Not sig." = "gray") 98 | 99 | # Creates volcano plot with the 25 most significant transcripts labeled 100 | 101 | vol = ggplot(data=res_df, aes(x=log2FoldChange, y=-log10(padj), colour=Significance_reg)) + 102 | #scale_color_manual(values = colorCode) + 103 | geom_point(size=1.75) + 104 | xlab("log2 fold change") + ylab("-log10 p-adjusted")+ 105 | ggtitle(paste0("Differential Expression (", names(condi_col)[1], " vs. ", names(condi_col)[2], ")")) + # add conditions to title 106 | scale_color_manual(values = color_code) + 107 | theme_bw() + 108 | guides(colour = guide_legend(override.aes = list(size=5))) + 109 | #theme(legend.position = "None")+ 110 | geom_text_repel(max.overlaps = Inf, max.time = 1, aes(x=log2FoldChange, y=-log10(padj)), 111 | label = ifelse(res_df$transcriptLabels == TRUE, as.character(res_df$names),""), 112 | box.padding = 0.5, show.legend = F, size = 6) + 113 | theme( 114 | panel.background = element_rect(fill = "transparent"), # bg of the panel 115 | plot.background = element_rect(fill = "transparent", color = NA), # bg of the plot 116 | # panel.grid.major = element_blank(), # get rid of major grid 117 | # panel.grid.minor = element_blank(), # get rid of minor grid 118 | legend.background = element_rect(fill = "transparent"), # get rid of legend bg 119 | #legend.box.background = element_rect(fill = "transparent"), # get rid of legend panel bg 120 | #legend.title = element_text(size = 20, color = "white"), 121 | legend.title = element_blank(), # remove legend title 122 | legend.key = element_rect(colour = "transparent", fill = "transparent"), 123 | legend.text = element_text(size = 20, color = "white"), 124 | axis.text = element_text(angle = 45, hjust = 1, size = 17, color = "white"), 125 | plot.title = element_text(hjust = 0.5, face = "bold", size = 23, color = "white"), 126 | axis.title = element_text(size = 23, color = "white")) 127 | return(vol) 128 | } 129 | 130 | createHeatmap_DTE <- function(dds, rld, condi_col, main_color = "RdBu", gtf_file = NA, transcripts = NA){ 131 | flog.info("########## Create Heatmap of Expression ###########") 132 | 133 | print("Heatmap will be created...") 134 | colList = list("conditions" = condi_col) 135 | select <- transcripts[, "gencode"] 136 | df <- as.data.frame(colData(dds)["conditions"]) 137 | heat_input = assay(rld)[select,] 138 | row.names(heat_input) = transcripts[, "names"] 139 | 140 | ha = HeatmapAnnotation(Condition = df$conditions, 141 | col = list(Condition = condi_col), 142 | name = "Condition ", 143 | show_annotation_name = F, 144 | annotation_height = 2, 145 | annotation_width = 2, 146 | annotation_legend_param = list( 147 | title_gp = gpar(fontsize = 18, col = "white"), 148 | labels_gp = gpar(fontsize = 16, col = "white"), 149 | title_position = "lefttop-rot" 150 | )) # changed from annotation_label = gt_render(c("condition")) 151 | g = ComplexHeatmap::Heatmap( 152 | heat_input, 153 | name = "Norm. counts", 154 | col = hcl.colors(50, main_color), 155 | cluster_rows = T, 156 | cluster_columns = T, 157 | show_column_dend = F, 158 | top_annotation = ha, 159 | show_row_dend = T, 160 | show_column_names=T, 161 | column_title = NULL, 162 | column_names_rot = 45, 163 | row_dend_gp = gpar(col = "white"), 164 | row_names_side = "right", 165 | row_names_gp = gpar(fontsize = 15, col = "white"), 166 | column_names_gp = gpar(fontsize = 20, col = "white"), 167 | heatmap_legend_param = list( 168 | title_gp = gpar(fontsize = 18, col = "white"), 169 | labels_gp = gpar(fontsize = 16, col = "white"), 170 | legend_height = unit(6, "cm"), 171 | grid_width = unit(0.5, "cm"), 172 | title_position = "lefttop-rot" 173 | ) 174 | ) 175 | g_draw = draw(g, background = "transparent") 176 | 177 | ha2 = HeatmapAnnotation(Condition = df$conditions, 178 | col = list(Condition = condi_col), 179 | name = "Condition ", 180 | show_annotation_name = F, 181 | annotation_height = 2, 182 | annotation_width = 2, 183 | annotation_legend_param = list( 184 | title_gp = gpar(fontsize = 18, col = "black"), 185 | labels_gp = gpar(fontsize = 16, col = "black"), 186 | title_position = "lefttop-rot" 187 | )) # changed from annotation_label = gt_render(c("condition")) 188 | 189 | g2 = ComplexHeatmap::Heatmap( 190 | heat_input, 191 | name = "Norm. counts", 192 | col = hcl.colors(50, main_color), 193 | cluster_rows = T, 194 | cluster_columns = T, 195 | show_column_dend = F, 196 | top_annotation = ha2, 197 | show_row_dend = T, 198 | show_column_names=T, 199 | column_title = NULL, 200 | column_names_rot = 45, 201 | row_names_side = "right", 202 | row_names_gp = gpar(fontsize = 15, col = "black"), 203 | column_names_gp = gpar(fontsize = 15, col = "black"), 204 | heatmap_legend_param = list( 205 | title_gp = gpar(fontsize = 18, col = "black"), 206 | labels_gp = gpar(fontsize = 16, col = "black"), 207 | legend_height = unit(6, "cm"), 208 | grid_width = unit(0.5, "cm"), 209 | title_position = "lefttop-rot" 210 | ) 211 | ) 212 | 213 | g_draw2 = draw(g2, background = "transparent") 214 | flog.info("########## Create Heatmap Finished ###########") 215 | return(list("heat_dte" = g_draw, "heat_dte.down" = g_draw2)) 216 | } 217 | 218 | createSam2Sam_DTE <- function(rld){ 219 | flog.info("########## Create Sample to sample distance heatmap ###########") 220 | 221 | sampleDists <- dist(t(assay(rld))) 222 | sampleDistMatrix <- as.matrix(sampleDists) 223 | rownames(sampleDistMatrix) <- paste(rld$conditions, rld$Samples, sep="-") 224 | colnames(sampleDistMatrix) <- NULL 225 | colors <- colorRampPalette( rev(brewer.pal(9, "Blues")))(255) 226 | 227 | g = ComplexHeatmap::Heatmap(sampleDistMatrix, 228 | name = "Distance", 229 | cluster_rows = T, 230 | cluster_columns = T, 231 | show_column_dend = F, 232 | show_row_dend = T, 233 | show_column_names=F, col = colors, 234 | column_title = NULL, 235 | row_names_side = "right", 236 | heatmap_legend_param = list( 237 | title_gp = gpar(fontsize = 22, col = "white"), 238 | labels_gp = gpar(fontsize = 20, col = "white"), 239 | legend_direction = "horizontal", 240 | legend_width = unit(8, "cm"), 241 | grid_height = unit(1, "cm"), 242 | title_position = "lefttop"), 243 | row_names_gp = gpar(fontsize = 22, col = "white"), 244 | row_dend_gp = gpar(col = "white")) 245 | g_draw = draw(g, background = "transparent", 246 | heatmap_legend_side = "bottom", 247 | padding = unit(c(2, 2, 2, 30), "mm")) 248 | 249 | g2 = ComplexHeatmap::Heatmap(sampleDistMatrix, 250 | name = "Distance", 251 | cluster_rows = T, 252 | cluster_columns = T, 253 | show_column_dend = F, 254 | show_row_dend = T, 255 | show_column_names=F, col = colors, 256 | column_title = NULL, 257 | row_names_side = "right", 258 | heatmap_legend_param = list( 259 | title_gp = gpar(fontsize = 12, col = "black"), 260 | labels_gp = gpar(fontsize = 10, col = "black"), 261 | legend_direction = "horizontal", 262 | legend_width = unit(4, "cm"), 263 | grid_height = unit(0.5, "cm"), 264 | title_position = "lefttop"), 265 | row_names_gp = gpar(fontsize = 12, col = "black"), 266 | row_dend_gp = gpar(col = "black")) 267 | g_draw2 = draw(g2, background = "transparent", 268 | heatmap_legend_side = "bottom", 269 | padding = unit(c(2, 2, 2, 2), "mm")) 270 | 271 | return(list(g_draw, g_draw2)) 272 | } 273 | 274 | run_preprocessing_dte <- function(meta.file, counts.file, condition.col, first.level, ref.level, pvalue, gtf_file){ 275 | flog.info("########## Differential Expression Analysis ###########") 276 | counts = counts.file 277 | 278 | metadata = meta.file 279 | row.names(metadata) <- metadata$Samples 280 | 281 | missingSampleInfos = colnames(counts)[-which(colnames(counts) %in% metadata$Samples)] 282 | if (length(missingSampleInfos) > 0){ 283 | print(paste0("No metadata found for the following samples: ", paste(missingSampleInfos, collapse = ","))) 284 | print(paste0(">>>>> Counts will be excluded!")) 285 | 286 | } else { 287 | print("All required information is included!") 288 | } 289 | metadata = metadata[which(row.names(metadata) %in% colnames(counts)),] 290 | 291 | colnames(metadata)[colnames(metadata) == condition.col] = "conditions" 292 | 293 | if (is.na(first.level) & is.na(ref.level)){ 294 | first.level = unique(metadata$conditions)[1] 295 | ref.level = unique(metadata$conditions)[2] 296 | 297 | } else if (is.na(first.level)){ 298 | first.level = unique(metadata$conditions[!(metadata$conditions == ref.level)])[1] 299 | } else { 300 | ref.level = unique(metadata$conditions[!(metadata$conditions == first.level)])[1] 301 | } 302 | print(metadata) 303 | print(c(first.level, ref.level)) 304 | metadata = metadata[which(metadata$conditions %in% c(first.level, ref.level)),] 305 | metadata = metadata[which(row.names(metadata) %in% intersect(row.names(metadata), colnames(counts))),] 306 | 307 | counts = counts[, match(row.names(metadata), colnames(counts))] 308 | 309 | dds = createDDS2_DTE(counts, metadata, first.level, ref.level) 310 | rld = rlog(dds) 311 | res_df = createRES_DTE(dds, first.level, ref.level, pvalue, gtf_file) 312 | print(head(res_df)) 313 | 314 | deaProcess = list(res_df = res_df, 315 | rld = rld, 316 | dds = dds, 317 | counts = counts, 318 | metadata = metadata, 319 | first.level = first.level, 320 | ref.level = ref.level) 321 | 322 | } 323 | 324 | save_rds_dte <- function(deaResults, output.dir){ 325 | 326 | flog.info("Saving results to the rds object: deaResults.RDS") 327 | print(names(deaResults)) 328 | print(paste0(output.dir, "deaResults.rds")) 329 | saveRDS(deaResults, paste0(output.dir, "deaResults.rds")) 330 | 331 | } 332 | -------------------------------------------------------------------------------- /app/server/R_scripts/gene_wise_analysis_function.R: -------------------------------------------------------------------------------- 1 | #!/usr/bin/env Rscript 2 | 3 | ###### Gene-wise analyses ##### 4 | options(warn=-1) 5 | 6 | safe_colorblind_palette <- c("#88CCEE", "#CC6677", "#DDCC77", "#117733", "#332288", "#AA4499", 7 | "#44AA99", "#999933", "#882255", "#661100", "#6699CC", "#888888") 8 | 9 | getGeneSymbolFromGTF <- function(gtf.file, output.dir){ 10 | gtf.gr = rtracklayer::import(gtf.file) # creates a GRanges object 11 | gtf.df = as.data.frame(gtf.gr) 12 | genes = unique(gtf.df[ ,c("gene_id","gene_name", "transcript_id", "transcript_name")]) 13 | return(genes) 14 | } 15 | createDDS <- function(counts.file, meta.file, condition.col, first.level, ref.level){ 16 | print("########## Normalization of counts plots ###########") 17 | counts = counts.file 18 | 19 | metadata = meta.file 20 | row.names(metadata) <- metadata$Samples 21 | 22 | missingSampleInfos = colnames(counts)[-which(colnames(counts) %in% metadata$Samples)] 23 | if (length(missingSampleInfos) > 0){ 24 | print(paste0("No metadata found for the following samples: ", paste(missingSampleInfos, collapse = ","))) 25 | print(paste0(">>>>> Counts will be excluded!")) 26 | 27 | } else { 28 | print("All required informations are included!") 29 | } 30 | metadata = metadata[which(row.names(metadata) %in% colnames(counts)),] 31 | 32 | colnames(metadata)[colnames(metadata) == condition.col] = "conditions" 33 | 34 | if (is.na(first.level) & is.na(ref.level)){ 35 | first.level = unique(metadata$conditions)[1] 36 | ref.level = unique(metadata$conditions)[2] 37 | 38 | } else if (is.na(first.level)){ 39 | first.level = unique(metadata$conditions[!(metadata$conditions == ref.level)])[1] 40 | } else { 41 | ref.level = unique(metadata$conditions[!(metadata$conditions == first.level)])[1] 42 | } 43 | 44 | metadata = metadata[which(metadata$conditions %in% c(first.level, ref.level)),] 45 | metadata = metadata[which(row.names(metadata) %in% intersect(row.names(metadata), colnames(counts))),] 46 | counts = counts[, match(row.names(metadata), colnames(counts))] 47 | 48 | x = all(row.names(metadata) == colnames(counts)) 49 | if (x){ 50 | print("Starting normalization ...") 51 | } else { 52 | print("Sample names do not match between counts and metadata!!") 53 | } 54 | 55 | dds <- DESeqDataSetFromMatrix(countData = counts, 56 | colData = metadata, 57 | design = ~ conditions) 58 | 59 | 60 | dds$conditions = factor(dds$conditions, levels = c(first.level, ref.level)) 61 | 62 | dds <- DESeq(dds, parallel = T) 63 | norm_counts <- counts(dds, normalize = TRUE) 64 | 65 | return(list(counts, norm_counts, metadata)) 66 | } 67 | createCountsPlot <- function(normCounts, genes, metaTab, genes.tab, gtitle, outName, outDir = ".", condi_cols, download = F, ylabel = "Counts"){ 68 | print("########## Create counts plots ###########") 69 | 70 | normmutGenes <- normCounts 71 | print(genes) 72 | mutGenes <- as.data.frame(normmutGenes[as.character(genes[,1]),]) 73 | mutGenes$genes <- rownames(mutGenes) 74 | meltedmutGenes <- melt(mutGenes) 75 | colnames(meltedmutGenes) <- c("gene", "samplename", "normalized_counts") 76 | meltedmutGenes <- merge(meltedmutGenes, metaTab, by.x = "samplename", by.y = "row.names") 77 | 78 | meltedmutGenes_all = merge.data.frame(meltedmutGenes, genes, by.x = "gene", by.y = "gene_id", all.x = T) 79 | if (!download){ 80 | x1 = ggplot(meltedmutGenes_all) + 81 | geom_boxplot(aes(x = gene_name, y = normalized_counts, fill = conditions), color = "white") + 82 | #geom_point(aes(x = gene_name, y = normalized_counts, color = conditions), position = position_jitter(w=0.1, h=0), size=4)+ 83 | scale_y_log10(oob = scales::squish_infinite) + 84 | scale_fill_manual(values = condi_cols) + 85 | 86 | ylab(ylabel) + 87 | xlab("Genes") + 88 | ggtitle(gtitle) + 89 | theme_bw() + 90 | theme( 91 | panel.background = element_rect(fill = "transparent", color = "white"), # bg of the panel 92 | plot.background = element_rect(fill = "transparent", color = NA), # bg of the plot 93 | panel.grid.major = element_line(size = 0.2, linetype = 'solid', 94 | colour = "white"), # get rid of major grid 95 | panel.grid.minor = element_line(size = 0.2, linetype = 'solid', colour = "white"), # get rid of minor grid 96 | legend.background = element_rect(fill = "transparent"), # get rid of legend bg 97 | legend.box.background = element_rect(fill = "transparent"), # get rid of legend panel bg 98 | legend.title = element_text(size = 20, color = "white"), 99 | legend.key = element_rect(colour = "transparent", fill = "transparent"), 100 | legend.text = element_text(size = 20, color = "white"), 101 | axis.text = element_text(angle = 45, hjust = 1, size = 17, color = "white"), 102 | plot.title = element_text(hjust = 0.5, face = "bold", size = 23, color = "white"), 103 | axis.title = element_text(size = 23, color = "white")) 104 | 105 | x2 <- ggplot(meltedmutGenes_all) + 106 | #geom_boxplot(aes(x = gene_name, y = normalized_counts, fill = conditions)) + 107 | geom_point(aes(x = gene_name, y = normalized_counts, color = conditions), position = position_jitter(w=0.1, h=0), size=4)+ 108 | scale_y_log10(oob = scales::squish_infinite) + 109 | scale_color_manual(values = condi_cols) + 110 | #scale_fill_manual(values = safe_colorblind_palette[c(3,11)]) + 111 | 112 | ylab(ylabel) + 113 | xlab("Genes") + 114 | ggtitle(gtitle) + 115 | theme_bw() + 116 | theme( 117 | panel.background = element_rect(fill = "transparent", color = "white"), # bg of the panel 118 | plot.background = element_rect(fill = "transparent", color = NA), # bg of the plot 119 | panel.grid.major = element_line(size = 0.2, linetype = 'solid', 120 | colour = "white"), # get rid of major grid 121 | panel.grid.minor = element_line(size = 0.2, linetype = 'solid', colour = "white"), # get rid of minor grid 122 | legend.background = element_rect(fill = "transparent"), # get rid of legend bg 123 | legend.box.background = element_rect(fill = "transparent"), # get rid of legend panel bg 124 | legend.title = element_text(size = 20, color = "white"), 125 | legend.key = element_rect(colour = "transparent", fill = "transparent"), 126 | legend.text = element_text(size = 20, color = "white"), 127 | axis.text = element_text(angle = 45, hjust = 1, size = 17, color = "white"), 128 | plot.title = element_text(hjust = 0.5, face = "bold", size = 23, color = "white"), 129 | axis.title = element_text(size = 23, color = "white")) 130 | 131 | x3 <- ggplot(meltedmutGenes_all) + 132 | geom_violin(aes(x = gene_name, y = normalized_counts, fill = conditions), color = "white") + 133 | #geom_point(aes(x = gene_name, y = normalized_counts, color = conditions), position = position_jitter(w=0.1, h=0), size=4)+ 134 | scale_y_log10(oob = scales::squish_infinite) + 135 | scale_fill_manual(values = condi_cols) + 136 | 137 | ylab(ylabel) + 138 | xlab("Genes") + 139 | ggtitle(gtitle) + 140 | theme_bw() + 141 | theme( 142 | panel.background = element_rect(fill = "transparent", color = "white"), # bg of the panel 143 | plot.background = element_rect(fill = "transparent", color = NA), # bg of the plot 144 | panel.grid.major = element_line(size = 0.2, linetype = 'solid', 145 | colour = "white"), # get rid of major grid 146 | panel.grid.minor = element_line(size = 0.2, linetype = 'solid', colour = "white"), # get rid of minor grid 147 | legend.background = element_rect(fill = "transparent"), # get rid of legend bg 148 | legend.box.background = element_rect(fill = "transparent"), # get rid of legend panel bg 149 | legend.title = element_text(size = 20, color = "white"), 150 | legend.key = element_rect(colour = "transparent", fill = "transparent"), 151 | legend.text = element_text(size = 20, color = "white"), 152 | axis.text = element_text(angle = 45, hjust = 1, size = 17, color = "white"), 153 | plot.title = element_text(hjust = 0.5, face = "bold", size = 23, color = "white"), 154 | axis.title = element_text(size = 23, color = "white")) 155 | 156 | } else { 157 | x1 = ggplot(meltedmutGenes_all) + 158 | geom_boxplot(aes(x = gene_name, y = normalized_counts, fill = conditions), color = "black") + 159 | scale_y_log10(oob = scales::squish_infinite) + 160 | scale_fill_manual(values = condi_cols) + 161 | 162 | ylab(ylabel) + 163 | xlab("Genes") + 164 | ggtitle(gtitle) + 165 | theme_bw() + 166 | theme( 167 | legend.title = element_text(size = 20, color = "black"), 168 | legend.text = element_text(size = 20, color = "black"), 169 | axis.text = element_text(angle = 45, hjust = 1, size = 17, color = "black"), 170 | plot.title = element_text(hjust = 0.5, face = "bold", size = 23, color = "black"), 171 | axis.title = element_text(size = 23, color = "black")) 172 | 173 | x2 <- ggplot(meltedmutGenes_all) + 174 | geom_point(aes(x = gene_name, y = normalized_counts, color = conditions), position = position_jitter(w=0.1, h=0), size=4)+ 175 | scale_y_log10(oob = scales::squish_infinite) + 176 | scale_color_manual(values = condi_cols) + 177 | ylab(ylabel) + 178 | xlab("Genes") + 179 | ggtitle(gtitle) + 180 | theme_bw() + 181 | theme( 182 | legend.title = element_text(size = 20, color = "black"), 183 | legend.text = element_text(size = 20, color = "black"), 184 | axis.text = element_text(angle = 45, hjust = 1, size = 17, color = "black"), 185 | plot.title = element_text(hjust = 0.5, face = "bold", size = 23, color = "black"), 186 | axis.title = element_text(size = 23, color = "black")) 187 | 188 | x3 <- ggplot(meltedmutGenes_all) + 189 | geom_violin(aes(x = gene_name, y = normalized_counts, fill = conditions), color = "black") + 190 | scale_y_log10(oob = scales::squish_infinite) + 191 | scale_fill_manual(values = condi_cols) + 192 | 193 | ylab(ylabel) + 194 | xlab("Genes") + 195 | ggtitle(gtitle) + 196 | theme_bw() + 197 | theme( 198 | legend.title = element_text(size = 20, color = "black"), 199 | legend.text = element_text(size = 20, color = "black"), 200 | axis.text = element_text(angle = 45, hjust = 1, size = 17, color = "black"), 201 | plot.title = element_text(hjust = 0.5, face = "bold", size = 23, color = "black"), 202 | axis.title = element_text(size = 23, color = "black")) 203 | 204 | } 205 | return(list("Points" = x2, "Boxplot" = x1, "Violinplot" = x3)) 206 | } 207 | TEA <- function(counts, norm_counts, genes.list, metadata, pvalue, output.dir, condi_cols){ 208 | print("###################################") 209 | print("## Starting TEA ##") 210 | print("###################################") 211 | norm_counts_plot = createCountsPlot( 212 | normCounts = norm_counts, 213 | genes = genes.list, 214 | metaTab = metadata, 215 | gtitle = "Normalized counts", 216 | outName = "normalized", 217 | outDir = output.dir, condi_cols = condi_cols, 218 | ylabel = "Normalized read counts") 219 | 220 | norm_counts_plot.download = createCountsPlot( 221 | normCounts = norm_counts, 222 | genes = genes.list, 223 | metaTab = metadata, 224 | gtitle = "Normalized counts", 225 | outName = "normalized", 226 | outDir = output.dir, condi_cols = condi_cols, download = T, 227 | ylabel = "Normalized read counts") 228 | 229 | counts_plot = createCountsPlot( 230 | normCounts = counts, 231 | genes = genes.list, 232 | metaTab = metadata, 233 | gtitle = "Raw counts", 234 | outName = "raw", 235 | outDir = output.dir, condi_cols = condi_cols, 236 | ylabel = "Raw read counts") 237 | 238 | counts_plot.download = createCountsPlot( 239 | normCounts = counts, 240 | genes = genes.list, 241 | metaTab = metadata, 242 | gtitle = "Raw counts", 243 | outName = "raw", 244 | outDir = output.dir, condi_cols = condi_cols, download = T, 245 | ylabel = "Raw read counts") 246 | 247 | 248 | p1 = ggarrange(plotlist = list(counts_plot[["Points"]], norm_counts_plot[["Points"]]), nrow = 1, ncol = 2, common.legend = TRUE) 249 | p2 = ggarrange(plotlist = list(counts_plot[["Boxplot"]], norm_counts_plot[["Boxplot"]]), nrow = 1, ncol = 2, common.legend = TRUE) 250 | p3 = ggarrange(plotlist = list(counts_plot[["Violinplot"]], norm_counts_plot[["Violinplot"]]), nrow = 1, ncol = 2, common.legend = TRUE) 251 | 252 | p1.download = ggarrange(plotlist = list(counts_plot.download[["Points"]], norm_counts_plot.download[["Points"]]), nrow = 1, ncol = 2, common.legend = TRUE) 253 | p2.download = ggarrange(plotlist = list(counts_plot.download[["Boxplot"]], norm_counts_plot.download[["Boxplot"]]), nrow = 1, ncol = 2, common.legend = TRUE) 254 | p3.download = ggarrange(plotlist = list(counts_plot.download[["Violinplot"]], norm_counts_plot.download[["Violinplot"]]), nrow = 1, ncol = 2, common.legend = TRUE) 255 | 256 | return(list("Dotplot" = p1, "Boxplot" = p2, "Violinplot" = p3, "Dotplot.down" = p1.download, "Boxplot.down" = p2.download, "Violinplot.down" = p3.download)) 257 | } 258 | geneBodyCov.plot <- function(gB_results, geneOfInterest_in, metadata, condi_cols){ 259 | 260 | theme_set(theme_light()) 261 | 262 | geneOfInterest_ID = geneOfInterest_in$gene_id 263 | geneOfInterest_name = geneOfInterest_in$gene_name 264 | geneOfInterest = paste0(geneOfInterest_name, "\n(", geneOfInterest_ID, ")") 265 | geneBodyCov = gB_results %>% 266 | gather(key = "Position", "Value", -Percentile) %>% 267 | group_by(Percentile) %>% 268 | mutate(Position = as.numeric(gsub("X", "", Position)), PercVal = ifelse(Value == 0, 0, Value/sum(Value))) 269 | n_samples = length(unique(geneBodyCov$Percentile)) 270 | g = ggplot(geneBodyCov, aes(x = Position, y = PercVal, color = Percentile)) + 271 | geom_smooth(se = FALSE) + 272 | ylim(0,max(geneBodyCov$PercVal)) + 273 | ggtitle(geneOfInterest) + 274 | scale_color_manual("Samples", values = safe_colorblind_palette[c(1:n_samples)]) + 275 | scale_x_continuous(breaks = c(0, 50, 100), labels = c("5'", "mid gene", "3'")) + # change position labels from 0 to 100 to 5' to 3' 276 | ylab("Relative Coverage (%)") + 277 | theme( 278 | panel.grid.minor.x = element_blank(), # remove minor grid lines from plot 279 | panel.background = element_rect(fill = "transparent"), # bg of the panel 280 | plot.background = element_rect(fill = "transparent", color = NA), # bg of the plot 281 | # panel.grid.major = element_blank(), # get rid of major grid 282 | # panel.grid.minor = element_blank(), # get rid of minor grid 283 | legend.background = element_rect(fill = "transparent"), # get rid of legend bg 284 | #legend.box.background = element_rect(fill = "transparent"), # get rid of legend panel bg 285 | legend.title = element_text(size = 20, color = "white"), 286 | legend.key = element_rect(colour = "transparent", fill = "transparent"), 287 | legend.text = element_text(size = 20, color = "white"), 288 | axis.text.y = element_text(angle = 45, hjust = 1, size = 17, color = "white"), 289 | axis.text.x = element_text(size = 17, color = "white"), # removed: angle = 45, hjust = 1, due to the new discrete axis labels 290 | plot.title = element_text(hjust = 0.5, face = "bold", size = 23, color = "white"), 291 | axis.title = element_text(size = 23, color = "white"), 292 | ) 293 | 294 | geneBodyCov = geneBodyCov %>% 295 | left_join(metadata, by = c("Percentile" = "Samples")) 296 | 297 | g_cond = ggplot(geneBodyCov, aes(x = Position, y = PercVal, color = Condition)) + 298 | geom_smooth(se = FALSE) + 299 | ylim(0,max(geneBodyCov$PercVal)) + 300 | ggtitle(geneOfInterest) + 301 | scale_color_manual("Condition", values = condi_cols) + 302 | scale_x_continuous(breaks = c(0, 50, 100), labels = c("5'", "mid gene", "3'")) + # change position labels from 0 to 100 to 5' to 3' 303 | ylab("Relative Coverage (%)") + 304 | theme( 305 | panel.grid.minor.x = element_blank(), # remove minor grid lines from plot 306 | panel.background = element_rect(fill = "transparent", color = "white"), # bg of the panel 307 | plot.background = element_rect(fill = "transparent", color = NA), # bg of the plot 308 | panel.grid.major = element_line(size = 0.2, linetype = 'solid', 309 | colour = "white"), # get rid of major grid 310 | panel.grid.minor = element_line(size = 0.2, linetype = 'solid', colour = "white"), # get rid of minor grid 311 | legend.background = element_rect(fill = "transparent"), # get rid of legend bg 312 | legend.box.background = element_rect(fill = "transparent"), # get rid of legend panel bg 313 | legend.title = element_text(size = 20, color = "white"), 314 | legend.key = element_rect(colour = "transparent", fill = "transparent"), 315 | legend.text = element_text(size = 20, color = "white"), 316 | axis.text.y = element_text(angle = 45, hjust = 1, size = 17, color = "white"), 317 | axis.text.x = element_text(size = 17, color = "white"), # removed: angle = 45, hjust = 1, due to the new discrete axis labels 318 | plot.title = element_text(hjust = 0.5, face = "bold", size = 23, color = "white"), 319 | axis.title = element_text(size = 23, color = "white")) 320 | 321 | 322 | 323 | return(list("samples" = g, "condition" = g_cond)) 324 | 325 | } -------------------------------------------------------------------------------- /app/server/R_scripts/dtu_function.R: -------------------------------------------------------------------------------- 1 | 2 | 3 | DRIM_seq_prep <- function(table = count.table, run.dir = csv.dir, samps = metadata, condition_col = "Condition", first.level = "a2d3-OE", ref.level = "Ctrl", gtf_file = gtf,cores = 4){ 4 | 5 | ############################################################################################### 6 | # # 7 | # # 8 | # # 9 | # Read count file, gtf and metadata # 10 | # # 11 | # # 12 | # # 13 | # # 14 | ############################################################################################### 15 | 16 | txdb.filename <- str_split(gtf_file, "/") 17 | txdb.filename <- as.vector(txdb.filename[[1]])[length(txdb.filename[[1]])] 18 | txdb.filename <- paste0(run.dir,txdb.filename) 19 | samps["sample_id"] = samps$Samples 20 | samps["condition"] = samps[condition_col] 21 | #print(samps$condition) 22 | 23 | samps <- samps[which(samps$condition %in% c(first.level,ref.level)),] 24 | print(samps) 25 | 26 | head(table) 27 | table = table[,which(colnames(table) %in% samps$Samples)] 28 | #head(table) 29 | #txdb <- makeTxDbFromGFF(gtf) 30 | #print(txdb) 31 | #saveDb(txdb, txdb.filename) 32 | #txdb <- loadDb(txdb.filename) 33 | 34 | out <- tryCatch( 35 | {txdb <- loadDb(txdb.filename) 36 | x = T 37 | }, 38 | error = function(e){ 39 | x = F 40 | }, 41 | finally = { 42 | }) 43 | 44 | 45 | #print(out) 46 | 47 | if (out){ 48 | txdb <- loadDb(txdb.filename) 49 | } 50 | else { 51 | txdb <- makeTxDbFromGFF(gtf_file) 52 | saveDb(txdb, txdb.filename) 53 | } 54 | 55 | txdf <- AnnotationDbi::select(txdb, keys(txdb, "GENEID"),"TXNAME", "GENEID") 56 | #print(txdf) 57 | tab <- table(txdf$GENEID) 58 | txdf$ntx <- tab[match(txdf$GENEID, names(tab))] 59 | #print(txdf$ntx) 60 | 61 | 62 | all(rownames(table) %in% txdf$TXNAME) 63 | txdf <- txdf[match(rownames(table),txdf$TXNAME),] 64 | all(rownames(table) == txdf$TXNAME) 65 | 66 | counts <- data.frame(gene_id=txdf$GENEID, 67 | feature_id=txdf$TXNAME, 68 | table) 69 | #counts <- counts[which(counts$gene_id == goi_id),] 70 | #print("Counts") 71 | #head(counts) 72 | 73 | 74 | param = BiocParallel::SerialParam() 75 | d <- dmDSdata(counts=counts, samples=samps) 76 | n <- length(samps$sample_id) 77 | n.small <- min(table(samps$condition)) 78 | out2 <- tryCatch( 79 | {d <- dmFilter(d, 80 | min_samps_feature_expr=as.integer(n.small), min_feature_expr=5, 81 | # min_samps_feature_prop=int(n.small/1.5), min_feature_prop=0.1, 82 | min_samps_gene_expr=(n.small), min_gene_expr=20) 83 | x = F 84 | }, 85 | error = function(e){ 86 | x = T 87 | }, 88 | finally = { 89 | }) 90 | if (out2){ 91 | flog.info("########## DrimSeq Filtering failed ###########") 92 | flog.info("Either only one splicing variant for every gene in dataset or it must be sequenced deeper") 93 | return(NULL) 94 | } 95 | 96 | table(table(counts(d)$gene_id)) 97 | input_design <- DRIMSeq::samples(d) 98 | input_design$condition <- factor(input_design$condition,levels = c(first.level,ref.level)) 99 | print(input_design) 100 | design_full <- model.matrix(~condition, data=input_design) 101 | print(design_full) 102 | print(colnames(design_full)[2]) 103 | #print(design_full) 104 | set.seed(1) 105 | system.time({ 106 | d <- dmPrecision(d, design=design_full, BPPARAM = param) 107 | d <- dmFit(d, design=design_full, BPPARAM = param) 108 | d <- dmTest(d, coef=colnames(design_full)[2], BPPARAM = param) 109 | }) 110 | 111 | 112 | res_txp <- DRIMSeq::results(d, level="feature") 113 | 114 | print("DRIM SEQ MADE IT") 115 | #print(counts(d)) 116 | list = list() 117 | list$counts = counts 118 | list$drim = d 119 | list$samps = samps 120 | list$txdf = txdf 121 | list$res_df = res_txp 122 | return(list) 123 | } 124 | 125 | 126 | ############################################################################################### 127 | # # 128 | # # 129 | # # 130 | # DTU # 131 | # # 132 | # # 133 | # # 134 | # # 135 | ############################################################################################### 136 | 137 | 138 | DTU_special <- function(d_list = tryout, condition_col = "Condition", first.level = "a2d3-OE", ref.level = "Ctrl", goi_id = "ENSG00000111640.15",gtf_tab = gtf_table, cores = 4, pvalue_input = 0.05){ 139 | 140 | d = d_list$drim 141 | counts = d_list$counts 142 | samps = d_list$samps 143 | 144 | res <- DRIMSeq::results(d) 145 | head(res) 146 | 147 | res.txp <- DRIMSeq::results(d, level="feature") 148 | head(res.txp) 149 | 150 | no.na <- function(x) ifelse(is.na(x), 1, x) 151 | res$pvalue <- no.na(res$pvalue) 152 | res.txp$pvalue <- no.na(res.txp$pvalue) 153 | 154 | pScreen <- res$pvalue 155 | strp <- function(x) substr(x,1,15) 156 | names(pScreen) <- strp(res$gene_id) 157 | pConfirmation <- matrix(res.txp$pvalue, ncol=1) 158 | rownames(pConfirmation) <- strp(res.txp$feature_id) 159 | 160 | tx2gene <- res.txp[,c("feature_id", "gene_id")] 161 | for (i in 1:2) tx2gene[,i] <- strp(tx2gene[,i]) 162 | goi_df = res.txp[which(res.txp$gene_id == goi_id),] 163 | goi_df_merged = merge(goi_df,counts(d), by = "feature_id") 164 | idx <- which(res$gene_id == goi_id) 165 | #plotProportions(d, res$gene_id[idx], "condition") 166 | selected_d = counts(d)[which(counts(d)$gene_id == res$gene_id[idx]),] 167 | samples = samps$sample_id 168 | feature_ids = unique(selected_d$feature_id) 169 | plot_dataframe_d = data.frame() 170 | sum_df = data.frame(sample = as.character(), sum = as.numeric()) 171 | for (i in samples){ 172 | sum = sum(selected_d[i]) 173 | temp_sum = cbind(i,sum) 174 | colnames(temp_sum) = c("sample", "sum") 175 | sum_df = rbind(sum_df,temp_sum) 176 | } 177 | print(sum_df) 178 | for (i in samples){ 179 | for (j in feature_ids){ 180 | counts_df = as.numeric(selected_d[which(selected_d$feature_id == j),i]) 181 | feature_id = j 182 | sample_name = i 183 | sum_column = as.numeric(sum_df[which(sum_df$sample == i),"sum"]) 184 | percentage = as.numeric(counts_df / sum_column) 185 | Condition = as.character(samps[which(samps$Samples == i), "Condition"]) 186 | temp_df = cbind(feature_id,counts_df, sample_name, Condition, sum_column, percentage) 187 | plot_dataframe_d = rbind(plot_dataframe_d,temp_df) 188 | } 189 | 190 | 191 | } 192 | plot_dataframe_d$counts_df = as.numeric(plot_dataframe_d$counts_df) 193 | plot_dataframe_d$percentage = as.numeric(plot_dataframe_d$percentage) 194 | plot_dataframe_d$Significance = res.txp[which(res.txp$gene_id == goi_id),]$adj_pvalue < pvalue_input 195 | plot_dataframe_d$Significance[is.na(plot_dataframe_d$Significance)] <- FALSE 196 | goi_name = unique(gtf_tab$gene_name[gtf_tab$gene_id == goi_id]) 197 | print(goi_name) 198 | #plot_dataframe_d 199 | 200 | if(dim(plot_dataframe_d)[1] > 0){ 201 | 202 | # get coordinates to draw significance bars 203 | sig_bar_coord = data.frame(id = unique(plot_dataframe_d$feature_id)) # create a new dataframe containing the unique IDs from plot_dataframe_d (in the same order as they occur in the ordiginal data) 204 | sig_bar_coord$y = apply(sig_bar_coord, MARGIN = 1, FUN = function(row) max(plot_dataframe_d[plot_dataframe_d$feature_id == row, "percentage"]) + 0.03) # for each ID get the maximum percentage value and add 0.01 => y value for the significance bars 205 | sig_bar_coord$x_center = as.numeric(rownames(sig_bar_coord)) # extract the rownames and transform these to numeric types => x positions in the middle between each boxplot pair (since the initial order (that the ggboxplot function also uses) is kept, the index can be used as x position) 206 | sig_bar_coord$x_1 = sig_bar_coord$x_center - 0.25 # starting position of the line 207 | sig_bar_coord$x_2 = sig_bar_coord$x_center + 0.25 # end position of the line 208 | sig_bar_coord$significant = apply(sig_bar_coord, MARGIN = 1, FUN = function(row) plot_dataframe_d[plot_dataframe_d$feature_id == row["id"], "Significance"][1]) # for each ID extract the corresponding significane value (TRUE/FALSE) 209 | sig_bar_coord = sig_bar_coord[sig_bar_coord$significant == TRUE,] # keep only significant IDs; the x and y values are used to draw the line below 210 | 211 | bp <- ggboxplot(plot_dataframe_d, "feature_id", "percentage", 212 | color = "Condition", add = "jitter", add.params = list(size = 3, alpha = 1)) + # removed: 213 | color_palette(palette = "jco")+ 214 | #fill_palette(palette = c("steelblue4","indianred3"))+ 215 | xlab("Feature ID") + 216 | ylab("Transcript expression (%)") + 217 | ggtitle(goi_name) + 218 | scale_y_continuous(labels = scales::percent_format(accuracy = 1)) + 219 | geom_segment(data = sig_bar_coord, aes(x = x_1, y = y, xend = x_2, yend = y), color = "indianred3") + # draw significance bars 220 | geom_text(data = sig_bar_coord, aes(x = x_center, y = y+0.02), label = "sign.", color = "indianred3") + # add "sign." label anove each bar 221 | theme( 222 | panel.background = element_rect(fill = "transparent"), # bg of the panel 223 | plot.background = element_rect(fill = "transparent", color = NA), # bg of the plot 224 | panel.grid.major = element_blank(), # get rid of major grid 225 | panel.grid.minor = element_blank(), # get rid of minor grid 226 | legend.background = element_rect(fill = "transparent"), # get rid of legend bg 227 | #legend.box.background = element_rect(fill = "transparent"), # get rid of legend panel bg 228 | legend.title = element_text(size = 20, color = "white"), 229 | legend.key = element_rect(colour = "transparent", fill = "transparent"), 230 | legend.text = element_text("Condition", size = 8, color = "white"), 231 | axis.line = element_line(color = "white"), 232 | axis.text = element_text(angle = 45, hjust = 1, size = 10, color = "white"), 233 | plot.title = element_text(hjust = 0.5, face = "bold", size = 14, color = "white"), 234 | axis.title = element_text(size = 14, color = "white") 235 | ) 236 | 237 | bp <- ggpar(bp,legend = "right", 238 | font.legend = c(14)) 239 | output_list = list() 240 | output_list$bp = bp 241 | return(output_list) 242 | } 243 | else{ 244 | output_list = list() 245 | x = NA 246 | output_list$bp = x 247 | return(output_list) 248 | } 249 | } 250 | 251 | 252 | 253 | 254 | 255 | 256 | 257 | 258 | 259 | 260 | 261 | 262 | DTU_general <- function(d_list, condition_col = "Condition", first.level = "Hct116", ref.level = "MCF7", samps = metadata, gtf_table, cores = 4, pvalue_input = 0.05){ 263 | ############################################################################################### 264 | # # 265 | # # 266 | # # 267 | # DTU with DEXSeq # 268 | # # 269 | # # 270 | # # 271 | # # 272 | ############################################################################################### 273 | output_list = list() 274 | d = d_list$drim 275 | counts = d_list$counts 276 | samps= d_list$samps 277 | 278 | sample.data <- DRIMSeq::samples(d) 279 | print(sample.data) 280 | sample.data$condition <- factor(sample.data$condition,levels = c(first.level,ref.level)) 281 | count.data <- round(as.matrix(counts(d)[,-c(1:2)])) 282 | print(count.data) 283 | 284 | 285 | dxd <- DEXSeqDataSet(countData=count.data, 286 | sampleData=sample.data, 287 | design=~sample + exon + condition:exon, 288 | featureID=counts(d)$feature_id, 289 | groupID=counts(d)$gene_id) 290 | #print(dxd) 291 | dxd$condition = factor(dxd$condition,levels = c(first.level,ref.level)) 292 | system.time({ 293 | dxd <- estimateSizeFactors(dxd) 294 | dxd <- estimateDispersions(dxd, quiet=TRUE) 295 | dxd <- testForDEU(dxd, reducedModel=~sample + exon) 296 | dxd <- estimateExonFoldChanges( dxd, fitExpToVar="condition", denominator=ref.level) 297 | }) 298 | dxr <- DEXSeqResults(dxd, independentFiltering=FALSE) 299 | output_list = list() 300 | dxr_df = as.data.frame(na.omit(dxr)) 301 | print(colnames(dxr_df)) 302 | output_list$dxr = dxr_df 303 | qval <- perGeneQValue(dxr) 304 | dxr.g <- data.frame(gene=names(qval),qval) 305 | 306 | columns <- c("featureID","groupID","pvalue") 307 | dxr <- as.data.frame(dxr[,columns]) 308 | #print(head(dxr)) 309 | dxr_df = dxr_df[order(dxr_df$padj, decreasing = FALSE), ] 310 | 311 | counter=0 312 | data_to_label = c() 313 | for (i in dxr_df$padj){ 314 | if (counter < 10){ 315 | data_to_label = c(data_to_label,TRUE) 316 | } 317 | else{ 318 | data_to_label = c(data_to_label,FALSE) 319 | } 320 | counter = counter + 1 321 | } 322 | print("Line X passed") 323 | 324 | dxr_df$label = data_to_label 325 | dxr_df 326 | 327 | regex = c('log2fold') 328 | column_list = as.vector(colnames(dxr_df)) 329 | log_2_fold_name = as.character(column_list[which(grepl(regex,column_list))]) 330 | dxr_df["log_2_fold_Change"] = dxr_df[log_2_fold_name] 331 | 332 | 333 | print("Line Y passed") 334 | 335 | dxr_df$Significance = dxr_df$padj < pvalue_input 336 | dxr_df$Significance_reg = ifelse(dxr_df$Significance, 337 | ifelse(dxr_df$log_2_fold_Change > 0, "Up", "Down"), 338 | "Not sig.") 339 | 340 | dxr_df$Significance_reg = factor(dxr_df$Significance_reg, levels = c("Up", "Down","Not sig.")) 341 | color_code = list("Up" = "lightgoldenrod", 342 | "Down" = "steelblue1", 343 | "Not sig." = "gray") 344 | 345 | dxr_df$gene_name = gtf_table[match(dxr_df$groupID,gtf_table$gene_id),"gene_name"] 346 | dxr_df$transcript_name = gtf_table[match(dxr_df$featureID,gtf_table$transcript_id),"transcript_name"] 347 | #rownames(dxr_df) = paste(dxr_df$gene_name,dxr_df$featureID,sep = ":") 348 | rownames(dxr_df) = dxr_df$transcript_name 349 | 350 | volcano_plot_dex <- ggplot(dxr_df,mapping = aes(x = log_2_fold_Change, y=-log10(padj),color = Significance_reg)) + 351 | geom_point(size=1.75) + 352 | ggtitle(paste0("Volcano Plot (", first.level, " vs. ", ref.level, ")")) + # add conditions to title 353 | xlab("log2 fold change") + 354 | ylab("-log10( p-adjusted )") + 355 | guides(colour = guide_legend(override.aes = list(size=5))) + # increase the size of the dots in the legend 356 | geom_text(data = dxr_df %>% filter(label == TRUE) , 357 | mapping = aes(label = rownames(dxr_df)[which(dxr_df$label == TRUE)]), size = 6, nudge_y = 0.4, check_overlap = T) + 358 | 359 | scale_color_manual(values = color_code) + 360 | theme( 361 | panel.background = element_rect(fill = "transparent"), # bg of the panel 362 | plot.background = element_rect(fill = "transparent", color = NA), # bg of the plot 363 | legend.background = element_rect(fill = "transparent"), # get rid of legend bg 364 | legend.title = element_blank(), # remove legend title 365 | legend.key = element_rect(colour = "transparent", fill = "transparent"), 366 | legend.text = element_text(size = 20, color = "white"), 367 | axis.line = element_line(color = "white"), 368 | axis.text = element_text(angle = 45, hjust = 1, size = 17, color = "white"), 369 | plot.title = element_text(hjust = 0.5, face = "bold", size = 23, color = "white"), 370 | axis.title = element_text(size = 23, color = "white")) 371 | output_list$dxr_df = dxr_df 372 | output_list$volcano_plot = volcano_plot_dex 373 | return(output_list) 374 | } 375 | 376 | 377 | 378 | 379 | -------------------------------------------------------------------------------- /app/server/R_scripts/dtu_and_dte_function.R: -------------------------------------------------------------------------------- 1 | 2 | 3 | DRIM_seq_prep <- function(table = count.table, run.dir = csv.dir, samps = metadata, condition_col = "Condition", first.level = "a2d3-OE", ref.level = "Ctrl", gtf_file = gtf,cores = 4){ 4 | 5 | ############################################################################################### 6 | # # 7 | # # 8 | # # 9 | # Read count file, gtf and metadata # 10 | # # 11 | # # 12 | # # 13 | # # 14 | ############################################################################################### 15 | 16 | txdb.filename <- str_split(gtf_file, "/") 17 | txdb.filename <- as.vector(txdb.filename[[1]])[length(txdb.filename[[1]])] 18 | txdb.filename <- paste0(run.dir,txdb.filename) 19 | samps["sample_id"] = samps$Samples 20 | samps["condition"] = samps[condition_col] 21 | #print(samps$condition) 22 | 23 | samps <- samps[which(samps$condition %in% c(first.level,ref.level)),] 24 | print(samps) 25 | 26 | head(table) 27 | table = table[,which(colnames(table) %in% samps$Samples)] 28 | #head(table) 29 | #txdb <- makeTxDbFromGFF(gtf) 30 | #print(txdb) 31 | #saveDb(txdb, txdb.filename) 32 | #txdb <- loadDb(txdb.filename) 33 | 34 | out <- tryCatch( 35 | {txdb <- loadDb(txdb.filename) 36 | x = T 37 | }, 38 | error = function(e){ 39 | x = F 40 | }, 41 | finally = { 42 | }) 43 | 44 | 45 | #print(out) 46 | 47 | if (out){ 48 | txdb <- loadDb(txdb.filename) 49 | } 50 | else { 51 | txdb <- makeTxDbFromGFF(gtf_file) 52 | saveDb(txdb, txdb.filename) 53 | } 54 | 55 | txdf <- AnnotationDbi::select(txdb, keys(txdb, "GENEID"),"TXNAME", "GENEID") 56 | #print(txdf) 57 | tab <- table(txdf$GENEID) 58 | txdf$ntx <- tab[match(txdf$GENEID, names(tab))] 59 | #print(txdf$ntx) 60 | 61 | 62 | all(rownames(table) %in% txdf$TXNAME) 63 | txdf <- txdf[match(rownames(table),txdf$TXNAME),] 64 | all(rownames(table) == txdf$TXNAME) 65 | 66 | counts <- data.frame(gene_id=txdf$GENEID, 67 | feature_id=txdf$TXNAME, 68 | table) 69 | #counts <- counts[which(counts$gene_id == goi_id),] 70 | #print("Counts") 71 | #head(counts) 72 | 73 | 74 | param = BiocParallel::SerialParam() 75 | d <- dmDSdata(counts=counts, samples=samps) 76 | n <- length(samps$sample_id) 77 | d <- dmFilter(d, 78 | min_samps_feature_expr=as.integer(n/2), min_feature_expr=10, 79 | # min_samps_feature_prop=int(n.small/1.5), min_feature_prop=0.1, 80 | min_samps_gene_expr=(n/2), min_gene_expr=50) 81 | 82 | table(table(counts(d)$gene_id)) 83 | design_full <- model.matrix(~condition, data=DRIMSeq::samples(d)) 84 | #print(design_full) 85 | set.seed(1) 86 | system.time({ 87 | d <- dmPrecision(d, design=design_full, BPPARAM = param) 88 | d <- dmFit(d, design=design_full, BPPARAM = param) 89 | d <- dmTest(d, coef=colnames(design_full)[2], BPPARAM = param) 90 | }) 91 | print("DRIM SEQ MADE IT") 92 | #print(counts(d)) 93 | list = list() 94 | list$counts = counts 95 | list$drim = d 96 | list$samps = samps 97 | list$txdf = txdf 98 | return(list) 99 | } 100 | 101 | 102 | ############################################################################################### 103 | # # 104 | # # 105 | # # 106 | # DTU # 107 | # # 108 | # # 109 | # # 110 | # # 111 | ############################################################################################### 112 | 113 | 114 | DTU_special <- function(d_list = tryout, condition_col = "Condition", first.level = "a2d3-OE", ref.level = "Ctrl", goi_id = "ENSG00000111640.15",gtf_tab = gtf_table, cores = 4){ 115 | 116 | d = d_list$drim 117 | counts = d_list$counts 118 | samps = d_list$samps 119 | 120 | res <- DRIMSeq::results(d) 121 | head(res) 122 | 123 | res.txp <- DRIMSeq::results(d, level="feature") 124 | head(res.txp) 125 | 126 | no.na <- function(x) ifelse(is.na(x), 1, x) 127 | res$pvalue <- no.na(res$pvalue) 128 | res.txp$pvalue <- no.na(res.txp$pvalue) 129 | 130 | pScreen <- res$pvalue 131 | strp <- function(x) substr(x,1,15) 132 | names(pScreen) <- strp(res$gene_id) 133 | pConfirmation <- matrix(res.txp$pvalue, ncol=1) 134 | rownames(pConfirmation) <- strp(res.txp$feature_id) 135 | 136 | tx2gene <- res.txp[,c("feature_id", "gene_id")] 137 | for (i in 1:2) tx2gene[,i] <- strp(tx2gene[,i]) 138 | goi_df = res.txp[which(res.txp$gene_id == goi_id),] 139 | goi_df_merged = merge(goi_df,counts(d), by = "feature_id") 140 | idx <- which(res$gene_id == goi_id) 141 | #plotProportions(d, res$gene_id[idx], "condition") 142 | selected_d = counts(d)[which(counts(d)$gene_id == res$gene_id[idx]),] 143 | samples = samps$sample_id 144 | feature_ids = unique(selected_d$feature_id) 145 | plot_dataframe_d = data.frame() 146 | sum_df = data.frame(sample = as.character(), sum = as.numeric()) 147 | for (i in samples){ 148 | sum = sum(selected_d[i]) 149 | temp_sum = cbind(i,sum) 150 | colnames(temp_sum) = c("sample", "sum") 151 | sum_df = rbind(sum_df,temp_sum) 152 | } 153 | print(sum_df) 154 | for (i in samples){ 155 | for (j in feature_ids){ 156 | counts_df = as.numeric(selected_d[which(selected_d$feature_id == j),i]) 157 | feature_id = j 158 | sample_name = i 159 | sum_column = as.numeric(sum_df[which(sum_df$sample == i),"sum"]) 160 | percentage = as.numeric(counts_df / sum_column) 161 | Condition = as.character(samps[which(samps$Samples == i), "Condition"]) 162 | temp_df = cbind(feature_id,counts_df, sample_name, Condition, sum_column, percentage) 163 | plot_dataframe_d = rbind(plot_dataframe_d,temp_df) 164 | } 165 | 166 | 167 | } 168 | plot_dataframe_d$counts_df = as.numeric(plot_dataframe_d$counts_df) 169 | plot_dataframe_d$percentage = as.numeric(plot_dataframe_d$percentage) 170 | plot_dataframe_d$Significance = res.txp[which(res.txp$gene_id == goi_id),]$adj_pvalue < 0.05 171 | plot_dataframe_d$Significance[is.na(plot_dataframe_d$Significance)] <- FALSE 172 | goi_name = unique(gtf_tab$gene_name[gtf_tab$gene_id == goi_id]) 173 | print(goi_name) 174 | #plot_dataframe_d 175 | 176 | if(dim(plot_dataframe_d)[1] > 0){ 177 | 178 | # get coordinates to draw significance bars 179 | sig_bar_coord = data.frame(id = unique(plot_dataframe_d$feature_id)) # create a new dataframe containing the unique IDs from plot_dataframe_d (in the same order as they occur in the ordiginal data) 180 | sig_bar_coord$y = apply(sig_bar_coord, MARGIN = 1, FUN = function(row) max(plot_dataframe_d[plot_dataframe_d$feature_id == row, "percentage"]) + 0.03) # for each ID get the maximum percentage value and add 0.01 => y value for the significance bars 181 | sig_bar_coord$x_center = as.numeric(rownames(sig_bar_coord)) # extract the rownames and transform these to numeric types => x positions in the middle between each boxplot pair (since the initial order (that the ggboxplot function also uses) is kept, the index can be used as x position) 182 | sig_bar_coord$x_1 = sig_bar_coord$x_center - 0.25 # starting position of the line 183 | sig_bar_coord$x_2 = sig_bar_coord$x_center + 0.25 # end position of the line 184 | sig_bar_coord$significant = apply(sig_bar_coord, MARGIN = 1, FUN = function(row) plot_dataframe_d[plot_dataframe_d$feature_id == row["id"], "Significance"][1]) # for each ID extract the corresponding significane value (TRUE/FALSE) 185 | sig_bar_coord = sig_bar_coord[sig_bar_coord$significant == TRUE,] # keep only significant IDs; the x and y values are used to draw the line below 186 | 187 | bp <- ggboxplot(plot_dataframe_d, "feature_id", "percentage", 188 | color = "Condition", add = "jitter", add.params = list(size = 3, alpha = 1)) + # removed: 189 | color_palette(palette = "jco")+ 190 | #fill_palette(palette = c("steelblue4","indianred3"))+ 191 | xlab("Feature ID") + 192 | ylab("Transcript expression (%)") + 193 | ggtitle(goi_name) + 194 | scale_y_continuous(labels = scales::percent_format(accuracy = 1)) + 195 | geom_segment(data = sig_bar_coord, aes(x = x_1, y = y, xend = x_2, yend = y), color = "indianred3") + # draw significance bars 196 | geom_text(data = sig_bar_coord, aes(x = x_center, y = y+0.02), label = "sign.", color = "indianred3") + # add "sign." label anove each bar 197 | theme( 198 | panel.background = element_rect(fill = "transparent"), # bg of the panel 199 | plot.background = element_rect(fill = "transparent", color = NA), # bg of the plot 200 | panel.grid.major = element_blank(), # get rid of major grid 201 | panel.grid.minor = element_blank(), # get rid of minor grid 202 | legend.background = element_rect(fill = "transparent"), # get rid of legend bg 203 | #legend.box.background = element_rect(fill = "transparent"), # get rid of legend panel bg 204 | legend.title = element_text(size = 20, color = "white"), 205 | legend.key = element_rect(colour = "transparent", fill = "transparent"), 206 | legend.text = element_text("Condition", size = 8, color = "white"), 207 | axis.line = element_line(color = "white"), 208 | axis.text = element_text(angle = 45, hjust = 1, size = 10, color = "white"), 209 | plot.title = element_text(hjust = 0.5, face = "bold", size = 14, color = "white"), 210 | axis.title = element_text(size = 14, color = "white") 211 | ) 212 | 213 | bp <- ggpar(bp,legend = "right", 214 | font.legend = c(14)) 215 | output_list = list() 216 | output_list$bp = bp 217 | return(output_list) 218 | } 219 | else{ 220 | output_list = list() 221 | x = NA 222 | output_list$bp = x 223 | return(output_list) 224 | } 225 | } 226 | 227 | 228 | 229 | 230 | 231 | 232 | 233 | 234 | 235 | 236 | 237 | 238 | 239 | ############################################################################################### 240 | # # 241 | # # 242 | # # 243 | # DTE # 244 | # # 245 | # # 246 | # # 247 | # # 248 | ############################################################################################### 249 | 250 | 251 | DTE_general <- function(d_list, condition_col = "Condition", first.level = "Hct116", ref.level = "MCF7", samps = metadata, gtf_table, cores = 4){ 252 | ############################################################################################### 253 | # # 254 | # # 255 | # # 256 | # DTE with DEXSeq # 257 | # # 258 | # # 259 | # # 260 | # # 261 | ############################################################################################### 262 | output_list = list() 263 | d = d_list$drim 264 | counts = d_list$counts 265 | samps= d_list$samps 266 | 267 | sample.data <- DRIMSeq::samples(d) 268 | print(sample.data) 269 | count.data <- round(as.matrix(counts(d)[,-c(1:2)])) 270 | print(count.data) 271 | dxd <- DEXSeqDataSet(countData=count.data, 272 | sampleData=sample.data, 273 | design=~sample + exon + condition:exon, 274 | featureID=counts(d)$feature_id, 275 | groupID=counts(d)$gene_id) 276 | #print(dxd) 277 | 278 | system.time({ 279 | dxd <- estimateSizeFactors(dxd) 280 | dxd <- estimateDispersions(dxd, quiet=TRUE) 281 | dxd <- testForDEU(dxd, reducedModel=~sample + exon) 282 | dxd <- estimateExonFoldChanges( dxd, fitExpToVar="condition") 283 | }) 284 | dxr <- DEXSeqResults(dxd, independentFiltering=FALSE) 285 | output_list = list() 286 | dxr_df = as.data.frame(na.omit(dxr)) 287 | print(colnames(dxr_df)) 288 | output_list$dxr = dxr_df 289 | qval <- perGeneQValue(dxr) 290 | dxr.g <- data.frame(gene=names(qval),qval) 291 | 292 | columns <- c("featureID","groupID","pvalue") 293 | dxr <- as.data.frame(dxr[,columns]) 294 | #print(head(dxr)) 295 | dxr_df = dxr_df[order(dxr_df$padj, decreasing = FALSE), ] 296 | 297 | counter=0 298 | data_to_label = c() 299 | for (i in dxr_df$padj){ 300 | if (counter < 10){ 301 | data_to_label = c(data_to_label,TRUE) 302 | } 303 | else{ 304 | data_to_label = c(data_to_label,FALSE) 305 | } 306 | counter = counter + 1 307 | } 308 | print("Line X passed") 309 | 310 | dxr_df$label = data_to_label 311 | dxr_df 312 | 313 | regex = c('log2fold') 314 | column_list = as.vector(colnames(dxr_df)) 315 | log_2_fold_name = as.character(column_list[which(grepl(regex,column_list))]) 316 | dxr_df["log_2_fold_Change"] = dxr_df[log_2_fold_name] 317 | 318 | 319 | print("Line Y passed") 320 | 321 | dxr_df$Significance = dxr_df$padj < 0.05 322 | dxr_df$Significance_reg = ifelse(dxr_df$Significance, 323 | ifelse(dxr_df$log_2_fold_Change > 0, "Up", "Down"), 324 | "Not sig.") 325 | 326 | dxr_df$Significance_reg = factor(dxr_df$Significance_reg, levels = c("Up", "Down","Not sig.")) 327 | color_code = list("Up" = "lightgoldenrod", 328 | "Down" = "steelblue1", 329 | "Not sig." = "gray") 330 | 331 | dxr_df$gene_name = gtf_table[match(dxr_df$groupID,gtf_table$gene_id),"gene_name"] 332 | rownames(dxr_df) = paste(dxr_df$gene_name,dxr_df$featureID,sep = ":") 333 | 334 | volcano_plot_dex <- ggplot(dxr_df,mapping = aes(x = log_2_fold_Change, y=-log10(padj),color = Significance_reg)) + 335 | geom_point(size=1.75) + 336 | ggtitle(paste0("Volcano Plot (", first.level, " vs. ", ref.level, ")")) + # add conditions to title 337 | xlab("log2 fold change") + 338 | ylab("-log10( p-adjusted )") + 339 | guides(colour = guide_legend(override.aes = list(size=5))) + # increase the size of the dots in the legend 340 | geom_text(data = dxr_df %>% filter(label == TRUE) , 341 | mapping = aes(label = rownames(dxr_df)[which(dxr_df$label == TRUE)]), size = 6, nudge_y = 0.4, check_overlap = T) + 342 | 343 | scale_color_manual(values = color_code) + 344 | theme( 345 | panel.background = element_rect(fill = "transparent"), # bg of the panel 346 | plot.background = element_rect(fill = "transparent", color = NA), # bg of the plot 347 | #panel.grid.major = element_blank(), # get rid of major grid 348 | #panel.grid.minor = element_blank(), # get rid of minor grid 349 | legend.background = element_rect(fill = "transparent"), # get rid of legend bg 350 | #legend.box.background = element_rect(fill = "transparent"), # get rid of legend panel bg 351 | #legend.title = element_text(size = 20, color = "white"), 352 | legend.title = element_blank(), # remove legend title 353 | legend.key = element_rect(colour = "transparent", fill = "transparent"), 354 | legend.text = element_text(size = 20, color = "white"), 355 | axis.line = element_line(color = "white"), 356 | axis.text = element_text(angle = 45, hjust = 1, size = 17, color = "white"), 357 | plot.title = element_text(hjust = 0.5, face = "bold", size = 23, color = "white"), 358 | axis.title = element_text(size = 23, color = "white")) 359 | output_list$dxr_df = dxr_df 360 | output_list$volcano_plot = volcano_plot_dex 361 | return(output_list) 362 | } 363 | 364 | 365 | 366 | 367 | -------------------------------------------------------------------------------- /app/server/R_scripts/infer_experiment_plots.R: -------------------------------------------------------------------------------- 1 | inner_var_plot_per_sample <- function(table = data.frame()){ 2 | if (nrow(table) == 0){ 3 | return(ggplot() + theme_void()) 4 | } 5 | samples = colnames(table) 6 | table["Iteration"] = as.numeric(rownames(table)) 7 | 8 | data = c() 9 | 10 | for (i in samples){ 11 | tmp_data = table["Iteration"] 12 | tmp_data["Difference_of_ratios"] = table[i] 13 | tmp_data["Sample"] = i 14 | data = rbind(data,tmp_data) 15 | } 16 | data 17 | if ((nrow(data) / length(samples)) < 20) { 18 | p = ggline(data,x="Iteration",y="Difference_of_ratios", color="Sample") + 19 | ylab("Change in gene composition") + 20 | theme( 21 | # rect = element_rect(fill = "transparent"), 22 | panel.background = element_rect(fill = 'transparent', color = "white"), # bg of the panel 23 | plot.background = element_rect(fill = "transparent", color = NA), # bg of the plot 24 | legend.background = element_rect(fill = "transparent"), # get rid of legend bg 25 | # legend.box.background = element_rect(fill = "transparent"), 26 | legend.title = element_text(size = 16, color = "white"), 27 | legend.key = element_rect(colour = "transparent", fill = "transparent"), 28 | legend.text = element_text(size = 14, color = "white"), 29 | axis.text = element_text(angle = 45, hjust = 1, size = 14, color = "white"), 30 | plot.title = element_text(hjust = 0.5, face = "bold", size = 23, color = "white"), 31 | axis.title = element_text(size = 23, color = "white"), 32 | axis.line = element_line(color = "white"), 33 | axis.ticks = element_line(color = "white")) + 34 | scale_x_discrete(limits = as.character(seq(1, 20))) 35 | } else { 36 | p = ggline(data,x="Iteration",y="Difference_of_ratios", color="Sample") + 37 | ylab("Change in gene composition") + 38 | theme( 39 | # rect = element_rect(fill = "transparent"), 40 | panel.background = element_rect(fill = 'transparent', color = "white"), # bg of the panel 41 | plot.background = element_rect(fill = "transparent", color = NA), # bg of the plot 42 | legend.background = element_rect(fill = "transparent"), # get rid of legend bg 43 | # legend.box.background = element_rect(fill = "transparent"), 44 | legend.title = element_text(size = 16, color = "white"), 45 | legend.key = element_rect(colour = "transparent", fill = "transparent"), 46 | legend.text = element_text(size = 14, color = "white"), 47 | axis.text = element_text(angle = 45, hjust = 1, size = 14, color = "white"), 48 | plot.title = element_text(hjust = 0.5, face = "bold", size = 23, color = "white"), 49 | axis.title = element_text(size = 23, color = "white"), 50 | axis.line = element_line(color = "white"), 51 | axis.ticks = element_line(color = "white")) 52 | } 53 | p = ggpar(p,legend = "top") 54 | return(p) 55 | } 56 | 57 | inner_var_plot_per_condition <- function(table = data.frame(), metadata_table, colors=c("#00AFBB", "#E7B800")){ 58 | if (nrow(table) == 0){ 59 | return(ggplot() + theme_void()) 60 | } 61 | samples = colnames(table) 62 | table["Iteration"] = as.numeric(rownames(table)) 63 | 64 | data = c() 65 | conditions = unique(metadata_table$Condition) 66 | 67 | for (i in conditions){ 68 | tmp_metadata = metadata_table[which(metadata_table$Condition == i),] 69 | sum = c(0) 70 | count = 0 71 | for (j in tmp_metadata$Samples){ 72 | sum = sum + table[j] 73 | count = count + 1 74 | } 75 | condition_values = sum/c(count) 76 | tmp_data = table["Iteration"] 77 | tmp_data["Difference_of_ratios"] = condition_values 78 | tmp_data["Condition"] = i 79 | data = rbind(data,tmp_data) 80 | } 81 | 82 | if ((nrow(data) / length(tmp_metadata$Samples)) < 20){ 83 | p = ggline(data,x="Iteration",y="Difference_of_ratios", color="Condition", palette=colors) + 84 | ylab("Change in gene composition") + 85 | theme( 86 | panel.background = element_rect(fill = "transparent"), # bg of the panel 87 | plot.background = element_rect(fill = "transparent", color = NA), # bg of the plot 88 | legend.background = element_rect(fill = "transparent"), # get rid of legend bg 89 | legend.title = element_text(size = 16, color = "white"), 90 | legend.key = element_rect(colour = "transparent", fill = "transparent"), 91 | legend.text = element_text(size = 14, color = "white"), 92 | axis.text = element_text(angle = 45, hjust = 1, size = 14, color = "white"), 93 | plot.title = element_text(hjust = 0.5, face = "bold", size = 23, color = "white"), 94 | axis.title = element_text(size = 23, color = "white"), 95 | axis.line = element_line(color = "white"), 96 | axis.ticks = element_line(color = "white")) + 97 | scale_x_discrete(limits = as.character(seq(1, 20))) 98 | } else { 99 | p = ggline(data,x="Iteration",y="Difference_of_ratios", color="Condition", palette=colors) + 100 | ylab("Change in gene composition") + 101 | theme( 102 | panel.background = element_rect(fill = "transparent"), # bg of the panel 103 | plot.background = element_rect(fill = "transparent", color = NA), # bg of the plot 104 | legend.background = element_rect(fill = "transparent"), # get rid of legend bg 105 | legend.title = element_text(size = 16, color = "white"), 106 | legend.key = element_rect(colour = "transparent", fill = "transparent"), 107 | legend.text = element_text(size = 14, color = "white"), 108 | axis.text = element_text(angle = 45, hjust = 1, size = 14, color = "white"), 109 | plot.title = element_text(hjust = 0.5, face = "bold", size = 23, color = "white"), 110 | axis.title = element_text(size = 23, color = "white"), 111 | axis.line = element_line(color = "white"), 112 | axis.ticks = element_line(color = "white")) 113 | } 114 | p = ggpar(p,legend = "top") 115 | return(p) 116 | } 117 | 118 | 119 | total_genes_counted_plot_per_sample <- function(table = data.frame()){ 120 | if (nrow(table) == 0){ 121 | return(ggplot() + theme_void()) 122 | } 123 | samples = colnames(table) 124 | table["Iteration"] = as.numeric(rownames(table)) 125 | 126 | data = c() 127 | 128 | for (i in samples){ 129 | tmp_data = table["Iteration"] 130 | # print(table[i]) 131 | tmp_data["Counts"] = table[i] 132 | tmp_data["Sample"] = i 133 | data = rbind(data,tmp_data) 134 | } 135 | if ((nrow(data) / length(samples)) < 20){ 136 | q = ggline(data,x="Iteration",y="Counts", color="Sample") + 137 | ylab("Number of identified genes") + 138 | theme( 139 | panel.background = element_rect(fill = "transparent"), # bg of the panel 140 | plot.background = element_rect(fill = "transparent", color = NA), # bg of the plot 141 | legend.background = element_rect(fill = "transparent"), # get rid of legend bg 142 | legend.title = element_text(size = 16, color = "white"), 143 | legend.key = element_rect(colour = "transparent", fill = "transparent"), 144 | legend.text = element_text(size = 14, color = "white"), 145 | axis.text = element_text(angle = 45, hjust = 1, size = 14, color = "white"), 146 | plot.title = element_text(hjust = 0.5, face = "bold", size = 23, color = "white"), 147 | axis.title = element_text(size = 23, color = "white"), 148 | axis.line = element_line(color = "white"), 149 | axis.ticks = element_line(color = "white")) + 150 | scale_x_discrete(limits = as.character(seq(1, 20))) 151 | } else { 152 | q = ggline(data,x="Iteration",y="Counts", color="Sample") + 153 | ylab("Number of identified genes") + 154 | theme( 155 | panel.background = element_rect(fill = "transparent"), # bg of the panel 156 | plot.background = element_rect(fill = "transparent", color = NA), # bg of the plot 157 | legend.background = element_rect(fill = "transparent"), # get rid of legend bg 158 | legend.title = element_text(size = 16, color = "white"), 159 | legend.key = element_rect(colour = "transparent", fill = "transparent"), 160 | legend.text = element_text(size = 14, color = "white"), 161 | axis.text = element_text(angle = 45, hjust = 1, size = 14, color = "white"), 162 | plot.title = element_text(hjust = 0.5, face = "bold", size = 23, color = "white"), 163 | axis.title = element_text(size = 23, color = "white"), 164 | axis.line = element_line(color = "white"), 165 | axis.ticks = element_line(color = "white")) 166 | } 167 | q = ggpar(q,legend = "top") 168 | return(q) 169 | } 170 | 171 | 172 | total_genes_counted_plot_per_condition <- function(table = data.frame(), metadata_table, colors=c("#00AFBB", "#E7B800")){ 173 | if (nrow(table) == 0){ 174 | return(ggplot() + theme_void()) 175 | } 176 | samples = colnames(table) 177 | table["Iteration"] = as.numeric(rownames(table)) 178 | data = c() 179 | conditions = unique(metadata_table$Condition) 180 | 181 | for (i in conditions){ 182 | tmp_metadata = metadata_table[which(metadata_table$Condition == i),] 183 | sum = c(0) 184 | #print(sum) 185 | count = 0 186 | #print(tmp_metadata$Samples) 187 | for (j in tmp_metadata$Samples){ 188 | #print("table J") 189 | #print(table[j]) 190 | sum = sum + table[j] 191 | count = count + 1 192 | } 193 | # print(sum) 194 | condition_values = sum/c(count) 195 | # print(condition_values) 196 | tmp_data = table["Iteration"] 197 | tmp_data["Counts"] = condition_values 198 | tmp_data["Condition"] = i 199 | #print(tmp_data) 200 | data = rbind(data,tmp_data) 201 | } 202 | # print(data) 203 | if ((nrow(data) / length(tmp_metadata$Samples)) < 20){ 204 | p = ggline(data,x="Iteration",y="Counts", color="Condition", palette=colors) + 205 | ylab("Number of identified genes") + 206 | theme( 207 | panel.background = element_rect(fill = "transparent"), # bg of the panel 208 | plot.background = element_rect(fill = "transparent", color = NA), # bg of the plot 209 | legend.background = element_rect(fill = "transparent"), # get rid of legend bg 210 | legend.title = element_text(size = 16, color = "white"), 211 | legend.key = element_rect(colour = "transparent", fill = "transparent"), 212 | legend.text = element_text(size = 14, color = "white"), 213 | axis.text = element_text(angle = 45, hjust = 1, size = 14, color = "white"), 214 | plot.title = element_text(hjust = 0.5, face = "bold", size = 23, color = "white"), 215 | axis.title = element_text(size = 23, color = "white"), 216 | axis.line = element_line(color = "white"), 217 | axis.ticks = element_line(color = "white")) + 218 | scale_x_discrete(limits = as.character(seq(1, 20))) 219 | } else { 220 | p = ggline(data,x="Iteration",y="Counts", color="Condition", palette=colors) + 221 | ylab("Number of identified genes") + 222 | theme( 223 | panel.background = element_rect(fill = "transparent"), # bg of the panel 224 | plot.background = element_rect(fill = "transparent", color = NA), # bg of the plot 225 | legend.background = element_rect(fill = "transparent"), # get rid of legend bg 226 | legend.title = element_text(size = 16, color = "white"), 227 | legend.key = element_rect(colour = "transparent", fill = "transparent"), 228 | legend.text = element_text(size = 14, color = "white"), 229 | axis.text = element_text(angle = 45, hjust = 1, size = 14, color = "white"), 230 | plot.title = element_text(hjust = 0.5, face = "bold", size = 23, color = "white"), 231 | axis.title = element_text(size = 23, color = "white"), 232 | axis.line = element_line(color = "white"), 233 | axis.ticks = element_line(color = "white") 234 | ) 235 | } 236 | p = ggpar(p,legend = "top") 237 | return(p) 238 | } 239 | 240 | 241 | inner_var_plot_per_sample.download <- function(table = data.frame()){ 242 | if (nrow(table) == 0){ 243 | return(ggplot() + theme_void()) 244 | } 245 | samples = colnames(table) 246 | table["Iteration"] = as.numeric(rownames(table)) 247 | 248 | data = c() 249 | 250 | for (i in samples){ 251 | tmp_data = table["Iteration"] 252 | tmp_data["Difference_of_ratios"] = table[i] 253 | tmp_data["Sample"] = i 254 | data = rbind(data,tmp_data) 255 | } 256 | data 257 | p = ggline(data,x="Iteration",y="Difference_of_ratios", color="Sample") + 258 | ylab("Change in gene composition") + 259 | theme( 260 | legend.title = element_text(size = 16, color = "black"), 261 | legend.text = element_text(size = 14, color = "black"), 262 | axis.text = element_text(angle = 45, hjust = 1, size = 14, color = "black"), 263 | plot.title = element_text(hjust = 0.5, face = "bold", size = 23, color = "black"), 264 | axis.title = element_text(size = 23, color = "black"), 265 | axis.line = element_line(color = "black"), 266 | axis.ticks = element_line(color = "black")) 267 | p = ggpar(p,legend = "top") 268 | return(p) 269 | } 270 | 271 | inner_var_plot_per_condition.download <- function(table = data.frame(), metadata_table, colors){ 272 | if (nrow(table) == 0){ 273 | return(ggplot() + theme_void()) 274 | } 275 | samples = colnames(table) 276 | table["Iteration"] = as.numeric(rownames(table)) 277 | 278 | data = c() 279 | conditions = unique(metadata_table$Condition) 280 | 281 | for (i in conditions){ 282 | tmp_metadata = metadata_table[which(metadata_table$Condition == i),] 283 | sum = c(0) 284 | count = 0 285 | for (j in tmp_metadata$Samples){ 286 | sum = sum + table[j] 287 | count = count + 1 288 | } 289 | condition_values = sum/c(count) 290 | tmp_data = table["Iteration"] 291 | tmp_data["Difference_of_ratios"] = condition_values 292 | tmp_data["Condition"] = i 293 | data = rbind(data,tmp_data) 294 | } 295 | 296 | p = ggline(data,x="Iteration",y="Difference_of_ratios", color="Condition", palette=colors) + 297 | ylab("Change in gene composition") + 298 | theme( 299 | legend.title = element_text(size = 16, color = "black"), 300 | legend.text = element_text(size = 14, color = "black"), 301 | axis.text = element_text(angle = 45, hjust = 1, size = 14, color = "black"), 302 | plot.title = element_text(hjust = 0.5, face = "bold", size = 23, color = "black"), 303 | axis.title = element_text(size = 23, color = "black"), 304 | axis.line = element_line(color = "black"), 305 | axis.ticks = element_line(color = "black")) 306 | p = ggpar(p,legend = "top") 307 | return(p) 308 | } 309 | 310 | 311 | total_genes_counted_plot_per_sample.download <- function(table = data.frame()){ 312 | if (nrow(table) == 0){ 313 | return(ggplot() + theme_void()) 314 | } 315 | samples = colnames(table) 316 | table["Iteration"] = as.numeric(rownames(table)) 317 | 318 | data = c() 319 | 320 | for (i in samples){ 321 | tmp_data = table["Iteration"] 322 | tmp_data["Counts"] = table[i] 323 | tmp_data["Sample"] = i 324 | data = rbind(data,tmp_data) 325 | } 326 | 327 | q = ggline(data,x="Iteration",y="Counts", color="Sample") + 328 | ylab("Number of identified genes") + 329 | theme( 330 | legend.title = element_text(size = 16, color = "black"), 331 | legend.text = element_text(size = 14, color = "black"), 332 | axis.text = element_text(angle = 45, hjust = 1, size = 14, color = "black"), 333 | plot.title = element_text(hjust = 0.5, face = "bold", size = 23, color = "black"), 334 | axis.title = element_text(size = 23, color = "black"), 335 | axis.line = element_line(color = "black"), 336 | axis.ticks = element_line(color = "black")) 337 | q = ggpar(q,legend = "top") 338 | return(q) 339 | } 340 | 341 | 342 | total_genes_counted_plot_per_condition.download <- function(table = data.frame(), metadata_table, colors){ 343 | if (nrow(table) == 0){ 344 | return(ggplot() + theme_void()) 345 | } 346 | samples = colnames(table) 347 | table["Iteration"] = as.numeric(rownames(table)) 348 | data = c() 349 | conditions = unique(metadata_table$Condition) 350 | 351 | for (i in conditions){ 352 | tmp_metadata = metadata_table[which(metadata_table$Condition == i),] 353 | sum = c(0) 354 | count = 0 355 | for (j in tmp_metadata$Samples){ 356 | sum = sum + table[j] 357 | count = count + 1 358 | } 359 | condition_values = sum/c(count) 360 | tmp_data = table["Iteration"] 361 | tmp_data["Counts"] = condition_values 362 | tmp_data["Condition"] = i 363 | data = rbind(data,tmp_data) 364 | } 365 | 366 | p = ggline(data,x="Iteration",y="Counts", color="Condition", palette=colors) + 367 | ylab("Number of identified genes") + 368 | theme( 369 | legend.title = element_text(size = 16, color = "black"), 370 | legend.text = element_text(size = 14, color = "black"), 371 | axis.text = element_text(angle = 45, hjust = 1, size = 14, color = "black"), 372 | plot.title = element_text(hjust = 0.5, face = "bold", size = 23, color = "black"), 373 | axis.title = element_text(size = 23, color = "black"), 374 | axis.line = element_line(color = "black"), 375 | axis.ticks = element_line(color = "black")) 376 | p = ggpar(p,legend = "top") 377 | return(p) 378 | } 379 | 380 | -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | NanopoReaTA - Nanopore Real-Time Transcriptional Analysis Tool 2 | ================================================== 3 | 4 | 5 | [](https://www.nextflow.io/) 6 | [](https://www.r-project.org/) 7 | [](https://shiny.rstudio.com/) 8 | 9 | **NanopoReaTA** is an R shiny application that integrates both preprocessing and downstream analysis pipelines for RNA sequencing data from [Oxford Nanopore Technologies (ONT)](https://nanoporetech.com/) into a user-friendly interface. NanopoReaTA focuses on the analysis of (direct) cDNA and RNA-sequencing (cDNA, DRS) reads and guides you through the different steps up to final visualizations of results from i.e. differential expression or gene body coverage. Furthermore, NanopoReaTa can be run in real-time right after starting a run via MinKNOW, the sequencing application of ONT. 10 | 11 | 12 | **Currently available analysis modules:** 13 | 1. [Run Overview](#run-overview) - Experiment statistics over time 14 | 2. [Gene-wise analysis](#gene-wise-analysis) - Gene-wise analysis of expression (gene counts, gene body coverage) 15 | 3. [Differential expression analysis](#differential-expression-analysis) - Differential expression and/or usage analysis of genes (DGE) and transcripts (DTE + DTU) 16 | 17 |









