├── License.txt ├── README.md ├── bin ├── FastViromeExplorer$1.class ├── FastViromeExplorer.class └── Read.class ├── imgvr-viruses-list.txt ├── ncbi-viruses-list.txt ├── src ├── FastViromeExplorer.java └── Read.java ├── test ├── reads_1.fq ├── reads_1.fq.gz ├── reads_2.fq ├── reads_2.fq.gz ├── testset-kallisto-index.idx ├── testset-salmon-index │ ├── hash.bin │ ├── header.json │ ├── indexing.log │ ├── quasi_index.log │ ├── refInfo.json │ ├── rsd.bin │ ├── sa.bin │ ├── txpInfo.bin │ └── versionInfo.json └── testset.fa ├── tools-linux ├── kallisto ├── kallisto-license.txt └── samtools ├── tools-mac ├── kallisto ├── kallisto-license.txt ├── salmon └── samtools └── utility-scripts └── generateGenomeList.sh /License.txt: -------------------------------------------------------------------------------- 1 | BSD 2-Clause License 2 | 3 | Copyright (c) 2017-2019, Saima Sultana Tithi, Frank O. Aylward, Roderick V. Jensen, and Liqing Zhang. 4 | All rights reserved. 5 | 6 | Redistribution and use in source and binary forms, with or without 7 | modification, are permitted provided that the following conditions are met: 8 | 9 | * Redistributions of source code must retain the above copyright notice, this 10 | list of conditions and the following disclaimer. 11 | 12 | * Redistributions in binary form must reproduce the above copyright notice, 13 | this list of conditions and the following disclaimer in the documentation 14 | and/or other materials provided with the distribution. 15 | 16 | THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" 17 | AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE 18 | IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE 19 | DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE 20 | FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL 21 | DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR 22 | SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER 23 | CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, 24 | OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE 25 | OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. 26 | -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | # FastViromeExplorer 2 | Indentify the viruses/phages and their abundance in the viral metagenomics data. The paper describing FastViromeExplorer is available from here: https://peerj.com/articles/4227/. 3 | 4 | # Installation 5 | FastViromeExplorer requires JAVA (JDK) 1.8 or later, Samtools 1.4 or later, and Kallisto 0.43.0 or 0.43.1 installed in the user's machine. As in later versions of Kallisto, the output format of pseudoalignments is different, please use Kallisto version 0.43.0 or 0.43.1. 6 | ## Download FastViromeExplorer 7 | You can download FastViromeExplorer directly from github and extract it. You can also download it using the following command: 8 | ```bash 9 | git clone https://github.com/saima-tithi/FastViromeExplorer.git 10 | ``` 11 | From now on, we will refer the FastViromeExplorer directory in the user's local machine as `project directory`. The `project directory` will contain 5 folders: src, bin, test, tools-linux, and tools-mac. It will also contain two text files: ncbi-viruses-list and imgvr-viruses-list.txt. 12 | ## Install Java 13 | If Java is not already installed, you need to install Java (JDK) 1.8 or later from the following link: http://www.oracle.com/technetwork/java/javase/downloads/index.html. From this link, download the appropriate jdk installation file (for linux or macOS), and then install Java by double-clicking the downloaded installation file. 14 | ## Install Kallisto and Samtools 15 | If Kallisto or Samtools is not installed, you can install it from the executables distributed with FastViromeExplorer. 16 | In terminal, go into the project directory. Then go into the `tools-linux` folder if you are using a linux machine or go into the `tools-mac` folder if you are using macOS. Copy the kallisto and samtools executables from this directory to the /usr/local/bin directory. 17 | 18 | ```bash 19 | cd /path-to-FastViromeExplorer/tools-linux 20 | sudo cp kallisto /usr/local/bin/ 21 | sudo cp samtools /usr/local/bin/ 22 | ``` 23 | Or 24 | 25 | ```bash 26 | cd /path-to-FastViromeExplorer/tools-mac 27 | sudo cp kallisto /usr/local/bin/ 28 | sudo cp samtools /usr/local/bin/ 29 | ``` 30 | ## Install FastViromeExplorer 31 | In terminal, go into the project directory, which should contain `src` and `bin` folders. From the project directory, run the following command: 32 | ```bash 33 | javac -d bin src/*.java 34 | ``` 35 | # Run FastViromeExplorer using test data 36 | From the project directory, run the following commands: 37 | ```bash 38 | mkdir test-output 39 | java -cp bin FastViromeExplorer -1 test/reads_1.fq -2 test/reads_2.fq -i test/testset-kallisto-index.idx -o test-output 40 | ``` 41 | The test input files are given in the `test` folder. Here, the input files are: 42 | 1. *reads_1.fq* and *reads_2.fq* : paired-end reads in fastq format 43 | 2. *testset-kallisto-index.idx* : kallisto index file generated for a small set of NCBI RefSeq viruses 44 | 45 | The output files will be generated in the `test-output` directory. The output files are: 46 | 1. *FastViromeExplorer-reads-mapped-sorted.sam* : aligned/mapped reads in sam format 47 | 2. *FastViromeExplorer-final-sorted-abundance.tsv* : virus abundance result in tab-delimited format 48 | 49 | In a similar manner, we can run FastViromeExplorer for single-end reads without specifying the "-2" parameter. An example of running FastViromeExplorer for single-end reads: 50 | ```bash 51 | mkdir test-output 52 | java -cp bin FastViromeExplorer -1 test/reads_1.fq -i test/testset-kallisto-index.idx -o test-output 53 | ``` 54 | 55 | By default, FastViromeExplorer uses `kallisto` as the alignment tool. FastViromeExplorer can also be run using `Salmon` as the alignment tool for the pseudoalignment step. For running using the `Salmon` tool, from the project directory, run the following commands: 56 | ```bash 57 | mkdir test-output-salmon 58 | java -cp bin FastViromeExplorer -1 test/reads_1.fq -2 test/reads_2.fq -i test/testset-salmon-index -o test-output-salmon -salmon true 59 | ``` 60 | 61 | # Run FastViromeExplorer using NCBI RefSeq database 62 | Some pre-computed kallisto index files are given in the following link: http://bench.cs.vt.edu/FastViromeExplorer/. 63 | Download the kallisto index file for NCBI RefSeq database "ncbi-virus-kallisto-index-k31.idx" and save it. From terminal, run the following command: 64 | ```bash 65 | mkdir $outputDirectory 66 | java -cp /path-to-FastViromeExplorer/bin FastViromeExplorer -1 $read1File -2 $read2File -i /path-to-index-file/ncbi-virus-kallisto-index-k31.idx -o $outputDirectory 67 | ``` 68 | # Run FastViromeExplorer using IMG/VR database 69 | Download the kallisto index file for IMG/VR database "imgvr-virus-kallisto-index-k31.idx" from http://bench.cs.vt.edu/FastViromeExplorer/ and save it. From terminal, run the following command: 70 | ```bash 71 | mkdir $outputDirectory 72 | java -cp /path-to-FastViromeExplorer/bin FastViromeExplorer -1 $read1File -2 $read2File -i /path-to-index-file/imgvr-virus-kallisto-index-k31.idx -l imgvr-viruses-list.txt -o $outputDirectory 73 | ``` 74 | For running FastViromeExplorer using IMG/VR database, we need to specify the kallisto index file and the list of viruses in the database along with their genome length, which is given in the file "imgvr-viruses-list.txt". 75 | 76 | # Run FastViromeExplorer using custom database 77 | For running FastViromeExplorer using any custom database, please look at our detailed manual at http://fastviromeexplorer.readthedocs.io/en/latest/. 78 | 79 | # Usage 80 | java -cp /path-to-FastViromeExplorer/bin FastViromeExplorer -1 $read1File -2 $read2File -i $indexFile -o $outputDirectory 81 | 82 | The full parameter list of FastViromeExplorer: 83 | 1. -1: input .fastq file or .fastq.gz file for read sequences (paired-end 1), mandatory field. 84 | 2. -2: input .fastq file or .fastq.gz file for read sequences (paired-end 2). 85 | 3. -i: kallisto/salmon index file, mandatory field. 86 | 4. -db: reference database file in fasta/fa format. 87 | 5. -o: output directory, default option is the project directory. 88 | 6. -l: virus list containing all viruses present in the reference database along with their length. 89 | 7. -cr: the value of ratio criteria, default: 0.3. 90 | 8. -co: the value of coverage criteria, default: 0.1. 91 | 9. -cn: the value of number of reads criteria, default: 10. 92 | 10. -salmon: use salmon instead of kallisto, default: false. To use salmon pass '-salmon true' as parameter. 93 | 94 | # Support 95 | If you are having issues, please look at the detailed manual at http://fastviromeexplorer.readthedocs.io/en/latest/ or contact us at saima5@vt.edu 96 | # License 97 | This project is licensed under the BSD 2-clause "Simplified" License. 98 | # Citation 99 | If you are using our tool, please cite us: 100 | 101 | Saima Sultana Tithi, Frank O. Aylward, Roderick V. Jensen, and Liqing Zhang. "FastViromeExplorer: a pipeline for virus and phage identification and abundance profiling in metagenomics data." PeerJ 6 (2018): e4227. 102 | -------------------------------------------------------------------------------- /bin/FastViromeExplorer$1.class: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/saima-tithi/FastViromeExplorer/bf9e73c6aa316db49b140f5b9154edc364187327/bin/FastViromeExplorer$1.class -------------------------------------------------------------------------------- /bin/FastViromeExplorer.class: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/saima-tithi/FastViromeExplorer/bf9e73c6aa316db49b140f5b9154edc364187327/bin/FastViromeExplorer.class -------------------------------------------------------------------------------- /bin/Read.class: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/saima-tithi/FastViromeExplorer/bf9e73c6aa316db49b140f5b9154edc364187327/bin/Read.class -------------------------------------------------------------------------------- /src/FastViromeExplorer.java: -------------------------------------------------------------------------------- 1 | import java.io.BufferedReader; 2 | import java.io.BufferedWriter; 3 | import java.io.File; 4 | import java.io.FileReader; 5 | import java.io.FileWriter; 6 | import java.io.InputStreamReader; 7 | import java.lang.ProcessBuilder.Redirect; 8 | import java.nio.file.Paths; 9 | import java.text.DecimalFormat; 10 | import java.util.Collections; 11 | import java.util.Comparator; 12 | import java.util.HashMap; 13 | import java.util.Iterator; 14 | import java.util.LinkedHashMap; 15 | import java.util.LinkedList; 16 | import java.util.List; 17 | import java.util.Map; 18 | import java.util.NavigableSet; 19 | import java.util.TreeSet; 20 | import java.util.Map.Entry; 21 | 22 | public class FastViromeExplorer { 23 | private static String outDir = ""; 24 | private static String read1 = ""; 25 | private static String read2 = ""; 26 | private static String kallistoIndexFile = ""; 27 | private static String refDbFile = ""; 28 | private static String virusListFile = ""; 29 | private static double ratioCriteria = 0.3; 30 | private static double coverageCriteria = 0.1; 31 | private static int numReadsCriteria = 10; 32 | private static double avgReadLen = 0; 33 | private static boolean ASC = true; 34 | private static boolean DESC = false; 35 | private static Map virusLength; 36 | private static Map virusLineage; 37 | private static Map virusRatio; 38 | private static boolean useSalmon = false; 39 | private static boolean reportRatio = false; 40 | 41 | // sort a map 42 | private static Map sortByComparator(Map unsortMap, final boolean order) { 43 | List> list = new LinkedList>(unsortMap.entrySet()); 44 | 45 | // Sorting the list based on values 46 | Collections.sort(list, new Comparator>() { 47 | public int compare(Entry o1, Entry o2) { 48 | if (order) { 49 | return o1.getValue().compareTo(o2.getValue()); 50 | } else { 51 | return o2.getValue().compareTo(o1.getValue()); 52 | 53 | } 54 | } 55 | }); 56 | 57 | // Maintaining insertion order with the help of LinkedList 58 | Map sortedMap = new LinkedHashMap(); 59 | for (Entry entry : list) { 60 | sortedMap.put(entry.getKey(), entry.getValue()); 61 | } 62 | 63 | return sortedMap; 64 | } 65 | 66 | private static void parseArguments(String[] args) { 67 | if (args.length == 0) { 68 | printUsage(); 69 | System.exit(1); 70 | } else { 71 | for (int i = 0; i < args.length; i++) { 72 | if (args[i].startsWith("-")) { 73 | if ((i + 1) >= args.length) { 74 | System.out.println("Missing argument after " + args[i] + " ."); 75 | printUsage(); 76 | System.exit(1); 77 | } else { 78 | if (args[i].equals("-o")) { 79 | outDir = args[i + 1]; 80 | } else if (args[i].equals("-1")) { 81 | read1 = args[i + 1]; 82 | } else if (args[i].equals("-2")) { 83 | read2 = args[i + 1]; 84 | } else if (args[i].equals("-i")) { 85 | kallistoIndexFile = args[i + 1]; 86 | } else if (args[i].equals("-db")) { 87 | refDbFile = args[i + 1]; 88 | } else if (args[i].equals("-l")) { 89 | virusListFile = args[i + 1]; 90 | } else if (args[i].equals("-cr")) { 91 | ratioCriteria = Double.parseDouble(args[i + 1]); 92 | } else if (args[i].equals("-co")) { 93 | coverageCriteria = Double.parseDouble(args[i + 1]); 94 | } else if (args[i].equals("-cn")) { 95 | numReadsCriteria = Integer.parseInt(args[i + 1]); 96 | } else if (args[i].equals("-salmon")) { 97 | if (args[i + 1].equalsIgnoreCase("true")) { 98 | useSalmon = true; 99 | } else { 100 | useSalmon = false; 101 | } 102 | } else if (args[i].equals("-reportRatio")) { 103 | if (args[i + 1].equalsIgnoreCase("true")) { 104 | reportRatio = true; 105 | } else { 106 | reportRatio = false; 107 | } 108 | } else { 109 | System.out.println("Invalid argument."); 110 | printUsage(); 111 | System.exit(1); 112 | } 113 | } 114 | } 115 | } 116 | } // finish parsing arguments 117 | if (read1.isEmpty()) { 118 | System.out.println("Please provide the read file."); 119 | printUsage(); 120 | System.exit(1); 121 | } 122 | if (kallistoIndexFile.isEmpty() && refDbFile.isEmpty()) { 123 | System.out.println("Please provide the reference database or kallisto index file or salmon index directory."); 124 | printUsage(); 125 | System.exit(1); 126 | } 127 | if (virusListFile.isEmpty()) { 128 | virusListFile = "ncbi-viruses-list.txt"; 129 | } 130 | if (outDir.isEmpty()) { 131 | outDir = Paths.get(".").toAbsolutePath().normalize().toString(); 132 | } 133 | if (ratioCriteria < 0.0 || ratioCriteria > 1.0) { 134 | System.out.println("The ratio criteria should be between 0.0 and 1.0. " 135 | + "Using the default value: 0.3."); 136 | ratioCriteria = 0.3; 137 | } 138 | if (coverageCriteria < 0.0 || coverageCriteria > 1.0) { 139 | System.out.println("The coverage criteria should be between 0.0 and 1.0. " 140 | + "Using the default value: 0.1."); 141 | coverageCriteria = 0.1; 142 | } 143 | } 144 | 145 | private static void printUsage() { 146 | System.out.println("Usage:"); 147 | System.out.println( 148 | "java -cp /path-to-FastViromeExplorer/bin FastViromeExplorer -1 $read1File -2 $read2File -i $indexFile -o $outputDirectory"); 149 | System.out.println("-1: input .fastq file for read sequences (paired-end 1), mandatory field."); 150 | System.out.println("-2: input .fastq file for read sequences (paired-end 2)."); 151 | System.out.println("-i: kallisto/salmon index file, mandatory field."); 152 | System.out.println("-db: reference database file in fasta/fa format."); 153 | System.out.println("-o: output directory. Default option is the project directory."); 154 | System.out.println("-l: virus list containing " 155 | + "all viruses present in the reference database along with their length."); 156 | System.out.println("-cr: the value of ratio criteria, default: 0.3."); 157 | System.out.println("-co: the value of coverage criteria, default: 0.1."); 158 | System.out.println("-cn: the value of number of reads criteria, default: 10."); 159 | System.out.println( 160 | "-salmon: use salmon instead of kallisto, default: false. To use salmon pass '-salmon true' as parameter."); 161 | System.out.println( 162 | "-reportRatio: default: false. To get ratio pass '-reportRatio true' as parameter."); 163 | } 164 | 165 | private static void checkInputs() { 166 | File file = new File(read1); 167 | if (!file.exists() || file.isDirectory()) { 168 | System.out.println("Could not find read file: " + read1); 169 | System.exit(1); 170 | } 171 | if(!read2.isEmpty()) { 172 | file = new File(read2); 173 | if (!file.exists() || file.isDirectory()) { 174 | System.out.println("Could not find read file: " + read2); 175 | System.exit(1); 176 | } 177 | } 178 | if(!kallistoIndexFile.isEmpty()) { 179 | file = new File(kallistoIndexFile); 180 | if (useSalmon) { 181 | if (!file.exists()) { 182 | System.out.println("Could not find salmon index directory: " + kallistoIndexFile); 183 | System.exit(1); 184 | } 185 | } 186 | else { 187 | if (!file.exists() || file.isDirectory()) { 188 | System.out.println("Could not find kallisto index file: " + kallistoIndexFile); 189 | System.exit(1); 190 | } 191 | } 192 | } 193 | if(!refDbFile.isEmpty()) { 194 | file = new File(refDbFile); 195 | if (!file.exists() || file.isDirectory()) { 196 | System.out.println("Could not find reference database file: " + refDbFile); 197 | System.exit(1); 198 | } 199 | } 200 | } 201 | 202 | private static void callKallisto() { 203 | File f1 = new File(virusListFile); 204 | if (!f1.isFile()) { 205 | System.out.println( 206 | "Could not find the list of viruses (ncbi-viruses-list.txt) " + "in the project directory."); 207 | printUsage(); 208 | System.exit(1); 209 | } 210 | try { 211 | String command = ""; 212 | // index file is given 213 | if (!kallistoIndexFile.isEmpty()) { 214 | if (read2.isEmpty()) { 215 | command = "kallisto quant -i " + kallistoIndexFile + " -o " + outDir 216 | + " --single -l 200 -s 50 --pseudobam " + read1 217 | + " | samtools view -bS - | samtools view -h -F 0x04 -b - | " + "samtools sort - -o " 218 | + outDir + "/FastViromeExplorer-reads-mapped-sorted.sam\n"; 219 | } else { 220 | command = "kallisto quant -i " + kallistoIndexFile + " -o " + outDir + " --pseudobam " + read1 + " " 221 | + read2 + " | samtools view -bS - | samtools view -h -F 0x04 -b - | " 222 | + "samtools sort - -o " + outDir + "/FastViromeExplorer-reads-mapped-sorted.sam\n"; 223 | } 224 | } else if (!refDbFile.isEmpty()) { 225 | if (read2.isEmpty()) { 226 | command = "kallisto index -i kallisto-index.idx " + refDbFile + "\n" 227 | + "kallisto quant -i kallisto-index.idx " + "-o " + outDir 228 | + " --single -l 200 -s 50 --pseudobam " + read1 229 | + " | samtools view -bS - | samtools view -h -F 0x04 -b - | " + "samtools sort - -o " 230 | + outDir + "/FastViromeExplorer-reads-mapped-sorted.sam\n"; 231 | } else { 232 | command = "kallisto index -i kallisto-index.idx " + refDbFile + "\n" 233 | + "kallisto quant -i kallisto-index.idx " + "-o " + outDir + " --pseudobam " + read1 + " " 234 | + read2 + " | samtools view -bS - | samtools view -h -F 0x04 -b - | " 235 | + "samtools sort - -o " + outDir + "/FastViromeExplorer-reads-mapped-sorted.sam\n"; 236 | } 237 | } 238 | 239 | FileWriter shellFileWriter = new FileWriter(outDir + "/run.sh"); 240 | shellFileWriter.write("#!/bin/bash\n"); 241 | shellFileWriter.write(command); 242 | shellFileWriter.close(); 243 | 244 | ProcessBuilder builder = new ProcessBuilder("sh", outDir + "/run.sh"); 245 | builder.redirectError(new File(outDir + "/log.txt")); 246 | Process process = builder.start(); 247 | BufferedReader reader = new BufferedReader(new InputStreamReader(process.getInputStream())); 248 | while (reader.readLine() != null) { 249 | } 250 | process.waitFor(); 251 | } catch (Exception ex) { 252 | ex.printStackTrace(); 253 | } 254 | } 255 | 256 | private static void callSalmon() { 257 | File f1 = new File(virusListFile); 258 | if (!f1.isFile()) { 259 | System.out.println( 260 | "Could not find the list of viruses (ncbi-viruses-list.txt) " + "in the project directory."); 261 | printUsage(); 262 | System.exit(1); 263 | } 264 | try { 265 | String command = ""; 266 | // index file is given 267 | if (!kallistoIndexFile.isEmpty()) { 268 | if (read2.isEmpty()) { 269 | command = "salmon quant -i " + kallistoIndexFile 270 | + " -l A -r " + read1 + " -o " + outDir 271 | + " --writeMappings | samtools view -bS - | samtools view -h -F 0x04 -b - | " 272 | + "samtools sort - -o " + outDir + "/FastViromeExplorer-reads-mapped-sorted.sam\n"; 273 | } else { 274 | command = "salmon quant -i " + kallistoIndexFile 275 | + " -l A -1 " + read1 + " -2 " + read2 + " -o " 276 | + outDir + " --writeMappings | samtools view -bS - | samtools view -h -F 0x04 -b - | " 277 | + "samtools sort - -o " + outDir + "/FastViromeExplorer-reads-mapped-sorted.sam\n"; 278 | } 279 | } else if (!refDbFile.isEmpty()) { 280 | if (read2.isEmpty()) { 281 | command = "salmon index -t " + refDbFile + " -i salmon-index\n" 282 | + "salmon quant -i salmon-index" 283 | + " -l A -r " + read1 + " -o " + outDir 284 | + " --writeMappings | samtools view -bS - | samtools view -h -F 0x04 -b - | " 285 | + "samtools sort - -o " + outDir + "/FastViromeExplorer-reads-mapped-sorted.sam\n"; 286 | } else { 287 | command = "salmon index -t " + refDbFile + " -i salmon-index\n" 288 | + "salmon quant -i salmon-index" 289 | + " -l A -1 " + read1 + " -2 " + read2 + " -o " + outDir 290 | + " --writeMappings | samtools view -bS - | samtools view -h -F 0x04 -b - | " 291 | + "samtools sort - -o " + outDir + "/FastViromeExplorer-reads-mapped-sorted.sam\n"; 292 | } 293 | } 294 | 295 | FileWriter shellFileWriter = new FileWriter(outDir + "/run.sh"); 296 | shellFileWriter.write("#!/bin/bash\n"); 297 | shellFileWriter.write(command); 298 | shellFileWriter.close(); 299 | 300 | ProcessBuilder builder = new ProcessBuilder("sh", outDir + "/run.sh"); 301 | builder.redirectError(new File(outDir + "/log.txt")); 302 | Process process = builder.start(); 303 | BufferedReader reader = new BufferedReader(new InputStreamReader(process.getInputStream())); 304 | while (reader.readLine() != null) { 305 | } 306 | process.waitFor(); 307 | } catch (Exception ex) { 308 | ex.printStackTrace(); 309 | } 310 | } 311 | 312 | private static void getAverageReadLength() { 313 | try { 314 | String command = ""; 315 | if (read1.endsWith(".gz")) { 316 | command = "gzip -dc " + read1 317 | + " | awk 'NR%4 == 2 {lenSum+=length($0); readCount++;} END {print lenSum/readCount}'"; 318 | } else { 319 | command = "awk 'NR%4 == 2 {lenSum+=length($0); readCount++;} END {print lenSum/readCount}' " 320 | + read1; 321 | } 322 | 323 | FileWriter shellFileWriter = new FileWriter(outDir + "/run.sh"); 324 | shellFileWriter.write("#!/bin/bash\n"); 325 | shellFileWriter.write(command); 326 | shellFileWriter.close(); 327 | 328 | ProcessBuilder builder = new ProcessBuilder("sh", outDir + "/run.sh"); 329 | builder.redirectError(Redirect.appendTo(new File(outDir + "/log.txt"))); 330 | Process process = builder.start(); 331 | BufferedReader reader = new BufferedReader(new InputStreamReader(process.getInputStream())); 332 | String str = ""; 333 | while ((str = reader.readLine()) != null) { 334 | avgReadLen = Double.parseDouble(str); 335 | } 336 | process.waitFor(); 337 | } catch (Exception ex) { 338 | ex.printStackTrace(); 339 | } 340 | 341 | if (avgReadLen == 0) { 342 | System.out.println("Error: Could not extract average read length from read file."); 343 | System.exit(1); 344 | } 345 | } 346 | 347 | private static void getVirusLength() { 348 | virusLength = new HashMap(); 349 | virusLineage = new HashMap(); 350 | BufferedReader br = null; 351 | try { 352 | br = new BufferedReader(new FileReader(virusListFile)); 353 | String str = ""; 354 | while ((str = br.readLine()) != null) { 355 | String[] results = str.split("\t"); 356 | if (results.length == 4) { 357 | int length = Integer.parseInt(results[3]); 358 | virusLength.put(results[0].trim(), length); 359 | virusLineage.put(results[0].trim(), results[1].trim() + "\t" + results[2].trim()); 360 | } else { 361 | int length = Integer.parseInt(results[1]); 362 | virusLength.put(results[0].trim(), length); 363 | virusLineage.put(results[0].trim(), "N/A\tN/A"); 364 | } 365 | } 366 | br.close(); 367 | } catch (Exception e) { 368 | e.printStackTrace(); 369 | } 370 | } 371 | 372 | private static void getRatio() { 373 | virusRatio = new HashMap(); 374 | // read from sam 375 | TreeSet readSet = new TreeSet(); 376 | int totalReads = 0; 377 | int numReads = 0; 378 | BufferedReader br = null; 379 | try { 380 | br = new BufferedReader(new FileReader(outDir + "/FastViromeExplorer-reads-mapped-sorted.sam")); 381 | String str = ""; 382 | String prevVirusName = null; 383 | double ratio = 0.0; 384 | // read sam file 385 | while ((str = br.readLine()) != null) { 386 | if (!str.startsWith("@")) { 387 | totalReads++; 388 | String[] results = str.split("\t"); 389 | String virusName = results[2].trim(); 390 | int startPos = Integer.parseInt(results[3].trim()); 391 | int endPos = results[9].trim().length() + startPos - 1; 392 | if (prevVirusName != null && !virusName.equals(prevVirusName)) 393 | { 394 | // finish calculating ratio for prev virus 395 | double coveredBps = 0.0; 396 | if (!virusLength.containsKey(prevVirusName)) { 397 | System.out.println("Could not get the genome length of " + prevVirusName 398 | + ". Please make sure you provided the right genome-length file using -l parameter."); 399 | } 400 | double genomeLen = virusLength.get(prevVirusName); 401 | 402 | for (int i = 1; i <= genomeLen; i++) { 403 | Read tempRead = new Read(i, i + (2 * (int) avgReadLen)); 404 | NavigableSet smallSet = readSet.headSet(tempRead, true); 405 | Iterator it = smallSet.descendingIterator(); 406 | while (it.hasNext()) { 407 | tempRead = it.next(); 408 | if (i >= tempRead.getStartPos() && i <= tempRead.getEndPos()) { 409 | coveredBps++; 410 | break; 411 | } 412 | if (tempRead.getEndPos() + 2 * avgReadLen < i) { 413 | break; 414 | } 415 | } 416 | } 417 | 418 | double support = coveredBps / genomeLen; 419 | double cov = (numReads * avgReadLen) / genomeLen; 420 | double predictedSupport = 1 - Math.exp(-cov); 421 | ratio = 0.0; 422 | if (support < predictedSupport) { 423 | ratio = support / predictedSupport; 424 | } else { 425 | ratio = predictedSupport / support; 426 | } 427 | if (ratio >= ratioCriteria && support >= coverageCriteria) { 428 | if (reportRatio) { 429 | virusRatio.put(prevVirusName, support + "\t" + predictedSupport + "\t" 430 | + new DecimalFormat("##.####").format(ratio)); 431 | } 432 | else { 433 | virusRatio.put(prevVirusName, ""); 434 | } 435 | } 436 | // create new readSet 437 | readSet = new TreeSet(); 438 | numReads = 0; 439 | Read read = new Read(startPos, endPos); 440 | numReads++; 441 | readSet.add(read); 442 | } 443 | else { 444 | Read read = new Read(startPos, endPos); 445 | numReads++; 446 | readSet.add(read); 447 | } 448 | prevVirusName = virusName; 449 | } 450 | } 451 | // calculate ratio for the last virus 452 | if (prevVirusName != null) { 453 | double coveredBps = 0.0; 454 | if (!virusLength.containsKey(prevVirusName)) { 455 | System.out.println("Could not get the genome length of " + prevVirusName 456 | + ". Please make sure you provided the right genome-length file using -l parameter."); 457 | } 458 | double genomeLen = virusLength.get(prevVirusName); 459 | for (int i = 1; i <= genomeLen; i++) { 460 | Read tempRead = new Read(i, i + (2 * (int) avgReadLen)); 461 | NavigableSet smallSet = readSet.headSet(tempRead, true); 462 | Iterator it = smallSet.descendingIterator(); 463 | while (it.hasNext()) { 464 | tempRead = it.next(); 465 | if (i >= tempRead.getStartPos() && i <= tempRead.getEndPos()) { 466 | coveredBps++; 467 | break; 468 | } 469 | if (tempRead.getEndPos() + 2 * avgReadLen < i) { 470 | break; 471 | } 472 | } 473 | } 474 | 475 | double support = coveredBps / genomeLen; 476 | double cov = (numReads * avgReadLen) / genomeLen; 477 | double predictedSupport = 1 - Math.exp(-cov); 478 | ratio = 0.0; 479 | if (support < predictedSupport) { 480 | ratio = support / predictedSupport; 481 | } else { 482 | ratio = predictedSupport / support; 483 | } 484 | 485 | if (ratio >= ratioCriteria && support >= coverageCriteria) { 486 | if (reportRatio) { 487 | virusRatio.put(prevVirusName, support + "\t" + predictedSupport + "\t" 488 | + new DecimalFormat("##.####").format(ratio)); 489 | } 490 | else { 491 | virusRatio.put(prevVirusName, ""); 492 | } 493 | } 494 | } 495 | 496 | readSet = null; 497 | br.close(); 498 | } catch (Exception e) { 499 | e.printStackTrace(); 500 | } 501 | if (totalReads == 0) { 502 | System.out.println("Error: The sam file " 503 | + "FastViromeExplorer-reads-mapped-sorted.sam is empty. Please check the " 504 | + "kallisto and samtools version. Please use kallisto 0.43.1 and samtools 1.4 or later."); 505 | System.exit(1); 506 | } 507 | else { 508 | System.out.println("Processed " + totalReads + " reads from " 509 | + "FastViromeExplorer-reads-mapped-sorted.sam."); 510 | } 511 | } 512 | 513 | private static int getSortedAbundanceRatio() { 514 | int numFinalViruses = 0; 515 | Map map = new HashMap(); 516 | BufferedReader br = null; 517 | try { 518 | br = new BufferedReader(new FileReader(outDir + "/abundance.tsv")); 519 | br.readLine(); 520 | String str = ""; 521 | while ((str = br.readLine()) != null) { 522 | String[] results = str.split("\t"); 523 | double est_count = Double.parseDouble(results[3]); 524 | if (est_count != 0.0) { 525 | map.put(results[0], est_count); 526 | } 527 | } 528 | br.close(); 529 | } catch (Exception e) { 530 | e.printStackTrace(); 531 | } 532 | 533 | map = sortByComparator(map, DESC); 534 | 535 | // write in file 536 | BufferedWriter bw = null; 537 | try { 538 | bw = new BufferedWriter(new FileWriter(outDir + "/FastViromeExplorer-final-sorted-abundance.tsv")); 539 | if (reportRatio) { 540 | bw.write( 541 | "#NCBIAccession\tName\tkingdom;phylum;class;order;family;genus;species\tEstimatedAbundance\tSupport\tPredictedSupport\tRatio\n"); 542 | for (Entry entry : map.entrySet()) { 543 | if (virusRatio.containsKey(entry.getKey()) && entry.getValue() >= numReadsCriteria) { 544 | numFinalViruses++; 545 | bw.write(entry.getKey() + "\t" + virusLineage.get(entry.getKey()) + "\t" + entry.getValue() + "\t" 546 | + virusRatio.get(entry.getKey()) + "\n"); 547 | } 548 | } 549 | } 550 | else { 551 | bw.write( 552 | "#VirusIdentifier\tVirusName\tkingdom;phylum;class;order;family;genus;species\tEstimatedAbundance\n"); 553 | for (Entry entry : map.entrySet()) { 554 | if (virusRatio.containsKey(entry.getKey()) && entry.getValue() >= numReadsCriteria) { 555 | numFinalViruses++; 556 | bw.write(entry.getKey() + "\t" + virusLineage.get(entry.getKey()) + "\t" + entry.getValue() + "\n"); 557 | } 558 | } 559 | } 560 | bw.close(); 561 | } 562 | catch (Exception e) { 563 | e.printStackTrace(); 564 | } 565 | return numFinalViruses; 566 | } 567 | 568 | private static int getSortedAbundanceRatioFromSalmon() { 569 | int numFinalViruses = 0; 570 | Map map = new HashMap(); 571 | BufferedReader br = null; 572 | try { 573 | br = new BufferedReader(new FileReader(outDir + "/quant.sf")); 574 | br.readLine(); 575 | String str = ""; 576 | while ((str = br.readLine()) != null) { 577 | String[] results = str.split("\t"); 578 | double est_count = Double.parseDouble(results[4]); 579 | if (est_count != 0.0) { 580 | map.put(results[0], est_count); 581 | } 582 | } 583 | br.close(); 584 | } catch (Exception e) { 585 | e.printStackTrace(); 586 | } 587 | 588 | map = sortByComparator(map, DESC); 589 | 590 | // write in file 591 | BufferedWriter bw = null; 592 | try { 593 | bw = new BufferedWriter(new FileWriter(outDir + "/FastViromeExplorer-final-sorted-abundance.tsv")); 594 | if (reportRatio) { 595 | bw.write( 596 | "#NCBIAccession\tName\tkingdom;phylum;class;order;family;genus;species\tEstimatedAbundance\tSupport\tPredictedSupport\tRatio\n"); 597 | for (Entry entry : map.entrySet()) { 598 | if (virusRatio.containsKey(entry.getKey()) && entry.getValue() >= numReadsCriteria) { 599 | numFinalViruses++; 600 | bw.write(entry.getKey() + "\t" + virusLineage.get(entry.getKey()) + "\t" + entry.getValue() + "\t" 601 | + virusRatio.get(entry.getKey()) + "\n"); 602 | } 603 | } 604 | } 605 | else { 606 | bw.write( 607 | "#VirusIdentifier\tVirusName\tkingdom;phylum;class;order;family;genus;species\tEstimatedAbundance\n"); 608 | for (Entry entry : map.entrySet()) { 609 | if (virusRatio.containsKey(entry.getKey()) && entry.getValue() >= numReadsCriteria) { 610 | numFinalViruses++; 611 | bw.write(entry.getKey() + "\t" + virusLineage.get(entry.getKey()) + "\t" + entry.getValue() + "\n"); 612 | } 613 | } 614 | } 615 | bw.close(); 616 | } 617 | catch (Exception e) { 618 | e.printStackTrace(); 619 | } 620 | return numFinalViruses; 621 | } 622 | 623 | public static void main(String[] args) { 624 | parseArguments(args); 625 | checkInputs(); 626 | System.out.println("Finished parsing inputs."); 627 | if (useSalmon) { 628 | callSalmon(); 629 | } else { 630 | callKallisto(); 631 | } 632 | getAverageReadLength(); 633 | getVirusLength(); 634 | getRatio(); 635 | int numFinalViruses = 0; 636 | if (useSalmon) { 637 | numFinalViruses = getSortedAbundanceRatioFromSalmon(); 638 | } else { 639 | numFinalViruses = getSortedAbundanceRatio(); 640 | } 641 | if (numFinalViruses == 0) { 642 | System.out.println("None of the viruses in the given database passed all the 3 filtering criteria. " 643 | + "So the output file FastViromeExplorer-final-sorted-abundance.tsv is empty. " 644 | + "To get some output, you can relax the filtering criteria or you can " 645 | + "change the database to better fit this sample."); 646 | } 647 | else { 648 | System.out.println("FastViromeExplorer reported " + numFinalViruses 649 | + " viruses/genomes in the output file FastViromeExplorer-final-sorted-abundance.tsv"); 650 | } 651 | System.out.println("Finished running FastViromeExplorer."); 652 | } 653 | } 654 | -------------------------------------------------------------------------------- /src/Read.java: -------------------------------------------------------------------------------- 1 | public class Read implements Comparable{ 2 | private int startPos; 3 | private int endPos; 4 | 5 | public Read() { 6 | 7 | } 8 | 9 | public Read(int startPos, int endPos) { 10 | this.startPos = startPos; 11 | this.endPos = endPos; 12 | } 13 | 14 | public int getStartPos() { 15 | return startPos; 16 | } 17 | 18 | public void setStartPos(int startPos) { 19 | this.startPos = startPos; 20 | } 21 | 22 | public int getEndPos() { 23 | return endPos; 24 | } 25 | 26 | public void setEndPos(int endPos) { 27 | this.endPos = endPos; 28 | } 29 | 30 | @Override 31 | public int compareTo(Read o) { 32 | int diff = this.startPos - o.startPos; 33 | if (diff == 0) { 34 | return (this.endPos - o.endPos); 35 | } 36 | return diff; 37 | } 38 | 39 | @Override 40 | public boolean equals(Object o) { 41 | return (o instanceof Read) && (this.compareTo((Read)o) == 0); 42 | } 43 | } 44 | -------------------------------------------------------------------------------- /test/reads_1.fq.gz: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/saima-tithi/FastViromeExplorer/bf9e73c6aa316db49b140f5b9154edc364187327/test/reads_1.fq.gz -------------------------------------------------------------------------------- /test/reads_2.fq.gz: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/saima-tithi/FastViromeExplorer/bf9e73c6aa316db49b140f5b9154edc364187327/test/reads_2.fq.gz -------------------------------------------------------------------------------- /test/testset-kallisto-index.idx: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/saima-tithi/FastViromeExplorer/bf9e73c6aa316db49b140f5b9154edc364187327/test/testset-kallisto-index.idx -------------------------------------------------------------------------------- /test/testset-salmon-index/hash.bin: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/saima-tithi/FastViromeExplorer/bf9e73c6aa316db49b140f5b9154edc364187327/test/testset-salmon-index/hash.bin -------------------------------------------------------------------------------- /test/testset-salmon-index/header.json: -------------------------------------------------------------------------------- 1 | { 2 | "value0": { 3 | "IndexType": 1, 4 | "IndexVersion": "q5", 5 | "UsesKmers": true, 6 | "KmerLen": 31, 7 | "BigSA": false, 8 | "PerfectHash": false, 9 | "SeqHash": "668dc4e8bfda265b61019978a3f25cbbc9105f2116832fac6fdfabe48dd773eb", 10 | "NameHash": "763b68f34093851e2704b920cca419a4c709416a5575181b2a1ff535bdc6b891" 11 | } 12 | } -------------------------------------------------------------------------------- /test/testset-salmon-index/indexing.log: -------------------------------------------------------------------------------- 1 | [2018-01-17 16:06:49.093] [jLog] [info] building index 2 | [2018-01-17 16:06:49.612] [jLog] [info] done building index 3 | -------------------------------------------------------------------------------- /test/testset-salmon-index/quasi_index.log: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/saima-tithi/FastViromeExplorer/bf9e73c6aa316db49b140f5b9154edc364187327/test/testset-salmon-index/quasi_index.log -------------------------------------------------------------------------------- /test/testset-salmon-index/refInfo.json: -------------------------------------------------------------------------------- 1 | { 2 | "ReferenceFiles": [ 3 | "testset.fa" 4 | ] 5 | } -------------------------------------------------------------------------------- /test/testset-salmon-index/rsd.bin: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/saima-tithi/FastViromeExplorer/bf9e73c6aa316db49b140f5b9154edc364187327/test/testset-salmon-index/rsd.bin -------------------------------------------------------------------------------- /test/testset-salmon-index/sa.bin: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/saima-tithi/FastViromeExplorer/bf9e73c6aa316db49b140f5b9154edc364187327/test/testset-salmon-index/sa.bin -------------------------------------------------------------------------------- /test/testset-salmon-index/txpInfo.bin: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/saima-tithi/FastViromeExplorer/bf9e73c6aa316db49b140f5b9154edc364187327/test/testset-salmon-index/txpInfo.bin -------------------------------------------------------------------------------- /test/testset-salmon-index/versionInfo.json: -------------------------------------------------------------------------------- 1 | { 2 | "indexVersion": 2, 3 | "hasAuxIndex": false, 4 | "auxKmerLength": 31, 5 | "indexType": 1 6 | } -------------------------------------------------------------------------------- /tools-linux/kallisto: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/saima-tithi/FastViromeExplorer/bf9e73c6aa316db49b140f5b9154edc364187327/tools-linux/kallisto -------------------------------------------------------------------------------- /tools-linux/kallisto-license.txt: -------------------------------------------------------------------------------- 1 | BSD 2-Clause License 2 | 3 | Copyright (c) 2017, Nicolas Bray, Harold Pimentel, Páll Melsted and Lior Pachter 4 | All rights reserved. 5 | 6 | Redistribution and use in source and binary forms, with or without 7 | modification, are permitted provided that the following conditions are met: 8 | 9 | * Redistributions of source code must retain the above copyright notice, this 10 | list of conditions and the following disclaimer. 11 | 12 | * Redistributions in binary form must reproduce the above copyright notice, 13 | this list of conditions and the following disclaimer in the documentation 14 | and/or other materials provided with the distribution. 15 | 16 | THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" 17 | AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE 18 | IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE 19 | DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE 20 | FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL 21 | DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR 22 | SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER 23 | CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, 24 | OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE 25 | OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. 26 | -------------------------------------------------------------------------------- /tools-linux/samtools: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/saima-tithi/FastViromeExplorer/bf9e73c6aa316db49b140f5b9154edc364187327/tools-linux/samtools -------------------------------------------------------------------------------- /tools-mac/kallisto: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/saima-tithi/FastViromeExplorer/bf9e73c6aa316db49b140f5b9154edc364187327/tools-mac/kallisto -------------------------------------------------------------------------------- /tools-mac/kallisto-license.txt: -------------------------------------------------------------------------------- 1 | BSD 2-Clause License 2 | 3 | Copyright (c) 2017, Nicolas Bray, Harold Pimentel, Páll Melsted and Lior Pachter 4 | All rights reserved. 5 | 6 | Redistribution and use in source and binary forms, with or without 7 | modification, are permitted provided that the following conditions are met: 8 | 9 | * Redistributions of source code must retain the above copyright notice, this 10 | list of conditions and the following disclaimer. 11 | 12 | * Redistributions in binary form must reproduce the above copyright notice, 13 | this list of conditions and the following disclaimer in the documentation 14 | and/or other materials provided with the distribution. 15 | 16 | THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" 17 | AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE 18 | IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE 19 | DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE 20 | FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL 21 | DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR 22 | SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER 23 | CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, 24 | OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE 25 | OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. 26 | -------------------------------------------------------------------------------- /tools-mac/salmon: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/saima-tithi/FastViromeExplorer/bf9e73c6aa316db49b140f5b9154edc364187327/tools-mac/salmon -------------------------------------------------------------------------------- /tools-mac/samtools: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/saima-tithi/FastViromeExplorer/bf9e73c6aa316db49b140f5b9154edc364187327/tools-mac/samtools -------------------------------------------------------------------------------- /utility-scripts/generateGenomeList.sh: -------------------------------------------------------------------------------- 1 | #!/bin/bash 2 | 3 | # exit script if one command fails 4 | set -o errexit 5 | 6 | # exit script if variable is not set 7 | set -o nounset 8 | 9 | samtools faidx $1 10 | awk '{print $1"\tN/A\tN/A\t"$2}' $1.fai > $2 11 | 12 | --------------------------------------------------------------------------------