├── rcst ├── ex.pdf ├── comp.csv ├── locate.csv ├── synth.csv ├── Makefile ├── locate_bars.R ├── synth.R ├── locate.R └── concl.tex ├── logs_spire2014 ├── results.ods └── log_bwt.txt ├── .gitignore ├── logs_old_rcst ├── index_multi_0.0001.err ├── index_multi_0.0003.err ├── index_multi_0.0010.err ├── index_multi_0.0030.err ├── index_multi_0.0300.err ├── index_multi_0.0100.err ├── index_multi_0.1000.err ├── README.md ├── cst_size.py ├── cst_traverse_4379375.log ├── cst_traverse_4379376.log ├── cst_compare_4379377.log ├── cst_compare_4379378.log ├── verify_lcp_3888648.log ├── verify_lcp_3888650.log ├── build_rfm_4358894.log ├── build_rfm_4358895.log ├── old_rfm_3891260.log ├── old_rfm_3892021.log ├── old_rfm_3891261.log ├── old_rfm_3892022.log ├── verify_psi_3917632.log ├── verify_psi_3917633.log ├── index_synth_0.0003.log ├── index_synth_0.0030.log ├── index_synth_0.0001.log ├── index_synth_0.0010.log ├── index_synth_0.0100.log ├── index_synth_0.0300.log ├── index_synth_0.1000.log ├── locate_test_7_64.log ├── locate_test_17_64.log ├── locate_test_31_64.log ├── locate_test_61_64.log ├── locate_test_127_127.log ├── test_rfm_3891909.log ├── test_rfm_3876245.log └── test_rfm_3876247.log ├── logs_rst ├── rlcp_female_maternal2.log ├── rlcp_human_maternal.log ├── breakdown_human_maternal.log ├── breakdown_female_maternal2.log ├── query_human_maternal.log ├── query_female_maternal2.log ├── README.md ├── old_human_maternal.log ├── old_female_maternal2.log ├── locate_female_maternal2_17.log ├── locate_female_maternal2_31.log ├── locate_female_maternal2_127.log ├── locate_female_maternal2_61.log ├── locate_female_maternal2_7.log ├── build_human_maternal.log ├── build_female_maternal2.log ├── verify_human_maternal.log ├── verify_female_maternal2.log ├── cst_female_maternal2.log ├── cst_size.py ├── old_female_maternal2.err └── old_human_maternal.err ├── LICENSE ├── Makefile ├── rlcp_size.cpp ├── scripts ├── fasta2seq.cpp └── col2vector.cpp ├── logs_spire2015 └── verify_psi_3963100.log ├── utils.cpp ├── new_relative_lcp.h ├── mutate.cpp ├── spire2014 └── relative.bib └── align_bwts.cpp /rcst/ex.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/jltsiren/relative-fm/HEAD/rcst/ex.pdf -------------------------------------------------------------------------------- /logs_spire2014/results.ods: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/jltsiren/relative-fm/HEAD/logs_spire2014/results.ods -------------------------------------------------------------------------------- /rcst/comp.csv: -------------------------------------------------------------------------------- 1 | RST;1.897410170497;2.60642192931;4.083947005119;6.677788673318;12.111519432752;14.834728285323;15.092026425534 2 | GCT;1.54805873553;1.93883420797;2.41493374465;3.04908091423;3.66712044889;3.82198714562;3.52361322795 3 | -------------------------------------------------------------------------------- /rcst/locate.csv: -------------------------------------------------------------------------------- 1 | Rate;7;17;31;61;127 2 | SSA;8.51921;5.83014;4.98004;4.47237;3.95172 3 | SSA;170.13;571.42;1317.27;3235.51;8043.51 4 | SSA-RRR;6.82859;4.13952;3.28942;2.78175;2.2611 5 | SSA-RRR;1424.32;4189.3;8579.72;19122;45394.7 6 | RFM;1.10643;1.10643;1.10643;1.10643;1.10643 7 | RFM;1556.7;2085.13;2939.68;4999.69;9993.2 8 | -------------------------------------------------------------------------------- /.gitignore: -------------------------------------------------------------------------------- 1 | *.o 2 | librfm.a 3 | 4 | *.aux 5 | *.bak 6 | *.bbl 7 | *.blg 8 | *.pdf 9 | 10 | relative.log 11 | paper.log 12 | 13 | build_bwt 14 | align_bwts 15 | build_rlcp 16 | query_test 17 | cst_traverse 18 | cst_compare 19 | rlcp_size 20 | mutate 21 | verify 22 | 23 | col2vector 24 | fasta2seq 25 | 26 | .DS_Store 27 | -------------------------------------------------------------------------------- /rcst/synth.csv: -------------------------------------------------------------------------------- 1 | RFM;0.708735378279;0.837742123038;1.16687640158;1.90319723581;3.71157777432;6.48250452931;8.34207185172 2 | RLCP;0.615638882996;1.19564374294;2.3440318777;4.20155643327;7.82689705007;7.77917983734;6.17695042476 3 | Reference;0.573035909222;0.573036063332;0.573038725839;0.573035004238;0.573044608362;0.573043918673;0.573004149054 4 | RST;1.897410170497;2.60642192931;4.083947005119;6.677788673318;12.111519432752;14.834728285323;15.092026425534 5 | -------------------------------------------------------------------------------- /logs_old_rcst/index_multi_0.0001.err: -------------------------------------------------------------------------------- 1 | Building TextIndex 2 | change q to 32 again 3 | Verbatim grammar: 1.6183e+10 4 | Dictionary: 14132036 5 | C: 44979336 6 | Sampled pos in C: 6102436 7 | Sampled offset in C: 2435776 8 | Heap excess: 3014634 9 | Heap min excess: 9153508 10 | Rules MIN excess: 1665959 11 | Rules lengths: 4259202 12 | Rules total excess: 1419438 13 | Rules leafs: 3373654 14 | Sampled leafs in C: 7373756 15 | All fsb/lstb of leafs: 3544000 16 | CST size: 101453735 17 | -------------------------------------------------------------------------------- /logs_old_rcst/index_multi_0.0003.err: -------------------------------------------------------------------------------- 1 | Building TextIndex 2 | change q to 32 again 3 | Verbatim grammar: 1.61757e+10 4 | Dictionary: 21921837 5 | C: 55840116 6 | Sampled pos in C: 6349468 7 | Sampled offset in C: 2360268 8 | Heap excess: 3125784 9 | Heap min excess: 9143104 10 | Rules MIN excess: 2788274 11 | Rules lengths: 5939181 12 | Rules total excess: 2308347 13 | Rules leafs: 4623615 14 | Sampled leafs in C: 7365368 15 | All fsb/lstb of leafs: 5298240 16 | CST size: 127063602 17 | -------------------------------------------------------------------------------- /logs_old_rcst/index_multi_0.0010.err: -------------------------------------------------------------------------------- 1 | Building TextIndex 2 | change q to 32 again 3 | Verbatim grammar: 1.61391e+10 4 | Dictionary: 31391137 5 | C: 68585236 6 | Sampled pos in C: 6329204 7 | Sampled offset in C: 2296092 8 | Heap excess: 3204126 9 | Heap min excess: 9113908 10 | Rules MIN excess: 4082242 11 | Rules lengths: 8257588 12 | Rules total excess: 3700773 13 | Rules leafs: 6375488 14 | Sampled leafs in C: 7341860 15 | All fsb/lstb of leafs: 7586912 16 | CST size: 158264566 17 | -------------------------------------------------------------------------------- /logs_old_rcst/index_multi_0.0030.err: -------------------------------------------------------------------------------- 1 | Building TextIndex 2 | change q to 32 again 3 | Verbatim grammar: 1.60192e+10 4 | Dictionary: 38077858 5 | C: 98001848 6 | Sampled pos in C: 6271208 7 | Sampled offset in C: 2190873 8 | Heap excess: 3280121 9 | Heap min excess: 9030392 10 | Rules MIN excess: 5084111 11 | Rules lengths: 9492673 12 | Rules total excess: 4951809 13 | Rules leafs: 7288660 14 | Sampled leafs in C: 7274584 15 | All fsb/lstb of leafs: 8881056 16 | CST size: 199825193 17 | -------------------------------------------------------------------------------- /logs_old_rcst/index_multi_0.0300.err: -------------------------------------------------------------------------------- 1 | Building TextIndex 2 | change q to 32 again 3 | Verbatim grammar: 1.4865e+10 4 | Dictionary: 30426388 5 | C: 167769704 6 | Sampled pos in C: 6030028 7 | Sampled offset in C: 1656731 8 | Heap excess: 3013820 9 | Heap min excess: 8349132 10 | Rules MIN excess: 3781175 11 | Rules lengths: 6892208 12 | Rules total excess: 3290501 13 | Rules leafs: 5185435 14 | Sampled leafs in C: 6725788 15 | All fsb/lstb of leafs: 7353728 16 | CST size: 250474638 17 | -------------------------------------------------------------------------------- /logs_old_rcst/index_multi_0.0100.err: -------------------------------------------------------------------------------- 1 | Building TextIndex 2 | change q to 32 again 3 | Verbatim grammar: 1.56078e+10 4 | Dictionary: 39753529 5 | C: 136564568 6 | Sampled pos in C: 6341236 7 | Sampled offset in C: 1892619 8 | Heap excess: 3190901 9 | Heap min excess: 8780044 10 | Rules MIN excess: 5276080 11 | Rules lengths: 9660235 12 | Rules total excess: 5153295 13 | Rules leafs: 7367848 14 | Sampled leafs in C: 7072904 15 | All fsb/lstb of leafs: 9271872 16 | CST size: 240325131 17 | -------------------------------------------------------------------------------- /logs_old_rcst/index_multi_0.1000.err: -------------------------------------------------------------------------------- 1 | Building TextIndex 2 | change q to 32 again 3 | Verbatim grammar: 1.38033e+10 4 | Dictionary: 18736433 5 | C: 173866988 6 | Sampled pos in C: 5608728 7 | Sampled offset in C: 1419708 8 | Heap excess: 2483369 9 | Heap min excess: 7765796 10 | Rules MIN excess: 1886617 11 | Rules lengths: 4006750 12 | Rules total excess: 1457411 13 | Rules leafs: 2920647 14 | Sampled leafs in C: 6255876 15 | All fsb/lstb of leafs: 4528352 16 | CST size: 230936675 17 | -------------------------------------------------------------------------------- /logs_rst/rlcp_female_maternal2.log: -------------------------------------------------------------------------------- 1 | Start time: 1493452843 2 | 3 | Building the relative LCP array 4 | 5 | Reference: female.lcp 6 | Length: 3036320416 7 | 8 | Target: maternal2.lcp 9 | Length: 3036191208 10 | Construction: 25166 seconds 11 | Phrases: 93966191 (average length 32.3115) 12 | LCP array: 499.834 MB (1.38098 bpc) 13 | Tree: 96.7397 MB (0.267279 bpc) 14 | Relative LCP: 596.574 MB (1.64826 bpc) 15 | 16 | Memory usage: 141.528 GB 17 | 18 | Finish time: 1493478094 19 | -------------------------------------------------------------------------------- /logs_rst/rlcp_human_maternal.log: -------------------------------------------------------------------------------- 1 | Start time: 1493404830 2 | 3 | Building the relative LCP array 4 | 5 | Reference: human.lcp 6 | Length: 3095693982 7 | 8 | Target: maternal.lcp 9 | Length: 3036191208 10 | Construction: 27229.3 seconds 11 | Phrases: 128458615 (average length 23.6356) 12 | LCP array: 1043.61 MB (2.88336 bpc) 13 | Tree: 189.71 MB (0.524144 bpc) 14 | Relative LCP: 1233.32 MB (3.40751 bpc) 15 | 16 | Memory usage: 141.14 GB 17 | 18 | Finish time: 1493432160 19 | -------------------------------------------------------------------------------- /logs_rst/breakdown_human_maternal.log: -------------------------------------------------------------------------------- 1 | Start time: 1494586821 2 | 3 | RLCP size breakdown 4 | 5 | Reference: human 6 | 7 | LCP: 3862.42 MB (10.4663 bpc) 8 | 9 | 10 | Target: maternal 11 | 12 | LCP array: 1043.61 MB (2.88336 bpc) 13 | Tree: 189.71 MB (0.524144 bpc) 14 | Relative LCP: 1233.32 MB (3.40751 bpc) 15 | 16 | RLZAP: 1043.61 MB (2.88336 bpc) 17 | literals: 556.704 MB (1.5381 bpc) 18 | parse: 486.906 MB (1.34526 bpc) 19 | 20 | 21 | Memory used: 5104.23 MB 22 | 23 | Finish time: 1494586883 24 | -------------------------------------------------------------------------------- /logs_rst/breakdown_female_maternal2.log: -------------------------------------------------------------------------------- 1 | Start time: 1494586834 2 | 3 | RLCP size breakdown 4 | 5 | Reference: female 6 | 7 | LCP: 3690.1 MB (10.1948 bpc) 8 | 9 | 10 | Target: maternal2 11 | 12 | LCP array: 499.834 MB (1.38098 bpc) 13 | Tree: 96.7397 MB (0.267279 bpc) 14 | Relative LCP: 596.574 MB (1.64826 bpc) 15 | 16 | RLZAP: 499.834 MB (1.38098 bpc) 17 | literals: 148.238 MB (0.409562 bpc) 18 | parse: 351.596 MB (0.971415 bpc) 19 | 20 | 21 | Memory used: 4294.18 MB 22 | 23 | Finish time: 1494586881 24 | -------------------------------------------------------------------------------- /logs_spire2014/log_bwt.txt: -------------------------------------------------------------------------------- 1 | File: human 2 | File size: 3095693981 3 | Text size: 3095693982 4 | BWT built in 1198.28 seconds 5 | Memory usage: 26572.4 MB 6 | File: yanhuang 7 | File size: 3001661309 8 | Text size: 3001661310 9 | BWT built in 1150.38 seconds 10 | Memory usage: 25765.4 MB 11 | File: venter 12 | File size: 2809547336 13 | Text size: 2809547337 14 | BWT built in 1151.45 seconds 15 | Memory usage: 24116.3 MB 16 | File: maternal 17 | File size: 3036191207 18 | Text size: 3036191208 19 | BWT built in 1145.23 seconds 20 | Memory usage: 26061.6 MB 21 | File: paternal 22 | File size: 3036185259 23 | Text size: 3036185260 24 | BWT built in 1144.28 seconds 25 | Memory usage: 26061.6 MB 26 | -------------------------------------------------------------------------------- /logs_rst/query_human_maternal.log: -------------------------------------------------------------------------------- 1 | Start time: 1493802427 2 | 3 | Query test 4 | 5 | Reference: human 6 | Sequence: maternal 7 | Patterns: patterns 8 | 9 | Read 2000000 patterns of total length 64000000 10 | 11 | 12 | SimpleFM: 2110.17 MB (5.83014 bpc) 13 | SimpleFM: Found 2000000 patterns with 254997642 occ in 25.6761 seconds (2.37712 MB/s) 14 | 15 | SimpleFM: 1498.27 MB (4.13952 bpc) 16 | SimpleFM: Found 2000000 patterns with 254997642 occ in 129.845 seconds (0.470062 MB/s) 17 | 18 | RFM: 455.695 MB (1.25903 bpc) 19 | RFM: Found 2000000 patterns with 254997642 occ in 259.389 seconds (0.235304 MB/s) 20 | 21 | 22 | Memory usage: 2980.88 MB 23 | 24 | Finish time: 1493803088 25 | -------------------------------------------------------------------------------- /logs_rst/query_female_maternal2.log: -------------------------------------------------------------------------------- 1 | Start time: 1493803191 2 | 3 | Query test 4 | 5 | Reference: female 6 | Sequence: maternal2 7 | Patterns: patterns 8 | 9 | Read 2000000 patterns of total length 64000000 10 | 11 | 12 | SimpleFM: 2110.17 MB (5.83014 bpc) 13 | SimpleFM: Found 2000000 patterns with 254997642 occ in 25.8003 seconds (2.36568 MB/s) 14 | 15 | SimpleFM: 1498.27 MB (4.13952 bpc) 16 | SimpleFM: Found 2000000 patterns with 254997642 occ in 129.956 seconds (0.46966 MB/s) 17 | 18 | RFM: 400.462 MB (1.10643 bpc) 19 | RFM: Found 2000000 patterns with 254997642 occ in 246.484 seconds (0.247623 MB/s) 20 | 21 | 22 | Memory usage: 2865.13 MB 23 | 24 | Finish time: 1493803837 25 | -------------------------------------------------------------------------------- /rcst/Makefile: -------------------------------------------------------------------------------- 1 | SOURCES=$(wildcard *.tex) 2 | FIGURES=locate.pdf locate_bars.pdf synth.pdf comp.pdf ex.pdf 3 | 4 | all: paper.pdf 5 | 6 | paper.pdf: $(SOURCES) $(FIGURES) paper.bbl 7 | pdflatex paper 8 | pdflatex paper 9 | 10 | force: $(FIGURES) paper.bbl 11 | rm -f paper.pdf 12 | pdflatex paper 13 | pdflatex paper 14 | 15 | bib: $(FIGURES) 16 | pdflatex paper 17 | bibtex paper 18 | 19 | paper.bbl: paper.bib 20 | pdflatex paper 21 | bibtex paper 22 | 23 | locate.pdf: locate.csv locate.R 24 | R --slave --args locate x legend < locate.R 25 | 26 | locate_bars.pdf: locate.csv locate_bars.R 27 | R --slave --args locate xy legend < locate_bars.R 28 | 29 | synth.pdf: synth.csv synth.R 30 | R --slave --args synth xy legend < synth.R 31 | 32 | comp.pdf: comp.csv synth.R 33 | R --slave --args comp x legend < synth.R 34 | 35 | clean: 36 | rm -f *.aux *.bak *.bbl *.blg *.log paper.pdf $(FIGURES) 37 | -------------------------------------------------------------------------------- /logs_rst/README.md: -------------------------------------------------------------------------------- 1 | # Experiment logs for the RST paper 2 | 3 | * **Note: These logs are for the submitted 2017 version of the paper.** 4 | * "human" is the human reference genome. 5 | * "female" is the human reference genome without chromosome Y. 6 | * "maternal" and "paternal" are the corresponding haplotypes of NA12878. 7 | * Numbers indicate copies of the same files. 8 | 9 | ## Index construction 10 | 11 | * SSA, Full RFM: `build_*.log` 12 | * Basic RFM: `old_*.log`, `old_*.err` 13 | * RLCP: `rlcp_*.log` 14 | 15 | ## Component sizes 16 | 17 | * RFM: Same as for construction 18 | * RLCP: `breakdown_*.log` 19 | 20 | ## Benchmarks 21 | 22 | * Basic queries: `verify_*.log` 23 | * Find queries: `query_*.log` 24 | * Locate queries: `locate_*.log` 25 | * The last number indicates SA sample interval. 26 | * CST comparison: `cst_*.log` 27 | 28 | ## Synthetic collections 29 | 30 | * `multi_*.log`, where the last number indicates mutation rate. 31 | * Use `cst_size.py` to parse the logs. 32 | * The comparison to GCT comes from the old benchmarks, as the construction of each GCT takes around 2.5 days. 33 | -------------------------------------------------------------------------------- /logs_old_rcst/README.md: -------------------------------------------------------------------------------- 1 | # Experiment logs for the RCST paper 2 | 3 | * **Note: These logs are for the old arXiv version of the paper from 2015.** 4 | * "human" is the human reference genome. 5 | * "female" is the human reference genome without chromosome Y. 6 | * "maternal" and "paternal" are the corresponding haplotypes of NA12878. 7 | * Prefixes and numbers indicate copies of the same files. 8 | 9 | ## Index construction and index sizes 10 | 11 | * Basic RFM: `old_rfm_*.log` 12 | * SSA, Full RFM: `build_rfm_*.log` 13 | * RCST: `test_rfm_*.log` 14 | 15 | ## Component sizes 16 | 17 | * Basic RFM: `old_rfm_*.log` 18 | * Full RFM, RLCP: `test_rfm_*.log` 19 | 20 | ## Query tests 21 | 22 | * LF/Psi in SSA, RFM: `verify_psi_*.log` 23 | * LCP, RLCP: `verify_lcp_*.log` 24 | * Locate queries: `locate_test_*.log` 25 | 26 | ## Synthetic datasets 27 | 28 | * Varying mutation rate: `index_synth_*.log` 29 | * Comparison to GCT: `index_multi_*.log`, `index_multi_*.err` 30 | * Use `cst_size.py` to parse the `.log` files. 31 | 32 | ## Compressed suffix trees 33 | 34 | * Full traversal: `cst_traversal_*.log` 35 | * Matching statistics: `cst_compare_*.log` 36 | -------------------------------------------------------------------------------- /logs_rst/old_human_maternal.log: -------------------------------------------------------------------------------- 1 | Start time: 1494230825 2 | 3 | Relative FM-index builder 4 | Using OpenMP with 32 threads 5 | 6 | Algorithm: partitioning 7 | Block size: 1024 8 | Maximum diagonal: 8192 9 | Maximum length: 32 10 | Buffers: on demand 11 | 12 | Reference: human 13 | 14 | BWT: 1282.54 MB (3.47538 bpc) 15 | Simple FM: 1282.54 MB (3.47539 bpc) 16 | 17 | 18 | Target: maternal 19 | Reference size: 3095693982 20 | Target size: 3036191208 21 | Found 8327360 ranges with intersection length 3033218320 in 3.8538 seconds 22 | Partitioning misses: reference 1832, target 741 23 | Partitioning losses: reference 62473830, target 2972147 24 | Found a common subsequence of length 2991932598 in 50.5812 seconds 25 | LCS losses: exact 41247729, heuristics 37993 26 | 27 | Index built in 85.0666 seconds 28 | 29 | BWT: 1247.9 MB (3.44778 bpc) 30 | Simple FM: 1247.9 MB (3.44778 bpc) 31 | 32 | ref_minus_lcs: 44.1439 MB (0.121964 bpc) 33 | seq_minus_lcs: 17.3459 MB (0.0479245 bpc) 34 | bwt_lcs: 163.937 MB (0.452938 bpc) 35 | Relative FM: 225.428 MB (0.622828 bpc) 36 | 37 | 38 | Memory usage: 4.41426 GB 39 | 40 | Finish time: 1494231067 41 | -------------------------------------------------------------------------------- /LICENSE: -------------------------------------------------------------------------------- 1 | Copyright (c) 2015, 2016, 2017 Genome Research Ltd. 2 | Copyright (c) 2014 Jouni Siren and Simon Gog 3 | 4 | Permission is hereby granted, free of charge, to any person obtaining a copy 5 | of this software and associated documentation files (the "Software"), to deal 6 | in the Software without restriction, including without limitation the rights 7 | to use, copy, modify, merge, publish, distribute, sublicense, and/or sell 8 | copies of the Software, and to permit persons to whom the Software is 9 | furnished to do so, subject to the following conditions: 10 | 11 | The above copyright notice and this permission notice shall be included in all 12 | copies or substantial portions of the Software. 13 | 14 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR 15 | IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, 16 | FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE 17 | AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER 18 | LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, 19 | OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE 20 | SOFTWARE. 21 | -------------------------------------------------------------------------------- /logs_rst/old_female_maternal2.log: -------------------------------------------------------------------------------- 1 | Start time: 1494232589 2 | 3 | Relative FM-index builder 4 | Using OpenMP with 32 threads 5 | 6 | Algorithm: partitioning 7 | Block size: 1024 8 | Maximum diagonal: 8192 9 | Maximum length: 32 10 | Buffers: on demand 11 | 12 | Reference: female 13 | 14 | BWT: 1247.95 MB (3.44778 bpc) 15 | Simple FM: 1247.95 MB (3.44778 bpc) 16 | 17 | 18 | Target: maternal2 19 | Reference size: 3036320416 20 | Target size: 3036191208 21 | Found 8315561 ranges with intersection length 3028240572 in 3.60987 seconds 22 | Partitioning misses: reference 698, target 796 23 | Partitioning losses: reference 8079146, target 7949840 24 | Found a common subsequence of length 2990988921 in 50.2578 seconds 25 | LCS losses: exact 37213658, heuristics 37993 26 | 27 | Index built in 79.7281 seconds 28 | 29 | BWT: 1247.9 MB (3.44778 bpc) 30 | Simple FM: 1247.9 MB (3.44778 bpc) 31 | 32 | ref_minus_lcs: 17.7606 MB (0.0490704 bpc) 33 | seq_minus_lcs: 17.7109 MB (0.0489331 bpc) 34 | bwt_lcs: 150.468 MB (0.415724 bpc) 35 | Relative FM: 185.94 MB (0.513729 bpc) 36 | 37 | 38 | Memory usage: 4.37933 GB 39 | 40 | Finish time: 1494232819 41 | -------------------------------------------------------------------------------- /rcst/locate_bars.R: -------------------------------------------------------------------------------- 1 | 2 | # Use R --slave --args name axes [legend] < locate.R 3 | 4 | args = commandArgs() 5 | 6 | name = args[4] 7 | 8 | x = 3.4 9 | y = 3 10 | 11 | data <- read.csv(file = paste(name, ".csv", sep = ""), head = FALSE, sep = ";", dec = ".", check.names = FALSE) 12 | pdf(file = paste(name, "_bars.pdf", sep = ""), width = x, height = y, paper = "special", 13 | family = "Helvetica", pointsize = 11) 14 | par(mar = c(4, 4, 1, 1)) 15 | 16 | nc = ncol(data) 17 | 18 | xtitle = NULL 19 | 20 | yrange = c(0.3, 300) 21 | yscale = c(1, 10, 100) 22 | ytitle = "" 23 | ylabs = FALSE 24 | 25 | time_per_occ = 1000000.0 / 254997642 26 | 27 | if(grepl("x", args[5])) 28 | { 29 | xtitle = "Sample interval" 30 | } 31 | if(grepl("y", args[5])) 32 | { 33 | ytitle = "Time (µs / occurrence)" 34 | ylabs = yscale 35 | } 36 | 37 | labs = c(3, 5, 7) 38 | times = cbind(t(data[labs, 2:nc])) * time_per_occ 39 | colnames(times) = data[labs, 1] 40 | rownames(times) = data[1, 2:nc] 41 | times 42 | 43 | colors = gray.colors(3) 44 | bp = barplot(t(times), 45 | beside = TRUE, 46 | col = colors, 47 | axes = FALSE, 48 | xlab = xtitle, 49 | ylab = ytitle, 50 | ylim = yrange, 51 | log = "y") 52 | 53 | axis(side = 2, at = ylabs, lab = ylabs) 54 | box() 55 | 56 | if(length(args) > 5) 57 | { 58 | legend(1, 0.75 * max(yrange), xjust = 0, yjust = 1, colnames(times), fill = colors, cex = 0.8) 59 | } 60 | 61 | dev.off() 62 | q() 63 | -------------------------------------------------------------------------------- /logs_rst/locate_female_maternal2_17.log: -------------------------------------------------------------------------------- 1 | Start time: 1493809175 2 | 3 | BWT construction 4 | Options: isa_sample_rate=64 sa_sample_rate=17 5 | 6 | File: female 7 | Text size: 3036320415 8 | BWT built in 864.965 seconds (3.34772 MB/s) 9 | Alphabet written to female.alpha 10 | BWT written to female.bwt 11 | Samples written to female.samples 12 | 13 | File: maternal2 14 | Text size: 3036191207 15 | BWT built in 865.321 seconds (3.3462 MB/s) 16 | Alphabet written to maternal2.alpha 17 | BWT written to maternal2.bwt 18 | Samples written to maternal2.samples 19 | 20 | Query test 21 | 22 | Reference: female 23 | Sequence: maternal2 24 | Patterns: patterns 25 | 26 | Read 2000000 patterns of total length 64000000 27 | 28 | 29 | Executing locate() queries. 30 | 31 | SimpleFM: 2110.17 MB (5.83014 bpc) 32 | SimpleFM: Found 2000000 patterns with 254997642 occ in 571.42 seconds (446252 occ/s) 33 | Hash of located positions: 11653246869206397622 34 | 35 | SimpleFM: 1498.27 MB (4.13952 bpc) 36 | SimpleFM: Found 2000000 patterns with 254997642 occ in 4189.3 seconds (60868.8 occ/s) 37 | Hash of located positions: 11653246869206397622 38 | 39 | RFM: 400.462 MB (1.10643 bpc) 40 | RFM: Found 2000000 patterns with 254997642 occ in 2085.13 seconds (122294 occ/s) 41 | Hash of located positions: 11653246869206397622 42 | 43 | 44 | Memory usage: 2865.14 MB 45 | 46 | Finish time: 1493818091 47 | -------------------------------------------------------------------------------- /logs_rst/locate_female_maternal2_31.log: -------------------------------------------------------------------------------- 1 | Start time: 1493818091 2 | 3 | BWT construction 4 | Options: isa_sample_rate=64 sa_sample_rate=31 5 | 6 | File: female 7 | Text size: 3036320415 8 | BWT built in 1042.28 seconds (2.7782 MB/s) 9 | Alphabet written to female.alpha 10 | BWT written to female.bwt 11 | Samples written to female.samples 12 | 13 | File: maternal2 14 | Text size: 3036191207 15 | BWT built in 866.308 seconds (3.34239 MB/s) 16 | Alphabet written to maternal2.alpha 17 | BWT written to maternal2.bwt 18 | Samples written to maternal2.samples 19 | 20 | Query test 21 | 22 | Reference: female 23 | Sequence: maternal2 24 | Patterns: patterns 25 | 26 | Read 2000000 patterns of total length 64000000 27 | 28 | 29 | Executing locate() queries. 30 | 31 | SimpleFM: 1802.49 MB (4.98004 bpc) 32 | SimpleFM: Found 2000000 patterns with 254997642 occ in 1317.27 seconds (193580 occ/s) 33 | Hash of located positions: 11653246869206397622 34 | 35 | SimpleFM: 1190.58 MB (3.28942 bpc) 36 | SimpleFM: Found 2000000 patterns with 254997642 occ in 8579.72 seconds (29721 occ/s) 37 | Hash of located positions: 11653246869206397622 38 | 39 | RFM: 400.462 MB (1.10643 bpc) 40 | RFM: Found 2000000 patterns with 254997642 occ in 2939.68 seconds (86743.3 occ/s) 41 | Hash of located positions: 11653246869206397622 42 | 43 | 44 | Memory usage: 2557.56 MB 45 | 46 | Finish time: 1493833176 47 | -------------------------------------------------------------------------------- /logs_rst/locate_female_maternal2_127.log: -------------------------------------------------------------------------------- 1 | Start time: 1493862600 2 | 3 | BWT construction 4 | Options: isa_sample_rate=127 sa_sample_rate=127 5 | 6 | File: female 7 | Text size: 3036320415 8 | BWT built in 1142.32 seconds (2.5349 MB/s) 9 | Alphabet written to female.alpha 10 | BWT written to female.bwt 11 | Samples written to female.samples 12 | 13 | File: maternal2 14 | Text size: 3036191207 15 | BWT built in 865.913 seconds (3.34391 MB/s) 16 | Alphabet written to maternal2.alpha 17 | BWT written to maternal2.bwt 18 | Samples written to maternal2.samples 19 | 20 | Query test 21 | 22 | Reference: female 23 | Sequence: maternal2 24 | Patterns: patterns 25 | 26 | Read 2000000 patterns of total length 64000000 27 | 28 | 29 | Executing locate() queries. 30 | 31 | SimpleFM: 1430.29 MB (3.95172 bpc) 32 | SimpleFM: Found 2000000 patterns with 254997642 occ in 8043.51 seconds (31702.3 occ/s) 33 | Hash of located positions: 11653246869206397622 34 | 35 | SimpleFM: 818.388 MB (2.2611 bpc) 36 | SimpleFM: Found 2000000 patterns with 254997642 occ in 45394.7 seconds (5617.34 occ/s) 37 | Hash of located positions: 11653246869206397622 38 | 39 | RFM: 400.462 MB (1.10643 bpc) 40 | RFM: Found 2000000 patterns with 254997642 occ in 9993.2 seconds (25517.1 occ/s) 41 | Hash of located positions: 11653246869206397622 42 | 43 | 44 | Memory usage: 2185.28 MB 45 | 46 | Finish time: 1493928382 47 | -------------------------------------------------------------------------------- /logs_rst/locate_female_maternal2_61.log: -------------------------------------------------------------------------------- 1 | Start time: 1493833176 2 | 3 | BWT construction 4 | Options: isa_sample_rate=64 sa_sample_rate=61 5 | 6 | File: female 7 | Text size: 3036320415 8 | BWT built in 865.678 seconds (3.34496 MB/s) 9 | Alphabet written to female.alpha 10 | BWT written to female.bwt 11 | Samples written to female.samples 12 | 13 | File: maternal2 14 | Text size: 3036191207 15 | BWT built in 865.913 seconds (3.34391 MB/s) 16 | Alphabet written to maternal2.alpha 17 | BWT written to maternal2.bwt 18 | Samples written to maternal2.samples 19 | 20 | Query test 21 | 22 | Reference: female 23 | Sequence: maternal2 24 | Patterns: patterns 25 | 26 | Read 2000000 patterns of total length 64000000 27 | 28 | 29 | Executing locate() queries. 30 | 31 | SimpleFM: 1618.74 MB (4.47237 bpc) 32 | SimpleFM: Found 2000000 patterns with 254997642 occ in 3235.51 seconds (78812.2 occ/s) 33 | Hash of located positions: 11653246869206397622 34 | 35 | SimpleFM: 1006.83 MB (2.78175 bpc) 36 | SimpleFM: Found 2000000 patterns with 254997642 occ in 19122 seconds (13335.3 occ/s) 37 | Hash of located positions: 11653246869206397622 38 | 39 | RFM: 400.462 MB (1.10643 bpc) 40 | RFM: Found 2000000 patterns with 254997642 occ in 4999.69 seconds (51002.6 occ/s) 41 | Hash of located positions: 11653246869206397622 42 | 43 | 44 | Memory usage: 2373.74 MB 45 | 46 | Finish time: 1493862600 47 | -------------------------------------------------------------------------------- /logs_rst/locate_female_maternal2_7.log: -------------------------------------------------------------------------------- 1 | Start time: 1493803915 2 | 3 | BWT construction 4 | Options: isa_sample_rate=64 sa_sample_rate=7 5 | 6 | File: female 7 | Text size: 3036320415 8 | BWT built in 874.064 seconds (3.31287 MB/s) 9 | Alphabet written to female.alpha 10 | BWT written to female.bwt 11 | Samples written to female.samples 12 | 13 | File: maternal2 14 | Text size: 3036191207 15 | BWT built in 873.627 seconds (3.31439 MB/s) 16 | Alphabet written to maternal2.alpha 17 | BWT written to maternal2.bwt 18 | Samples written to maternal2.samples 19 | 20 | Query test 21 | 22 | Reference: female 23 | Sequence: maternal2 24 | Patterns: patterns 25 | 26 | Read 2000000 patterns of total length 64000000 27 | 28 | 29 | Executing locate() queries. 30 | 31 | SimpleFM: 3083.46 MB (8.51921 bpc) 32 | SimpleFM: Found 2000000 patterns with 254997642 occ in 170.13 seconds (1.49884e+06 occ/s) 33 | Hash of located positions: 11653246869206397622 34 | 35 | SimpleFM: 2471.56 MB (6.82859 bpc) 36 | SimpleFM: Found 2000000 patterns with 254997642 occ in 1424.32 seconds (179031 occ/s) 37 | Hash of located positions: 11653246869206397622 38 | 39 | RFM: 400.462 MB (1.10643 bpc) 40 | RFM: Found 2000000 patterns with 254997642 occ in 1556.7 seconds (163807 occ/s) 41 | Hash of located positions: 11653246869206397622 42 | 43 | 44 | Memory usage: 3838.38 MB 45 | 46 | Finish time: 1493809175 47 | -------------------------------------------------------------------------------- /rcst/synth.R: -------------------------------------------------------------------------------- 1 | 2 | # Use R --slave --args name axes [legend] < synth.R 3 | 4 | args = commandArgs() 5 | 6 | name = args[4] 7 | 8 | x = 3.4 9 | y = 3 10 | 11 | data <- read.csv(file = paste(name, ".csv", sep = ""), head = FALSE, sep = ";", dec = ".", check.names = FALSE) 12 | pdf(file = paste(name, ".pdf", sep = ""), width = x, height = y, paper = "special", 13 | family = "Helvetica", pointsize = 11) 14 | par(mar = c(4, 4, 1, 1)) 15 | 16 | xrange = c(0.0001, 0.1) 17 | xscale = c("0.0001", "0.001", "0.01", "0.1") 18 | xtitle = "" 19 | xlabs = FALSE 20 | 21 | yrange = c(0, 15) 22 | yscale = c(0, 3, 6, 9, 12, 15) 23 | ytitle = "" 24 | ylabs = FALSE 25 | 26 | if(grepl("x", args[5])) 27 | { 28 | xtitle = "Mutation rate" 29 | xlabs = xscale 30 | } 31 | if(grepl("y", args[5])) 32 | { 33 | ytitle = "Size (bpc)" 34 | ylabs = yscale 35 | } 36 | 37 | plot(c(1), 38 | c(1), 39 | type = "n", 40 | axes = F, 41 | main = "", 42 | xlab = xtitle, 43 | ylab = ytitle, 44 | xlim = xrange, 45 | ylim = yrange, 46 | log = "x") 47 | 48 | axis(1, at = xscale, lab = xlabs) 49 | axis(2, at = yscale, lab = ylabs) 50 | box() 51 | 52 | nr = nrow(data) 53 | nc = ncol(data) 54 | xpos = c(0.0001, 0.0003, 0.001, 0.003, 0.01, 0.03, 0.1) 55 | 56 | symbols = c(00, 01, 02, 07) 57 | if(nr == 2) 58 | { 59 | symbols = c(07, 12) 60 | } 61 | 62 | for(i in c(1:nr)) 63 | { 64 | points(xpos, data[i, 2:nc], type = "b", pch = symbols[i], cex = 1.5) 65 | } 66 | 67 | if(length(args) > 5) 68 | { 69 | legend(min(xrange), max(yrange), xjust = 0, yjust = 1, data[1:nr, 1], 70 | pch = symbols, pt.cex = 1.5) 71 | } 72 | 73 | dev.off() 74 | q() 75 | -------------------------------------------------------------------------------- /rcst/locate.R: -------------------------------------------------------------------------------- 1 | 2 | # Use R --slave --args name axes [legend] < locate.R 3 | 4 | args = commandArgs() 5 | 6 | name = args[4] 7 | 8 | x = 3.4 9 | y = 3 10 | 11 | data <- read.csv(file = paste(name, ".csv", sep = ""), head = FALSE, sep = ";", dec = ".", check.names = FALSE) 12 | pdf(file = paste(name, ".pdf", sep = ""), width = x, height = y, paper = "special", 13 | family = "Helvetica", pointsize = 11) 14 | par(mar = c(4, 4, 1, 1)) 15 | 16 | xrange = c(0, 10) 17 | xscale = c(0, 2, 4, 6, 8, 10) 18 | xtitle = "" 19 | xlabs = F 20 | 21 | yrange = c(0.3, 300) 22 | yscale = c(1, 10, 100) 23 | ytitle = "" 24 | ylabs = F 25 | 26 | if(grepl("x", args[5])) 27 | { 28 | xtitle = "Size (bpc)" 29 | xlabs = xscale 30 | } 31 | if(grepl("y", args[5])) 32 | { 33 | ytitle = "Time (µs / occurrence)" 34 | ylabs = yscale 35 | } 36 | 37 | plot(c(1), 38 | c(1), 39 | type = "n", 40 | axes = F, 41 | main = "", 42 | xlab = xtitle, 43 | ylab = ytitle, 44 | xlim = xrange, 45 | ylim = yrange, 46 | log = "y") 47 | 48 | axis(1, at = xscale, lab = xlabs) 49 | axis(2, at = yscale, lab = ylabs) 50 | box() 51 | 52 | nc = ncol(data) 53 | labs = c(2, 4, 6) 54 | time_per_occ = 1000000.0 / 254997642 55 | 56 | points(data[2, 2:nc], data[3, 2:nc] * time_per_occ, type = "b", pch = 01, cex = 1.5) # SSA 57 | points(data[4, 2:nc], data[5, 2:nc] * time_per_occ, type = "b", pch = 02, cex = 1.5) # SSA-RRR 58 | points(data[6, 2:nc], data[7, 2:nc] * time_per_occ, type = "b", pch = 00, cex = 1.5) # RFM 59 | 60 | if(length(args) > 5) 61 | { 62 | legend(max(xrange), max(yrange), xjust = 1, yjust = 1, data[labs, 1], 63 | pch = c(01, 02, 00), cex = 0.8, pt.cex = 1.2) 64 | } 65 | 66 | dev.off() 67 | q() 68 | -------------------------------------------------------------------------------- /logs_rst/build_human_maternal.log: -------------------------------------------------------------------------------- 1 | Start time: 1493281710 2 | 3 | BWT construction 4 | Options: alphabet isa_sample_rate=64 lcp sa_sample_rate=17 5 | 6 | File: human 7 | Text size: 3095693981 8 | BWT built in 1007.3 seconds (2.9309 MB/s) 9 | Alphabet written to human.alpha 10 | BWT written to human.bwt 11 | Samples written to human.samples 12 | LCP array built in 723.609 seconds (4.07994 MB/s) 13 | 14 | File: maternal 15 | Text size: 3036191207 16 | BWT built in 951.642 seconds (3.04268 MB/s) 17 | Alphabet written to maternal.alpha 18 | BWT written to maternal.bwt 19 | Samples written to maternal.samples 20 | LCP array built in 738.175 seconds (3.92256 MB/s) 21 | 22 | Relative FM-index builder 23 | Using OpenMP with 32 threads 24 | 25 | Algorithm: invariant 26 | SA sample rate: 257 27 | ISA sample rate: 512 28 | 29 | Reference: human 30 | 31 | BWT: 1282.54 MB (3.47538 bpc) 32 | SA samples: 694.655 MB (1.88235 bpc) 33 | ISA samples: 184.518 MB (0.5 bpc) 34 | Simple FM: 2161.71 MB (5.85774 bpc) 35 | 36 | 37 | Target: maternal 38 | Reference size: 3095693982 39 | Target size: 3036191208 40 | Built the merging bitvector in 2317.66 seconds 41 | Matched 3032774943 positions in 5773.64 seconds 42 | Found a common subsequence of length 2979962648 in 972.89 seconds 43 | Built the bwt_lcs bitvectors and samples in 1405.35 seconds 44 | Index built in 10507.6 seconds 45 | 46 | BWT: 1247.9 MB (3.44778 bpc) 47 | SA samples: 681.303 MB (1.88235 bpc) 48 | ISA samples: 180.971 MB (0.5 bpc) 49 | Simple FM: 2110.17 MB (5.83014 bpc) 50 | 51 | ref_minus_lcs: 49.6847 MB (0.137273 bpc) 52 | seq_minus_lcs: 22.0453 MB (0.0609084 bpc) 53 | bwt_lcs: 189.846 MB (0.524521 bpc) 54 | text_lcs: 126.431 MB (0.349311 bpc) 55 | SA samples: 45.0667 MB (0.124514 bpc) 56 | ISA samples: 22.6214 MB (0.0625 bpc) 57 | Relative FM: 455.695 MB (1.25903 bpc) 58 | 59 | 60 | Memory usage: 83.9516 GB 61 | 62 | Finish time: 1493295922 63 | -------------------------------------------------------------------------------- /logs_rst/build_female_maternal2.log: -------------------------------------------------------------------------------- 1 | Start time: 1493390047 2 | 3 | BWT construction 4 | Options: isa_sample_rate=64 lcp sa_sample_rate=17 5 | 6 | File: female 7 | Text size: 3036320415 8 | BWT built in 870.53 seconds (3.32632 MB/s) 9 | Alphabet written to female.alpha 10 | BWT written to female.bwt 11 | Samples written to female.samples 12 | LCP array built in 748.465 seconds (3.8688 MB/s) 13 | 14 | File: maternal2 15 | Text size: 3036191207 16 | BWT built in 944.253 seconds (3.06649 MB/s) 17 | Alphabet written to maternal2.alpha 18 | BWT written to maternal2.bwt 19 | Samples written to maternal2.samples 20 | LCP array built in 744.381 seconds (3.88986 MB/s) 21 | 22 | Relative FM-index builder 23 | Using OpenMP with 32 threads 24 | 25 | Algorithm: invariant 26 | SA sample rate: 257 27 | ISA sample rate: 512 28 | 29 | Reference: female 30 | 31 | BWT: 1247.95 MB (3.44778 bpc) 32 | SA samples: 681.332 MB (1.88235 bpc) 33 | ISA samples: 180.979 MB (0.5 bpc) 34 | Simple FM: 2110.26 MB (5.83013 bpc) 35 | 36 | 37 | Target: maternal2 38 | Reference size: 3036320416 39 | Target size: 3036191208 40 | Built the merging bitvector in 2302.16 seconds 41 | Matched 3001221512 positions in 5711.42 seconds 42 | Found a common subsequence of length 2980010799 in 895.85 seconds 43 | Built the bwt_lcs bitvectors and samples in 1457.24 seconds 44 | Index built in 10399.1 seconds 45 | 46 | BWT: 1247.9 MB (3.44778 bpc) 47 | SA samples: 681.303 MB (1.88235 bpc) 48 | ISA samples: 180.971 MB (0.5 bpc) 49 | Simple FM: 2110.17 MB (5.83014 bpc) 50 | 51 | ref_minus_lcs: 22.0769 MB (0.0609955 bpc) 52 | seq_minus_lcs: 22.0266 MB (0.0608567 bpc) 53 | bwt_lcs: 163.138 MB (0.450729 bpc) 54 | text_lcs: 125.532 MB (0.34683 bpc) 55 | SA samples: 45.0667 MB (0.124514 bpc) 56 | ISA samples: 22.6214 MB (0.0625 bpc) 57 | Relative FM: 400.462 MB (1.10643 bpc) 58 | 59 | 60 | Memory usage: 82.5713 GB 61 | 62 | Finish time: 1493404044 63 | -------------------------------------------------------------------------------- /logs_rst/verify_human_maternal.log: -------------------------------------------------------------------------------- 1 | Start time: 1494582021 2 | 3 | RFM and RLCP verifier 4 | 5 | Reference: human 6 | 7 | FM-index: 2161.71 MB (5.85774 bpc) 8 | 9 | LCP array: 3862.42 MB (10.4663 bpc) 10 | 11 | 12 | Target: maternal 13 | 14 | BWT: 1247.9 MB (3.44778 bpc) 15 | SA samples: 681.303 MB (1.88235 bpc) 16 | ISA samples: 180.971 MB (0.5 bpc) 17 | Simple FM: 2110.17 MB (5.83014 bpc) 18 | 19 | ref_minus_lcs: 49.6847 MB (0.137273 bpc) 20 | seq_minus_lcs: 22.0453 MB (0.0609084 bpc) 21 | bwt_lcs: 189.846 MB (0.524521 bpc) 22 | text_lcs: 126.431 MB (0.349311 bpc) 23 | SA samples: 45.0667 MB (0.124514 bpc) 24 | ISA samples: 22.6214 MB (0.0625 bpc) 25 | Relative FM: 455.695 MB (1.25903 bpc) 26 | 27 | LCP array: 3689.77 MB (10.1944 bpc) 28 | 29 | LCP array: 1043.61 MB (2.88336 bpc) 30 | Tree: 189.71 MB (0.524144 bpc) 31 | Relative LCP: 1233.32 MB (3.40751 bpc) 32 | 33 | LF (FM): 10000000 queries in 3.27694 seconds (0.327694 µs/query) 34 | 35 | LF (RRR): 10000000 queries in 19.891 seconds (1.9891 µs/query) 36 | 37 | LF (RFM): 10000000 queries in 30.5445 seconds (3.05445 µs/query) 38 | 39 | Psi (FM): 10000000 queries in 10.4753 seconds (1.04753 µs/query) 40 | 41 | Psi (RRR): 10000000 queries in 27.0867 seconds (2.70867 µs/query) 42 | 43 | Psi (RFM, slow): 10000000 queries in 430.949 seconds (43.0949 µs/query) 44 | 45 | Select structures built in 415.42 seconds 46 | Relative select: 189.888 MB (0.524637 bpc) 47 | 48 | Psi (RFM, fast): 10000000 queries in 51.9583 seconds (5.19583 µs/query) 49 | 50 | LCP (random): 100000000 queries in 5.37034 seconds (0.0537034 µs/query) 51 | LCP (seq): 3036191208 queries in 6.19624 seconds (0.00204079 µs/query) 52 | RLCP (random): 100000000 queries in 158.02 seconds (1.5802 µs/query) 53 | RLCP (seq): 3036191208 queries in 71.4982 seconds (0.0235486 µs/query) 54 | 55 | RMQ: 100000000 queries in 298.518 seconds (2.98518 µs/query) 56 | 57 | PSV: 100000000 queries in 189.86 seconds (1.8986 µs/query) 58 | 59 | NSV: 100000000 queries in 190.854 seconds (1.90854 µs/query) 60 | 61 | 62 | Memory used: 17130.4 MB 63 | 64 | Finish time: 1494584223 65 | -------------------------------------------------------------------------------- /logs_rst/verify_female_maternal2.log: -------------------------------------------------------------------------------- 1 | Start time: 1494584309 2 | 3 | RFM and RLCP verifier 4 | 5 | Reference: female 6 | 7 | FM-index: 2110.26 MB (5.83013 bpc) 8 | 9 | LCP array: 3690.1 MB (10.1948 bpc) 10 | 11 | 12 | Target: maternal2 13 | 14 | BWT: 1247.9 MB (3.44778 bpc) 15 | SA samples: 681.303 MB (1.88235 bpc) 16 | ISA samples: 180.971 MB (0.5 bpc) 17 | Simple FM: 2110.17 MB (5.83014 bpc) 18 | 19 | ref_minus_lcs: 22.0769 MB (0.0609955 bpc) 20 | seq_minus_lcs: 22.0266 MB (0.0608567 bpc) 21 | bwt_lcs: 163.138 MB (0.450729 bpc) 22 | text_lcs: 125.532 MB (0.34683 bpc) 23 | SA samples: 45.0667 MB (0.124514 bpc) 24 | ISA samples: 22.6214 MB (0.0625 bpc) 25 | Relative FM: 400.462 MB (1.10643 bpc) 26 | 27 | LCP array: 3689.77 MB (10.1944 bpc) 28 | 29 | LCP array: 499.834 MB (1.38098 bpc) 30 | Tree: 96.7397 MB (0.267279 bpc) 31 | Relative LCP: 596.574 MB (1.64826 bpc) 32 | 33 | LF (FM): 10000000 queries in 3.27086 seconds (0.327086 µs/query) 34 | 35 | LF (RRR): 10000000 queries in 19.8811 seconds (1.98811 µs/query) 36 | 37 | LF (RFM): 10000000 queries in 28.9438 seconds (2.89438 µs/query) 38 | 39 | Psi (FM): 10000000 queries in 10.4681 seconds (1.04681 µs/query) 40 | 41 | Psi (RRR): 10000000 queries in 27.0742 seconds (2.70742 µs/query) 42 | 43 | Psi (RFM, slow): 10000000 queries in 404.783 seconds (40.4783 µs/query) 44 | 45 | Select structures built in 400.001 seconds 46 | Relative select: 163.182 MB (0.450851 bpc) 47 | 48 | Psi (RFM, fast): 10000000 queries in 50.0009 seconds (5.00009 µs/query) 49 | 50 | LCP (random): 100000000 queries in 5.37228 seconds (0.0537228 µs/query) 51 | LCP (seq): 3036191208 queries in 6.17201 seconds (0.00203281 µs/query) 52 | RLCP (random): 100000000 queries in 148.016 seconds (1.48016 µs/query) 53 | RLCP (seq): 3036191208 queries in 52.8455 seconds (0.0174052 µs/query) 54 | 55 | RMQ: 100000000 queries in 307.795 seconds (3.07795 µs/query) 56 | 57 | PSV: 100000000 queries in 178.783 seconds (1.78783 µs/query) 58 | 59 | NSV: 100000000 queries in 183.385 seconds (1.83385 µs/query) 60 | 61 | 62 | Memory used: 16155.9 MB 63 | 64 | Finish time: 1494586457 65 | -------------------------------------------------------------------------------- /logs_old_rcst/cst_size.py: -------------------------------------------------------------------------------- 1 | #! /usr/bin/env python 2 | # -*- coding: utf-8 -*- 3 | 4 | # cst_size.py logfile1 [logfile2 ...] 5 | 6 | # This computes the total size of a collection of Relative CSTs, including 7 | # the size of the reference. 8 | 9 | import sys, re 10 | 11 | def printSize(type, megabytes, bytes): 12 | print (type + ":"), megabytes, "MB,", (megabytes * 1048576.0 * 8 / bytes), "bpc" 13 | 14 | def main(): 15 | 16 | if len(sys.argv) < 2: 17 | return 18 | 19 | size_pattern = "Target size:\s*(.*)" 20 | size = re.compile(size_pattern) 21 | 22 | fm_pattern = "Simple FM:\s*(.*) MB" 23 | fm = re.compile(fm_pattern) 24 | 25 | lcp_pattern = "LCP array:\s*(.*) MB" 26 | lcp = re.compile(lcp_pattern) 27 | 28 | rfm_pattern = "Relative FM:\s*(.*) MB" 29 | rfm = re.compile(rfm_pattern) 30 | 31 | rlcp_pattern = "Relative LCP:\s*(.*) MB" 32 | rlcp = re.compile(rlcp_pattern) 33 | 34 | repet_pattern = "Total:\s*(.*)" 35 | repet = re.compile(repet_pattern) 36 | 37 | for arg in range(1, len(sys.argv)): 38 | fm_size = 0.0 39 | lcp_size = 0.0 40 | rfm_size = 0.0 41 | rlcp_size = 0.0 42 | total_size = 0.0 43 | repet_size = 0.0 44 | bytes = 0 45 | infile = open(sys.argv[arg], "rb") 46 | state = 0 47 | 48 | for line in infile: 49 | if state == 0: 50 | res = size.search(line) 51 | if res: 52 | bytes += int(res.group(1)) 53 | res = fm.search(line) 54 | if res: 55 | fm_size = float(res.group(1)) 56 | state = 1 57 | elif state == 1: 58 | res = lcp.search(line) 59 | if res: 60 | lcp_size = float(res.group(1)) 61 | state = 2 62 | else: 63 | res = rfm.search(line) 64 | if res: 65 | rfm_size += float(res.group(1)) 66 | res = rlcp.search(line) 67 | if res: 68 | rlcp_size += float(res.group(1)) 69 | res = repet.search(line) 70 | if res: 71 | repet_size = int(res.group(1)) / 1048576.0 72 | 73 | infile.close() 74 | total_size = fm_size + lcp_size + rfm_size + rlcp_size 75 | 76 | print "File:", sys.argv[arg] 77 | print "Original:", (bytes / 1048576.0), "MB" 78 | printSize("Reference", fm_size + lcp_size, bytes) 79 | printSize("Sequences", rfm_size + rlcp_size, bytes) 80 | printSize("Relative CST", total_size, bytes) 81 | printSize("Repetitive CST", repet_size, bytes) 82 | print 83 | 84 | if __name__ == "__main__": 85 | main() 86 | -------------------------------------------------------------------------------- /logs_old_rcst/cst_traverse_4379375.log: -------------------------------------------------------------------------------- 1 | Start 2 | Tue Jun 23 15:28:49 BST 2015 3 | 4 | CST traverse timings 5 | 6 | Reference: human 7 | Target: maternal 8 | 9 | DFS traversal in compressed suffix trees 10 | 11 | Reference: human 12 | 13 | FM-index: 1999.67 MB (5.41863 bpc) 14 | LCP array: 3862.42 MB (10.4663 bpc) 15 | Reference data: 5862.09 MB (15.8849 bpc) 16 | 17 | 18 | Sequence: maternal 19 | 20 | Relative CST: 1546.64 MB (4.27318 bpc) 21 | Relative CST: 5127716113 nodes in 2496.22 seconds 22 | 23 | cst_sct3_dac: 6543.94 MB (18.0801 bpc) 24 | cst_sct3_dac: 5127716113 nodes in 1418.5 seconds 25 | 26 | cst_sct3_plcp: 3905.7 MB (10.7909 bpc) 27 | cst_sct3_plcp: 5127716113 nodes in 1381.21 seconds 28 | 29 | cst_sada: 4462.21 MB (12.3285 bpc) 30 | cst_sada: 5127716113 nodes in 502.511 seconds 31 | 32 | cst_fully: 1802.96 MB (4.98135 bpc) 33 | 34 | 35 | Memory used: 32110.4 MB 36 | 37 | 38 | End 39 | Wed Jun 24 04:53:15 BST 2015 40 | 41 | 42 | ------------------------------------------------------------ 43 | Sender: LSF System 44 | Subject: Job 4379375: in cluster Done 45 | 46 | Job was submitted from host by user in cluster . 47 | Job was executed on host(s) <32*vr-1-1-03>, in queue , as user in cluster . 48 | was used as the home directory. 49 | was used as the working directory. 50 | Started at Tue Jun 23 15:28:49 2015 51 | Results reported at Wed Jun 24 04:53:15 2015 52 | 53 | Your job looked like: 54 | 55 | ------------------------------------------------------------ 56 | # LSBATCH: User input 57 | /nfs/users/nfs_j/js35/job_scripts/cst_traverse human maternal 58 | ------------------------------------------------------------ 59 | 60 | Successfully completed. 61 | 62 | Resource usage summary: 63 | 64 | CPU time : 49075.85 sec. 65 | Max Memory : 32118 MB 66 | Average Memory : 13408.78 MB 67 | Total Requested Memory : 65536.00 MB 68 | Delta Memory : 33418.00 MB 69 | (Delta: the difference between total requested memory and actual max usage.) 70 | Max Swap : 32176 MB 71 | 72 | Max Processes : 4 73 | Max Threads : 5 74 | 75 | The output (if any) is above this job summary. 76 | 77 | 78 | 79 | PS: 80 | 81 | Read file for stderr output of this job. 82 | 83 | -------------------------------------------------------------------------------- /logs_old_rcst/cst_traverse_4379376.log: -------------------------------------------------------------------------------- 1 | Start 2 | Tue Jun 23 15:28:49 BST 2015 3 | 4 | CST traverse timings 5 | 6 | Reference: female 7 | Target: maternal2 8 | 9 | DFS traversal in compressed suffix trees 10 | 11 | Reference: female 12 | 13 | FM-index: 1952.6 MB (5.39454 bpc) 14 | LCP array: 3690.1 MB (10.1948 bpc) 15 | Reference data: 5642.7 MB (15.5894 bpc) 16 | 17 | 18 | Sequence: maternal2 19 | 20 | Relative CST: 1144.74 MB (3.16278 bpc) 21 | Relative CST: 5127716113 nodes in 2360.38 seconds 22 | 23 | cst_sct3_dac: 6543.94 MB (18.0801 bpc) 24 | cst_sct3_dac: 5127716113 nodes in 1083.43 seconds 25 | 26 | cst_sct3_plcp: 3905.7 MB (10.7909 bpc) 27 | cst_sct3_plcp: 5127716113 nodes in 1069.23 seconds 28 | 29 | cst_sada: 4462.21 MB (12.3285 bpc) 30 | cst_sada: 5127716113 nodes in 301.68 seconds 31 | 32 | cst_fully: 1802.96 MB (4.98135 bpc) 33 | 34 | 35 | Memory used: 31886 MB 36 | 37 | 38 | End 39 | Wed Jun 24 02:59:25 BST 2015 40 | 41 | 42 | ------------------------------------------------------------ 43 | Sender: LSF System 44 | Subject: Job 4379376: in cluster Done 45 | 46 | Job was submitted from host by user in cluster . 47 | Job was executed on host(s) <32*vr-4-1-11>, in queue , as user in cluster . 48 | was used as the home directory. 49 | was used as the working directory. 50 | Started at Tue Jun 23 15:28:49 2015 51 | Results reported at Wed Jun 24 02:59:25 2015 52 | 53 | Your job looked like: 54 | 55 | ------------------------------------------------------------ 56 | # LSBATCH: User input 57 | /nfs/users/nfs_j/js35/job_scripts/cst_traverse female maternal2 58 | ------------------------------------------------------------ 59 | 60 | Successfully completed. 61 | 62 | Resource usage summary: 63 | 64 | CPU time : 42158.84 sec. 65 | Max Memory : 31894 MB 66 | Average Memory : 13099.15 MB 67 | Total Requested Memory : 65536.00 MB 68 | Delta Memory : 33642.00 MB 69 | (Delta: the difference between total requested memory and actual max usage.) 70 | Max Swap : 31952 MB 71 | 72 | Max Processes : 4 73 | Max Threads : 5 74 | 75 | The output (if any) is above this job summary. 76 | 77 | 78 | 79 | PS: 80 | 81 | Read file for stderr output of this job. 82 | 83 | -------------------------------------------------------------------------------- /Makefile: -------------------------------------------------------------------------------- 1 | SDSL_DIR=../sdsl-lite 2 | RLZAP_DIR=../rlzap 3 | 4 | # In OS X, getrusage() returns maximum resident set size in bytes. 5 | # In Linux, the value is in kilobytes, so this line should be commented out. 6 | #RUSAGE_FLAGS=-DRUSAGE_IN_BYTES 7 | 8 | # Compute run and gap measures for the bitvectors in RelativeFM::reportSize(). 9 | # This makes reportSize() significantly slower. 10 | #RUN_FLAGS=-DREPORT_RUNS 11 | 12 | # Print some additional information. Status info goes to stderr. 13 | VERBOSE_FLAGS=-DVERBOSE_OUTPUT 14 | #VERBOSE_FLAGS=-DVERBOSE_OUTPUT -DVERBOSE_STATUS_INFO 15 | 16 | # Hybrid bitvectors are slower, but they can sometimes be smaller. 17 | #VECTOR_FLAGS=-DUSE_HYBRID_BITVECTORS 18 | 19 | # Multithreading with OpenMP. No longer compiles without OpenMP support. 20 | # Currently used for RFM construction. 21 | PARALLEL_FLAGS=-fopenmp -D_GLIBCXX_PARALLEL 22 | 23 | OTHER_FLAGS=$(RUSAGE_FLAGS) $(RUN_FLAGS) $(VERBOSE_FLAGS) $(VECTOR_FLAGS) $(PARALLEL_FLAGS) 24 | 25 | include $(SDSL_DIR)/Make.helper 26 | CXX_FLAGS=$(MY_CXX_FLAGS) $(OTHER_FLAGS) $(MY_CXX_OPT_FLAGS) -I$(INC_DIR) -I$(RLZAP_DIR)/include -I$(RLZAP_DIR)/ext_libs/boost/include -I$(RLZAP_DIR)/ext_libs/sais/include 27 | LIBOBJS=relative_fm.o utils.o support.o new_relative_lcp.o 28 | SOURCES=$(wildcard *.cpp) 29 | HEADERS=$(wildcard *.h) 30 | OBJS=$(SOURCES:.cpp=.o) 31 | LIBS=-L$(LIB_DIR) -L$(RLZAP_DIR)/build -lsdsl -ldivsufsort -ldivsufsort64 -lrlz_lib 32 | LIBRARY=librfm.a 33 | PROGRAMS=build_bwt align_bwts build_rlcp query_test cst_traverse cst_compare rlcp_size mutate verify 34 | 35 | all: $(PROGRAMS) 36 | 37 | %.o:%.cpp $(HEADERS) 38 | $(MY_CXX) $(CXX_FLAGS) -c $< 39 | 40 | $(LIBRARY):$(LIBOBJS) 41 | ar rcs $@ $(LIBOBJS) 42 | 43 | build_bwt:build_bwt.o $(LIBRARY) 44 | $(MY_CXX) $(CXX_FLAGS) -o $@ $< $(LIBRARY) $(LIBS) 45 | 46 | align_bwts:align_bwts.o $(LIBRARY) 47 | $(MY_CXX) $(CXX_FLAGS) -o $@ $< $(LIBRARY) $(LIBS) 48 | 49 | build_rlcp:build_rlcp.o $(LIBRARY) 50 | $(MY_CXX) $(CXX_FLAGS) -o $@ $< $(LIBRARY) $(LIBS) 51 | 52 | query_test:query_test.o $(LIBRARY) 53 | $(MY_CXX) $(CXX_FLAGS) -o $@ $< $(LIBRARY) $(LIBS) 54 | 55 | cst_traverse:cst_traverse.o $(LIBRARY) 56 | $(MY_CXX) $(CXX_FLAGS) -o $@ $< $(LIBRARY) $(LIBS) 57 | 58 | cst_compare:cst_compare.o $(LIBRARY) 59 | $(MY_CXX) $(CXX_FLAGS) -o $@ $< $(LIBRARY) $(LIBS) 60 | 61 | rlcp_size:rlcp_size.o $(LIBRARY) 62 | $(MY_CXX) $(CXX_FLAGS) -o $@ $< $(LIBRARY) $(LIBS) 63 | 64 | mutate:mutate.o $(LIBRARY) 65 | $(MY_CXX) $(CXX_FLAGS) -o $@ $< $(LIBRARY) $(LIBS) 66 | 67 | verify:verify.o $(LIBRARY) 68 | $(MY_CXX) $(CXX_FLAGS) -o $@ $< $(LIBRARY) $(LIBS) 69 | 70 | clean: 71 | rm -f $(PROGRAMS) $(LIBRARY) $(OBJS) 72 | -------------------------------------------------------------------------------- /rcst/concl.tex: -------------------------------------------------------------------------------- 1 | 2 | 3 | \section{Discussion}\label{section:discussion} 4 | 5 | We have introduced relative suffix trees (\RCST), a new kind of compressed suffix tree for repetitive sequence collections. Our \RCST{} compresses the suffix tree of an individual sequence relative to the suffix tree of a reference sequence. It combines an already known relative suffix array with a novel relative-compressed longest common prefix representation (\RLCP). When the sequences are similar enough (e.g., two human genomes), the \RCST{} requires about 3 bits per symbol on each target sequence. This is close to the space used by the most space-efficient compressed suffix trees designed to store repetitive collections in a single tree, but the \RCST{} provides a different functionality as it indexes each sequence individually. The \RCST{} supports query and navigation operations within a few microseconds, which is competitive with the largest and fastest compressed suffix trees. 6 | 7 | The size of \RCST{} is proportional to the amount of sequence that is present either in the reference or in the target, but not both. This is unusual for relative compression, where any additional material in the reference is generally harmless. Sorting the suffixes in lexicographic tends to distribute the additional suffixes all over the suffix array, creating many mismatches between the suffix-based structures of the reference and the target. For example, the 60~million suffixes from chromosome~Y created 34~million new phrases in the RLZ parse of the \DLCP{} array of a female genome, doubling the size of the \RLCP{} array. Having multiple references (e.g.~male and female) can hence be worthwhile when building relative data structures for many target sequences. 8 | 9 | While our \RCST{} implementation provides competitive time/space trade-offs, there is still much room for improvement. Most importantly, some of the construction algorithms require significant amounts of time and memory. In many places, we have chosen simple and fast implementation options, even though there could be alternatives that require significantly less space without being too much slower. 10 | 11 | Our \RCST{} is a relative version of the \CSTnpr. Another alternative for future work is a relative \CSTsada, using \RLZ{} compressed bitvectors for suffix tree topology and \PLCP. %Based on our preliminary experiments, the main obstacle is the compression of phrase pointers. Relative pointers work well when most differences between the reference and the target are single-character substitutions. As suffix sorting multiplies the differences and transforms substitutions into insertions and deletions, we need new compression schemes for the pointers. 12 | 13 | -------------------------------------------------------------------------------- /logs_old_rcst/cst_compare_4379377.log: -------------------------------------------------------------------------------- 1 | Start 2 | Tue Jun 23 15:28:50 BST 2015 3 | 4 | Matching statistics timings 5 | 6 | Reference: human 7 | Target: maternal 8 | 9 | Finding the matching statistics with a CST 10 | 11 | Reference: human 12 | FM-index: 1999.67 MB (5.41863 bpc) 13 | LCP array: 3862.42 MB (10.4663 bpc) 14 | Reference data: 5862.09 MB (15.8849 bpc) 15 | 16 | Target: maternal 17 | Sequence: paternal_chr1_noNruns (225271390 bytes) 18 | 19 | Relative (slow): 1546.64 MB (4.27318 bpc) 20 | Relative (slow): Average maximal match: 427.958 (69505.4 seconds) 21 | 22 | Relative (fast): 1736.53 MB (4.79782 bpc) 23 | Relative (fast): Average maximal match: 427.958 (26792.3 seconds) 24 | 25 | cst_sct3_dac: 6543.94 MB (18.0801 bpc) 26 | cst_sct3_dac: Average maximal match: 427.958 (7836.72 seconds) 27 | 28 | cst_sct3_plcp: 3905.7 MB (10.7909 bpc) 29 | cst_sct3_plcp: Average maximal match: 427.958 (14071.1 seconds) 30 | 31 | cst_sada: 4462.21 MB (12.3285 bpc) 32 | cst_sada: Average maximal match: 427.958 (23407.2 seconds) 33 | 34 | cst_fully: 1802.96 MB (4.98135 bpc) 35 | 36 | Memory used: 23404.7 MB 37 | 38 | 39 | End 40 | Thu Jun 25 06:52:53 BST 2015 41 | 42 | 43 | ------------------------------------------------------------ 44 | Sender: LSF System 45 | Subject: Job 4379377: in cluster Done 46 | 47 | Job was submitted from host by user in cluster . 48 | Job was executed on host(s) <32*vr-1-1-06>, in queue , as user in cluster . 49 | was used as the home directory. 50 | was used as the working directory. 51 | Started at Tue Jun 23 15:28:49 2015 52 | Results reported at Thu Jun 25 06:52:53 2015 53 | 54 | Your job looked like: 55 | 56 | ------------------------------------------------------------ 57 | # LSBATCH: User input 58 | /nfs/users/nfs_j/js35/job_scripts/cst_compare human maternal 59 | ------------------------------------------------------------ 60 | 61 | Successfully completed. 62 | 63 | Resource usage summary: 64 | 65 | CPU time : 144219.45 sec. 66 | Max Memory : 23400 MB 67 | Average Memory : 13169.21 MB 68 | Total Requested Memory : 65536.00 MB 69 | Delta Memory : 42136.00 MB 70 | (Delta: the difference between total requested memory and actual max usage.) 71 | Max Swap : 26471 MB 72 | 73 | Max Processes : 4 74 | Max Threads : 5 75 | 76 | The output (if any) is above this job summary. 77 | 78 | 79 | 80 | PS: 81 | 82 | Read file for stderr output of this job. 83 | 84 | -------------------------------------------------------------------------------- /logs_old_rcst/cst_compare_4379378.log: -------------------------------------------------------------------------------- 1 | Start 2 | Tue Jun 23 15:28:50 BST 2015 3 | 4 | Matching statistics timings 5 | 6 | Reference: female 7 | Target: maternal2 8 | 9 | Finding the matching statistics with a CST 10 | 11 | Reference: female 12 | FM-index: 1952.6 MB (5.39454 bpc) 13 | LCP array: 3690.1 MB (10.1948 bpc) 14 | Reference data: 5642.7 MB (15.5894 bpc) 15 | 16 | Target: maternal2 17 | Sequence: paternal_chr1_noNruns (225271390 bytes) 18 | 19 | Relative (slow): 1144.74 MB (3.16278 bpc) 20 | Relative (slow): Average maximal match: 427.958 (54576.2 seconds) 21 | 22 | Relative (fast): 1307.93 MB (3.61363 bpc) 23 | Relative (fast): Average maximal match: 427.958 (23346.4 seconds) 24 | 25 | cst_sct3_dac: 6543.94 MB (18.0801 bpc) 26 | cst_sct3_dac: Average maximal match: 427.958 (7201.41 seconds) 27 | 28 | cst_sct3_plcp: 3905.7 MB (10.7909 bpc) 29 | cst_sct3_plcp: Average maximal match: 427.958 (11702.3 seconds) 30 | 31 | cst_sada: 4462.21 MB (12.3285 bpc) 32 | cst_sada: Average maximal match: 427.958 (18876.6 seconds) 33 | 34 | cst_fully: 1802.96 MB (4.98135 bpc) 35 | 36 | Memory used: 23180.2 MB 37 | 38 | 39 | End 40 | Wed Jun 24 23:40:28 BST 2015 41 | 42 | 43 | ------------------------------------------------------------ 44 | Sender: LSF System 45 | Subject: Job 4379378: in cluster Done 46 | 47 | Job was submitted from host by user in cluster . 48 | Job was executed on host(s) <32*vr-4-1-10>, in queue , as user in cluster . 49 | was used as the home directory. 50 | was used as the working directory. 51 | Started at Tue Jun 23 15:28:49 2015 52 | Results reported at Wed Jun 24 23:40:28 2015 53 | 54 | Your job looked like: 55 | 56 | ------------------------------------------------------------ 57 | # LSBATCH: User input 58 | /nfs/users/nfs_j/js35/job_scripts/cst_compare female maternal2 59 | ------------------------------------------------------------ 60 | 61 | Successfully completed. 62 | 63 | Resource usage summary: 64 | 65 | CPU time : 118037.72 sec. 66 | Max Memory : 23169 MB 67 | Average Memory : 12733.99 MB 68 | Total Requested Memory : 65536.00 MB 69 | Delta Memory : 42367.00 MB 70 | (Delta: the difference between total requested memory and actual max usage.) 71 | Max Swap : 25222 MB 72 | 73 | Max Processes : 4 74 | Max Threads : 5 75 | 76 | The output (if any) is above this job summary. 77 | 78 | 79 | 80 | PS: 81 | 82 | Read file for stderr output of this job. 83 | 84 | -------------------------------------------------------------------------------- /logs_rst/cst_female_maternal2.log: -------------------------------------------------------------------------------- 1 | Start time: 1493930548 2 | 3 | DFS traversal in compressed suffix trees 4 | 5 | Timer: interval=1000000, max_time=86400 6 | 7 | Reference: female 8 | 9 | FM-index: 2110.26 MB (5.83013 bpc) 10 | LCP array: 3690.1 MB (10.1948 bpc) 11 | Reference data: 5800.36 MB (16.025 bpc) 12 | 13 | 14 | Target: maternal2 15 | 16 | Relative (slow): 997.036 MB (2.75468 bpc) 17 | Relative (slow): 5127716113 nodes in 4606.37 seconds 18 | 19 | Select structures built in 230.437 seconds 20 | 21 | Relative (fast): 1160.22 MB (3.20553 bpc) 22 | Relative (fast): 5127716113 nodes in 4639.74 seconds 23 | 24 | cst_sct3_dac: 6543.94 MB (18.0801 bpc) 25 | cst_sct3_dac: 5127716113 nodes in 1172.17 seconds 26 | 27 | cst_sct3_plcp: 3905.7 MB (10.7909 bpc) 28 | cst_sct3_plcp: 5127716113 nodes in 1162.48 seconds 29 | 30 | cst_sada: 4462.21 MB (12.3285 bpc) 31 | cst_sada: 5127716113 nodes in 297.567 seconds 32 | 33 | cst_fully: 1802.96 MB (4.98135 bpc) 34 | cst_fully: 273000000 nodes in 86623.8 seconds (timeout) 35 | 36 | 37 | Memory usage: 12.4798 GB 38 | 39 | Finding maximal matches with a CST 40 | 41 | Timer: interval=1000000, max_time=86400 42 | 43 | Reference: female 44 | 45 | FM-index: 2110.26 MB (5.83013 bpc) 46 | LCP array: 3690.1 MB (10.1948 bpc) 47 | Reference data: 5800.36 MB (16.025 bpc) 48 | 49 | Target: maternal2 50 | Query: paternal_chr1_noN (225271390 bytes) 51 | 52 | Relative (slow): 997.036 MB (2.75468 bpc) 53 | Forward: 1474642 matches of average length 170.377 in 46997.1 seconds 54 | Backward: 1474642 matches of average length 170.377 in 837.827 seconds 55 | 56 | Relative (fast): 1160.22 MB (3.20553 bpc) 57 | Forward: 1474642 matches of average length 170.377 in 18066.7 seconds 58 | Backward: 1474642 matches of average length 170.377 in 836.332 seconds 59 | 60 | cst_sct3_dac: 6543.94 MB (18.0801 bpc) 61 | Forward: 1474642 matches of average length 170.377 in 6690.52 seconds 62 | Backward: 1474642 matches of average length 170.377 in 89.063 seconds 63 | 64 | cst_sct3_plcp: 3905.7 MB (10.7909 bpc) 65 | Forward: 1474642 matches of average length 170.377 in 10036.2 seconds 66 | Backward: 1474642 matches of average length 170.377 in 104.011 seconds 67 | 68 | cst_sada: 4462.21 MB (12.3285 bpc) 69 | Forward: 1474642 matches of average length 170.377 in 18014.2 seconds 70 | Backward: 1474642 matches of average length 170.377 in 1157.68 seconds 71 | 72 | cst_fully: 1802.96 MB (4.98135 bpc) 73 | Forward: 1474642 matches of average length 170.377 in 74969.9 seconds 74 | Backward: 1474642 matches of average length 170.377 in 705.192 seconds 75 | 76 | Maximal matches verified. 77 | 78 | Memory usage: 12.8711 GB 79 | 80 | Finish time: 1494208172 81 | -------------------------------------------------------------------------------- /logs_old_rcst/verify_lcp_3888648.log: -------------------------------------------------------------------------------- 1 | Start 2 | Thu Apr 16 11:41:56 BST 2015 3 | 4 | Verifying the relative LCP array 5 | 6 | Reference: human 7 | Target: maternal 8 | 9 | RLCP verifier 10 | 11 | Reference: human 12 | 13 | LCP array: 3862.42 MB (10.4663 bpc) 14 | 15 | 16 | Target: maternal 17 | 18 | LCP array: 3689.77 MB (10.1944 bpc) 19 | 20 | Phrases: 520.881 MB (1.43913 bpc) 21 | Blocks: 124.647 MB (0.344384 bpc) 22 | Samples: 196.667 MB (0.543366 bpc) 23 | Tree: 257.812 MB (0.712303 bpc) 24 | Relative LCP: 1100.01 MB (3.03918 bpc) 25 | 26 | LCP (random): 100000000 queries in 5.19795 seconds (0.0519795 µs/query) 27 | LCP (seq): 3036191208 queries in 2.9926 seconds (0.000985642 µs/query) 28 | RLCP (random): 100000000 queries in 109.64 seconds (1.0964 µs/query) 29 | RLCP (seq): 3036191208 queries in 361.823 seconds (0.11917 µs/query) 30 | 31 | RMQ: 100000000 queries in 276.931 seconds (2.76931 µs/query) 32 | 33 | PSV: 100000000 queries in 193.526 seconds (1.93526 µs/query) 34 | 35 | PSEV: 100000000 queries in 191.532 seconds (1.91532 µs/query) 36 | 37 | NSV: 100000000 queries in 190.958 seconds (1.90958 µs/query) 38 | 39 | NSEV: 100000000 queries in 185.394 seconds (1.85394 µs/query) 40 | 41 | 42 | End 43 | Thu Apr 16 12:08:01 BST 2015 44 | 45 | 46 | ------------------------------------------------------------ 47 | Sender: LSF System 48 | Subject: Job 3888648: in cluster Done 49 | 50 | Job was submitted from host by user in cluster . 51 | Job was executed on host(s) <32*vr-4-1-14>, in queue , as user in cluster . 52 | was used as the home directory. 53 | was used as the working directory. 54 | Started at Thu Apr 16 11:41:56 2015 55 | Results reported at Thu Apr 16 12:08:01 2015 56 | 57 | Your job looked like: 58 | 59 | ------------------------------------------------------------ 60 | # LSBATCH: User input 61 | /nfs/users/nfs_j/js35/job_scripts/verify_lcp human maternal 62 | ------------------------------------------------------------ 63 | 64 | Successfully completed. 65 | 66 | Resource usage summary: 67 | 68 | CPU time : 1576.38 sec. 69 | Max Memory : 10197 MB 70 | Average Memory : 9417.69 MB 71 | Total Requested Memory : 16384.00 MB 72 | Delta Memory : 6187.00 MB 73 | (Delta: the difference between total requested memory and actual max usage.) 74 | Max Swap : 10256 MB 75 | 76 | Max Processes : 4 77 | Max Threads : 5 78 | 79 | The output (if any) is above this job summary. 80 | 81 | 82 | 83 | PS: 84 | 85 | Read file for stderr output of this job. 86 | 87 | -------------------------------------------------------------------------------- /logs_old_rcst/verify_lcp_3888650.log: -------------------------------------------------------------------------------- 1 | Start 2 | Thu Apr 16 12:08:01 BST 2015 3 | 4 | Verifying the relative LCP array 5 | 6 | Reference: female 7 | Target: maternal2 8 | 9 | RLCP verifier 10 | 11 | Reference: female 12 | 13 | LCP array: 3690.1 MB (10.1948 bpc) 14 | 15 | 16 | Target: maternal2 17 | 18 | LCP array: 3689.77 MB (10.1944 bpc) 19 | 20 | Phrases: 400.738 MB (1.10719 bpc) 21 | Blocks: 97.8164 MB (0.270254 bpc) 22 | Samples: 109.513 MB (0.302572 bpc) 23 | Tree: 141.778 MB (0.391715 bpc) 24 | Relative LCP: 749.846 MB (2.07173 bpc) 25 | 26 | LCP (random): 100000000 queries in 6.96667 seconds (0.0696667 µs/query) 27 | LCP (seq): 3036191208 queries in 3.03326 seconds (0.000999036 µs/query) 28 | RLCP (random): 100000000 queries in 126.268 seconds (1.26268 µs/query) 29 | RLCP (seq): 3036191208 queries in 377.46 seconds (0.12432 µs/query) 30 | 31 | RMQ: 100000000 queries in 260.523 seconds (2.60523 µs/query) 32 | 33 | PSV: 100000000 queries in 192.33 seconds (1.9233 µs/query) 34 | 35 | PSEV: 100000000 queries in 180.109 seconds (1.80109 µs/query) 36 | 37 | NSV: 100000000 queries in 180.52 seconds (1.8052 µs/query) 38 | 39 | NSEV: 100000000 queries in 174.404 seconds (1.74404 µs/query) 40 | 41 | 42 | End 43 | Thu Apr 16 12:33:55 BST 2015 44 | 45 | 46 | ------------------------------------------------------------ 47 | Sender: LSF System 48 | Subject: Job 3888650: in cluster Done 49 | 50 | Job was submitted from host by user in cluster . 51 | Job was executed on host(s) <32*vr-4-1-14>, in queue , as user in cluster . 52 | was used as the home directory. 53 | was used as the working directory. 54 | Started at Thu Apr 16 12:08:01 2015 55 | Results reported at Thu Apr 16 12:33:55 2015 56 | 57 | Your job looked like: 58 | 59 | ------------------------------------------------------------ 60 | # LSBATCH: User input 61 | /nfs/users/nfs_j/js35/job_scripts/verify_lcp female maternal2 62 | ------------------------------------------------------------ 63 | 64 | Successfully completed. 65 | 66 | Resource usage summary: 67 | 68 | CPU time : 1562.30 sec. 69 | Max Memory : 9668 MB 70 | Average Memory : 8875.98 MB 71 | Total Requested Memory : 16384.00 MB 72 | Delta Memory : 6716.00 MB 73 | (Delta: the difference between total requested memory and actual max usage.) 74 | Max Swap : 9729 MB 75 | 76 | Max Processes : 4 77 | Max Threads : 5 78 | 79 | The output (if any) is above this job summary. 80 | 81 | 82 | 83 | PS: 84 | 85 | Read file for stderr output of this job. 86 | 87 | -------------------------------------------------------------------------------- /logs_old_rcst/build_rfm_4358894.log: -------------------------------------------------------------------------------- 1 | Start 2 | Fri Jun 19 13:25:00 BST 2015 3 | 4 | RFM construction benchmark 5 | 6 | Reference: human 7 | Target: maternal 8 | Threads: 8 9 | 10 | Relative FM-index builder 11 | Using OpenMP with 8 threads 12 | 13 | Algorithm: invariant 14 | Input format: plain 15 | SA sample rate: 257 16 | ISA sample rate: 512 17 | 18 | Reference: human 19 | 20 | BWT: 1120.49 MB (3.03628 bpc) 21 | SA samples: 694.655 MB (1.88235 bpc) 22 | ISA samples: 184.518 MB (0.5 bpc) 23 | Simple FM: 1999.67 MB (5.41863 bpc) 24 | 25 | 26 | Target: maternal 27 | Reference size: 3095693982 28 | Target size: 3036191208 29 | Built the merging bitvector in 3280.44 seconds 30 | Matched 3032774943 positions in 8422.83 seconds 31 | Found a common subsequence of length 2979962648 in 919.432 seconds 32 | Built the bwt_lcs bitvectors and samples in 1629.56 seconds 33 | Index built in 14300.1 seconds 34 | 35 | BWT: 1090.24 MB (3.01219 bpc) 36 | SA samples: 681.303 MB (1.88235 bpc) 37 | ISA samples: 180.971 MB (0.5 bpc) 38 | Simple FM: 1952.51 MB (5.39455 bpc) 39 | 40 | ref_minus_lcs: 43.4075 MB (0.119929 bpc) 41 | seq_minus_lcs: 19.2637 MB (0.0532231 bpc) 42 | bwt_lcs: 189.846 MB (0.524521 bpc) 43 | text_lcs: 126.431 MB (0.349311 bpc) 44 | SA samples: 45.0667 MB (0.124514 bpc) 45 | ISA samples: 22.6214 MB (0.0625 bpc) 46 | Relative FM: 446.637 MB (1.234 bpc) 47 | 48 | 49 | Memory usage: 85741.6 MB 50 | 51 | 52 | End 53 | Fri Jun 19 17:26:19 BST 2015 54 | 55 | 56 | ------------------------------------------------------------ 57 | Sender: LSF System 58 | Subject: Job 4358894: in cluster Done 59 | 60 | Job was submitted from host by user in cluster . 61 | Job was executed on host(s) <8*vr-4-1-04>, in queue , as user in cluster . 62 | was used as the home directory. 63 | was used as the working directory. 64 | Started at Fri Jun 19 13:24:59 2015 65 | Results reported at Fri Jun 19 17:26:19 2015 66 | 67 | Your job looked like: 68 | 69 | ------------------------------------------------------------ 70 | # LSBATCH: User input 71 | /nfs/users/nfs_j/js35/job_scripts/build_rfm human maternal 8 72 | ------------------------------------------------------------ 73 | 74 | Successfully completed. 75 | 76 | Resource usage summary: 77 | 78 | CPU time : 16295.37 sec. 79 | Max Memory : 85747 MB 80 | Average Memory : 12521.13 MB 81 | Total Requested Memory : 131072.00 MB 82 | Delta Memory : 45325.00 MB 83 | (Delta: the difference between total requested memory and actual max usage.) 84 | Max Swap : 89858 MB 85 | 86 | Max Processes : 4 87 | Max Threads : 12 88 | 89 | The output (if any) is above this job summary. 90 | 91 | 92 | 93 | PS: 94 | 95 | Read file for stderr output of this job. 96 | 97 | -------------------------------------------------------------------------------- /logs_old_rcst/build_rfm_4358895.log: -------------------------------------------------------------------------------- 1 | Start 2 | Fri Jun 19 13:25:00 BST 2015 3 | 4 | RFM construction benchmark 5 | 6 | Reference: female 7 | Target: maternal 8 | Threads: 8 9 | 10 | Relative FM-index builder 11 | Using OpenMP with 8 threads 12 | 13 | Algorithm: invariant 14 | Input format: plain 15 | SA sample rate: 257 16 | ISA sample rate: 512 17 | 18 | Reference: female 19 | 20 | BWT: 1090.28 MB (3.01219 bpc) 21 | SA samples: 681.332 MB (1.88235 bpc) 22 | ISA samples: 180.979 MB (0.5 bpc) 23 | Simple FM: 1952.6 MB (5.39454 bpc) 24 | 25 | 26 | Target: maternal 27 | Reference size: 3036320416 28 | Target size: 3036191208 29 | Built the merging bitvector in 2925.82 seconds 30 | Matched 3001221512 positions in 6958.24 seconds 31 | Found a common subsequence of length 2980010799 in 972.847 seconds 32 | Built the bwt_lcs bitvectors and samples in 1942.21 seconds 33 | Index built in 12839.9 seconds 34 | 35 | BWT: 1090.24 MB (3.01219 bpc) 36 | SA samples: 681.303 MB (1.88235 bpc) 37 | ISA samples: 180.971 MB (0.5 bpc) 38 | Simple FM: 1952.51 MB (5.39455 bpc) 39 | 40 | ref_minus_lcs: 19.2911 MB (0.053299 bpc) 41 | seq_minus_lcs: 19.2473 MB (0.0531778 bpc) 42 | bwt_lcs: 163.138 MB (0.450729 bpc) 43 | text_lcs: 125.532 MB (0.34683 bpc) 44 | SA samples: 45.0667 MB (0.124514 bpc) 45 | ISA samples: 22.6214 MB (0.0625 bpc) 46 | Relative FM: 394.897 MB (1.09105 bpc) 47 | 48 | 49 | Memory usage: 84237 MB 50 | 51 | 52 | End 53 | Fri Jun 19 17:02:06 BST 2015 54 | 55 | 56 | ------------------------------------------------------------ 57 | Sender: LSF System 58 | Subject: Job 4358895: in cluster Done 59 | 60 | Job was submitted from host by user in cluster . 61 | Job was executed on host(s) <8*vr-3-1-09>, in queue , as user in cluster . 62 | was used as the home directory. 63 | was used as the working directory. 64 | Started at Fri Jun 19 13:25:00 2015 65 | Results reported at Fri Jun 19 17:02:06 2015 66 | 67 | Your job looked like: 68 | 69 | ------------------------------------------------------------ 70 | # LSBATCH: User input 71 | /nfs/users/nfs_j/js35/job_scripts/build_rfm female maternal 8 72 | ------------------------------------------------------------ 73 | 74 | Successfully completed. 75 | 76 | Resource usage summary: 77 | 78 | CPU time : 14613.86 sec. 79 | Max Memory : 84242 MB 80 | Average Memory : 12916.67 MB 81 | Total Requested Memory : 131072.00 MB 82 | Delta Memory : 46830.00 MB 83 | (Delta: the difference between total requested memory and actual max usage.) 84 | Max Swap : 88399 MB 85 | 86 | Max Processes : 4 87 | Max Threads : 12 88 | 89 | The output (if any) is above this job summary. 90 | 91 | 92 | 93 | PS: 94 | 95 | Read file for stderr output of this job. 96 | 97 | -------------------------------------------------------------------------------- /logs_rst/cst_size.py: -------------------------------------------------------------------------------- 1 | #! /usr/bin/env python 2 | # -*- coding: utf-8 -*- 3 | 4 | # cst_size.py logfile1 [logfile2 ...] 5 | 6 | # This computes the total size of a collection of Relative CSTs, including 7 | # the size of the reference. 8 | 9 | import sys, re 10 | 11 | def printSize(type, megabytes, bytes): 12 | print (type + ":"), megabytes, "MB,", (megabytes * 1048576.0 * 8 / bytes), "bpc" 13 | 14 | def main(): 15 | 16 | if len(sys.argv) < 2: 17 | return 18 | 19 | repet_sizes = {} 20 | repet_sizes["0.0001"] = 101453735 21 | repet_sizes["0.0003"] = 127063602 22 | repet_sizes["0.001"] = 158264566 23 | repet_sizes["0.003"] = 199825193 24 | repet_sizes["0.01"] = 240325131 25 | repet_sizes["0.03"] = 250474638 26 | repet_sizes["0.1"] = 230936675 27 | 28 | rate_pattern = "Mutation rate:\s(.*)" 29 | rate = re.compile(rate_pattern) 30 | 31 | size_pattern = "Target size:\s*(.*)" 32 | size = re.compile(size_pattern) 33 | 34 | fm_pattern = "Simple FM:\s*(.*) MB" 35 | fm = re.compile(fm_pattern) 36 | 37 | lcp_pattern = "LCP array:\s*(.*) MB" 38 | lcp = re.compile(lcp_pattern) 39 | 40 | rfm_pattern = "Relative FM:\s*(.*) MB" 41 | rfm = re.compile(rfm_pattern) 42 | 43 | rlcp_pattern = "Relative LCP:\s*(.*) MB" 44 | rlcp = re.compile(rlcp_pattern) 45 | 46 | repet_pattern = "Total:\s*(.*)" 47 | repet = re.compile(repet_pattern) 48 | 49 | for arg in range(1, len(sys.argv)): 50 | fm_size = 0.0 51 | lcp_size = 0.0 52 | rfm_size = 0.0 53 | rlcp_size = 0.0 54 | total_size = 0.0 55 | repet_size = 0.0 56 | bytes = 0 57 | infile = open(sys.argv[arg], "rb") 58 | state = 0 59 | 60 | for line in infile: 61 | if state == 0: # Mutation rate 62 | res = rate.search(line) 63 | if res: 64 | if res.group(1) in repet_sizes: 65 | repet_size = repet_sizes[res.group(1)] / 1048576.0 66 | state = 1 67 | elif state == 1: # Collection size 68 | res = size.search(line) 69 | if res: 70 | bytes += int(res.group(1)) 71 | res = fm.search(line) 72 | if res: 73 | fm_size = float(res.group(1)) 74 | state = 2 75 | elif state == 2: # RFM size 76 | res = rfm.search(line) 77 | if res: 78 | rfm_size += float(res.group(1)) 79 | res = lcp.search(line) 80 | if res: 81 | lcp_size = float(res.group(1)) 82 | state = 3 83 | elif state == 3: # RLCP size 84 | res = rlcp.search(line) 85 | if res: 86 | rlcp_size += float(res.group(1)) 87 | 88 | infile.close() 89 | total_size = fm_size + lcp_size + rfm_size + rlcp_size 90 | 91 | print "File:", sys.argv[arg] 92 | print "Original:", (bytes / 1048576.0), "MB" 93 | printSize("Reference", fm_size + lcp_size, bytes) 94 | printSize("RFM", rfm_size, bytes) 95 | printSize("RLCP", rlcp_size, bytes) 96 | printSize("RFM+RLCP", rfm_size + rlcp_size, bytes) 97 | printSize("Relative CST", total_size, bytes) 98 | printSize("Repetitive CST", repet_size, bytes) 99 | print 100 | 101 | if __name__ == "__main__": 102 | main() 103 | -------------------------------------------------------------------------------- /logs_old_rcst/old_rfm_3891260.log: -------------------------------------------------------------------------------- 1 | Start 2 | Fri Apr 17 09:18:50 BST 2015 3 | 4 | Testing the relative FM-index with assembled genomes 5 | 6 | Reference: human 7 | Target: old_maternal 8 | Threads: 8 9 | 10 | BWT construction 11 | Options: alphabet 12 | 13 | File: old_maternal 14 | Text size: 3036191207 15 | BWT built in 997.387 seconds (2.90312 MB/s) 16 | Alphabet written to old_maternal.alpha 17 | BWT written to old_maternal.bwt 18 | 19 | Relative FM-index builder 20 | Using OpenMP with 8 threads 21 | 22 | Algorithm: partitioning 23 | Input format: plain 24 | Block size: 1024 25 | Maximum diagonal: 8192 26 | Maximum length: 32 27 | Buffers: on demand 28 | 29 | Reference: human 30 | 31 | BWT: 1120.49 MB (3.03628 bpc) 32 | SA samples: 694.655 MB (1.88235 bpc) 33 | ISA samples: 184.518 MB (0.5 bpc) 34 | Simple FM: 1999.67 MB (5.41863 bpc) 35 | 36 | 37 | Target: old_maternal 38 | Reference size: 3095693982 39 | Target size: 3036191208 40 | Found 8327360 ranges with intersection length 3033218320 in 15.5448 seconds 41 | Partitioning misses: reference 1832, target 741 42 | Partitioning losses: reference 62473830, target 2972147 43 | Found a common subsequence of length 2991932598 in 82.5311 seconds 44 | LCS losses: exact 41247729, heuristics 37993 45 | 46 | Index built in 140.969 seconds 47 | 48 | BWT: 1090.24 MB (3.01219 bpc) 49 | Simple FM: 1090.24 MB (3.01219 bpc) 50 | 51 | ref_minus_lcs: 38.5664 MB (0.106554 bpc) 52 | seq_minus_lcs: 15.1568 MB (0.0418764 bpc) 53 | bwt_lcs: 163.937 MB (0.452938 bpc) 54 | Relative FM: 217.661 MB (0.60137 bpc) 55 | 56 | 57 | Memory usage: 5076.3 MB 58 | 59 | 60 | End 61 | Fri Apr 17 09:42:11 BST 2015 62 | 63 | 64 | ------------------------------------------------------------ 65 | Sender: LSF System 66 | Subject: Job 3891260: in cluster Done 67 | 68 | Job was submitted from host by user in cluster . 69 | Job was executed on host(s) <8*vr-4-1-04>, in queue , as user in cluster . 70 | was used as the home directory. 71 | was used as the working directory. 72 | Started at Fri Apr 17 09:18:50 2015 73 | Results reported at Fri Apr 17 09:42:11 2015 74 | 75 | Your job looked like: 76 | 77 | ------------------------------------------------------------ 78 | # LSBATCH: User input 79 | /nfs/users/nfs_j/js35/job_scripts/old_rfm human old_maternal 8 80 | ------------------------------------------------------------ 81 | 82 | Successfully completed. 83 | 84 | Resource usage summary: 85 | 86 | CPU time : 1906.53 sec. 87 | Max Memory : 26067 MB 88 | Average Memory : 19337.09 MB 89 | Total Requested Memory : 32768.00 MB 90 | Delta Memory : 6701.00 MB 91 | (Delta: the difference between total requested memory and actual max usage.) 92 | Max Swap : 26128 MB 93 | 94 | Max Processes : 4 95 | Max Threads : 12 96 | 97 | The output (if any) is above this job summary. 98 | 99 | 100 | 101 | PS: 102 | 103 | Read file for stderr output of this job. 104 | 105 | -------------------------------------------------------------------------------- /logs_old_rcst/old_rfm_3892021.log: -------------------------------------------------------------------------------- 1 | Start 2 | Fri Apr 17 15:15:35 BST 2015 3 | 4 | Testing the relative FM-index with assembled genomes 5 | 6 | Reference: human 7 | Target: old_paternal 8 | Threads: 8 9 | 10 | BWT construction 11 | Options: alphabet 12 | 13 | File: old_paternal 14 | Text size: 3036185259 15 | BWT built in 1046.43 seconds (2.76706 MB/s) 16 | Alphabet written to old_paternal.alpha 17 | BWT written to old_paternal.bwt 18 | 19 | Relative FM-index builder 20 | Using OpenMP with 8 threads 21 | 22 | Algorithm: partitioning 23 | Input format: plain 24 | Block size: 1024 25 | Maximum diagonal: 8192 26 | Maximum length: 32 27 | Buffers: on demand 28 | 29 | Reference: human 30 | 31 | BWT: 1120.49 MB (3.03628 bpc) 32 | SA samples: 694.655 MB (1.88235 bpc) 33 | ISA samples: 184.518 MB (0.5 bpc) 34 | Simple FM: 1999.67 MB (5.41863 bpc) 35 | 36 | 37 | Target: old_paternal 38 | Reference size: 3095693982 39 | Target size: 3036185260 40 | Found 8326922 ranges with intersection length 3033209612 in 16.4793 seconds 41 | Partitioning misses: reference 1865, target 699 42 | Partitioning losses: reference 62482505, target 2974949 43 | Found a common subsequence of length 2991871821 in 84.4343 seconds 44 | LCS losses: exact 41299743, heuristics 38048 45 | 46 | Index built in 141.447 seconds 47 | 48 | BWT: 1090.24 MB (3.01219 bpc) 49 | Simple FM: 1090.24 MB (3.0122 bpc) 50 | 51 | ref_minus_lcs: 38.5915 MB (0.106624 bpc) 52 | seq_minus_lcs: 15.1762 MB (0.04193 bpc) 53 | bwt_lcs: 164.004 MB (0.453123 bpc) 54 | Relative FM: 217.772 MB (0.601678 bpc) 55 | 56 | 57 | Memory usage: 5048.17 MB 58 | 59 | 60 | End 61 | Fri Apr 17 15:39:29 BST 2015 62 | 63 | 64 | ------------------------------------------------------------ 65 | Sender: LSF System 66 | Subject: Job 3892021: in cluster Done 67 | 68 | Job was submitted from host by user in cluster . 69 | Job was executed on host(s) <8*vr-4-1-13>, in queue , as user in cluster . 70 | was used as the home directory. 71 | was used as the working directory. 72 | Started at Fri Apr 17 15:15:35 2015 73 | Results reported at Fri Apr 17 15:39:29 2015 74 | 75 | Your job looked like: 76 | 77 | ------------------------------------------------------------ 78 | # LSBATCH: User input 79 | /nfs/users/nfs_j/js35/job_scripts/old_rfm human old_paternal 8 80 | ------------------------------------------------------------ 81 | 82 | Successfully completed. 83 | 84 | Resource usage summary: 85 | 86 | CPU time : 1954.78 sec. 87 | Max Memory : 26066 MB 88 | Average Memory : 19826.27 MB 89 | Total Requested Memory : 32768.00 MB 90 | Delta Memory : 6702.00 MB 91 | (Delta: the difference between total requested memory and actual max usage.) 92 | Max Swap : 26128 MB 93 | 94 | Max Processes : 4 95 | Max Threads : 12 96 | 97 | The output (if any) is above this job summary. 98 | 99 | 100 | 101 | PS: 102 | 103 | Read file for stderr output of this job. 104 | 105 | -------------------------------------------------------------------------------- /logs_old_rcst/old_rfm_3891261.log: -------------------------------------------------------------------------------- 1 | Start 2 | Fri Apr 17 09:19:00 BST 2015 3 | 4 | Testing the relative FM-index with assembled genomes 5 | 6 | Reference: female 7 | Target: old_maternal2 8 | Threads: 8 9 | 10 | BWT construction 11 | Options: alphabet 12 | 13 | File: old_maternal2 14 | Text size: 3036191207 15 | BWT built in 913.255 seconds (3.17057 MB/s) 16 | Alphabet written to old_maternal2.alpha 17 | BWT written to old_maternal2.bwt 18 | 19 | Relative FM-index builder 20 | Using OpenMP with 8 threads 21 | 22 | Algorithm: partitioning 23 | Input format: plain 24 | Block size: 1024 25 | Maximum diagonal: 8192 26 | Maximum length: 32 27 | Buffers: on demand 28 | 29 | Reference: female 30 | 31 | BWT: 1090.28 MB (3.01219 bpc) 32 | SA samples: 681.332 MB (1.88235 bpc) 33 | ISA samples: 180.979 MB (0.5 bpc) 34 | Simple FM: 1952.6 MB (5.39454 bpc) 35 | 36 | 37 | Target: old_maternal2 38 | Reference size: 3036320416 39 | Target size: 3036191208 40 | Found 8315561 ranges with intersection length 3028240572 in 15.903 seconds 41 | Partitioning misses: reference 698, target 796 42 | Partitioning losses: reference 8079146, target 7949840 43 | Found a common subsequence of length 2990988921 in 79.6782 seconds 44 | LCS losses: exact 37213658, heuristics 37993 45 | 46 | Index built in 136.634 seconds 47 | 48 | BWT: 1090.24 MB (3.01219 bpc) 49 | Simple FM: 1090.24 MB (3.01219 bpc) 50 | 51 | ref_minus_lcs: 15.5191 MB (0.0428772 bpc) 52 | seq_minus_lcs: 15.4758 MB (0.0427576 bpc) 53 | bwt_lcs: 150.468 MB (0.415724 bpc) 54 | Relative FM: 181.463 MB (0.50136 bpc) 55 | 56 | 57 | Memory usage: 4977.85 MB 58 | 59 | 60 | End 61 | Fri Apr 17 09:40:34 BST 2015 62 | 63 | 64 | ------------------------------------------------------------ 65 | Sender: LSF System 66 | Subject: Job 3891261: in cluster Done 67 | 68 | Job was submitted from host by user in cluster . 69 | Job was executed on host(s) <8*vr-4-1-04>, in queue , as user in cluster . 70 | was used as the home directory. 71 | was used as the working directory. 72 | Started at Fri Apr 17 09:19:00 2015 73 | Results reported at Fri Apr 17 09:40:40 2015 74 | 75 | Your job looked like: 76 | 77 | ------------------------------------------------------------ 78 | # LSBATCH: User input 79 | /nfs/users/nfs_j/js35/job_scripts/old_rfm female old_maternal2 8 80 | ------------------------------------------------------------ 81 | 82 | Successfully completed. 83 | 84 | Resource usage summary: 85 | 86 | CPU time : 1789.27 sec. 87 | Max Memory : 26067 MB 88 | Average Memory : 19331.80 MB 89 | Total Requested Memory : 32768.00 MB 90 | Delta Memory : 6701.00 MB 91 | (Delta: the difference between total requested memory and actual max usage.) 92 | Max Swap : 26128 MB 93 | 94 | Max Processes : 4 95 | Max Threads : 12 96 | 97 | The output (if any) is above this job summary. 98 | 99 | 100 | 101 | PS: 102 | 103 | Read file for stderr output of this job. 104 | 105 | -------------------------------------------------------------------------------- /logs_old_rcst/old_rfm_3892022.log: -------------------------------------------------------------------------------- 1 | Start 2 | Fri Apr 17 15:15:44 BST 2015 3 | 4 | Testing the relative FM-index with assembled genomes 5 | 6 | Reference: female 7 | Target: old_paternal2 8 | Threads: 8 9 | 10 | BWT construction 11 | Options: alphabet 12 | 13 | File: old_paternal2 14 | Text size: 3036185259 15 | BWT built in 1068.8 seconds (2.70914 MB/s) 16 | Alphabet written to old_paternal2.alpha 17 | BWT written to old_paternal2.bwt 18 | 19 | Relative FM-index builder 20 | Using OpenMP with 8 threads 21 | 22 | Algorithm: partitioning 23 | Input format: plain 24 | Block size: 1024 25 | Maximum diagonal: 8192 26 | Maximum length: 32 27 | Buffers: on demand 28 | 29 | Reference: female 30 | 31 | BWT: 1090.28 MB (3.01219 bpc) 32 | SA samples: 681.332 MB (1.88235 bpc) 33 | ISA samples: 180.979 MB (0.5 bpc) 34 | Simple FM: 1952.6 MB (5.39454 bpc) 35 | 36 | 37 | Target: old_paternal2 38 | Reference size: 3036320416 39 | Target size: 3036185260 40 | Found 8315237 ranges with intersection length 3028232265 in 17.0266 seconds 41 | Partitioning misses: reference 745, target 754 42 | Partitioning losses: reference 8087406, target 7952241 43 | Found a common subsequence of length 2990927430 in 88.7386 seconds 44 | LCS losses: exact 37266787, heuristics 38048 45 | 46 | Index built in 142.53 seconds 47 | 48 | BWT: 1090.24 MB (3.01219 bpc) 49 | Simple FM: 1090.24 MB (3.0122 bpc) 50 | 51 | ref_minus_lcs: 15.5404 MB (0.0429363 bpc) 52 | seq_minus_lcs: 15.4946 MB (0.0428098 bpc) 53 | bwt_lcs: 150.539 MB (0.41592 bpc) 54 | Relative FM: 181.574 MB (0.501667 bpc) 55 | 56 | 57 | Memory usage: 4977.84 MB 58 | 59 | 60 | End 61 | Fri Apr 17 15:40:14 BST 2015 62 | 63 | 64 | ------------------------------------------------------------ 65 | Sender: LSF System 66 | Subject: Job 3892022: in cluster Done 67 | 68 | Job was submitted from host by user in cluster . 69 | Job was executed on host(s) <8*vr-4-1-14>, in queue , as user in cluster . 70 | was used as the home directory. 71 | was used as the working directory. 72 | Started at Fri Apr 17 15:15:44 2015 73 | Results reported at Fri Apr 17 15:40:14 2015 74 | 75 | Your job looked like: 76 | 77 | ------------------------------------------------------------ 78 | # LSBATCH: User input 79 | /nfs/users/nfs_j/js35/job_scripts/old_rfm female old_paternal2 8 80 | ------------------------------------------------------------ 81 | 82 | Successfully completed. 83 | 84 | Resource usage summary: 85 | 86 | CPU time : 2039.12 sec. 87 | Max Memory : 26067 MB 88 | Average Memory : 19049.72 MB 89 | Total Requested Memory : 32768.00 MB 90 | Delta Memory : 6701.00 MB 91 | (Delta: the difference between total requested memory and actual max usage.) 92 | Max Swap : 26128 MB 93 | 94 | Max Processes : 4 95 | Max Threads : 12 96 | 97 | The output (if any) is above this job summary. 98 | 99 | 100 | 101 | PS: 102 | 103 | Read file for stderr output of this job. 104 | 105 | -------------------------------------------------------------------------------- /rlcp_size.cpp: -------------------------------------------------------------------------------- 1 | /* 2 | Copyright (c) 2017 Genome Research Ltd. 3 | 4 | Author: Jouni Siren 5 | 6 | Permission is hereby granted, free of charge, to any person obtaining a copy 7 | of this software and associated documentation files (the "Software"), to deal 8 | in the Software without restriction, including without limitation the rights 9 | to use, copy, modify, merge, publish, distribute, sublicense, and/or sell 10 | copies of the Software, and to permit persons to whom the Software is 11 | furnished to do so, subject to the following conditions: 12 | 13 | The above copyright notice and this permission notice shall be included in all 14 | copies or substantial portions of the Software. 15 | 16 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR 17 | IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, 18 | FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE 19 | AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER 20 | LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, 21 | OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE 22 | SOFTWARE. 23 | */ 24 | 25 | #include 26 | #include 27 | 28 | #include "new_relative_lcp.h" 29 | 30 | using namespace relative; 31 | 32 | //------------------------------------------------------------------------------ 33 | 34 | void 35 | printRecursive(sdsl::structure_tree_node* v, size_type bytes) 36 | { 37 | if(!(v->name.empty())) { printSize(v->name, v->size, bytes); } 38 | if(v->children.size() > 0 && v->name != "literals" && v->name != "parse") 39 | { 40 | for(const auto& child : v->children) { printRecursive(child.second.get(), bytes); } 41 | } 42 | } 43 | 44 | //------------------------------------------------------------------------------ 45 | 46 | int 47 | main(int argc, char** argv) 48 | { 49 | if(argc < 3) 50 | { 51 | std::cerr << "Usage: rlcp_size ref seq1 [seq2 ...]" << std::endl; 52 | std::cerr << std::endl; 53 | return 1; 54 | } 55 | 56 | std::cout << "RLCP size breakdown" << std::endl; 57 | std::cout << std::endl; 58 | 59 | std::string ref_name = argv[1]; 60 | std::cout << "Reference: " << ref_name << std::endl; 61 | std::cout << std::endl; 62 | 63 | NewRelativeLCP::lcp_type reference; 64 | sdsl::load_from_file(reference, ref_name + LCP_EXTENSION); 65 | printSize("LCP", sdsl::size_in_bytes(reference), reference.size()); std::cout << std::endl; 66 | std::cout << std::endl; 67 | 68 | for(int arg = 2; arg < argc; arg++) 69 | { 70 | std::string seq_name = argv[arg]; 71 | std::cout << "Target: " << seq_name << std::endl; 72 | std::cout << std::endl; 73 | 74 | NewRelativeLCP target(reference, seq_name); 75 | target.reportSize(true); 76 | std::cout << std::endl; 77 | 78 | std::unique_ptr st_node(new sdsl::structure_tree_node("", "type")); 79 | sdsl::nullstream ns; 80 | target.array.serialize(ns, st_node.get(), "RLZAP"); 81 | printRecursive(st_node.get(), target.size()); 82 | std::cout << std::endl; 83 | 84 | std::cout << std::endl; 85 | } 86 | 87 | std::cout << "Memory used: " << inMegabytes(memoryUsage()) << " MB" << std::endl; 88 | std::cout << std::endl; 89 | return 0; 90 | } 91 | 92 | //------------------------------------------------------------------------------ 93 | -------------------------------------------------------------------------------- /logs_old_rcst/verify_psi_3917632.log: -------------------------------------------------------------------------------- 1 | Start 2 | Wed Apr 22 10:40:57 BST 2015 3 | 4 | Verifying LF/Psi operations 5 | 6 | Reference: human 7 | Target: maternal 8 | 9 | RFM and RLCP verifier 10 | 11 | Reference: human 12 | 13 | FM-index: 1999.67 MB (5.41863 bpc) 14 | 15 | LCP array: 3862.42 MB (10.4663 bpc) 16 | 17 | 18 | Target: maternal 19 | 20 | BWT: 1090.24 MB (3.01219 bpc) 21 | SA samples: 681.303 MB (1.88235 bpc) 22 | ISA samples: 180.971 MB (0.5 bpc) 23 | Simple FM: 1952.51 MB (5.39455 bpc) 24 | 25 | ref_minus_lcs: 43.4075 MB (0.119929 bpc) 26 | seq_minus_lcs: 19.2637 MB (0.0532231 bpc) 27 | bwt_lcs: 189.846 MB (0.524521 bpc) 28 | text_lcs: 126.431 MB (0.349311 bpc) 29 | SA samples: 45.0667 MB (0.124514 bpc) 30 | ISA samples: 22.6214 MB (0.0625 bpc) 31 | Relative FM: 446.637 MB (1.234 bpc) 32 | 33 | LCP array: 3689.77 MB (10.1944 bpc) 34 | 35 | Phrases: 520.881 MB (1.43913 bpc) 36 | Blocks: 124.647 MB (0.344384 bpc) 37 | Samples: 196.667 MB (0.543366 bpc) 38 | Tree: 257.812 MB (0.712303 bpc) 39 | Relative LCP: 1100.01 MB (3.03918 bpc) 40 | 41 | LF (FM): 10000000 queries in 5.59998 seconds (0.559998 µs/query) 42 | 43 | LF (RFM): 10000000 queries in 39.7977 seconds (3.97977 µs/query) 44 | 45 | Psi (FM): 10000000 queries in 11.3925 seconds (1.13925 µs/query) 46 | 47 | Psi (RFM, slow): 10000000 queries in 472.488 seconds (47.2488 µs/query) 48 | 49 | Select structures built in 546.911 seconds 50 | Relative select: 189.888 MB (0.524637 bpc) 51 | 52 | Psi (RFM, fast): 10000000 queries in 62.765 seconds (6.2765 µs/query) 53 | 54 | 55 | Memory used: 14546.2 MB 56 | 57 | 58 | End 59 | Wed Apr 22 11:02:52 BST 2015 60 | 61 | 62 | ------------------------------------------------------------ 63 | Sender: LSF System 64 | Subject: Job 3917632: in cluster Done 65 | 66 | Job was submitted from host by user in cluster . 67 | Job was executed on host(s) <32*vr-4-1-10>, in queue , as user in cluster . 68 | was used as the home directory. 69 | was used as the working directory. 70 | Started at Wed Apr 22 10:40:57 2015 71 | Results reported at Wed Apr 22 11:02:52 2015 72 | 73 | Your job looked like: 74 | 75 | ------------------------------------------------------------ 76 | # LSBATCH: User input 77 | /nfs/users/nfs_j/js35/job_scripts/verify_psi human maternal 78 | ------------------------------------------------------------ 79 | 80 | Successfully completed. 81 | 82 | Resource usage summary: 83 | 84 | CPU time : 1325.60 sec. 85 | Max Memory : 14282 MB 86 | Average Memory : 12380.86 MB 87 | Total Requested Memory : 16384.00 MB 88 | Delta Memory : 2102.00 MB 89 | (Delta: the difference between total requested memory and actual max usage.) 90 | Max Swap : 14341 MB 91 | 92 | Max Processes : 4 93 | Max Threads : 5 94 | 95 | The output (if any) is above this job summary. 96 | 97 | 98 | 99 | PS: 100 | 101 | Read file for stderr output of this job. 102 | 103 | -------------------------------------------------------------------------------- /logs_old_rcst/verify_psi_3917633.log: -------------------------------------------------------------------------------- 1 | Start 2 | Wed Apr 22 10:40:57 BST 2015 3 | 4 | Verifying LF/Psi operations 5 | 6 | Reference: female 7 | Target: maternal2 8 | 9 | RFM and RLCP verifier 10 | 11 | Reference: female 12 | 13 | FM-index: 1952.6 MB (5.39454 bpc) 14 | 15 | LCP array: 3690.1 MB (10.1948 bpc) 16 | 17 | 18 | Target: maternal2 19 | 20 | BWT: 1090.24 MB (3.01219 bpc) 21 | SA samples: 681.303 MB (1.88235 bpc) 22 | ISA samples: 180.971 MB (0.5 bpc) 23 | Simple FM: 1952.51 MB (5.39455 bpc) 24 | 25 | ref_minus_lcs: 19.2911 MB (0.053299 bpc) 26 | seq_minus_lcs: 19.2473 MB (0.0531778 bpc) 27 | bwt_lcs: 163.138 MB (0.450729 bpc) 28 | text_lcs: 125.532 MB (0.34683 bpc) 29 | SA samples: 45.0667 MB (0.124514 bpc) 30 | ISA samples: 22.6214 MB (0.0625 bpc) 31 | Relative FM: 394.897 MB (1.09105 bpc) 32 | 33 | LCP array: 3689.77 MB (10.1944 bpc) 34 | 35 | Phrases: 400.738 MB (1.10719 bpc) 36 | Blocks: 97.8164 MB (0.270254 bpc) 37 | Samples: 109.513 MB (0.302572 bpc) 38 | Tree: 141.778 MB (0.391715 bpc) 39 | Relative LCP: 749.846 MB (2.07173 bpc) 40 | 41 | LF (FM): 10000000 queries in 6.26621 seconds (0.626621 µs/query) 42 | 43 | LF (RFM): 10000000 queries in 38.6133 seconds (3.86133 µs/query) 44 | 45 | Psi (FM): 10000000 queries in 15.6329 seconds (1.56329 µs/query) 46 | 47 | Psi (RFM, slow): 10000000 queries in 550.679 seconds (55.0679 µs/query) 48 | 49 | Select structures built in 563.098 seconds 50 | Relative select: 163.182 MB (0.450851 bpc) 51 | 52 | Psi (RFM, fast): 10000000 queries in 64.8574 seconds (6.48574 µs/query) 53 | 54 | 55 | Memory used: 13888.1 MB 56 | 57 | 58 | End 59 | Wed Apr 22 11:04:59 BST 2015 60 | 61 | 62 | ------------------------------------------------------------ 63 | Sender: LSF System 64 | Subject: Job 3917633: in cluster Done 65 | 66 | Job was submitted from host by user in cluster . 67 | Job was executed on host(s) <32*vr-2-3-16>, in queue , as user in cluster . 68 | was used as the home directory. 69 | was used as the working directory. 70 | Started at Wed Apr 22 10:40:57 2015 71 | Results reported at Wed Apr 22 11:04:59 2015 72 | 73 | Your job looked like: 74 | 75 | ------------------------------------------------------------ 76 | # LSBATCH: User input 77 | /nfs/users/nfs_j/js35/job_scripts/verify_psi female maternal2 78 | ------------------------------------------------------------ 79 | 80 | Successfully completed. 81 | 82 | Resource usage summary: 83 | 84 | CPU time : 1452.93 sec. 85 | Max Memory : 13611 MB 86 | Average Memory : 11727.12 MB 87 | Total Requested Memory : 16384.00 MB 88 | Delta Memory : 2773.00 MB 89 | (Delta: the difference between total requested memory and actual max usage.) 90 | Max Swap : 13672 MB 91 | 92 | Max Processes : 4 93 | Max Threads : 5 94 | 95 | The output (if any) is above this job summary. 96 | 97 | 98 | 99 | PS: 100 | 101 | Read file for stderr output of this job. 102 | 103 | -------------------------------------------------------------------------------- /scripts/fasta2seq.cpp: -------------------------------------------------------------------------------- 1 | #include 2 | #include 3 | #include 4 | #include 5 | #include 6 | 7 | #include 8 | 9 | 10 | uint64_t fileSize(std::ifstream& file); 11 | 12 | const uint64_t MEGABYTE = 1048576; 13 | 14 | 15 | int 16 | main(int argc, char** argv) 17 | { 18 | if(argc < 2) 19 | { 20 | std::cerr << "Usage: fasta2seq [-n] base_name" << std::endl; 21 | std::cerr << " -n Truncate runs of Ns into single Ns" << std::endl; 22 | return 1; 23 | } 24 | 25 | bool truncate_Ns = false; 26 | int c = 0; 27 | while((c = getopt(argc, argv, "n")) != -1) 28 | { 29 | if(c == 'n') { truncate_Ns = true; } 30 | else { return 2; } 31 | } 32 | 33 | std::string base_name = argv[optind]; 34 | std::string fasta_name = base_name + ".fa"; 35 | std::cout << "Extracting sequences from FASTA file " << fasta_name << std::endl; 36 | if(truncate_Ns) 37 | { 38 | std::cout << "Truncating runs of Ns" << std::endl; 39 | } 40 | 41 | std::ifstream fasta_file(fasta_name.c_str(), std::ios_base::binary); 42 | if(!fasta_file) 43 | { 44 | std::cerr << "fasta2seq: Cannot open FASTA file " << fasta_name << std::endl; 45 | return 2; 46 | } 47 | uint64_t bytes = fileSize(fasta_file); 48 | char* buffer = new char[bytes]; 49 | fasta_file.read(buffer, bytes); 50 | fasta_file.close(); 51 | std::cout << "Bytes: " << bytes << std::endl; 52 | std::cout << std::endl; 53 | 54 | std::ofstream seq_file(base_name.c_str(), std::ios_base::binary); 55 | if(!seq_file) 56 | { 57 | std::cerr << "fasta2seq: Cannot open sequence file " << base_name << std::endl; 58 | return 3; 59 | } 60 | 61 | char* write_buffer = new char[1048576]; 62 | uint64_t sequences = 0, size = 0; 63 | std::vector counts(256, 0); 64 | bool header_line = false, in_run = false; 65 | for(uint64_t i = 0; i < bytes; i++) 66 | { 67 | if(header_line) 68 | { 69 | if(buffer[i] == '\n') { header_line = false; } 70 | continue; 71 | } 72 | if(buffer[i] == '>') { header_line = true; sequences++; continue; } 73 | if(buffer[i] == '\n') { continue; } 74 | unsigned char temp = toupper(buffer[i]); 75 | if(truncate_Ns) 76 | { 77 | if(temp == 'N') 78 | { 79 | if(in_run) { continue; } 80 | in_run = true; 81 | } 82 | else { in_run = false; } 83 | } 84 | counts[temp]++; 85 | write_buffer[size % MEGABYTE] = temp; 86 | size++; 87 | if(size % MEGABYTE == 0) 88 | { 89 | seq_file.write(write_buffer, MEGABYTE); 90 | } 91 | } 92 | if(size % MEGABYTE != 0) { seq_file.write(write_buffer, size % MEGABYTE); } 93 | seq_file.close(); 94 | delete[] write_buffer; write_buffer = 0; 95 | delete[] buffer; buffer = 0; 96 | 97 | std::cout << "Sequences: " << sequences << std::endl; 98 | std::cout << "Bytes: " << size << std::endl; 99 | for(uint64_t i = 0; i < 256; i++) 100 | { 101 | if(counts[i] > 0) { std::cout << "counts[" << i << "] = " << counts[i] << std::endl; } 102 | } 103 | std::cout << std::endl; 104 | 105 | return 0; 106 | } 107 | 108 | uint64_t 109 | fileSize(std::ifstream& file) 110 | { 111 | std::streamoff curr = file.tellg(); 112 | file.seekg(0, std::ios::end); 113 | std::streamoff size = file.tellg(); 114 | file.seekg(0, std::ios::beg); 115 | size -= file.tellg(); 116 | file.seekg(curr, std::ios::beg); 117 | return size; 118 | } 119 | -------------------------------------------------------------------------------- /logs_old_rcst/index_synth_0.0003.log: -------------------------------------------------------------------------------- 1 | Start 2 | Thu Apr 16 09:41:59 BST 2015 3 | 4 | Indexing a synthetic human genome 5 | 6 | Reference: human 7 | Mutation rate: 0.0003 8 | Target: synth_0.0003 9 | Threads: 8 10 | 11 | Sequence mutator 12 | 13 | Source: human 14 | Size: 3095693981 15 | Target: synth_0.0003 16 | Mutation rate: 0.0003 17 | 18 | Target size: 3095693081 19 | Substitutions: 773689 20 | Insertions: 46357, total size 231577 21 | Deletions: 46297, total size 232477 22 | 23 | BWT construction 24 | Options: alphabet isa_sample_rate=64 lcp sa_sample_rate=17 25 | 26 | File: synth_0.0003 27 | Text size: 3095693081 28 | BWT built in 1063.9 seconds (2.77497 MB/s) 29 | Alphabet written to synth_0.0003.alpha 30 | BWT written to synth_0.0003.bwt 31 | Samples written to synth_0.0003.samples 32 | LCP array built in 731.449 seconds (4.03621 MB/s) 33 | LCP array written to synth_0.0003.lcp 34 | 35 | Relative FM-index and LCP array builder 36 | Using OpenMP with 8 threads 37 | 38 | Algorithm: invariant 39 | Input format: plain 40 | SA sample rate: 257 41 | ISA sample rate: 512 42 | 43 | Reference: human 44 | 45 | BWT: 1120.49 MB (3.03628 bpc) 46 | SA samples: 694.655 MB (1.88235 bpc) 47 | ISA samples: 184.518 MB (0.5 bpc) 48 | Simple FM: 1999.67 MB (5.41863 bpc) 49 | 50 | LCP array: 3862.42 MB (10.4663 bpc) 51 | 52 | 53 | Target: synth_0.0003 54 | Reference size: 3095693982 55 | Target size: 3095693082 56 | Built the merging bitvector in 3530.1 seconds 57 | Matched 3082697677 positions in 9111.88 seconds 58 | Found a common subsequence of length 3072078874 in 903.427 seconds 59 | Built the bwt_lcs bitvectors and samples in 1819.45 seconds 60 | Index built in 15394.9 seconds 61 | 62 | BWT: 1120.5 MB (3.0363 bpc) 63 | SA samples: 694.655 MB (1.88235 bpc) 64 | ISA samples: 184.518 MB (0.5 bpc) 65 | Simple FM: 1999.68 MB (5.41866 bpc) 66 | 67 | ref_minus_lcs: 8.14742 MB (0.0220776 bpc) 68 | seq_minus_lcs: 8.15646 MB (0.0221021 bpc) 69 | bwt_lcs: 124.698 MB (0.337902 bpc) 70 | text_lcs: 103.87 MB (0.281463 bpc) 71 | SA samples: 45.9499 MB (0.124514 bpc) 72 | ISA samples: 23.0647 MB (0.0625 bpc) 73 | Relative FM: 313.886 MB (0.850559 bpc) 74 | 75 | The RLZ parsing of the LCP array consists of 54234873 phrases 76 | Relative LCP array built in 5586.65 seconds 77 | 78 | LCP array: 3857.32 MB (10.4524 bpc) 79 | 80 | Phrases: 206.89 MB (0.560623 bpc) 81 | Blocks: 56.6467 MB (0.153499 bpc) 82 | Samples: 65.6321 MB (0.177848 bpc) 83 | Tree: 74.3041 MB (0.201347 bpc) 84 | Relative LCP: 403.473 MB (1.09332 bpc) 85 | 86 | 87 | Memory usage: 101320 MB 88 | 89 | 90 | End 91 | Thu Apr 16 16:08:13 BST 2015 92 | 93 | 94 | ------------------------------------------------------------ 95 | Sender: LSF System 96 | Subject: Job 3888641: in cluster Done 97 | 98 | Job was submitted from host by user in cluster . 99 | Job was executed on host(s) <8*vr-4-1-05>, in queue , as user in cluster . 100 | was used as the home directory. 101 | was used as the working directory. 102 | Started at Thu Apr 16 09:41:59 2015 103 | Results reported at Thu Apr 16 16:08:13 2015 104 | 105 | Your job looked like: 106 | 107 | ------------------------------------------------------------ 108 | # LSBATCH: User input 109 | /nfs/users/nfs_j/js35/job_scripts/index_synth 0.0003 8 110 | ------------------------------------------------------------ 111 | 112 | Successfully completed. 113 | 114 | Resource usage summary: 115 | 116 | CPU time : 25063.25 sec. 117 | Max Memory : 101325 MB 118 | Average Memory : 26423.30 MB 119 | Total Requested Memory : 131072.00 MB 120 | Delta Memory : 29747.00 MB 121 | (Delta: the difference between total requested memory and actual max usage.) 122 | Max Swap : 105581 MB 123 | 124 | Max Processes : 4 125 | Max Threads : 12 126 | 127 | The output (if any) is above this job summary. 128 | 129 | 130 | 131 | PS: 132 | 133 | Read file for stderr output of this job. 134 | 135 | -------------------------------------------------------------------------------- /logs_old_rcst/index_synth_0.0030.log: -------------------------------------------------------------------------------- 1 | Start 2 | Thu Apr 16 09:42:00 BST 2015 3 | 4 | Indexing a synthetic human genome 5 | 6 | Reference: human 7 | Mutation rate: 0.0030 8 | Target: synth_0.0030 9 | Threads: 8 10 | 11 | Sequence mutator 12 | 13 | Source: human 14 | Size: 3095693981 15 | Target: synth_0.0030 16 | Mutation rate: 0.003 17 | 18 | Target size: 3095695940 19 | Substitutions: 7712195 20 | Insertions: 464263, total size 2323116 21 | Deletions: 465186, total size 2321157 22 | 23 | BWT construction 24 | Options: alphabet isa_sample_rate=64 lcp sa_sample_rate=17 25 | 26 | File: synth_0.0030 27 | Text size: 3095695940 28 | BWT built in 1283.34 seconds (2.30047 MB/s) 29 | Alphabet written to synth_0.0030.alpha 30 | BWT written to synth_0.0030.bwt 31 | Samples written to synth_0.0030.samples 32 | LCP array built in 836.026 seconds (3.53133 MB/s) 33 | LCP array written to synth_0.0030.lcp 34 | 35 | Relative FM-index and LCP array builder 36 | Using OpenMP with 8 threads 37 | 38 | Algorithm: invariant 39 | Input format: plain 40 | SA sample rate: 257 41 | ISA sample rate: 512 42 | 43 | Reference: human 44 | 45 | BWT: 1120.49 MB (3.03628 bpc) 46 | SA samples: 694.655 MB (1.88235 bpc) 47 | ISA samples: 184.518 MB (0.5 bpc) 48 | Simple FM: 1999.67 MB (5.41863 bpc) 49 | 50 | LCP array: 3862.42 MB (10.4663 bpc) 51 | 52 | 53 | Target: synth_0.0030 54 | Reference size: 3095693982 55 | Target size: 3095695941 56 | Built the merging bitvector in 3327.42 seconds 57 | Matched 2975880923 positions in 8107.36 seconds 58 | Found a common subsequence of length 2890004816 in 1004.44 seconds 59 | Built the bwt_lcs bitvectors and samples in 1751.62 seconds 60 | Index built in 14266.9 seconds 61 | 62 | BWT: 1120.59 MB (3.03654 bpc) 63 | SA samples: 694.655 MB (1.88235 bpc) 64 | ISA samples: 184.518 MB (0.5 bpc) 65 | Simple FM: 1999.76 MB (5.41889 bpc) 66 | 67 | ref_minus_lcs: 71.3826 MB (0.19343 bpc) 68 | seq_minus_lcs: 71.4762 MB (0.193684 bpc) 69 | bwt_lcs: 315.278 MB (0.854329 bpc) 70 | text_lcs: 201.417 MB (0.545792 bpc) 71 | SA samples: 45.95 MB (0.124514 bpc) 72 | ISA samples: 23.0647 MB (0.0625 bpc) 73 | Relative FM: 728.569 MB (1.97425 bpc) 74 | 75 | The RLZ parsing of the LCP array consists of 241958257 phrases 76 | Relative LCP array built in 12312 seconds 77 | 78 | LCP array: 3831.58 MB (10.3827 bpc) 79 | 80 | Phrases: 922.998 MB (2.50111 bpc) 81 | Blocks: 190.458 MB (0.516096 bpc) 82 | Samples: 296.441 MB (0.803286 bpc) 83 | Tree: 389.297 MB (1.0549 bpc) 84 | Relative LCP: 1799.19 MB (4.87539 bpc) 85 | 86 | 87 | Memory usage: 101973 MB 88 | 89 | 90 | End 91 | Thu Apr 16 17:47:10 BST 2015 92 | 93 | 94 | ------------------------------------------------------------ 95 | Sender: LSF System 96 | Subject: Job 3888643: in cluster Done 97 | 98 | Job was submitted from host by user in cluster . 99 | Job was executed on host(s) <8*vr-4-1-11>, in queue , as user in cluster . 100 | was used as the home directory. 101 | was used as the working directory. 102 | Started at Thu Apr 16 09:42:00 2015 103 | Results reported at Thu Apr 16 17:47:14 2015 104 | 105 | Your job looked like: 106 | 107 | ------------------------------------------------------------ 108 | # LSBATCH: User input 109 | /nfs/users/nfs_j/js35/job_scripts/index_synth 0.0030 8 110 | ------------------------------------------------------------ 111 | 112 | Successfully completed. 113 | 114 | Resource usage summary: 115 | 116 | CPU time : 30709.48 sec. 117 | Max Memory : 101978 MB 118 | Average Memory : 26725.39 MB 119 | Total Requested Memory : 131072.00 MB 120 | Delta Memory : 29094.00 MB 121 | (Delta: the difference between total requested memory and actual max usage.) 122 | Max Swap : 109675 MB 123 | 124 | Max Processes : 4 125 | Max Threads : 12 126 | 127 | The output (if any) is above this job summary. 128 | 129 | 130 | 131 | PS: 132 | 133 | Read file for stderr output of this job. 134 | 135 | -------------------------------------------------------------------------------- /logs_old_rcst/index_synth_0.0001.log: -------------------------------------------------------------------------------- 1 | Start 2 | Thu Apr 16 09:41:59 BST 2015 3 | 4 | Indexing a synthetic human genome 5 | 6 | Reference: human 7 | Mutation rate: 0.0001 8 | Target: synth_0.0001 9 | Threads: 8 10 | 11 | Sequence mutator 12 | 13 | Source: human 14 | Size: 3095693981 15 | Target: synth_0.0001 16 | Mutation rate: 0.0001 17 | 18 | Target size: 3095694076 19 | Substitutions: 257442 20 | Insertions: 15343, total size 76784 21 | Deletions: 15520, total size 76689 22 | 23 | BWT construction 24 | Options: alphabet isa_sample_rate=64 lcp sa_sample_rate=17 25 | 26 | File: synth_0.0001 27 | Text size: 3095694076 28 | BWT built in 1028.68 seconds (2.86998 MB/s) 29 | Alphabet written to synth_0.0001.alpha 30 | BWT written to synth_0.0001.bwt 31 | Samples written to synth_0.0001.samples 32 | LCP array built in 783.095 seconds (3.77002 MB/s) 33 | LCP array written to synth_0.0001.lcp 34 | 35 | Relative FM-index and LCP array builder 36 | Using OpenMP with 8 threads 37 | 38 | Algorithm: invariant 39 | Input format: plain 40 | SA sample rate: 257 41 | ISA sample rate: 512 42 | 43 | Reference: human 44 | 45 | BWT: 1120.49 MB (3.03628 bpc) 46 | SA samples: 694.655 MB (1.88235 bpc) 47 | ISA samples: 184.518 MB (0.5 bpc) 48 | Simple FM: 1999.67 MB (5.41863 bpc) 49 | 50 | LCP array: 3862.42 MB (10.4663 bpc) 51 | 52 | 53 | Target: synth_0.0001 54 | Reference size: 3095693982 55 | Target size: 3095694077 56 | Built the merging bitvector in 3585.66 seconds 57 | Matched 3091078254 positions in 8582.31 seconds 58 | Found a common subsequence of length 3087152693 in 974.471 seconds 59 | Built the bwt_lcs bitvectors and samples in 1894.12 seconds 60 | Index built in 15056.1 seconds 61 | 62 | BWT: 1120.5 MB (3.03628 bpc) 63 | SA samples: 694.655 MB (1.88235 bpc) 64 | ISA samples: 184.518 MB (0.5 bpc) 65 | Simple FM: 1999.67 MB (5.41864 bpc) 66 | 67 | ref_minus_lcs: 2.99465 MB (0.0081148 bpc) 68 | seq_minus_lcs: 2.99774 MB (0.00812318 bpc) 69 | bwt_lcs: 103.815 MB (0.281314 bpc) 70 | text_lcs: 95.5414 MB (0.258895 bpc) 71 | SA samples: 45.95 MB (0.124514 bpc) 72 | ISA samples: 23.0647 MB (0.0625 bpc) 73 | Relative FM: 274.364 MB (0.743462 bpc) 74 | 75 | The RLZ parsing of the LCP array consists of 21466456 phrases 76 | Relative LCP array built in 2829.17 seconds 77 | 78 | LCP array: 3860.64 MB (10.4614 bpc) 79 | 80 | Phrases: 81.888 MB (0.221897 bpc) 81 | Blocks: 26.0157 MB (0.0704965 bpc) 82 | Samples: 28.6602 MB (0.0776626 bpc) 83 | Tree: 29.9523 MB (0.0811638 bpc) 84 | Relative LCP: 166.516 MB (0.45122 bpc) 85 | 86 | 87 | Memory usage: 101267 MB 88 | 89 | 90 | End 91 | Thu Apr 16 15:16:28 BST 2015 92 | 93 | 94 | ------------------------------------------------------------ 95 | Sender: LSF System 96 | Subject: Job 3888640: in cluster Done 97 | 98 | Job was submitted from host by user in cluster . 99 | Job was executed on host(s) <8*vr-4-1-08>, in queue , as user in cluster . 100 | was used as the home directory. 101 | was used as the working directory. 102 | Started at Thu Apr 16 09:41:59 2015 103 | Results reported at Thu Apr 16 15:16:35 2015 104 | 105 | Your job looked like: 106 | 107 | ------------------------------------------------------------ 108 | # LSBATCH: User input 109 | /nfs/users/nfs_j/js35/job_scripts/index_synth 0.0001 8 110 | ------------------------------------------------------------ 111 | 112 | Successfully completed. 113 | 114 | Resource usage summary: 115 | 116 | CPU time : 22166.93 sec. 117 | Max Memory : 101272 MB 118 | Average Memory : 26311.83 MB 119 | Total Requested Memory : 131072.00 MB 120 | Delta Memory : 29800.00 MB 121 | (Delta: the difference between total requested memory and actual max usage.) 122 | Max Swap : 105578 MB 123 | 124 | Max Processes : 4 125 | Max Threads : 12 126 | 127 | The output (if any) is above this job summary. 128 | 129 | 130 | 131 | PS: 132 | 133 | Read file for stderr output of this job. 134 | 135 | -------------------------------------------------------------------------------- /logs_old_rcst/index_synth_0.0010.log: -------------------------------------------------------------------------------- 1 | Start 2 | Thu Apr 16 09:42:00 BST 2015 3 | 4 | Indexing a synthetic human genome 5 | 6 | Reference: human 7 | Mutation rate: 0.0010 8 | Target: synth_0.0010 9 | Threads: 8 10 | 11 | Sequence mutator 12 | 13 | Source: human 14 | Size: 3095693981 15 | Target: synth_0.0010 16 | Mutation rate: 0.001 17 | 18 | Target size: 3095693265 19 | Substitutions: 2573110 20 | Insertions: 154611, total size 774361 21 | Deletions: 154875, total size 775077 22 | 23 | BWT construction 24 | Options: alphabet isa_sample_rate=64 lcp sa_sample_rate=17 25 | 26 | File: synth_0.0010 27 | Text size: 3095693265 28 | BWT built in 1165.34 seconds (2.5334 MB/s) 29 | Alphabet written to synth_0.0010.alpha 30 | BWT written to synth_0.0010.bwt 31 | Samples written to synth_0.0010.samples 32 | LCP array built in 815.732 seconds (3.61918 MB/s) 33 | LCP array written to synth_0.0010.lcp 34 | 35 | Relative FM-index and LCP array builder 36 | Using OpenMP with 8 threads 37 | 38 | Algorithm: invariant 39 | Input format: plain 40 | SA sample rate: 257 41 | ISA sample rate: 512 42 | 43 | Reference: human 44 | 45 | BWT: 1120.49 MB (3.03628 bpc) 46 | SA samples: 694.655 MB (1.88235 bpc) 47 | ISA samples: 184.518 MB (0.5 bpc) 48 | Simple FM: 1999.67 MB (5.41863 bpc) 49 | 50 | LCP array: 3862.42 MB (10.4663 bpc) 51 | 52 | 53 | Target: synth_0.0010 54 | Reference size: 3095693982 55 | Target size: 3095693266 56 | Built the merging bitvector in 3793.28 seconds 57 | Matched 3049774652 positions in 8669.23 seconds 58 | Found a common subsequence of length 3017972073 in 799.324 seconds 59 | Built the bwt_lcs bitvectors and samples in 1849.27 seconds 60 | Index built in 15155.1 seconds 61 | 62 | BWT: 1120.53 MB (3.03637 bpc) 63 | SA samples: 694.655 MB (1.88235 bpc) 64 | ISA samples: 184.518 MB (0.5 bpc) 65 | Simple FM: 1999.7 MB (5.41872 bpc) 66 | 67 | ref_minus_lcs: 27.9173 MB (0.0756493 bpc) 68 | seq_minus_lcs: 27.9476 MB (0.0757315 bpc) 69 | bwt_lcs: 190.639 MB (0.516586 bpc) 70 | text_lcs: 131.43 MB (0.356146 bpc) 71 | SA samples: 45.9499 MB (0.124514 bpc) 72 | ISA samples: 23.0647 MB (0.0625 bpc) 73 | Relative FM: 446.949 MB (1.21113 bpc) 74 | 75 | The RLZ parsing of the LCP array consists of 132150310 phrases 76 | Relative LCP array built in 10605 seconds 77 | 78 | LCP array: 3847.63 MB (10.4262 bpc) 79 | 80 | Phrases: 504.113 MB (1.36603 bpc) 81 | Blocks: 117.937 MB (0.319582 bpc) 82 | Samples: 144.591 MB (0.391808 bpc) 83 | Tree: 178.523 MB (0.483757 bpc) 84 | Relative LCP: 945.165 MB (2.56118 bpc) 85 | 86 | 87 | Memory usage: 101494 MB 88 | 89 | 90 | End 91 | Thu Apr 16 17:31:48 BST 2015 92 | 93 | 94 | ------------------------------------------------------------ 95 | Sender: LSF System 96 | Subject: Job 3888642: in cluster Done 97 | 98 | Job was submitted from host by user in cluster . 99 | Job was executed on host(s) <8*vr-4-1-13>, in queue , as user in cluster . 100 | was used as the home directory. 101 | was used as the working directory. 102 | Started at Thu Apr 16 09:41:59 2015 103 | Results reported at Thu Apr 16 17:31:48 2015 104 | 105 | Your job looked like: 106 | 107 | ------------------------------------------------------------ 108 | # LSBATCH: User input 109 | /nfs/users/nfs_j/js35/job_scripts/index_synth 0.0010 8 110 | ------------------------------------------------------------ 111 | 112 | Successfully completed. 113 | 114 | Resource usage summary: 115 | 116 | CPU time : 30209.01 sec. 117 | Max Memory : 101500 MB 118 | Average Memory : 25498.52 MB 119 | Total Requested Memory : 131072.00 MB 120 | Delta Memory : 29572.00 MB 121 | (Delta: the difference between total requested memory and actual max usage.) 122 | Max Swap : 105578 MB 123 | 124 | Max Processes : 4 125 | Max Threads : 12 126 | 127 | The output (if any) is above this job summary. 128 | 129 | 130 | 131 | PS: 132 | 133 | Read file for stderr output of this job. 134 | 135 | -------------------------------------------------------------------------------- /logs_old_rcst/index_synth_0.0100.log: -------------------------------------------------------------------------------- 1 | Start 2 | Thu Apr 16 09:42:00 BST 2015 3 | 4 | Indexing a synthetic human genome 5 | 6 | Reference: human 7 | Mutation rate: 0.0100 8 | Target: synth_0.0100 9 | Threads: 8 10 | 11 | Sequence mutator 12 | 13 | Source: human 14 | Size: 3095693981 15 | Target: synth_0.0100 16 | Mutation rate: 0.01 17 | 18 | Target size: 3095710626 19 | Substitutions: 25679045 20 | Insertions: 1545753, total size 7724642 21 | Deletions: 1543585, total size 7707997 22 | 23 | BWT construction 24 | Options: alphabet isa_sample_rate=64 lcp sa_sample_rate=17 25 | 26 | File: synth_0.0100 27 | Text size: 3095710626 28 | BWT built in 1070.88 seconds (2.75688 MB/s) 29 | Alphabet written to synth_0.0100.alpha 30 | BWT written to synth_0.0100.bwt 31 | Samples written to synth_0.0100.samples 32 | LCP array built in 687.833 seconds (4.29218 MB/s) 33 | LCP array written to synth_0.0100.lcp 34 | 35 | Relative FM-index and LCP array builder 36 | Using OpenMP with 8 threads 37 | 38 | Algorithm: invariant 39 | Input format: plain 40 | SA sample rate: 257 41 | ISA sample rate: 512 42 | 43 | Reference: human 44 | 45 | BWT: 1120.49 MB (3.03628 bpc) 46 | SA samples: 694.655 MB (1.88235 bpc) 47 | ISA samples: 184.518 MB (0.5 bpc) 48 | Simple FM: 1999.67 MB (5.41863 bpc) 49 | 50 | LCP array: 3862.42 MB (10.4663 bpc) 51 | 52 | 53 | Target: synth_0.0100 54 | Reference size: 3095693982 55 | Target size: 3095710627 56 | Built the merging bitvector in 3319.36 seconds 57 | Matched 2721514480 positions in 7630.38 seconds 58 | Found a common subsequence of length 2482662448 in 797.965 seconds 59 | Built the bwt_lcs bitvectors and samples in 1760.83 seconds 60 | Index built in 13676 seconds 61 | 62 | BWT: 1120.83 MB (3.03718 bpc) 63 | SA samples: 694.659 MB (1.88235 bpc) 64 | ISA samples: 184.519 MB (0.5 bpc) 65 | Simple FM: 2000.01 MB (5.41953 bpc) 66 | 67 | ref_minus_lcs: 216.809 MB (0.587499 bpc) 68 | seq_minus_lcs: 217.128 MB (0.588363 bpc) 69 | bwt_lcs: 568.07 MB (1.53933 bpc) 70 | text_lcs: 380.365 MB (1.0307 bpc) 71 | SA samples: 45.9502 MB (0.124514 bpc) 72 | ISA samples: 23.0649 MB (0.0625 bpc) 73 | Relative FM: 1451.39 MB (3.9329 bpc) 74 | 75 | The RLZ parsing of the LCP array consists of 281395718 phrases 76 | Relative LCP array built in 17285.1 seconds 77 | 78 | LCP array: 3820.53 MB (10.3527 bpc) 79 | 80 | Phrases: 1073.44 MB (2.90875 bpc) 81 | Blocks: 220.981 MB (0.598804 bpc) 82 | Samples: 335.674 MB (0.909592 bpc) 83 | Tree: 447.287 MB (1.21204 bpc) 84 | Relative LCP: 2077.38 MB (5.62919 bpc) 85 | 86 | 87 | Memory usage: 103410 MB 88 | 89 | 90 | End 91 | Thu Apr 16 18:53:44 BST 2015 92 | 93 | 94 | ------------------------------------------------------------ 95 | Sender: LSF System 96 | Subject: Job 3888644: in cluster Done 97 | 98 | Job was submitted from host by user in cluster . 99 | Job was executed on host(s) <8*vr-4-1-02>, in queue , as user in cluster . 100 | was used as the home directory. 101 | was used as the working directory. 102 | Started at Thu Apr 16 09:42:00 2015 103 | Results reported at Thu Apr 16 18:53:53 2015 104 | 105 | Your job looked like: 106 | 107 | ------------------------------------------------------------ 108 | # LSBATCH: User input 109 | /nfs/users/nfs_j/js35/job_scripts/index_synth 0.0100 8 110 | ------------------------------------------------------------ 111 | 112 | Successfully completed. 113 | 114 | Resource usage summary: 115 | 116 | CPU time : 34782.63 sec. 117 | Max Memory : 103417 MB 118 | Average Memory : 27136.62 MB 119 | Total Requested Memory : 131072.00 MB 120 | Delta Memory : 27655.00 MB 121 | (Delta: the difference between total requested memory and actual max usage.) 122 | Max Swap : 109678 MB 123 | 124 | Max Processes : 4 125 | Max Threads : 12 126 | 127 | The output (if any) is above this job summary. 128 | 129 | 130 | 131 | PS: 132 | 133 | Read file for stderr output of this job. 134 | 135 | -------------------------------------------------------------------------------- /logs_old_rcst/index_synth_0.0300.log: -------------------------------------------------------------------------------- 1 | Start 2 | Thu Apr 16 09:42:00 BST 2015 3 | 4 | Indexing a synthetic human genome 5 | 6 | Reference: human 7 | Mutation rate: 0.0300 8 | Target: synth_0.0300 9 | Threads: 8 10 | 11 | Sequence mutator 12 | 13 | Source: human 14 | Size: 3095693981 15 | Target: synth_0.0300 16 | Mutation rate: 0.03 17 | 18 | Target size: 3095676408 19 | Substitutions: 76712447 20 | Insertions: 4615431, total size 23056540 21 | Deletions: 4615538, total size 23074113 22 | 23 | BWT construction 24 | Options: alphabet isa_sample_rate=64 lcp sa_sample_rate=17 25 | 26 | File: synth_0.0300 27 | Text size: 3095676408 28 | BWT built in 1133.86 seconds (2.60373 MB/s) 29 | Alphabet written to synth_0.0300.alpha 30 | BWT written to synth_0.0300.bwt 31 | Samples written to synth_0.0300.samples 32 | LCP array built in 803.386 seconds (3.67478 MB/s) 33 | LCP array written to synth_0.0300.lcp 34 | 35 | Relative FM-index and LCP array builder 36 | Using OpenMP with 8 threads 37 | 38 | Algorithm: invariant 39 | Input format: plain 40 | SA sample rate: 257 41 | ISA sample rate: 512 42 | 43 | Reference: human 44 | 45 | BWT: 1120.49 MB (3.03628 bpc) 46 | SA samples: 694.655 MB (1.88235 bpc) 47 | ISA samples: 184.518 MB (0.5 bpc) 48 | Simple FM: 1999.67 MB (5.41863 bpc) 49 | 50 | LCP array: 3862.42 MB (10.4663 bpc) 51 | 52 | 53 | Target: synth_0.0300 54 | Reference size: 3095693982 55 | Target size: 3095676409 56 | Built the merging bitvector in 3930.01 seconds 57 | Matched 2138221141 positions in 9807.02 seconds 58 | Found a common subsequence of length 1668016180 in 972.001 seconds 59 | Built the bwt_lcs bitvectors and samples in 1837.77 seconds 60 | Index built in 16888.5 seconds 61 | 62 | BWT: 1121.44 MB (3.03885 bpc) 63 | SA samples: 694.651 MB (1.88235 bpc) 64 | ISA samples: 184.517 MB (0.5 bpc) 65 | Simple FM: 2000.61 MB (5.4212 bpc) 66 | 67 | ref_minus_lcs: 503.44 MB (1.36421 bpc) 68 | seq_minus_lcs: 504.361 MB (1.36671 bpc) 69 | bwt_lcs: 754.906 MB (2.04563 bpc) 70 | text_lcs: 566.424 MB (1.53489 bpc) 71 | SA samples: 45.9497 MB (0.124514 bpc) 72 | ISA samples: 23.0646 MB (0.0625 bpc) 73 | Relative FM: 2398.15 MB (6.49845 bpc) 74 | 75 | The RLZ parsing of the LCP array consists of 269943182 phrases 76 | Relative LCP array built in 16949.5 seconds 77 | 78 | LCP array: 3819.89 MB (10.3511 bpc) 79 | 80 | Phrases: 1029.75 MB (2.7904 bpc) 81 | Blocks: 215.192 MB (0.583122 bpc) 82 | Samples: 329.561 MB (0.893038 bpc) 83 | Tree: 439.413 MB (1.19071 bpc) 84 | Relative LCP: 2013.92 MB (5.45728 bpc) 85 | 86 | 87 | Memory usage: 106092 MB 88 | 89 | 90 | End 91 | Thu Apr 16 19:45:26 BST 2015 92 | 93 | 94 | ------------------------------------------------------------ 95 | Sender: LSF System 96 | Subject: Job 3888645: in cluster Done 97 | 98 | Job was submitted from host by user in cluster . 99 | Job was executed on host(s) <8*vr-4-1-04>, in queue , as user in cluster . 100 | was used as the home directory. 101 | was used as the working directory. 102 | Started at Thu Apr 16 09:42:00 2015 103 | Results reported at Thu Apr 16 19:45:26 2015 104 | 105 | Your job looked like: 106 | 107 | ------------------------------------------------------------ 108 | # LSBATCH: User input 109 | /nfs/users/nfs_j/js35/job_scripts/index_synth 0.0300 8 110 | ------------------------------------------------------------ 111 | 112 | Successfully completed. 113 | 114 | Resource usage summary: 115 | 116 | CPU time : 38398.93 sec. 117 | Max Memory : 106099 MB 118 | Average Memory : 27990.40 MB 119 | Total Requested Memory : 131072.00 MB 120 | Delta Memory : 24973.00 MB 121 | (Delta: the difference between total requested memory and actual max usage.) 122 | Max Swap : 109678 MB 123 | 124 | Max Processes : 4 125 | Max Threads : 12 126 | 127 | The output (if any) is above this job summary. 128 | 129 | 130 | 131 | PS: 132 | 133 | Read file for stderr output of this job. 134 | 135 | -------------------------------------------------------------------------------- /logs_old_rcst/index_synth_0.1000.log: -------------------------------------------------------------------------------- 1 | Start 2 | Thu Apr 16 09:42:01 BST 2015 3 | 4 | Indexing a synthetic human genome 5 | 6 | Reference: human 7 | Mutation rate: 0.1000 8 | Target: synth_0.1000 9 | Threads: 8 10 | 11 | Sequence mutator 12 | 13 | Source: human 14 | Size: 3095693981 15 | Target: synth_0.1000 16 | Mutation rate: 0.1 17 | 18 | Target size: 3095641980 19 | Substitutions: 252221891 20 | Insertions: 15173734, total size 75845679 21 | Deletions: 15177311, total size 75897680 22 | 23 | BWT construction 24 | Options: alphabet isa_sample_rate=64 lcp sa_sample_rate=17 25 | 26 | File: synth_0.1000 27 | Text size: 3095641980 28 | BWT built in 1234.2 seconds (2.39202 MB/s) 29 | Alphabet written to synth_0.1000.alpha 30 | BWT written to synth_0.1000.bwt 31 | Samples written to synth_0.1000.samples 32 | LCP array built in 663.215 seconds (4.4514 MB/s) 33 | LCP array written to synth_0.1000.lcp 34 | 35 | Relative FM-index and LCP array builder 36 | Using OpenMP with 8 threads 37 | 38 | Algorithm: invariant 39 | Input format: plain 40 | SA sample rate: 257 41 | ISA sample rate: 512 42 | 43 | Reference: human 44 | 45 | BWT: 1120.49 MB (3.03628 bpc) 46 | SA samples: 694.655 MB (1.88235 bpc) 47 | ISA samples: 184.518 MB (0.5 bpc) 48 | Simple FM: 1999.67 MB (5.41863 bpc) 49 | 50 | LCP array: 3862.42 MB (10.4663 bpc) 51 | 52 | 53 | Target: synth_0.1000 54 | Reference size: 3095693982 55 | Target size: 3095641981 56 | Built the merging bitvector in 3641.75 seconds 57 | Matched 1172867705 positions in 10228.1 seconds 58 | Found a common subsequence of length 471424494 in 738.008 seconds 59 | Built the bwt_lcs bitvectors and samples in 1917.51 seconds 60 | Index built in 17048.3 seconds 61 | 62 | BWT: 1123.65 MB (3.04487 bpc) 63 | SA samples: 694.643 MB (1.88235 bpc) 64 | ISA samples: 184.515 MB (0.5 bpc) 65 | Simple FM: 2002.8 MB (5.42722 bpc) 66 | 67 | ref_minus_lcs: 927.859 MB (2.51432 bpc) 68 | seq_minus_lcs: 930.937 MB (2.52266 bpc) 69 | bwt_lcs: 463.036 MB (1.25474 bpc) 70 | text_lcs: 346.566 MB (0.939129 bpc) 71 | SA samples: 45.9492 MB (0.124514 bpc) 72 | ISA samples: 23.0643 MB (0.0625 bpc) 73 | Relative FM: 2737.41 MB (7.41787 bpc) 74 | 75 | The RLZ parsing of the LCP array consists of 248201686 phrases 76 | Relative LCP array built in 15814 seconds 77 | 78 | LCP array: 3819.82 MB (10.351 bpc) 79 | 80 | Phrases: 946.814 MB (2.56569 bpc) 81 | Blocks: 194.355 MB (0.526665 bpc) 82 | Samples: 319.511 MB (0.865816 bpc) 83 | Tree: 426.014 MB (1.15442 bpc) 84 | Relative LCP: 1886.69 MB (5.11259 bpc) 85 | 86 | 87 | Memory usage: 108216 MB 88 | 89 | 90 | End 91 | Thu Apr 16 19:28:58 BST 2015 92 | 93 | 94 | ------------------------------------------------------------ 95 | Sender: LSF System 96 | Subject: Job 3888646: in cluster Done 97 | 98 | Job was submitted from host by user in cluster . 99 | Job was executed on host(s) <8*vr-4-1-10>, in queue , as user in cluster . 100 | was used as the home directory. 101 | was used as the working directory. 102 | Started at Thu Apr 16 09:42:00 2015 103 | Results reported at Thu Apr 16 19:28:58 2015 104 | 105 | Your job looked like: 106 | 107 | ------------------------------------------------------------ 108 | # LSBATCH: User input 109 | /nfs/users/nfs_j/js35/job_scripts/index_synth 0.1000 8 110 | ------------------------------------------------------------ 111 | 112 | Successfully completed. 113 | 114 | Resource usage summary: 115 | 116 | CPU time : 37388.95 sec. 117 | Max Memory : 108221 MB 118 | Average Memory : 28179.34 MB 119 | Total Requested Memory : 131072.00 MB 120 | Delta Memory : 22851.00 MB 121 | (Delta: the difference between total requested memory and actual max usage.) 122 | Max Swap : 109677 MB 123 | 124 | Max Processes : 4 125 | Max Threads : 12 126 | 127 | The output (if any) is above this job summary. 128 | 129 | 130 | 131 | PS: 132 | 133 | Read file for stderr output of this job. 134 | 135 | -------------------------------------------------------------------------------- /logs_old_rcst/locate_test_7_64.log: -------------------------------------------------------------------------------- 1 | Start 2 | Wed Apr 15 08:01:44 BST 2015 3 | 4 | Testing the locate functionality of the relative FM-index 5 | 6 | Reference: reference_7_64 7 | Target: target_7_64 8 | Threads: 32 9 | 10 | BWT construction 11 | Options: alphabet isa_sample_rate=64 sa_sample_rate=7 12 | 13 | File: reference_7_64 14 | Text size: 3036320415 15 | BWT built in 1062.84 seconds (2.72445 MB/s) 16 | Alphabet written to reference_7_64.alpha 17 | BWT written to reference_7_64.bwt 18 | Samples written to reference_7_64.samples 19 | 20 | File: target_7_64 21 | Text size: 3036191207 22 | BWT built in 941.496 seconds (3.07547 MB/s) 23 | Alphabet written to target_7_64.alpha 24 | BWT written to target_7_64.bwt 25 | Samples written to target_7_64.samples 26 | 27 | Relative FM-index builder 28 | Using OpenMP with 32 threads 29 | 30 | Algorithm: invariant 31 | Input format: plain 32 | SA sample rate: 257 33 | ISA sample rate: 512 34 | 35 | Reference: reference_7_64 36 | 37 | BWT: 1090.28 MB (3.01219 bpc) 38 | SA samples: 1654.66 MB (4.57143 bpc) 39 | ISA samples: 180.979 MB (0.5 bpc) 40 | Simple FM: 2925.93 MB (8.08362 bpc) 41 | 42 | 43 | Target: target_7_64 44 | Reference size: 3036320416 45 | Target size: 3036191208 46 | Built the merging bitvector in 3230.09 seconds 47 | Matched 3001221512 positions in 7260.13 seconds 48 | Found a common subsequence of length 2980010799 in 804.067 seconds 49 | Built the bwt_lcs bitvectors and samples in 1650.94 seconds 50 | Index built in 12982.1 seconds 51 | 52 | BWT: 1090.24 MB (3.01219 bpc) 53 | SA samples: 1654.59 MB (4.57143 bpc) 54 | ISA samples: 180.971 MB (0.5 bpc) 55 | Simple FM: 2925.8 MB (8.08362 bpc) 56 | 57 | ref_minus_lcs: 19.2911 MB (0.053299 bpc) 58 | seq_minus_lcs: 19.2473 MB (0.0531778 bpc) 59 | bwt_lcs: 163.138 MB (0.450729 bpc) 60 | text_lcs: 125.532 MB (0.34683 bpc) 61 | SA samples: 45.0667 MB (0.124514 bpc) 62 | ISA samples: 22.6214 MB (0.0625 bpc) 63 | Relative FM: 394.897 MB (1.09105 bpc) 64 | 65 | 66 | Memory usage: 86183.8 MB 67 | 68 | Query test 69 | 70 | Reference: reference_7_64 71 | Sequence: target_7_64 72 | Patterns: patterns 73 | 74 | Read 2000000 patterns of total length 64000000 75 | 76 | 77 | Executing locate() queries. 78 | 79 | SimpleFM: 2925.8 MB (8.08362 bpc) 80 | SimpleFM: Found 2000000 patterns with 254997642 occ in 221.885 seconds (1.14923e+06 occ/s) 81 | Hash of located positions: 11653246869206397622 82 | 83 | RFM: 394.897 MB (1.09105 bpc) 84 | RFM: Found 2000000 patterns with 254997642 occ in 1911.33 seconds (133414 occ/s) 85 | Hash of located positions: 11653246869206397622 86 | 87 | 88 | Memory usage: 3675.37 MB 89 | 90 | 91 | End 92 | Wed Apr 15 12:54:22 BST 2015 93 | 94 | 95 | ------------------------------------------------------------ 96 | Sender: LSF System 97 | Subject: Job 3883002: in cluster Done 98 | 99 | Job was submitted from host by user in cluster . 100 | Job was executed on host(s) <32*vr-4-1-13>, in queue , as user in cluster . 101 | was used as the home directory. 102 | was used as the working directory. 103 | Started at Wed Apr 15 08:01:43 2015 104 | Results reported at Wed Apr 15 12:54:22 2015 105 | 106 | Your job looked like: 107 | 108 | ------------------------------------------------------------ 109 | # LSBATCH: User input 110 | /nfs/users/nfs_j/js35/job_scripts/locate_test 7 64 32 111 | ------------------------------------------------------------ 112 | 113 | Successfully completed. 114 | 115 | Resource usage summary: 116 | 117 | CPU time : 19316.05 sec. 118 | Max Memory : 86191 MB 119 | Average Memory : 14053.21 MB 120 | Total Requested Memory : 131072.00 MB 121 | Delta Memory : 44881.00 MB 122 | (Delta: the difference between total requested memory and actual max usage.) 123 | Max Swap : 90345 MB 124 | 125 | Max Processes : 4 126 | Max Threads : 36 127 | 128 | The output (if any) is above this job summary. 129 | 130 | 131 | 132 | PS: 133 | 134 | Read file for stderr output of this job. 135 | 136 | -------------------------------------------------------------------------------- /logs_old_rcst/locate_test_17_64.log: -------------------------------------------------------------------------------- 1 | Start 2 | Mon Apr 13 11:21:37 BST 2015 3 | 4 | Testing the locate functionality of the relative FM-index 5 | 6 | Reference: reference_17_64 7 | Target: target_17_64 8 | Threads: 32 9 | 10 | BWT construction 11 | Options: alphabet isa_sample_rate=64 sa_sample_rate=17 12 | 13 | File: reference_17_64 14 | Text size: 3036320415 15 | BWT built in 1035.68 seconds (2.79591 MB/s) 16 | Alphabet written to reference_17_64.alpha 17 | BWT written to reference_17_64.bwt 18 | Samples written to reference_17_64.samples 19 | 20 | File: target_17_64 21 | Text size: 3036191207 22 | BWT built in 861.295 seconds (3.36184 MB/s) 23 | Alphabet written to target_17_64.alpha 24 | BWT written to target_17_64.bwt 25 | Samples written to target_17_64.samples 26 | 27 | Relative FM-index builder 28 | Using OpenMP with 32 threads 29 | 30 | Algorithm: invariant 31 | Input format: plain 32 | SA sample rate: 257 33 | ISA sample rate: 512 34 | 35 | Reference: reference_17_64 36 | 37 | BWT: 1090.28 MB (3.01219 bpc) 38 | SA samples: 681.332 MB (1.88235 bpc) 39 | ISA samples: 180.979 MB (0.5 bpc) 40 | Simple FM: 1952.6 MB (5.39454 bpc) 41 | 42 | 43 | Target: target_17_64 44 | Reference size: 3036320416 45 | Target size: 3036191208 46 | Built the merging bitvector in 3566.25 seconds 47 | Matched 3001221512 positions in 8610.87 seconds 48 | Found a common subsequence of length 2980010799 in 993.086 seconds 49 | Built the bwt_lcs bitvectors and samples in 1471.61 seconds 50 | Index built in 14678.7 seconds 51 | 52 | BWT: 1090.24 MB (3.01219 bpc) 53 | SA samples: 681.303 MB (1.88235 bpc) 54 | ISA samples: 180.971 MB (0.5 bpc) 55 | Simple FM: 1952.51 MB (5.39455 bpc) 56 | 57 | ref_minus_lcs: 19.2911 MB (0.053299 bpc) 58 | seq_minus_lcs: 19.2473 MB (0.0531778 bpc) 59 | bwt_lcs: 163.138 MB (0.450729 bpc) 60 | text_lcs: 125.532 MB (0.34683 bpc) 61 | SA samples: 45.0667 MB (0.124514 bpc) 62 | ISA samples: 22.6214 MB (0.0625 bpc) 63 | Relative FM: 394.897 MB (1.09105 bpc) 64 | 65 | 66 | Memory usage: 84237.2 MB 67 | 68 | Query test 69 | 70 | Reference: reference_17_64 71 | Sequence: target_17_64 72 | Patterns: patterns 73 | 74 | Read 2000000 patterns of total length 64000000 75 | 76 | 77 | Executing locate() queries. 78 | 79 | SimpleFM: 1952.51 MB (5.39455 bpc) 80 | SimpleFM: Found 2000000 patterns with 254997642 occ in 741.087 seconds (344086 occ/s) 81 | Hash of located positions: 11653246869206397622 82 | 83 | RFM: 394.897 MB (1.09105 bpc) 84 | RFM: Found 2000000 patterns with 254997642 occ in 2452.12 seconds (103991 occ/s) 85 | Hash of located positions: 11653246869206397622 86 | 87 | 88 | Memory usage: 2702.04 MB 89 | 90 | 91 | End 92 | Mon Apr 13 17:01:25 BST 2015 93 | 94 | 95 | ------------------------------------------------------------ 96 | Sender: LSF System 97 | Subject: Job 3870156: in cluster Done 98 | 99 | Job was submitted from host by user in cluster . 100 | Job was executed on host(s) <32*vr-4-1-08>, in queue , as user in cluster . 101 | was used as the home directory. 102 | was used as the working directory. 103 | Started at Mon Apr 13 11:21:37 2015 104 | Results reported at Mon Apr 13 17:01:25 2015 105 | 106 | Your job looked like: 107 | 108 | ------------------------------------------------------------ 109 | # LSBATCH: User input 110 | /nfs/users/nfs_j/js35/job_scripts/locate_test 17 64 32 111 | ------------------------------------------------------------ 112 | 113 | Successfully completed. 114 | 115 | Resource usage summary: 116 | 117 | CPU time : 21815.71 sec. 118 | Max Memory : 84244 MB 119 | Average Memory : 12147.35 MB 120 | Total Requested Memory : 131072.00 MB 121 | Delta Memory : 46828.00 MB 122 | (Delta: the difference between total requested memory and actual max usage.) 123 | Max Swap : 88399 MB 124 | 125 | Max Processes : 4 126 | Max Threads : 36 127 | 128 | The output (if any) is above this job summary. 129 | 130 | 131 | 132 | PS: 133 | 134 | Read file for stderr output of this job. 135 | 136 | -------------------------------------------------------------------------------- /logs_old_rcst/locate_test_31_64.log: -------------------------------------------------------------------------------- 1 | Start 2 | Mon Apr 13 11:23:02 BST 2015 3 | 4 | Testing the locate functionality of the relative FM-index 5 | 6 | Reference: reference_31_64 7 | Target: target_31_64 8 | Threads: 32 9 | 10 | BWT construction 11 | Options: alphabet isa_sample_rate=64 sa_sample_rate=31 12 | 13 | File: reference_31_64 14 | Text size: 3036320415 15 | BWT built in 1074.77 seconds (2.69422 MB/s) 16 | Alphabet written to reference_31_64.alpha 17 | BWT written to reference_31_64.bwt 18 | Samples written to reference_31_64.samples 19 | 20 | File: target_31_64 21 | Text size: 3036191207 22 | BWT built in 867.572 seconds (3.33752 MB/s) 23 | Alphabet written to target_31_64.alpha 24 | BWT written to target_31_64.bwt 25 | Samples written to target_31_64.samples 26 | 27 | Relative FM-index builder 28 | Using OpenMP with 32 threads 29 | 30 | Algorithm: invariant 31 | Input format: plain 32 | SA sample rate: 257 33 | ISA sample rate: 512 34 | 35 | Reference: reference_31_64 36 | 37 | BWT: 1090.28 MB (3.01219 bpc) 38 | SA samples: 373.634 MB (1.03226 bpc) 39 | ISA samples: 180.979 MB (0.5 bpc) 40 | Simple FM: 1644.9 MB (4.54445 bpc) 41 | 42 | 43 | Target: target_31_64 44 | Reference size: 3036320416 45 | Target size: 3036191208 46 | Built the merging bitvector in 3338 seconds 47 | Matched 3001221512 positions in 10867.9 seconds 48 | Found a common subsequence of length 2980010799 in 782.211 seconds 49 | Built the bwt_lcs bitvectors and samples in 1839.64 seconds 50 | Index built in 16865.1 seconds 51 | 52 | BWT: 1090.24 MB (3.01219 bpc) 53 | SA samples: 373.618 MB (1.03226 bpc) 54 | ISA samples: 180.971 MB (0.5 bpc) 55 | Simple FM: 1644.83 MB (4.54445 bpc) 56 | 57 | ref_minus_lcs: 19.2911 MB (0.053299 bpc) 58 | seq_minus_lcs: 19.2473 MB (0.0531778 bpc) 59 | bwt_lcs: 163.138 MB (0.450729 bpc) 60 | text_lcs: 125.532 MB (0.34683 bpc) 61 | SA samples: 45.0667 MB (0.124514 bpc) 62 | ISA samples: 22.6214 MB (0.0625 bpc) 63 | Relative FM: 394.897 MB (1.09105 bpc) 64 | 65 | 66 | Memory usage: 83621.8 MB 67 | 68 | Query test 69 | 70 | Reference: reference_31_64 71 | Sequence: target_31_64 72 | Patterns: patterns 73 | 74 | Read 2000000 patterns of total length 64000000 75 | 76 | 77 | Executing locate() queries. 78 | 79 | SimpleFM: 1644.83 MB (4.54445 bpc) 80 | SimpleFM: Found 2000000 patterns with 254997642 occ in 1800.21 seconds (141649 occ/s) 81 | Hash of located positions: 11653246869206397622 82 | 83 | RFM: 394.897 MB (1.09105 bpc) 84 | RFM: Found 2000000 patterns with 254997642 occ in 3662.53 seconds (69623.4 occ/s) 85 | Hash of located positions: 11653246869206397622 86 | 87 | 88 | Memory usage: 2394.34 MB 89 | 90 | 91 | End 92 | Mon Apr 13 18:17:47 BST 2015 93 | 94 | 95 | ------------------------------------------------------------ 96 | Sender: LSF System 97 | Subject: Job 3870160: in cluster Done 98 | 99 | Job was submitted from host by user in cluster . 100 | Job was executed on host(s) <32*vr-4-1-04>, in queue , as user in cluster . 101 | was used as the home directory. 102 | was used as the working directory. 103 | Started at Mon Apr 13 11:23:01 2015 104 | Results reported at Mon Apr 13 18:17:47 2015 105 | 106 | Your job looked like: 107 | 108 | ------------------------------------------------------------ 109 | # LSBATCH: User input 110 | /nfs/users/nfs_j/js35/job_scripts/locate_test 31 64 32 111 | ------------------------------------------------------------ 112 | 113 | Successfully completed. 114 | 115 | Resource usage summary: 116 | 117 | CPU time : 26517.55 sec. 118 | Max Memory : 83629 MB 119 | Average Memory : 9996.38 MB 120 | Total Requested Memory : 131072.00 MB 121 | Delta Memory : 47443.00 MB 122 | (Delta: the difference between total requested memory and actual max usage.) 123 | Max Swap : 87783 MB 124 | 125 | Max Processes : 4 126 | Max Threads : 36 127 | 128 | The output (if any) is above this job summary. 129 | 130 | 131 | 132 | PS: 133 | 134 | Read file for stderr output of this job. 135 | 136 | -------------------------------------------------------------------------------- /logs_old_rcst/locate_test_61_64.log: -------------------------------------------------------------------------------- 1 | Start 2 | Mon Apr 13 11:23:07 BST 2015 3 | 4 | Testing the locate functionality of the relative FM-index 5 | 6 | Reference: reference_61_64 7 | Target: target_61_64 8 | Threads: 32 9 | 10 | BWT construction 11 | Options: alphabet isa_sample_rate=64 sa_sample_rate=61 12 | 13 | File: reference_61_64 14 | Text size: 3036320415 15 | BWT built in 861.813 seconds (3.35996 MB/s) 16 | Alphabet written to reference_61_64.alpha 17 | BWT written to reference_61_64.bwt 18 | Samples written to reference_61_64.samples 19 | 20 | File: target_61_64 21 | Text size: 3036191207 22 | BWT built in 840.184 seconds (3.44631 MB/s) 23 | Alphabet written to target_61_64.alpha 24 | BWT written to target_61_64.bwt 25 | Samples written to target_61_64.samples 26 | 27 | Relative FM-index builder 28 | Using OpenMP with 32 threads 29 | 30 | Algorithm: invariant 31 | Input format: plain 32 | SA sample rate: 257 33 | ISA sample rate: 512 34 | 35 | Reference: reference_61_64 36 | 37 | BWT: 1090.28 MB (3.01219 bpc) 38 | SA samples: 189.879 MB (0.52459 bpc) 39 | ISA samples: 180.979 MB (0.5 bpc) 40 | Simple FM: 1461.14 MB (4.03678 bpc) 41 | 42 | 43 | Target: target_61_64 44 | Reference size: 3036320416 45 | Target size: 3036191208 46 | Built the merging bitvector in 3317.01 seconds 47 | Matched 3001221512 positions in 13689.8 seconds 48 | Found a common subsequence of length 2980010799 in 916.001 seconds 49 | Built the bwt_lcs bitvectors and samples in 1604.07 seconds 50 | Index built in 19563.5 seconds 51 | 52 | BWT: 1090.24 MB (3.01219 bpc) 53 | SA samples: 189.871 MB (0.52459 bpc) 54 | ISA samples: 180.971 MB (0.5 bpc) 55 | Simple FM: 1461.08 MB (4.03678 bpc) 56 | 57 | ref_minus_lcs: 19.2911 MB (0.053299 bpc) 58 | seq_minus_lcs: 19.2473 MB (0.0531778 bpc) 59 | bwt_lcs: 163.138 MB (0.450729 bpc) 60 | text_lcs: 125.532 MB (0.34683 bpc) 61 | SA samples: 45.0667 MB (0.124514 bpc) 62 | ISA samples: 22.6214 MB (0.0625 bpc) 63 | Relative FM: 394.897 MB (1.09105 bpc) 64 | 65 | 66 | Memory usage: 83254.3 MB 67 | 68 | Query test 69 | 70 | Reference: reference_61_64 71 | Sequence: target_61_64 72 | Patterns: patterns 73 | 74 | Read 2000000 patterns of total length 64000000 75 | 76 | 77 | Executing locate() queries. 78 | 79 | SimpleFM: 1461.08 MB (4.03678 bpc) 80 | SimpleFM: Found 2000000 patterns with 254997642 occ in 4436.67 seconds (57475 occ/s) 81 | Hash of located positions: 11653246869206397622 82 | 83 | RFM: 394.897 MB (1.09105 bpc) 84 | RFM: Found 2000000 patterns with 254997642 occ in 6638.69 seconds (38410.8 occ/s) 85 | Hash of located positions: 11653246869206397622 86 | 87 | 88 | Memory usage: 2210.59 MB 89 | 90 | 91 | End 92 | Mon Apr 13 20:31:06 BST 2015 93 | 94 | 95 | ------------------------------------------------------------ 96 | Sender: LSF System 97 | Subject: Job 3870161: in cluster Done 98 | 99 | Job was submitted from host by user in cluster . 100 | Job was executed on host(s) <32*vr-4-1-05>, in queue , as user in cluster . 101 | was used as the home directory. 102 | was used as the working directory. 103 | Started at Mon Apr 13 11:23:06 2015 104 | Results reported at Mon Apr 13 20:31:06 2015 105 | 106 | Your job looked like: 107 | 108 | ------------------------------------------------------------ 109 | # LSBATCH: User input 110 | /nfs/users/nfs_j/js35/job_scripts/locate_test 61 64 32 111 | ------------------------------------------------------------ 112 | 113 | Successfully completed. 114 | 115 | Resource usage summary: 116 | 117 | CPU time : 34572.34 sec. 118 | Max Memory : 83261 MB 119 | Average Memory : 8436.65 MB 120 | Total Requested Memory : 131072.00 MB 121 | Delta Memory : 47811.00 MB 122 | (Delta: the difference between total requested memory and actual max usage.) 123 | Max Swap : 87416 MB 124 | 125 | Max Processes : 4 126 | Max Threads : 36 127 | 128 | The output (if any) is above this job summary. 129 | 130 | 131 | 132 | PS: 133 | 134 | Read file for stderr output of this job. 135 | 136 | -------------------------------------------------------------------------------- /logs_old_rcst/locate_test_127_127.log: -------------------------------------------------------------------------------- 1 | Start 2 | Mon Apr 13 11:23:25 BST 2015 3 | 4 | Testing the locate functionality of the relative FM-index 5 | 6 | Reference: reference_127_127 7 | Target: target_127_127 8 | Threads: 32 9 | 10 | BWT construction 11 | Options: alphabet isa_sample_rate=127 sa_sample_rate=127 12 | 13 | File: reference_127_127 14 | Text size: 3036320415 15 | BWT built in 965.884 seconds (2.99794 MB/s) 16 | Alphabet written to reference_127_127.alpha 17 | BWT written to reference_127_127.bwt 18 | Samples written to reference_127_127.samples 19 | 20 | File: target_127_127 21 | Text size: 3036191207 22 | BWT built in 828.296 seconds (3.49578 MB/s) 23 | Alphabet written to target_127_127.alpha 24 | BWT written to target_127_127.bwt 25 | Samples written to target_127_127.samples 26 | 27 | Relative FM-index builder 28 | Using OpenMP with 32 threads 29 | 30 | Algorithm: invariant 31 | Input format: plain 32 | SA sample rate: 257 33 | ISA sample rate: 512 34 | 35 | Reference: reference_127_127 36 | 37 | BWT: 1090.28 MB (3.01219 bpc) 38 | SA samples: 91.2019 MB (0.251969 bpc) 39 | ISA samples: 91.2019 MB (0.251969 bpc) 40 | Simple FM: 1272.69 MB (3.51613 bpc) 41 | 42 | 43 | Target: target_127_127 44 | Reference size: 3036320416 45 | Target size: 3036191208 46 | Built the merging bitvector in 2991.29 seconds 47 | Matched 3001221512 positions in 22630.5 seconds 48 | Found a common subsequence of length 2980010799 in 947.252 seconds 49 | Built the bwt_lcs bitvectors and samples in 1666.89 seconds 50 | Index built in 28273 seconds 51 | 52 | BWT: 1090.24 MB (3.01219 bpc) 53 | SA samples: 91.198 MB (0.251969 bpc) 54 | ISA samples: 91.198 MB (0.251969 bpc) 55 | Simple FM: 1272.64 MB (3.51613 bpc) 56 | 57 | ref_minus_lcs: 19.2911 MB (0.053299 bpc) 58 | seq_minus_lcs: 19.2473 MB (0.0531778 bpc) 59 | bwt_lcs: 163.138 MB (0.450729 bpc) 60 | text_lcs: 125.532 MB (0.34683 bpc) 61 | SA samples: 45.0667 MB (0.124514 bpc) 62 | ISA samples: 22.6214 MB (0.0625 bpc) 63 | Relative FM: 394.897 MB (1.09105 bpc) 64 | 65 | 66 | Memory usage: 82877.4 MB 67 | 68 | Query test 69 | 70 | Reference: reference_127_127 71 | Sequence: target_127_127 72 | Patterns: patterns 73 | 74 | Read 2000000 patterns of total length 64000000 75 | 76 | 77 | Executing locate() queries. 78 | 79 | SimpleFM: 1272.64 MB (3.51613 bpc) 80 | SimpleFM: Found 2000000 patterns with 254997642 occ in 11624.1 seconds (21937 occ/s) 81 | Hash of located positions: 11653246869206397622 82 | 83 | RFM: 394.897 MB (1.09105 bpc) 84 | RFM: Found 2000000 patterns with 254997642 occ in 13878.3 seconds (18373.8 occ/s) 85 | Hash of located positions: 11653246869206397622 86 | 87 | 88 | Memory usage: 2022.13 MB 89 | 90 | 91 | End 92 | Tue Apr 14 02:59:21 BST 2015 93 | 94 | 95 | ------------------------------------------------------------ 96 | Sender: LSF System 97 | Subject: Job 3870162: in cluster Done 98 | 99 | Job was submitted from host by user in cluster . 100 | Job was executed on host(s) <32*vr-4-1-14>, in queue , as user in cluster . 101 | was used as the home directory. 102 | was used as the working directory. 103 | Started at Mon Apr 13 11:23:24 2015 104 | Results reported at Tue Apr 14 02:59:21 2015 105 | 106 | Your job looked like: 107 | 108 | ------------------------------------------------------------ 109 | # LSBATCH: User input 110 | /nfs/users/nfs_j/js35/job_scripts/locate_test 127 127 32 111 | ------------------------------------------------------------ 112 | 113 | Successfully completed. 114 | 115 | Resource usage summary: 116 | 117 | CPU time : 58287.63 sec. 118 | Max Memory : 82884 MB 119 | Average Memory : 6664.29 MB 120 | Total Requested Memory : 131072.00 MB 121 | Delta Memory : 48188.00 MB 122 | (Delta: the difference between total requested memory and actual max usage.) 123 | Max Swap : 87039 MB 124 | 125 | Max Processes : 4 126 | Max Threads : 36 127 | 128 | The output (if any) is above this job summary. 129 | 130 | 131 | 132 | PS: 133 | 134 | Read file for stderr output of this job. 135 | 136 | -------------------------------------------------------------------------------- /logs_spire2015/verify_psi_3963100.log: -------------------------------------------------------------------------------- 1 | Start 2 | Mon May 4 18:59:53 BST 2015 3 | 4 | Verifying LF/Psi operations 5 | 6 | old_maternal vs. human 7 | old_maternal2 vs. female 8 | 9 | RFM and RLCP verifier 10 | 11 | Reference: human 12 | 13 | FM-index: 1999.67 MB (5.41863 bpc) 14 | 15 | LCP array: 3862.42 MB (10.4663 bpc) 16 | 17 | 18 | Target: old_maternal 19 | 20 | BWT: 1090.24 MB (3.01219 bpc) 21 | Simple FM: 1090.24 MB (3.01219 bpc) 22 | 23 | ref_minus_lcs: 38.5664 MB (0.106554 bpc) 24 | seq_minus_lcs: 15.1568 MB (0.0418764 bpc) 25 | bwt_lcs: 163.937 MB (0.452938 bpc) 26 | Relative FM: 217.661 MB (0.60137 bpc) 27 | 28 | LCP array: 2.47955e-05 MB (inf bpc) 29 | 30 | Phrases: 8.58307e-06 MB (inf bpc) 31 | Blocks: 4.76837e-05 MB (inf bpc) 32 | Samples: 2.47955e-05 MB (inf bpc) 33 | Tree: 3.24249e-05 MB (inf bpc) 34 | Relative LCP: 0.000113487 MB (inf bpc) 35 | 36 | LF (FM): 100000000 queries in 55.0136 seconds (0.550136 µs/query) 37 | 38 | LF (RFM): 100000000 queries in 395.389 seconds (3.95389 µs/query) 39 | 40 | Psi (FM): 100000000 queries in 122.035 seconds (1.22035 µs/query) 41 | 42 | Psi (RFM, slow): 100000000 queries in 4797.44 seconds (47.9744 µs/query) 43 | 44 | Select structures built in 317.142 seconds 45 | Relative select: 163.683 MB (0.452234 bpc) 46 | 47 | Psi (RFM, fast): 100000000 queries in 611.072 seconds (6.11072 µs/query) 48 | 49 | 50 | Memory used: 8995.61 MB 51 | 52 | RFM and RLCP verifier 53 | 54 | Reference: female 55 | 56 | FM-index: 1952.6 MB (5.39454 bpc) 57 | 58 | LCP array: 3690.1 MB (10.1948 bpc) 59 | 60 | 61 | Target: old_maternal2 62 | 63 | BWT: 1090.24 MB (3.01219 bpc) 64 | Simple FM: 1090.24 MB (3.01219 bpc) 65 | 66 | ref_minus_lcs: 15.5191 MB (0.0428772 bpc) 67 | seq_minus_lcs: 15.4758 MB (0.0427576 bpc) 68 | bwt_lcs: 150.468 MB (0.415724 bpc) 69 | Relative FM: 181.463 MB (0.50136 bpc) 70 | 71 | LCP array: 2.47955e-05 MB (inf bpc) 72 | 73 | Phrases: 8.58307e-06 MB (inf bpc) 74 | Blocks: 4.76837e-05 MB (inf bpc) 75 | Samples: 2.47955e-05 MB (inf bpc) 76 | Tree: 3.24249e-05 MB (inf bpc) 77 | Relative LCP: 0.000113487 MB (inf bpc) 78 | 79 | LF (FM): 100000000 queries in 55.1854 seconds (0.551854 µs/query) 80 | 81 | LF (RFM): 100000000 queries in 384.258 seconds (3.84258 µs/query) 82 | 83 | Psi (FM): 100000000 queries in 110.638 seconds (1.10638 µs/query) 84 | 85 | Psi (RFM, slow): 100000000 queries in 4482.79 seconds (44.8279 µs/query) 86 | 87 | Select structures built in 310.814 seconds 88 | Relative select: 150.229 MB (0.415063 bpc) 89 | 90 | Psi (RFM, fast): 100000000 queries in 611.884 seconds (6.11884 µs/query) 91 | 92 | 93 | Memory used: 8708.63 MB 94 | 95 | 96 | End 97 | Mon May 4 22:30:22 BST 2015 98 | 99 | 100 | ------------------------------------------------------------ 101 | Sender: LSF System 102 | Subject: Job 3963100: in cluster Done 103 | 104 | Job was submitted from host by user in cluster . 105 | Job was executed on host(s) <32*vr-4-1-02>, in queue , as user in cluster . 106 | was used as the home directory. 107 | was used as the working directory. 108 | Started at Mon May 4 18:59:52 2015 109 | Results reported at Mon May 4 22:30:22 2015 110 | 111 | Your job looked like: 112 | 113 | ------------------------------------------------------------ 114 | # LSBATCH: User input 115 | /nfs/users/nfs_j/js35/job_scripts/verify_psi 116 | ------------------------------------------------------------ 117 | 118 | Successfully completed. 119 | 120 | Resource usage summary: 121 | 122 | CPU time : 13223.80 sec. 123 | Max Memory : 8478 MB 124 | Average Memory : 8060.76 MB 125 | Total Requested Memory : 16384.00 MB 126 | Delta Memory : 7906.00 MB 127 | (Delta: the difference between total requested memory and actual max usage.) 128 | Max Swap : 8662 MB 129 | 130 | Max Processes : 4 131 | Max Threads : 36 132 | 133 | The output (if any) is above this job summary. 134 | 135 | 136 | 137 | PS: 138 | 139 | Read file for stderr output of this job. 140 | 141 | -------------------------------------------------------------------------------- /utils.cpp: -------------------------------------------------------------------------------- 1 | /* 2 | Copyright (c) 2015, 2016, 2017 Genome Research Ltd. 3 | Copyright (c) 2014 Jouni Siren 4 | 5 | Author: Jouni Siren 6 | 7 | Permission is hereby granted, free of charge, to any person obtaining a copy 8 | of this software and associated documentation files (the "Software"), to deal 9 | in the Software without restriction, including without limitation the rights 10 | to use, copy, modify, merge, publish, distribute, sublicense, and/or sell 11 | copies of the Software, and to permit persons to whom the Software is 12 | furnished to do so, subject to the following conditions: 13 | 14 | The above copyright notice and this permission notice shall be included in all 15 | copies or substantial portions of the Software. 16 | 17 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR 18 | IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, 19 | FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE 20 | AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER 21 | LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, 22 | OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE 23 | SOFTWARE. 24 | */ 25 | 26 | #include 27 | #include 28 | 29 | #include 30 | 31 | #include "utils.h" 32 | 33 | namespace relative 34 | { 35 | 36 | //------------------------------------------------------------------------------ 37 | 38 | const std::string BWT_EXTENSION = ".bwt"; 39 | const std::string NATIVE_BWT_EXTENSION = ".cbwt"; 40 | const std::string ALPHA_EXTENSION = ".alpha"; 41 | const std::string SAMPLE_EXTENSION = ".samples"; 42 | const std::string LCP_EXTENSION = ".lcp"; 43 | const std::string DLCP_EXTENSION = ".dlcp"; 44 | const std::string DLCP_INDEX_EXTENSION = ".dlcp_index"; 45 | const std::string SIMPLE_FM_DEFAULT_ALPHABET("\0ACGNT", 6); 46 | 47 | //------------------------------------------------------------------------------ 48 | 49 | void 50 | printHeader(const std::string& header, size_type indent) 51 | { 52 | std::string padding; 53 | if(header.length() + 1 < indent) { padding = std::string(indent - 1 - header.length(), ' '); } 54 | std::cout << header << ":" << padding; 55 | } 56 | 57 | void 58 | printSize(const std::string& header, size_type bytes, size_type data_size, size_type indent) 59 | { 60 | printHeader(header, indent); 61 | std::cout << inMegabytes(bytes) << " MB (" << inBPC(bytes, data_size) << " bpc)" << std::endl; 62 | } 63 | 64 | void 65 | printTime(const std::string& header, size_type found, size_type matches, size_type bytes, double seconds, bool occs, size_type indent) 66 | { 67 | printHeader(header, indent); 68 | 69 | std::cout << "Found " << found << " patterns with " << matches << " occ in " << seconds << " seconds ("; 70 | if(occs) { std::cout << (matches / seconds) << " occ/s)" << std::endl; } 71 | else { std::cout << (inMegabytes(bytes) / seconds) << " MB/s)" << std::endl; } 72 | } 73 | 74 | void 75 | printTime(const std::string& header, size_type queries, double seconds, size_type indent) 76 | { 77 | printHeader(header, indent); 78 | std::cout << queries << " queries in " << seconds << " seconds (" 79 | << inMicroseconds(seconds / queries) << " µs/query)" << std::endl; 80 | } 81 | 82 | //------------------------------------------------------------------------------ 83 | 84 | double 85 | readTimer() 86 | { 87 | return omp_get_wtime(); 88 | } 89 | 90 | size_type 91 | memoryUsage() 92 | { 93 | rusage usage; 94 | getrusage(RUSAGE_SELF, &usage); 95 | #ifdef RUSAGE_IN_BYTES 96 | return usage.ru_maxrss; 97 | #else 98 | return KILOBYTE * usage.ru_maxrss; 99 | #endif 100 | } 101 | 102 | //------------------------------------------------------------------------------ 103 | 104 | size_type 105 | readRows(const std::string& filename, std::vector& rows, bool skip_empty_rows) 106 | { 107 | std::ifstream input(filename.c_str(), std::ios_base::binary); 108 | if(!input) 109 | { 110 | std::cerr << "readRows(): Cannot open input file " << filename << std::endl; 111 | return 0; 112 | } 113 | 114 | size_type chars = 0; 115 | while(input) 116 | { 117 | std::string buf; 118 | std::getline(input, buf); 119 | if(skip_empty_rows && buf.length() == 0) { continue; } 120 | rows.push_back(buf); 121 | chars += buf.length(); 122 | } 123 | 124 | input.close(); 125 | return chars; 126 | } 127 | 128 | //------------------------------------------------------------------------------ 129 | 130 | } // namespace relative 131 | -------------------------------------------------------------------------------- /new_relative_lcp.h: -------------------------------------------------------------------------------- 1 | /* 2 | Copyright (c) 2015, 2016, 2017 Genome Research Ltd. 3 | 4 | Author: Jouni Siren 5 | 6 | Permission is hereby granted, free of charge, to any person obtaining a copy 7 | of this software and associated documentation files (the "Software"), to deal 8 | in the Software without restriction, including without limitation the rights 9 | to use, copy, modify, merge, publish, distribute, sublicense, and/or sell 10 | copies of the Software, and to permit persons to whom the Software is 11 | furnished to do so, subject to the following conditions: 12 | 13 | The above copyright notice and this permission notice shall be included in all 14 | copies or substantial portions of the Software. 15 | 16 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR 17 | IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, 18 | FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE 19 | AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER 20 | LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, 21 | OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE 22 | SOFTWARE. 23 | */ 24 | 25 | #ifndef _NEW_RELATIVE_FM_RELATIVE_LCP_H 26 | #define _NEW_RELATIVE_FM_RELATIVE_LCP_H 27 | 28 | #include 29 | 30 | #include "support.h" 31 | 32 | namespace relative 33 | { 34 | 35 | //------------------------------------------------------------------------------ 36 | 37 | class NewRelativeLCP 38 | { 39 | public: 40 | typedef SLArray lcp_type; 41 | typedef lcp_type::size_type size_type; 42 | typedef lcp_type::value_type value_type; 43 | 44 | typedef rlz::lcp::LcpIndex rlcp_type; 45 | 46 | const static size_type BRANCHING_FACTOR = 64; // For the range minima tree. 47 | const static size_type MAX_PHRASE = 1024; // Long phrases are a performance issue. 48 | const static std::string EXTENSION; // .rlcp 49 | 50 | //------------------------------------------------------------------------------ 51 | 52 | // Reference is an LCP array. 53 | NewRelativeLCP(const lcp_type& ref, const lcp_type& seq); 54 | NewRelativeLCP(const lcp_type& ref, const std::string& base_name); 55 | NewRelativeLCP(const lcp_type& ref, std::istream& input); 56 | ~NewRelativeLCP(); 57 | 58 | size_type reportSize(bool print = false) const; 59 | void writeTo(const std::string& base_name) const; 60 | void writeTo(std::ostream& output) const; 61 | 62 | //------------------------------------------------------------------------------ 63 | 64 | inline size_type size() const { return this->array.size(); } 65 | 66 | // For range minima tree. 67 | inline size_type phrases() const { return this->array.phrases(); } 68 | inline size_type values() const { return this->tree.size(); } 69 | inline size_type levels() const { return this->offsets.size() - 1; } 70 | inline size_type branching() const { return this->branching_factor; } 71 | 72 | inline value_type operator[] (size_type i) const { return this->array(i); } 73 | 74 | //------------------------------------------------------------------------------ 75 | 76 | /* 77 | The return value is (res, LCP[res]) or notFound(). RMQ always returns the leftmost 78 | minimum value. 79 | */ 80 | 81 | range_type psv(size_type pos) const; 82 | range_type psev(size_type pos) const; 83 | 84 | range_type nsv(size_type pos) const; 85 | range_type nsev(size_type pos) const; 86 | 87 | range_type rmq(size_type sp, size_type ep) const; 88 | range_type rmq(range_type range) const; 89 | 90 | // Returned when a psv/nsv/rmq query cannot find a suitable value. 91 | inline range_type notFound() const { return range_type(this->size() + this->values(), this->size()); } 92 | 93 | //------------------------------------------------------------------------------ 94 | 95 | const lcp_type& reference; 96 | rlcp_type array; 97 | 98 | size_type branching_factor; 99 | lcp_type tree; 100 | sdsl::int_vector<64> offsets; 101 | 102 | //------------------------------------------------------------------------------ 103 | 104 | private: 105 | void loadFrom(std::istream& input); 106 | 107 | //------------------------------------------------------------------------------ 108 | 109 | NewRelativeLCP(const NewRelativeLCP&) = delete; 110 | NewRelativeLCP& operator=(const NewRelativeLCP&) = delete; 111 | }; 112 | 113 | //------------------------------------------------------------------------------ 114 | 115 | } // namespace relative 116 | 117 | #endif // _NEW_RELATIVE_FM_RELATIVE_LCP_H 118 | -------------------------------------------------------------------------------- /scripts/col2vector.cpp: -------------------------------------------------------------------------------- 1 | #include 2 | #include 3 | #include 4 | #include 5 | #include 6 | 7 | //------------------------------------------------------------------------------ 8 | 9 | const uint64_t MEGABYTE = 1048576; 10 | 11 | void writeBytes(std::ostream& out, char* bytes, uint64_t n); 12 | 13 | void buildAlphabet(uint64_t* counts, char* char2comp, const std::string& alpha_name); 14 | 15 | //------------------------------------------------------------------------------ 16 | 17 | int 18 | main(int argc, char** argv) 19 | { 20 | if(argc < 5) 21 | { 22 | std::cerr << "Usage: col2vector input column output alphabet" << std::endl; 23 | std::cerr << std::endl; 24 | return 1; 25 | } 26 | 27 | std::cout << "Converting a column into int_vector<8> with a packed alphabet" << std::endl; 28 | std::cout << std::endl; 29 | std::string input_name = argv[1]; 30 | std::cout << "Input: " << input_name << std::endl; 31 | uint64_t col = std::stoul(argv[2]); 32 | std::cout << "Column: " << col << std::endl; 33 | std::string output_name = argv[3]; 34 | std::cout << "Output: " << output_name << std::endl; 35 | std::string alpha_name = argv[4]; 36 | std::cout << "Alphabet: " << alpha_name << std::endl; 37 | std::cout << std::endl; 38 | 39 | if(input_name == output_name) 40 | { 41 | std::cerr << "col2vector: Input and output files must be separate!" << std::endl; 42 | return 2; 43 | } 44 | 45 | std::ifstream in(input_name.c_str(), std::ios_base::binary); 46 | if(!in) 47 | { 48 | std::cerr << "col2vector: Cannot open input file " << input_name << std::endl; 49 | return 3; 50 | } 51 | std::ofstream out(output_name.c_str(), std::ios_base::binary); 52 | if(!out) 53 | { 54 | std::cerr << "col2vector: Cannot open output file " << output_name << std::endl; 55 | in.close(); 56 | return 4; 57 | } 58 | 59 | // First pass: Build the alphabet. 60 | uint64_t bits = 0; 61 | uint64_t counts[257] = {}; 62 | char char2comp[256] = {}; 63 | while(in) 64 | { 65 | std::string line; 66 | std::getline(in, line); 67 | if(line.length() <= col) { continue; } 68 | unsigned char c = line[col]; 69 | if(isdigit(c)) { break; } // Ugly hack. 70 | counts[c]++; bits += 8; 71 | } 72 | std::cout << "Vector size: " << (bits / 8) << std::endl; 73 | out.write((char*)&bits, sizeof(bits)); 74 | buildAlphabet(counts, char2comp, alpha_name); 75 | 76 | // Second pass: Write the BWT. 77 | char* buffer = new char[MEGABYTE]; 78 | uint64_t bytes = 0; 79 | in.seekg(0); 80 | while(in) 81 | { 82 | std::string line; 83 | std::getline(in, line); 84 | if(line.length() <= col) { continue; } 85 | unsigned char c = line[col]; 86 | if(isdigit(c)) { break; } // Ugly hack. 87 | buffer[bytes % MEGABYTE] = char2comp[c]; bytes++; 88 | if(bytes % MEGABYTE == 0) { writeBytes(out, buffer, MEGABYTE); } 89 | } 90 | if(bytes % MEGABYTE > 0) { writeBytes(out, buffer, bytes % MEGABYTE); } 91 | 92 | delete[] buffer; buffer = 0; 93 | in.close(); out.close(); 94 | return 0; 95 | } 96 | 97 | //------------------------------------------------------------------------------ 98 | 99 | void 100 | writeBytes(std::ostream& out, char* bytes, uint64_t n) 101 | { 102 | uint64_t padding = 0; 103 | while((n + padding) % sizeof(uint64_t) != 0) { bytes[n + padding] = 0; padding++; } 104 | out.write(bytes, n + padding); 105 | } 106 | 107 | void 108 | writeBuffer(std::ostream& out, char* bytes, uint64_t n) 109 | { 110 | uint64_t bits = 8 * n; 111 | out.write((char*)&bits, sizeof(bits)); 112 | writeBytes(out, bytes, n); 113 | } 114 | 115 | void 116 | writeBuffer(std::ostream& out, uint64_t* data, uint64_t n) 117 | { 118 | uint64_t bits = 64 * n; 119 | out.write((char*)&bits, sizeof(bits)); 120 | out.write((char*)data, n * sizeof(uint64_t)); 121 | } 122 | 123 | //------------------------------------------------------------------------------ 124 | 125 | void 126 | buildAlphabet(uint64_t* counts, char* char2comp, const std::string& alpha_name) 127 | { 128 | uint64_t sigma = 0; 129 | char comp2char[256] = {}; 130 | 131 | for(uint64_t i = 0; i < 256; i++) 132 | { 133 | if(counts[i] > 0) 134 | { 135 | char2comp[i] = sigma; 136 | comp2char[sigma] = i; 137 | counts[sigma] = counts[i]; 138 | sigma++; 139 | } 140 | } 141 | 142 | for(uint64_t i = 0; i < sigma; i++) 143 | { 144 | std::cout << " counts[" << i << "] = " << counts[i] << std::endl; 145 | } 146 | std::cout << std::endl; 147 | 148 | // Cumulative counts. 149 | for(uint64_t i = sigma; i > 0; i--) { counts[i] = counts[i - 1]; } 150 | counts[0] = 0; 151 | for(uint64_t i = 1; i <= sigma; i++) { counts[i] += counts[i - 1]; } 152 | 153 | std::ofstream out(alpha_name.c_str(), std::ios_base::binary); 154 | if(!out) 155 | { 156 | std::cerr << "col2vector: buildAlphabet(): Cannot open alphabet file " << alpha_name << std::endl; 157 | return; 158 | } 159 | writeBuffer(out, char2comp, 256); 160 | writeBuffer(out, comp2char, sigma); 161 | writeBuffer(out, counts, sigma + 1); 162 | out.write((char*)&sigma, sizeof(sigma)); 163 | out.close(); 164 | } 165 | 166 | //------------------------------------------------------------------------------ 167 | -------------------------------------------------------------------------------- /mutate.cpp: -------------------------------------------------------------------------------- 1 | /* 2 | Copyright (c) 2015, 2016 Genome Research Ltd. 3 | 4 | Author: Jouni Siren 5 | 6 | Permission is hereby granted, free of charge, to any person obtaining a copy 7 | of this software and associated documentation files (the "Software"), to deal 8 | in the Software without restriction, including without limitation the rights 9 | to use, copy, modify, merge, publish, distribute, sublicense, and/or sell 10 | copies of the Software, and to permit persons to whom the Software is 11 | furnished to do so, subject to the following conditions: 12 | 13 | The above copyright notice and this permission notice shall be included in all 14 | copies or substantial portions of the Software. 15 | 16 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR 17 | IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, 18 | FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE 19 | AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER 20 | LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, 21 | OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE 22 | SOFTWARE. 23 | */ 24 | 25 | #include 26 | 27 | #include "utils.h" 28 | 29 | using namespace relative; 30 | 31 | //------------------------------------------------------------------------------ 32 | 33 | const double INDEL_RATE = 0.1; 34 | const double INDEL_EXTEND = 0.8; 35 | 36 | inline double 37 | probability(std::mt19937_64& rng) 38 | { 39 | return rng() / (rng.max() + 1.0); 40 | } 41 | 42 | inline size_type 43 | indelLength(std::mt19937_64& rng) 44 | { 45 | size_type len = 1; 46 | double prob = probability(rng); 47 | while(prob >= (1.0 - INDEL_EXTEND)) 48 | { 49 | prob = (prob - (1.0 - INDEL_EXTEND)) / INDEL_EXTEND; 50 | len++; 51 | } 52 | return len; 53 | } 54 | 55 | const std::string ALPHABET = "ACGT"; 56 | 57 | inline size_type 58 | randomChar(std::mt19937_64& rng) 59 | { 60 | return ALPHABET[rng() % ALPHABET.size()]; 61 | } 62 | 63 | inline size_type 64 | substitute(std::mt19937_64& rng, size_type old_char) 65 | { 66 | size_type res = 0; 67 | do 68 | { 69 | res = randomChar(rng); 70 | } 71 | while(res == old_char); 72 | return res; 73 | } 74 | 75 | //------------------------------------------------------------------------------ 76 | 77 | int 78 | main(int argc, char** argv) 79 | { 80 | if(argc < 4) 81 | { 82 | std::cerr << "Usage: mutate source target rate [seed]" << std::endl; 83 | std::cerr << std::endl; 84 | return 1; 85 | } 86 | 87 | std::cout << "Sequence mutator" << std::endl; 88 | std::cout << std::endl; 89 | 90 | sdsl::int_vector_buffer<8> source, target; 91 | size_type seed = 0xDEADBEEF; 92 | double rate = 0.0; 93 | { 94 | std::string source_name = argv[1]; 95 | std::cout << "Source: " << source_name << std::endl; 96 | source = sdsl::int_vector_buffer<8>(source_name, std::ios::in, MEGABYTE, 8, true); 97 | std::cout << "Size: " << source.size() << std::endl; 98 | 99 | std::string target_name = argv[2]; 100 | std::cout << "Target: " << target_name << std::endl; 101 | target = sdsl::int_vector_buffer<8>(target_name, std::ios::out, MEGABYTE, 8, true); 102 | 103 | rate = std::stod(argv[3]); 104 | std::cout << "Mutation rate: " << rate << std::endl; 105 | 106 | if(argc > 4) { seed = fnv1a_hash((size_type)std::stoul(argv[4]), seed); } 107 | std::cout << "Seed: " << seed << std::endl; 108 | 109 | std::cout << std::endl; 110 | } 111 | 112 | 113 | std::mt19937_64 rng(seed); 114 | size_type substitutions = 0, insertions = 0, insertion_total = 0, deletions = 0, deletion_total = 0; 115 | for(size_type i = 0; i < source.size(); i++) 116 | { 117 | bool at_n = (source[i] == 'N'), after_n = (i > 0 && source[i - 1] == 'N'); 118 | double mutation = probability(rng); 119 | if(mutation < rate) 120 | { 121 | mutation /= rate; 122 | if(mutation < INDEL_RATE / 2) // Insertion 123 | { 124 | insertions++; 125 | size_type len = indelLength(rng); 126 | insertion_total += len; 127 | if(at_n && after_n) // Insert N's inside a runs of N's. 128 | { 129 | for(size_type j = 0; j < len; j++) { target.push_back('N'); } 130 | } 131 | else 132 | { 133 | for(size_type j = 0; j < len; j++) { target.push_back(randomChar(rng)); } 134 | } 135 | target.push_back(source[i]); 136 | } 137 | else if(mutation < INDEL_RATE) // Deletion 138 | { 139 | deletions++; 140 | size_type len = indelLength(rng); 141 | len = std::min(len, source.size() - i); 142 | deletion_total += len; 143 | i += len - 1; 144 | } 145 | else // Substitution 146 | { 147 | if(at_n) // N's cannot be substituted with other characters. 148 | { 149 | target.push_back(source[i]); 150 | } 151 | else 152 | { 153 | substitutions++; 154 | target.push_back(substitute(rng, source[i])); 155 | } 156 | } 157 | } 158 | else { target.push_back(source[i]); } 159 | } 160 | 161 | std::cout << "Target size: " << target.size() << std::endl; 162 | std::cout << "Substitutions: " << substitutions << std::endl; 163 | std::cout << "Insertions: " << insertions << ", total size " << insertion_total << std::endl; 164 | std::cout << "Deletions: " << deletions << ", total size " << deletion_total << std::endl; 165 | std::cout << std::endl; 166 | 167 | target.close(); 168 | return 0; 169 | } 170 | 171 | //------------------------------------------------------------------------------ 172 | -------------------------------------------------------------------------------- /logs_rst/old_female_maternal2.err: -------------------------------------------------------------------------------- 1 | Processed 83156 / 8315561 ranges in 1.53928 seconds 2 | Processed 166312 / 8315561 ranges in 1.94151 seconds 3 | Processed 249467 / 8315561 ranges in 2.35737 seconds 4 | Processed 332623 / 8315561 ranges in 2.77016 seconds 5 | Processed 415779 / 8315561 ranges in 3.18625 seconds 6 | Processed 498934 / 8315561 ranges in 3.60173 seconds 7 | Processed 582090 / 8315561 ranges in 4.02389 seconds 8 | Processed 665245 / 8315561 ranges in 4.44612 seconds 9 | Processed 748401 / 8315561 ranges in 4.86461 seconds 10 | Processed 831557 / 8315561 ranges in 5.27981 seconds 11 | Processed 914712 / 8315561 ranges in 5.69423 seconds 12 | Processed 997868 / 8315561 ranges in 6.09922 seconds 13 | Processed 1081023 / 8315561 ranges in 6.50803 seconds 14 | Processed 1164179 / 8315561 ranges in 6.90779 seconds 15 | Processed 1247335 / 8315561 ranges in 7.2978 seconds 16 | Processed 1330490 / 8315561 ranges in 7.64032 seconds 17 | Processed 1413646 / 8315561 ranges in 8.05455 seconds 18 | Processed 1496801 / 8315561 ranges in 8.47837 seconds 19 | Processed 1579957 / 8315561 ranges in 8.90393 seconds 20 | Processed 1663113 / 8315561 ranges in 9.32293 seconds 21 | Processed 1746268 / 8315561 ranges in 9.74425 seconds 22 | Processed 1829424 / 8315561 ranges in 10.1555 seconds 23 | Processed 1912580 / 8315561 ranges in 10.5766 seconds 24 | Processed 1995735 / 8315561 ranges in 10.9936 seconds 25 | Processed 2078891 / 8315561 ranges in 11.4047 seconds 26 | Processed 2162046 / 8315561 ranges in 11.8118 seconds 27 | Processed 2245202 / 8315561 ranges in 12.1979 seconds 28 | Processed 2328358 / 8315561 ranges in 12.6075 seconds 29 | Processed 2411513 / 8315561 ranges in 13.0115 seconds 30 | Processed 2494669 / 8315561 ranges in 13.3759 seconds 31 | Processed 2577824 / 8315561 ranges in 13.798 seconds 32 | Processed 2660980 / 8315561 ranges in 14.2122 seconds 33 | Processed 2744136 / 8315561 ranges in 14.6218 seconds 34 | Processed 2827291 / 8315561 ranges in 15.0333 seconds 35 | Processed 2910447 / 8315561 ranges in 15.4364 seconds 36 | Processed 2993602 / 8315561 ranges in 15.8454 seconds 37 | Processed 3076758 / 8315561 ranges in 16.2706 seconds 38 | Processed 3159914 / 8315561 ranges in 16.6919 seconds 39 | Processed 3243069 / 8315561 ranges in 17.1093 seconds 40 | Processed 3326225 / 8315561 ranges in 17.5206 seconds 41 | Processed 3409381 / 8315561 ranges in 17.9253 seconds 42 | Processed 3492536 / 8315561 ranges in 18.3356 seconds 43 | Processed 3575692 / 8315561 ranges in 18.743 seconds 44 | Processed 3658847 / 8315561 ranges in 19.1635 seconds 45 | Processed 3742003 / 8315561 ranges in 19.5847 seconds 46 | Processed 3825159 / 8315561 ranges in 20.0026 seconds 47 | Processed 3908314 / 8315561 ranges in 20.4226 seconds 48 | Processed 3991470 / 8315561 ranges in 20.8438 seconds 49 | Processed 4074625 / 8315561 ranges in 21.2617 seconds 50 | Processed 4157781 / 8315561 ranges in 21.6731 seconds 51 | Processed 4240937 / 8315561 ranges in 22.0963 seconds 52 | Processed 4324092 / 8315561 ranges in 22.5194 seconds 53 | Processed 4407248 / 8315561 ranges in 22.9403 seconds 54 | Processed 4490403 / 8315561 ranges in 23.3731 seconds 55 | Processed 4573559 / 8315561 ranges in 23.8026 seconds 56 | Processed 4656715 / 8315561 ranges in 24.2309 seconds 57 | Processed 4739870 / 8315561 ranges in 24.6592 seconds 58 | Processed 4823026 / 8315561 ranges in 25.0601 seconds 59 | Processed 4906181 / 8315561 ranges in 25.46 seconds 60 | Processed 4989337 / 8315561 ranges in 25.8871 seconds 61 | Processed 5072493 / 8315561 ranges in 26.3029 seconds 62 | Processed 5155648 / 8315561 ranges in 26.7172 seconds 63 | Processed 5238804 / 8315561 ranges in 27.1333 seconds 64 | Processed 5321960 / 8315561 ranges in 27.5523 seconds 65 | Processed 5405115 / 8315561 ranges in 27.9821 seconds 66 | Processed 5488271 / 8315561 ranges in 28.3964 seconds 67 | Processed 5571426 / 8315561 ranges in 28.8218 seconds 68 | Processed 5654582 / 8315561 ranges in 29.2489 seconds 69 | Processed 5737738 / 8315561 ranges in 29.6826 seconds 70 | Processed 5820893 / 8315561 ranges in 30.1035 seconds 71 | Processed 5904049 / 8315561 ranges in 30.5038 seconds 72 | Processed 5987204 / 8315561 ranges in 30.8596 seconds 73 | Processed 6070360 / 8315561 ranges in 31.1983 seconds 74 | Processed 6153516 / 8315561 ranges in 31.5638 seconds 75 | Processed 6236671 / 8315561 ranges in 31.9458 seconds 76 | Processed 6319827 / 8315561 ranges in 32.3139 seconds 77 | Processed 6402982 / 8315561 ranges in 32.6961 seconds 78 | Processed 6486138 / 8315561 ranges in 33.0803 seconds 79 | Processed 6569294 / 8315561 ranges in 33.4518 seconds 80 | Processed 6652449 / 8315561 ranges in 33.8271 seconds 81 | Processed 6735605 / 8315561 ranges in 34.1823 seconds 82 | Processed 6818761 / 8315561 ranges in 34.5963 seconds 83 | Processed 6901916 / 8315561 ranges in 34.9977 seconds 84 | Processed 6985072 / 8315561 ranges in 35.4044 seconds 85 | Processed 7068227 / 8315561 ranges in 35.8149 seconds 86 | Processed 7151383 / 8315561 ranges in 36.2297 seconds 87 | Processed 7234539 / 8315561 ranges in 36.643 seconds 88 | Processed 7317694 / 8315561 ranges in 37.0583 seconds 89 | Processed 7400850 / 8315561 ranges in 37.4578 seconds 90 | Processed 7484005 / 8315561 ranges in 37.8392 seconds 91 | Processed 7567161 / 8315561 ranges in 38.224 seconds 92 | Processed 7650317 / 8315561 ranges in 38.6047 seconds 93 | Processed 7733472 / 8315561 ranges in 38.9822 seconds 94 | Processed 7816628 / 8315561 ranges in 39.3703 seconds 95 | Processed 7899783 / 8315561 ranges in 39.7856 seconds 96 | Processed 7982939 / 8315561 ranges in 40.204 seconds 97 | Processed 8066095 / 8315561 ranges in 40.594 seconds 98 | Processed 8149250 / 8315561 ranges in 40.997 seconds 99 | Processed 8232406 / 8315561 ranges in 41.3414 seconds 100 | Processed 8315561 / 8315561 ranges in 50.2575 seconds 101 | -------------------------------------------------------------------------------- /logs_rst/old_human_maternal.err: -------------------------------------------------------------------------------- 1 | Processed 83274 / 8327360 ranges in 1.58276 seconds 2 | Processed 166548 / 8327360 ranges in 1.96973 seconds 3 | Processed 249821 / 8327360 ranges in 2.37005 seconds 4 | Processed 333095 / 8327360 ranges in 2.77069 seconds 5 | Processed 416368 / 8327360 ranges in 3.16622 seconds 6 | Processed 499642 / 8327360 ranges in 3.56401 seconds 7 | Processed 582916 / 8327360 ranges in 3.97226 seconds 8 | Processed 666189 / 8327360 ranges in 4.38436 seconds 9 | Processed 749463 / 8327360 ranges in 4.78024 seconds 10 | Processed 832736 / 8327360 ranges in 5.17701 seconds 11 | Processed 916010 / 8327360 ranges in 5.56655 seconds 12 | Processed 999284 / 8327360 ranges in 5.95221 seconds 13 | Processed 1082557 / 8327360 ranges in 6.32959 seconds 14 | Processed 1165831 / 8327360 ranges in 6.67223 seconds 15 | Processed 1249104 / 8327360 ranges in 7.02611 seconds 16 | Processed 1332378 / 8327360 ranges in 7.42026 seconds 17 | Processed 1415652 / 8327360 ranges in 7.82541 seconds 18 | Processed 1498925 / 8327360 ranges in 8.23881 seconds 19 | Processed 1582199 / 8327360 ranges in 8.65144 seconds 20 | Processed 1665472 / 8327360 ranges in 9.06356 seconds 21 | Processed 1748746 / 8327360 ranges in 9.4666 seconds 22 | Processed 1832020 / 8327360 ranges in 9.86698 seconds 23 | Processed 1915293 / 8327360 ranges in 10.2719 seconds 24 | Processed 1998567 / 8327360 ranges in 10.6684 seconds 25 | Processed 2081840 / 8327360 ranges in 11.047 seconds 26 | Processed 2165114 / 8327360 ranges in 11.4259 seconds 27 | Processed 2248388 / 8327360 ranges in 11.8027 seconds 28 | Processed 2331661 / 8327360 ranges in 12.1721 seconds 29 | Processed 2414935 / 8327360 ranges in 12.5449 seconds 30 | Processed 2498208 / 8327360 ranges in 12.8806 seconds 31 | Processed 2581482 / 8327360 ranges in 13.2954 seconds 32 | Processed 2664756 / 8327360 ranges in 13.6976 seconds 33 | Processed 2748029 / 8327360 ranges in 14.0908 seconds 34 | Processed 2831303 / 8327360 ranges in 14.4851 seconds 35 | Processed 2914577 / 8327360 ranges in 14.8505 seconds 36 | Processed 2997850 / 8327360 ranges in 15.2216 seconds 37 | Processed 3081124 / 8327360 ranges in 15.6284 seconds 38 | Processed 3164397 / 8327360 ranges in 16.0342 seconds 39 | Processed 3247671 / 8327360 ranges in 16.4414 seconds 40 | Processed 3330944 / 8327360 ranges in 16.8515 seconds 41 | Processed 3414218 / 8327360 ranges in 17.2359 seconds 42 | Processed 3497492 / 8327360 ranges in 17.6381 seconds 43 | Processed 3580765 / 8327360 ranges in 18.0411 seconds 44 | Processed 3664039 / 8327360 ranges in 18.4438 seconds 45 | Processed 3747312 / 8327360 ranges in 18.8525 seconds 46 | Processed 3830586 / 8327360 ranges in 19.2638 seconds 47 | Processed 3913860 / 8327360 ranges in 19.6774 seconds 48 | Processed 3997133 / 8327360 ranges in 20.0846 seconds 49 | Processed 4080407 / 8327360 ranges in 20.4867 seconds 50 | Processed 4163680 / 8327360 ranges in 20.8865 seconds 51 | Processed 4246954 / 8327360 ranges in 21.2983 seconds 52 | Processed 4330228 / 8327360 ranges in 21.711 seconds 53 | Processed 4413501 / 8327360 ranges in 22.1149 seconds 54 | Processed 4496775 / 8327360 ranges in 22.5351 seconds 55 | Processed 4580048 / 8327360 ranges in 22.9519 seconds 56 | Processed 4663322 / 8327360 ranges in 23.371 seconds 57 | Processed 4746596 / 8327360 ranges in 23.778 seconds 58 | Processed 4829869 / 8327360 ranges in 24.1874 seconds 59 | Processed 4913143 / 8327360 ranges in 24.5742 seconds 60 | Processed 4996416 / 8327360 ranges in 24.9757 seconds 61 | Processed 5079690 / 8327360 ranges in 25.3581 seconds 62 | Processed 5162964 / 8327360 ranges in 25.7771 seconds 63 | Processed 5246237 / 8327360 ranges in 26.1932 seconds 64 | Processed 5329511 / 8327360 ranges in 26.5914 seconds 65 | Processed 5412784 / 8327360 ranges in 27.0078 seconds 66 | Processed 5496058 / 8327360 ranges in 27.4188 seconds 67 | Processed 5579332 / 8327360 ranges in 27.8279 seconds 68 | Processed 5662605 / 8327360 ranges in 28.2244 seconds 69 | Processed 5745879 / 8327360 ranges in 28.6442 seconds 70 | Processed 5829153 / 8327360 ranges in 29.0586 seconds 71 | Processed 5912426 / 8327360 ranges in 29.4516 seconds 72 | Processed 5995700 / 8327360 ranges in 29.7713 seconds 73 | Processed 6078973 / 8327360 ranges in 30.1151 seconds 74 | Processed 6162247 / 8327360 ranges in 30.4733 seconds 75 | Processed 6245520 / 8327360 ranges in 30.823 seconds 76 | Processed 6328794 / 8327360 ranges in 31.1678 seconds 77 | Processed 6412068 / 8327360 ranges in 31.4439 seconds 78 | Processed 6495341 / 8327360 ranges in 31.8135 seconds 79 | Processed 6578615 / 8327360 ranges in 32.1638 seconds 80 | Processed 6661888 / 8327360 ranges in 32.5248 seconds 81 | Processed 6745162 / 8327360 ranges in 32.8922 seconds 82 | Processed 6828436 / 8327360 ranges in 33.2494 seconds 83 | Processed 6911709 / 8327360 ranges in 33.5992 seconds 84 | Processed 6994983 / 8327360 ranges in 33.9879 seconds 85 | Processed 7078256 / 8327360 ranges in 34.3827 seconds 86 | Processed 7161530 / 8327360 ranges in 34.7767 seconds 87 | Processed 7244804 / 8327360 ranges in 35.1719 seconds 88 | Processed 7328077 / 8327360 ranges in 35.5649 seconds 89 | Processed 7411351 / 8327360 ranges in 35.959 seconds 90 | Processed 7494624 / 8327360 ranges in 36.3148 seconds 91 | Processed 7577898 / 8327360 ranges in 36.677 seconds 92 | Processed 7661172 / 8327360 ranges in 37.0022 seconds 93 | Processed 7744445 / 8327360 ranges in 37.3751 seconds 94 | Processed 7827719 / 8327360 ranges in 37.7297 seconds 95 | Processed 7910993 / 8327360 ranges in 38.0874 seconds 96 | Processed 7994266 / 8327360 ranges in 38.4853 seconds 97 | Processed 8077540 / 8327360 ranges in 38.8714 seconds 98 | Processed 8160813 / 8327360 ranges in 39.2557 seconds 99 | Processed 8244087 / 8327360 ranges in 39.5109 seconds 100 | Processed 8327360 / 8327360 ranges in 50.5809 seconds 101 | -------------------------------------------------------------------------------- /spire2014/relative.bib: -------------------------------------------------------------------------------- 1 | % This file was created with JabRef 2.9.2. 2 | % Encoding: UTF8 3 | 4 | @ARTICLE{BBL98, 5 | author = {P. Bose and J. F. Buss and A. Lubiw}, 6 | title = {Pattern Matching for Permutations}, 7 | journal = {Inf. Process. Lett.}, 8 | year = {1998}, 9 | volume = {65}, 10 | pages = {277--283}, 11 | number = {5} 12 | } 13 | 14 | @TECHREPORT{BW94, 15 | author = {M. Burrows and D. J. Wheeler}, 16 | title = {A block sorting lossless data compression algorithm}, 17 | institution = {Digital Equipment Corporation}, 18 | year = {1994}, 19 | number = {124} 20 | } 21 | 22 | @ARTICLE{FGHP14, 23 | author = {H. Ferrada and T. Gagie and T. Hirvola and S. J. Puglisi}, 24 | title = {Hybrid indexes for repetitive datasets}, 25 | journal = {Phil. Trans. Royal Society A}, 26 | year = {2014}, 27 | volume = {372}, 28 | number = {2016} 29 | } 30 | 31 | @ARTICLE{FM05, 32 | author = {P. Ferragina and G. Manzini}, 33 | title = {Indexing compressed text}, 34 | journal = {Journal of the ACM}, 35 | year = {2005}, 36 | volume = {52}, 37 | pages = {552--581}, 38 | number = {4} 39 | } 40 | 41 | @INPROCEEDINGS{Gog2014b, 42 | author = {S. Gog and T. Beller and A. Moffat and M. Petri}, 43 | title = {From Theory to Practice: Plug and Play with Succinct Data Structures}, 44 | booktitle = {Proc. 13th International Symposium on Experimental Algorithms (SEA 45 | 2014)}, 46 | year = {2014}, 47 | note = {To appear.}, 48 | owner = {jltsiren}, 49 | timestamp = {2014.05.03} 50 | } 51 | 52 | @INPROCEEDINGS{Kaerkkaeinen2014, 53 | author = {J. K{\"a}rkk{\"a}inen and D. Kempa and S. J. Puglisi}, 54 | title = {Hybrid Compression of Bitvectors for the {FM}-Index}, 55 | booktitle = {Proc. 2014 IEEE Data Compression Conference (DCC 2014)}, 56 | year = {2014}, 57 | note = {To appear.}, 58 | owner = {jltsiren}, 59 | timestamp = {2014.03.20} 60 | } 61 | 62 | @ARTICLE{LandauVN86, 63 | author = {G. M. Landau and U. Vishkin and R. Nussinov}, 64 | title = {An efficient string matching algorithm with k differences for nucleotide 65 | and amino acid sequences}, 66 | journal = {Nucleic Acids Research}, 67 | year = {1986}, 68 | volume = {14}, 69 | pages = {31-46}, 70 | number = {1}, 71 | bibsource = {DBLP, http://dblp.uni-trier.de}, 72 | ee = {http://dx.doi.org/10.1093/nar/14.1.31} 73 | } 74 | 75 | @ARTICLE{LTPS09, 76 | author = {B. Langmead and C. Trapnell and M. Pop and S. L. Salzberg}, 77 | title = {Ultrafast and memory-efficient alignment of short {DNA} sequences 78 | to the human genome}, 79 | journal = {Genome Biology}, 80 | year = {2009}, 81 | volume = {10}, 82 | pages = {R25} 83 | } 84 | 85 | @ARTICLE{LD09, 86 | author = {H. Li and R. Durbin}, 87 | title = {Fast and accurate short read alignment with {Burrows-Wheeler} transform}, 88 | journal = {Bioinformatics}, 89 | year = {2009}, 90 | volume = {25}, 91 | pages = {1754--1760}, 92 | number = {14} 93 | } 94 | 95 | @ARTICLE{LYLLYKW09, 96 | author = {R. Li and C. Yu and Y. Li and T.-W. Lam and S.-M. Yiu and K. Kristiansen 97 | and J. Wang}, 98 | title = {{SOAP2}: an improved ultrafast tool for short read alignment}, 99 | journal = {Bioinformatics}, 100 | year = {2009}, 101 | volume = {25}, 102 | pages = {1966--1967}, 103 | number = {15} 104 | } 105 | 106 | @ARTICLE{MNSV10, 107 | author = {V. M{\"a}kinen and G. Navarro and J. Sir{\'e}n and N. V{\"a}lim{\"a}ki}, 108 | title = {Storage and retrieval of highly repetitive sequence collections}, 109 | journal = {Journal of Computational Biology}, 110 | year = {2010}, 111 | volume = {17}, 112 | pages = {281--308}, 113 | number = {3} 114 | } 115 | 116 | @ARTICLE{Myers99, 117 | author = {E. W. Myers}, 118 | title = {A fast bit-vector algorithm for approximate string matching based 119 | on dynamic programming}, 120 | journal = {Journal of the ACM}, 121 | year = {1999}, 122 | volume = {46}, 123 | pages = {395--415}, 124 | number = {3} 125 | } 126 | 127 | @ARTICLE{Myers86, 128 | author = {Eugene W. Myers}, 129 | title = {An {O(ND)} difference algorithm and its variations}, 130 | journal = {Algorithmica}, 131 | year = {1986}, 132 | volume = {1}, 133 | pages = {251--266}, 134 | number = {2} 135 | } 136 | 137 | @INPROCEEDINGS{Okanohara2007, 138 | author = {D. Okanohara and K. Sadakane}, 139 | title = {Practical Entropy-Compressed Rank/Select Dictionary}, 140 | booktitle = {Proc. Ninth Workshop on Algorithm Engineering and Experiments (ALENEX 141 | 2007)}, 142 | year = {2007}, 143 | pages = {60-70}, 144 | publisher = {SIAM}, 145 | owner = {jltsiren}, 146 | timestamp = {2010.07.05} 147 | } 148 | 149 | @ARTICLE{Raman2007, 150 | author = {R. Raman and V. Raman and S. Rao Satti}, 151 | title = {Succinct indexable dictionaries with applications to encoding $k$-ary 152 | trees, prefix sums and multisets}, 153 | journal = {ACM Transactions on Algorithms}, 154 | year = {2007}, 155 | volume = {3}, 156 | pages = {43}, 157 | number = {4}, 158 | doi = {10.1145/1290672.1290680}, 159 | keywords = {dictionaries, multisets, perfect hashing, prefix sums, succinct data 160 | structures, trees}, 161 | owner = {jltsiren}, 162 | timestamp = {2014.05.03} 163 | } 164 | 165 | @ARTICLE{Rozowsky2011, 166 | author = {Joel Rozowsky and Alexej Abyzov and Jing Wang and Pedro Alves and 167 | Debasish Raha and Arif Harmanci and Jing Leng and Robert Bjornson 168 | and Yong Kong and Naoki Kitabayashi and Nitin Bhardwaj and Mark Rubin 169 | and Michael Snyder and Mark Gerstein}, 170 | title = {{AlleleSeq}: analysis of allele‐specific expression and binding in 171 | a network framework}, 172 | journal = {Molecular Systems Biology}, 173 | year = {2011}, 174 | volume = {7}, 175 | pages = {522}, 176 | owner = {jltsiren}, 177 | timestamp = {2014.07.13} 178 | } 179 | 180 | -------------------------------------------------------------------------------- /align_bwts.cpp: -------------------------------------------------------------------------------- 1 | /* 2 | Copyright (c) 2015, 2016, 2017 Genome Research Ltd. 3 | Copyright (c) 2014 Jouni Siren 4 | 5 | Author: Jouni Siren 6 | 7 | Permission is hereby granted, free of charge, to any person obtaining a copy 8 | of this software and associated documentation files (the "Software"), to deal 9 | in the Software without restriction, including without limitation the rights 10 | to use, copy, modify, merge, publish, distribute, sublicense, and/or sell 11 | copies of the Software, and to permit persons to whom the Software is 12 | furnished to do so, subject to the following conditions: 13 | 14 | The above copyright notice and this permission notice shall be included in all 15 | copies or substantial portions of the Software. 16 | 17 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR 18 | IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, 19 | FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE 20 | AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER 21 | LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, 22 | OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE 23 | SOFTWARE. 24 | */ 25 | 26 | #include 27 | #include 28 | 29 | #include "relative_fm.h" 30 | 31 | using namespace relative; 32 | 33 | //------------------------------------------------------------------------------ 34 | 35 | void mainLoop(int argc, char** argv, const align_parameters& parameters); 36 | 37 | //------------------------------------------------------------------------------ 38 | 39 | int 40 | main(int argc, char** argv) 41 | { 42 | // FIXME add support for mode_ropebwt 43 | if(argc < 3) 44 | { 45 | std::cerr << "Usage: align_bwts [parameters] reference target1 [target2 ...]" << std::endl; 46 | 47 | std::cerr << " -b N Set BWT block size to N (default " 48 | << align_parameters::BLOCK_SIZE << ")" << std::endl; 49 | std::cerr << " -d N Set maximum diagonal in LCS computation to N (default " 50 | << align_parameters::MAX_D << ")" << std::endl; 51 | std::cerr << " -l N Partition by patterns of length up to N (default " 52 | << align_parameters::MAX_LENGTH << ")" << std::endl; 53 | std::cerr << " -p Preallocate buffers for LCS computation" << std::endl; 54 | 55 | std::cerr << " -i Find a BWT-invariant subsequence that supports SA/ISA samples" << std::endl; 56 | std::cerr << std::endl; 57 | return 1; 58 | } 59 | 60 | align_parameters parameters; 61 | int c = 0; 62 | while((c = getopt(argc, argv, "b:d:l:pi")) != -1) 63 | { 64 | switch(c) 65 | { 66 | case 'b': 67 | parameters.block_size = atol(optarg); break; 68 | case 'd': 69 | parameters.max_d = atol(optarg); break; 70 | case 'l': 71 | parameters.max_length = atol(optarg); break; 72 | case 'p': 73 | parameters.preallocate = true; break; 74 | case 'i': 75 | parameters.invariant = true; 76 | if(parameters.sa_sample_rate == align_parameters::SA_SAMPLE_RATE) 77 | { 78 | parameters.sa_sample_rate = align_parameters::SECONDARY_SA_SAMPLE_RATE; 79 | } 80 | if(parameters.isa_sample_rate == align_parameters::ISA_SAMPLE_RATE) 81 | { 82 | parameters.isa_sample_rate = align_parameters::SECONDARY_ISA_SAMPLE_RATE; 83 | } 84 | break; 85 | case '?': 86 | return 2; 87 | default: 88 | return 3; 89 | } 90 | } 91 | 92 | std::cout << "Relative FM-index builder" << std::endl; 93 | std::cout << "Using OpenMP with " << omp_get_max_threads() << " threads" << std::endl; 94 | std::cout << std::endl; 95 | std::cout << "Algorithm: " << (parameters.invariant ? "invariant" : "partitioning") << std::endl; 96 | if(parameters.sa_sample_rate != 0) 97 | { 98 | std::cout << "SA sample rate: " << parameters.sa_sample_rate << std::endl; 99 | } 100 | if(parameters.isa_sample_rate != 0) 101 | { 102 | std::cout << "ISA sample rate: " << parameters.isa_sample_rate << std::endl; 103 | } 104 | if(!(parameters.invariant)) 105 | { 106 | std::cout << "Block size: " << parameters.block_size << std::endl; 107 | std::cout << "Maximum diagonal: " << parameters.max_d << std::endl; 108 | std::cout << "Maximum length: " << parameters.max_length << std::endl; 109 | std::cout << "Buffers: " << (parameters.preallocate ? "preallocated" : "on demand") << std::endl; 110 | } 111 | std::cout << std::endl; 112 | std::cout << "Reference: " << argv[optind] << std::endl; 113 | std::cout << std::endl; 114 | 115 | mainLoop(argc - optind, argv + optind, parameters); 116 | 117 | std::cout << "Memory usage: " << inGigabytes(memoryUsage()) << " GB" << std::endl; 118 | std::cout << std::endl; 119 | 120 | return 0; 121 | } 122 | 123 | //------------------------------------------------------------------------------ 124 | 125 | void 126 | mainLoop(int argc, char** argv, const align_parameters& parameters) 127 | { 128 | std::string ref_name = argv[0]; 129 | SimpleFM ref(ref_name); 130 | ref.reportSize(true); std::cout << std::endl; 131 | std::cout << std::endl; 132 | 133 | for(int arg = 1; arg < argc; arg++) 134 | { 135 | std::string seq_name = argv[arg]; 136 | std::cout << "Target: " << seq_name << std::endl; 137 | SimpleFM seq(seq_name); 138 | double start = readTimer(); 139 | RelativeFM rel(ref, seq, parameters, true); 140 | double seconds = readTimer() - start; 141 | std::cout << "Index built in " << seconds << " seconds" << std::endl; 142 | std::cout << std::endl; 143 | 144 | rel.writeTo(seq_name); 145 | seq.reportSize(true); std::cout << std::endl; 146 | rel.reportSize(true); std::cout << std::endl; 147 | std::cout << std::endl; 148 | } 149 | } 150 | 151 | //------------------------------------------------------------------------------ 152 | -------------------------------------------------------------------------------- /logs_old_rcst/test_rfm_3891909.log: -------------------------------------------------------------------------------- 1 | Start 2 | Fri Apr 17 14:36:03 BST 2015 3 | 4 | Testing the relative FM-index with assembled genomes 5 | 6 | Reference: human 7 | Target: paternal 8 | Threads: 8 9 | 10 | BWT construction 11 | Options: alphabet isa_sample_rate=64 lcp sa_sample_rate=17 12 | 13 | File: human 14 | Text size: 3095693981 15 | BWT built in 1194.14 seconds (2.47231 MB/s) 16 | Alphabet written to human.alpha 17 | BWT written to human.bwt 18 | Samples written to human.samples 19 | LCP array built in 807.314 seconds (3.65692 MB/s) 20 | LCP array written to human.lcp 21 | 22 | File: paternal 23 | Text size: 3036185259 24 | BWT built in 976.277 seconds (2.96589 MB/s) 25 | Alphabet written to paternal.alpha 26 | BWT written to paternal.bwt 27 | Samples written to paternal.samples 28 | LCP array built in 857.211 seconds (3.37785 MB/s) 29 | LCP array written to paternal.lcp 30 | 31 | Indexing the differential LCP array 32 | 33 | File: human 34 | LCP size: 3095693982 35 | DLCP width: 27 36 | SA width: 32 37 | Index built in 9574.21 seconds (0.308358 MB/s) 38 | 39 | Relative FM-index and LCP array builder 40 | Using OpenMP with 8 threads 41 | 42 | Algorithm: invariant 43 | Input format: plain 44 | SA sample rate: 257 45 | ISA sample rate: 512 46 | 47 | Reference: human 48 | 49 | BWT: 1120.49 MB (3.03628 bpc) 50 | SA samples: 694.655 MB (1.88235 bpc) 51 | ISA samples: 184.518 MB (0.5 bpc) 52 | Simple FM: 1999.67 MB (5.41863 bpc) 53 | 54 | LCP array: 3862.42 MB (10.4663 bpc) 55 | 56 | 57 | Target: paternal 58 | Reference size: 3095693982 59 | Target size: 3036185260 60 | Built the merging bitvector in 3577.95 seconds 61 | Matched 3017950511 positions in 8566.81 seconds 62 | Found a common subsequence of length 2979954409 in 994.341 seconds 63 | Built the bwt_lcs bitvectors and samples in 1705.79 seconds 64 | Index built in 14889.7 seconds 65 | 66 | BWT: 1090.24 MB (3.01219 bpc) 67 | SA samples: 681.302 MB (1.88235 bpc) 68 | ISA samples: 180.971 MB (0.5 bpc) 69 | Simple FM: 1952.51 MB (5.39455 bpc) 70 | 71 | ref_minus_lcs: 43.4159 MB (0.119953 bpc) 72 | seq_minus_lcs: 19.2443 MB (0.0531696 bpc) 73 | bwt_lcs: 189.84 MB (0.524505 bpc) 74 | text_lcs: 126.483 MB (0.349458 bpc) 75 | SA samples: 45.0667 MB (0.124514 bpc) 76 | ISA samples: 22.6214 MB (0.0625 bpc) 77 | Relative FM: 446.672 MB (1.2341 bpc) 78 | 79 | The RLZ parsing of the LCP array consists of 136538721 phrases 80 | Relative LCP array built in 8496.36 seconds 81 | 82 | LCP array: 3689.75 MB (10.1943 bpc) 83 | 84 | Phrases: 520.854 MB (1.43906 bpc) 85 | Blocks: 124.642 MB (0.344372 bpc) 86 | Samples: 196.319 MB (0.542406 bpc) 87 | Tree: 257.415 MB (0.711207 bpc) 88 | Relative LCP: 1099.23 MB (3.03704 bpc) 89 | 90 | 91 | Memory usage: 101412 MB 92 | 93 | Query test 94 | 95 | Reference: human 96 | Sequence: paternal 97 | Patterns: patterns 98 | 99 | Read 2000000 patterns of total length 64000000 100 | 101 | 102 | SimpleFM: 1952.51 MB (5.39455 bpc) 103 | SimpleFM: Found 1955515 patterns with 254933457 occ in 27.9538 seconds (2.18343 MB/s) 104 | 105 | RFM: 446.672 MB (1.2341 bpc) 106 | RFM: Found 1955515 patterns with 254933457 occ in 289.121 seconds (0.211106 MB/s) 107 | 108 | 109 | Memory usage: 2809.92 MB 110 | 111 | Query test 112 | 113 | Reference: human 114 | Sequence: paternal 115 | Patterns: patterns 116 | 117 | Read 2000000 patterns of total length 64000000 118 | 119 | 120 | Executing locate() queries. 121 | 122 | SimpleFM: 1952.51 MB (5.39455 bpc) 123 | SimpleFM: Found 1955515 patterns with 254933457 occ in 755.398 seconds (337483 occ/s) 124 | Hash of located positions: 8819409896968731320 125 | 126 | RFM: 446.672 MB (1.2341 bpc) 127 | RFM: Found 1955515 patterns with 254933457 occ in 2622.1 seconds (97224.8 occ/s) 128 | Hash of located positions: 8819409896968731320 129 | 130 | 131 | Memory usage: 2809.92 MB 132 | 133 | Query test 134 | 135 | Reference: human 136 | Sequence: paternal 137 | Patterns: patterns 138 | 139 | Read 2000000 patterns of total length 64000000 140 | 141 | 142 | Executing locate() queries. 143 | 144 | Verifying the results with extract() queries. 145 | 146 | SimpleFM: 1952.51 MB (5.39455 bpc) 147 | SimpleFM: Found 1955515 patterns with 254933457 occ in 1044.96 seconds (243964 occ/s) 148 | Hash of located positions: 8819409896968731320 149 | 150 | RFM: 446.672 MB (1.2341 bpc) 151 | RFM: Found 1955515 patterns with 254933457 occ in 3264.13 seconds (78101.5 occ/s) 152 | Hash of located positions: 8819409896968731320 153 | 154 | 155 | Memory usage: 2809.92 MB 156 | 157 | 158 | End 159 | Sat Apr 18 03:16:35 BST 2015 160 | 161 | 162 | ------------------------------------------------------------ 163 | Sender: LSF System 164 | Subject: Job 3891909: in cluster Done 165 | 166 | Job was submitted from host by user in cluster . 167 | Job was executed on host(s) <8*vr-4-1-10>, in queue , as user in cluster . 168 | was used as the home directory. 169 | was used as the working directory. 170 | Started at Fri Apr 17 14:36:03 2015 171 | Results reported at Sat Apr 18 03:16:35 2015 172 | 173 | Your job looked like: 174 | 175 | ------------------------------------------------------------ 176 | # LSBATCH: User input 177 | /nfs/users/nfs_j/js35/job_scripts/test_rfm human paternal 8 178 | ------------------------------------------------------------ 179 | 180 | Successfully completed. 181 | 182 | Resource usage summary: 183 | 184 | CPU time : 47835.60 sec. 185 | Max Memory : 101419 MB 186 | Average Memory : 27632.69 MB 187 | Total Requested Memory : 131072.00 MB 188 | Delta Memory : 29653.00 MB 189 | (Delta: the difference between total requested memory and actual max usage.) 190 | Max Swap : 105526 MB 191 | 192 | Max Processes : 4 193 | Max Threads : 12 194 | 195 | The output (if any) is above this job summary. 196 | 197 | 198 | 199 | PS: 200 | 201 | Read file for stderr output of this job. 202 | 203 | -------------------------------------------------------------------------------- /logs_old_rcst/test_rfm_3876245.log: -------------------------------------------------------------------------------- 1 | Start 2 | Tue Apr 14 12:56:26 BST 2015 3 | 4 | Testing the relative FM-index with assembled genomes 5 | 6 | Reference: human 7 | Target: maternal 8 | Threads: 8 9 | 10 | BWT construction 11 | Options: alphabet isa_sample_rate=64 lcp sa_sample_rate=17 12 | 13 | File: human 14 | Text size: 3095693981 15 | BWT built in 898.555 seconds (3.28559 MB/s) 16 | Alphabet written to human.alpha 17 | BWT written to human.bwt 18 | Samples written to human.samples 19 | LCP array built in 716.785 seconds (4.11878 MB/s) 20 | LCP array written to human.lcp 21 | 22 | File: maternal 23 | Text size: 3036191207 24 | BWT built in 1017.47 seconds (2.84582 MB/s) 25 | Alphabet written to maternal.alpha 26 | BWT written to maternal.bwt 27 | Samples written to maternal.samples 28 | LCP array built in 680.301 seconds (4.25626 MB/s) 29 | LCP array written to maternal.lcp 30 | 31 | Indexing the differential LCP array 32 | 33 | File: human 34 | LCP size: 3095693982 35 | DLCP width: 27 36 | SA width: 32 37 | Index built in 7238.68 seconds (0.407848 MB/s) 38 | 39 | Relative FM-index and LCP array builder 40 | Using OpenMP with 8 threads 41 | 42 | Algorithm: invariant 43 | Input format: plain 44 | SA sample rate: 257 45 | ISA sample rate: 512 46 | 47 | Reference: human 48 | 49 | BWT: 1120.49 MB (3.03628 bpc) 50 | SA samples: 694.655 MB (1.88235 bpc) 51 | ISA samples: 184.518 MB (0.5 bpc) 52 | Simple FM: 1999.67 MB (5.41863 bpc) 53 | 54 | LCP array: 3862.42 MB (10.4663 bpc) 55 | 56 | 57 | Target: maternal 58 | Reference size: 3095693982 59 | Target size: 3036191208 60 | Built the merging bitvector in 3475.35 seconds 61 | Matched 3032774943 positions in 8219.62 seconds 62 | Found a common subsequence of length 2979962648 in 778.584 seconds 63 | Built the bwt_lcs bitvectors and samples in 1636.81 seconds 64 | Index built in 14154.6 seconds 65 | 66 | BWT: 1090.24 MB (3.01219 bpc) 67 | SA samples: 681.303 MB (1.88235 bpc) 68 | ISA samples: 180.971 MB (0.5 bpc) 69 | Simple FM: 1952.51 MB (5.39455 bpc) 70 | 71 | ref_minus_lcs: 43.4075 MB (0.119929 bpc) 72 | seq_minus_lcs: 19.2637 MB (0.0532231 bpc) 73 | bwt_lcs: 189.846 MB (0.524521 bpc) 74 | text_lcs: 126.431 MB (0.349311 bpc) 75 | SA samples: 45.0667 MB (0.124514 bpc) 76 | ISA samples: 22.6214 MB (0.0625 bpc) 77 | Relative FM: 446.636 MB (1.234 bpc) 78 | 79 | The RLZ parsing of the LCP array consists of 136545925 phrases 80 | Relative LCP array built in 8609.61 seconds 81 | 82 | LCP array: 3689.77 MB (10.1944 bpc) 83 | 84 | Phrases: 520.881 MB (1.43913 bpc) 85 | Blocks: 124.647 MB (0.344384 bpc) 86 | Samples: 196.667 MB (0.543366 bpc) 87 | Tree: 257.812 MB (0.712303 bpc) 88 | Relative LCP: 1100.01 MB (3.03918 bpc) 89 | 90 | 91 | Memory usage: 101413 MB 92 | 93 | Query test 94 | 95 | Reference: human 96 | Sequence: maternal 97 | Patterns: patterns 98 | 99 | Read 2000000 patterns of total length 64000000 100 | 101 | 102 | SimpleFM: 1952.51 MB (5.39455 bpc) 103 | SimpleFM: Found 2000000 patterns with 254997642 occ in 28.2475 seconds (2.16072 MB/s) 104 | 105 | RFM: 446.636 MB (1.234 bpc) 106 | RFM: Found 2000000 patterns with 254997642 occ in 272.237 seconds (0.224198 MB/s) 107 | 108 | 109 | Memory usage: 2809.89 MB 110 | 111 | Query test 112 | 113 | Reference: human 114 | Sequence: maternal 115 | Patterns: patterns 116 | 117 | Read 2000000 patterns of total length 64000000 118 | 119 | 120 | Executing locate() queries. 121 | 122 | SimpleFM: 1952.51 MB (5.39455 bpc) 123 | SimpleFM: Found 2000000 patterns with 254997642 occ in 702.53 seconds (362970 occ/s) 124 | Hash of located positions: 11653246869206397622 125 | 126 | RFM: 446.636 MB (1.234 bpc) 127 | RFM: Found 2000000 patterns with 254997642 occ in 2590.91 seconds (98420.3 occ/s) 128 | Hash of located positions: 11653246869206397622 129 | 130 | 131 | Memory usage: 2809.89 MB 132 | 133 | Query test 134 | 135 | Reference: human 136 | Sequence: maternal 137 | Patterns: patterns 138 | 139 | Read 2000000 patterns of total length 64000000 140 | 141 | 142 | Executing locate() queries. 143 | 144 | Verifying the results with extract() queries. 145 | 146 | SimpleFM: 1952.51 MB (5.39455 bpc) 147 | SimpleFM: Found 2000000 patterns with 254997642 occ in 1037.57 seconds (245764 occ/s) 148 | Hash of located positions: 11653246869206397622 149 | 150 | RFM: 446.636 MB (1.234 bpc) 151 | RFM: Found 2000000 patterns with 254997642 occ in 2936.15 seconds (86847.6 occ/s) 152 | Hash of located positions: 11653246869206397622 153 | 154 | 155 | Memory usage: 2809.89 MB 156 | 157 | 158 | End 159 | Wed Apr 15 00:30:41 BST 2015 160 | 161 | 162 | ------------------------------------------------------------ 163 | Sender: LSF System 164 | Subject: Job 3876245: in cluster Done 165 | 166 | Job was submitted from host by user in cluster . 167 | Job was executed on host(s) <8*vr-4-1-08>, in queue , as user in cluster . 168 | was used as the home directory. 169 | was used as the working directory. 170 | Started at Tue Apr 14 12:56:26 2015 171 | Results reported at Wed Apr 15 00:30:41 2015 172 | 173 | Your job looked like: 174 | 175 | ------------------------------------------------------------ 176 | # LSBATCH: User input 177 | /nfs/users/nfs_j/js35/job_scripts/test_rfm human maternal 8 178 | ------------------------------------------------------------ 179 | 180 | Successfully completed. 181 | 182 | Resource usage summary: 183 | 184 | CPU time : 43720.10 sec. 185 | Max Memory : 101420 MB 186 | Average Memory : 25546.49 MB 187 | Total Requested Memory : 131072.00 MB 188 | Delta Memory : 29652.00 MB 189 | (Delta: the difference between total requested memory and actual max usage.) 190 | Max Swap : 105529 MB 191 | 192 | Max Processes : 4 193 | Max Threads : 12 194 | 195 | The output (if any) is above this job summary. 196 | 197 | 198 | 199 | PS: 200 | 201 | Read file for stderr output of this job. 202 | 203 | -------------------------------------------------------------------------------- /logs_old_rcst/test_rfm_3876247.log: -------------------------------------------------------------------------------- 1 | Start 2 | Tue Apr 14 12:56:40 BST 2015 3 | 4 | Testing the relative FM-index with assembled genomes 5 | 6 | Reference: female 7 | Target: maternal2 8 | Threads: 8 9 | 10 | BWT construction 11 | Options: alphabet isa_sample_rate=64 lcp sa_sample_rate=17 12 | 13 | File: female 14 | Text size: 3036320415 15 | BWT built in 1403.48 seconds (2.0632 MB/s) 16 | Alphabet written to female.alpha 17 | BWT written to female.bwt 18 | Samples written to female.samples 19 | LCP array built in 826.388 seconds (3.504 MB/s) 20 | LCP array written to female.lcp 21 | 22 | File: maternal2 23 | Text size: 3036191207 24 | BWT built in 1432.91 seconds (2.02074 MB/s) 25 | Alphabet written to maternal2.alpha 26 | BWT written to maternal2.bwt 27 | Samples written to maternal2.samples 28 | LCP array built in 786.793 seconds (3.68018 MB/s) 29 | LCP array written to maternal2.lcp 30 | 31 | Indexing the differential LCP array 32 | 33 | File: female 34 | LCP size: 3036320416 35 | DLCP width: 27 36 | SA width: 32 37 | Index built in 9654.12 seconds (0.29994 MB/s) 38 | 39 | Relative FM-index and LCP array builder 40 | Using OpenMP with 8 threads 41 | 42 | Algorithm: invariant 43 | Input format: plain 44 | SA sample rate: 257 45 | ISA sample rate: 512 46 | 47 | Reference: female 48 | 49 | BWT: 1090.28 MB (3.01219 bpc) 50 | SA samples: 681.332 MB (1.88235 bpc) 51 | ISA samples: 180.979 MB (0.5 bpc) 52 | Simple FM: 1952.6 MB (5.39454 bpc) 53 | 54 | LCP array: 3690.1 MB (10.1948 bpc) 55 | 56 | 57 | Target: maternal2 58 | Reference size: 3036320416 59 | Target size: 3036191208 60 | Built the merging bitvector in 3862.13 seconds 61 | Matched 3001221512 positions in 9288.6 seconds 62 | Found a common subsequence of length 2980010799 in 1182.16 seconds 63 | Built the bwt_lcs bitvectors and samples in 1974.71 seconds 64 | Index built in 16348.7 seconds 65 | 66 | BWT: 1090.24 MB (3.01219 bpc) 67 | SA samples: 681.303 MB (1.88235 bpc) 68 | ISA samples: 180.971 MB (0.5 bpc) 69 | Simple FM: 1952.51 MB (5.39455 bpc) 70 | 71 | ref_minus_lcs: 19.2911 MB (0.053299 bpc) 72 | seq_minus_lcs: 19.2473 MB (0.0531778 bpc) 73 | bwt_lcs: 163.138 MB (0.450729 bpc) 74 | text_lcs: 125.532 MB (0.34683 bpc) 75 | SA samples: 45.0667 MB (0.124514 bpc) 76 | ISA samples: 22.6214 MB (0.0625 bpc) 77 | Relative FM: 394.897 MB (1.09105 bpc) 78 | 79 | The RLZ parsing of the LCP array consists of 105051043 phrases 80 | Relative LCP array built in 7532.23 seconds 81 | 82 | LCP array: 3689.77 MB (10.1944 bpc) 83 | 84 | Phrases: 400.738 MB (1.10719 bpc) 85 | Blocks: 97.8164 MB (0.270254 bpc) 86 | Samples: 109.513 MB (0.302572 bpc) 87 | Tree: 141.778 MB (0.391715 bpc) 88 | Relative LCP: 749.846 MB (2.07173 bpc) 89 | 90 | 91 | Memory usage: 99510 MB 92 | 93 | Query test 94 | 95 | Reference: female 96 | Sequence: maternal2 97 | Patterns: patterns 98 | 99 | Read 2000000 patterns of total length 64000000 100 | 101 | 102 | SimpleFM: 1952.51 MB (5.39455 bpc) 103 | SimpleFM: Found 2000000 patterns with 254997642 occ in 31.879 seconds (1.91459 MB/s) 104 | 105 | RFM: 394.897 MB (1.09105 bpc) 106 | RFM: Found 2000000 patterns with 254997642 occ in 297.747 seconds (0.20499 MB/s) 107 | 108 | 109 | Memory usage: 2702.03 MB 110 | 111 | Query test 112 | 113 | Reference: female 114 | Sequence: maternal2 115 | Patterns: patterns 116 | 117 | Read 2000000 patterns of total length 64000000 118 | 119 | 120 | Executing locate() queries. 121 | 122 | SimpleFM: 1952.51 MB (5.39455 bpc) 123 | SimpleFM: Found 2000000 patterns with 254997642 occ in 801.668 seconds (318084 occ/s) 124 | Hash of located positions: 11653246869206397622 125 | 126 | RFM: 394.897 MB (1.09105 bpc) 127 | RFM: Found 2000000 patterns with 254997642 occ in 2523.89 seconds (101033 occ/s) 128 | Hash of located positions: 11653246869206397622 129 | 130 | 131 | Memory usage: 2702.03 MB 132 | 133 | Query test 134 | 135 | Reference: female 136 | Sequence: maternal2 137 | Patterns: patterns 138 | 139 | Read 2000000 patterns of total length 64000000 140 | 141 | 142 | Executing locate() queries. 143 | 144 | Verifying the results with extract() queries. 145 | 146 | SimpleFM: 1952.51 MB (5.39455 bpc) 147 | SimpleFM: Found 2000000 patterns with 254997642 occ in 949.243 seconds (268633 occ/s) 148 | Hash of located positions: 11653246869206397622 149 | 150 | RFM: 394.897 MB (1.09105 bpc) 151 | RFM: Found 2000000 patterns with 254997642 occ in 3234.93 seconds (78826.4 occ/s) 152 | Hash of located positions: 11653246869206397622 153 | 154 | 155 | Memory usage: 2702.04 MB 156 | 157 | 158 | End 159 | Wed Apr 15 01:55:11 BST 2015 160 | 161 | 162 | ------------------------------------------------------------ 163 | Sender: LSF System 164 | Subject: Job 3876247: in cluster Done 165 | 166 | Job was submitted from host by user in cluster . 167 | Job was executed on host(s) <8*vr-2-3-16>, in queue , as user in cluster . 168 | was used as the home directory. 169 | was used as the working directory. 170 | Started at Tue Apr 14 12:56:39 2015 171 | Results reported at Wed Apr 15 01:55:11 2015 172 | 173 | Your job looked like: 174 | 175 | ------------------------------------------------------------ 176 | # LSBATCH: User input 177 | /nfs/users/nfs_j/js35/job_scripts/test_rfm female maternal2 8 178 | ------------------------------------------------------------ 179 | 180 | Successfully completed. 181 | 182 | Resource usage summary: 183 | 184 | CPU time : 48577.51 sec. 185 | Max Memory : 99515 MB 186 | Average Memory : 27029.85 MB 187 | Total Requested Memory : 131072.00 MB 188 | Delta Memory : 31557.00 MB 189 | (Delta: the difference between total requested memory and actual max usage.) 190 | Max Swap : 103671 MB 191 | 192 | Max Processes : 4 193 | Max Threads : 12 194 | 195 | The output (if any) is above this job summary. 196 | 197 | 198 | 199 | PS: 200 | 201 | Read file for stderr output of this job. 202 | 203 | --------------------------------------------------------------------------------