├── Cargo.toml ├── README.md ├── examples └── ribbon.rs ├── fig ├── example.100.png ├── example.1000.png ├── example.10000.png └── udon.op.svg └── src ├── builder.rs ├── decoder.rs ├── index.rs ├── insbin.rs ├── lib.rs ├── op.rs ├── scaler.rs └── utils.rs /Cargo.toml: -------------------------------------------------------------------------------- 1 | 2 | [package] 3 | name = "udon" 4 | version = "0.1.0" 5 | authors = ["Hajime Suzuki "] 6 | edition = "2021" 7 | 8 | [dependencies] 9 | bitfield = "0.13" 10 | log = "0.4" 11 | 12 | 13 | [[example]] 14 | name = "ribbon" 15 | crate-type = ["bin"] 16 | 17 | [dev-dependencies] 18 | bam = "0.1" 19 | env_logger = "0.7" 20 | image = "0.23" 21 | structopt = "0.3" 22 | 23 | -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | # Udon — caching BAM CIGAR strings for visualization 2 | 3 | **Udon** is a tiny library transcoding [BAM CIGAR / MD strings](https://samtools.github.io/hts-specs/) and query sequence into a single augmented compressed CIGAR object. The augmented data structure, along with an index to locate substring positions, assists quick drawing of alignment ribbon of arbitrary span with arbitrary scaling. It achieves ~1 bit / column compression density and ~2G columns / sec. (per alignment) decompression throughput on typical real-world Illumina and Nanopore datasets. 4 | 5 | ### What Udon can do are: 6 | 7 | * Converting a BAM alignment record (both CIGAR and MD strings required) into a monolithic `Udon` object. 8 | * Keeping all the information of mismatches, insertions, and deletions in the `Udon` object, including what the query-side bases for mismatches or insertions are. 9 | * Slicing alignment record at arbitrary range, and decoding it into an alignment string (equivalent to an alignment string like `"MMMDDMMMCMMMTC"`; encoded as `UdonOp`s) or an alignment ribbon (an array of RGB-alpha's). 10 | * Fetching insertion sequence located at a specific position (which is indicated by `UdonOp::Ins` flag), and returning it as ASCII string. 11 | 12 | ## Examples 13 | 14 | ```Rust 15 | /* prepare scaler and color palette (10 columns (bases) per pixel) */ 16 | let scaler = UdonScaler::new(&UdonPalette::default(), 10.0); 17 | let base_color: [[[u8; 4]; 2]; 2] = [ 18 | [[255, 202, 191, 255], [255, 255, 255, 255]], 19 | [[191, 228, 255, 255], [255, 255, 255, 255]] 20 | ]; 21 | 22 | /* for each alignment... */ 23 | let mut record = Record::new(); 24 | while let Ok(true) = reader.read_into(&mut record) { 25 | if !record.flag().is_mapped() { continue; } 26 | 27 | /* construct indexed ribbon (udon) for the alignment record */ 28 | let cigar = record.cigar().raw(); 29 | let query = record.sequence().raw(); 30 | let mdstr = if let Some(TagValue::String(s, _)) = record.tags().get(b"MD") { s } else { panic!("") }; 31 | let udon = Udon::build(&cigar, &query, &mdstr).unwrap(); 32 | 33 | /* slice ribbon scaled */ 34 | let decode_range = Range:: { start: 0, end: udon.reference_span() }; 35 | let mut ribbon = udon.decode_scaled(&decode_range, 0.0, &scaler).unwrap(); 36 | 37 | /* put forward / reverse color then apply gamma correction */ 38 | ribbon.append_on_basecolor(&base_color[record.flag().is_reverse_strand() as usize]).correct_gamma(); 39 | 40 | /* here we obtained alignment ribbon in [RGBa8; 2] (= [[u8; 4]; 2]) array */ 41 | do_something_with(ribbon); 42 | } 43 | ``` 44 | 45 | Pileups at different scales, drawn by [ribbon.rs](https://github.com/ocxtal/udon/blob/devel/examples/ribbon.rs): 46 | 47 | ![0.15625 columns / pixel](./fig/example.100.png) 48 | 49 | ![1.5625 columns / pixel](./fig/example.1000.png) 50 | 51 | ![15.625 columns / pixel](./fig/example.10000.png) 52 | 53 | *Figure1: 100 (top), 1000 (middle), and 10000 (bottom) columns per 640 pixels.* 54 | 55 | ## Requirements 56 | 57 | * Rust >= **1.33.0** 58 | * **x86\_64** with SSE4.2 / AVX2 59 | 60 | *Note: The library is not yet published to crates.io. You would need to tell cargo to fetch it directly from github.com as `udon = "github.com/ocxtal/udon.git"`. See [Cargo.toml documentation](https://doc.rust-lang.org/cargo/reference/specifying-dependencies.html#specifying-dependencies-from-git-repositories) for the details.* 61 | 62 | ## APIs 63 | 64 | ### Construction (transcode) 65 | 66 | ```rust 67 | impl<'i, 'o> Udon<'o> { 68 | pub fn build(cigar: &'i [u32], packed_query: &'i [u8], mdstr: &'i [u8]) -> Option>>; 69 | pub fn build_alt(cigar: &'i [u32], packed_query_full: &'i [u8], mdstr: &'i [u8]) -> Option>>; 70 | } 71 | ``` 72 | 73 | Builds an `Udon` object. We assume `cigar`, `packed_query`, and `mdstr` are those parsed by [bam](https://docs.rs/bam/0.1.0/bam/) crate. The first function `build` is for complete BAM records with appropriate query field, and the second function `build_alt` is for records that lack their query sequence. In the latter case, the `packed_query_full` argument should be one parsed from the primary alignment that have full-length query sequence. See [example](https://github.com/ocxtal/udon/blob/devel/examples/ribbon.rs) for a simple usage of the `build` function. 74 | 75 | *Note: The two functions behave exactly the same for a primary alignment record whose overhangs are encoded as soft clips. The difference of two functions matters only when clips are hard.* 76 | 77 | ### Retrieving metadata 78 | 79 | ```rust 80 | impl<'o> Udon<'o> { 81 | pub fn reference_span(&self) -> usize; 82 | } 83 | ``` 84 | 85 | Returns reference side span, excluding soft- and hard-clips at the both ends. 86 | 87 | ### Decode 88 | 89 | ```rust 90 | /* decode Udon into a raw `UdonOp` array */ 91 | #[repr(u8)] 92 | pub enum UdonOp { 93 | MisA = 0x04 | 0x00, 94 | MisC = 0x04 | 0x01, 95 | MisG = 0x04 | 0x02, 96 | MisT = 0x04 | 0x03, 97 | Del = 0x08, 98 | Ins = 0x10 /* OR-ed with one of the others when a column has two edits (Ins-Del or Ins-Mismatch) */ 99 | } 100 | impl<'o> Udon<'o> { 101 | pub fn decode_raw(&self, ref_range: &Range) -> Option>; 102 | } 103 | 104 | /* decode Udon into an `UdonOp` array then scale it into an alignment ribbon of two RGBalpha channels */ 105 | pub type UdonColorPair = [[u8; 4]; 2]; /* [(R, G, B, alpha); 2] */ 106 | impl<'o> Udon<'o> { 107 | pub fn decode_scaled(&self, ref_range: &Range, offset_in_pixels: f64, scaler: &UdonScaler) -> Option>; 108 | } 109 | ``` 110 | 111 | Decodes `Udon` into an array. The array consists of alignment edits (for `decode_raw`; see `UdonOp` for the definition of the edits), or is alignment ribbon (an array of RGB-alpha pairs; for `decode_scaled`). 112 | 113 | For `decode_raw`, each column (one column per one reference-side base) keeps one or two alignment edit(s). When it has two edits, the first one is an insertion of one or more bases. The other (and single edit case) is either a single mismatch or deletion. When it expresses a mismatch, `UdonOp` distinguishes bases and reports what the query-side base was by two-bit flag. 114 | 115 | For `decode_scaled`, each array element keeps color for one or more alignment edits, depending on the value of the scaling factor (`columns_per_pixel` embedded in `UdonScaler`; see below for the details). For example, when `columns_per_pixel` is 3.0, each element is a blend of three edits. The `offset_in_pixels` adjust the fractional position of the ribbon when the start position (the first column) is not aligned to the multiple of the scaling factor. When `columns_per_pixel` is 3.0 and the start position (offset from the left boundary; measured on the reference side) is 10.0, the `offset_in_pixels` should be 0.3333... (= 10.0 % 3.0). 116 | 117 | *(Output of `decode_raw` should have been defined as `Option>`, and `UdonOp` as a bitfield, not an enum. I want to make this change for consistency and easiness, before the first stable release of this library.)* 118 | 119 | #### Color handling in scaled decoder 120 | 121 | ```rust 122 | pub type UdonColorPair = [[u8; 4]; 2]; /* [(r, g, b, alpha); 2] */ 123 | 124 | pub struct UdonPalette { 125 | /* all in [(r, g, b, alpha); 2] form; the larger alpha value, the more transparent */ 126 | background: UdonColorPair, 127 | del: UdonColorPair, 128 | ins: UdonColorPair, 129 | mismatch: [UdonColorPair; 4] 130 | } 131 | 132 | impl UdonScaler { 133 | pub fn new(color: &UdonPalette, columns_per_pixel: f64) -> UdonScaler; 134 | } 135 | 136 | impl UdonUtils for [UdonColorPair] { 137 | fn append_on_basecolor(&mut self, basecolor: &UdonColorPair) -> &mut Self; 138 | fn correct_gamma(&mut self) -> &mut Self; 139 | } 140 | ``` 141 | 142 | The scaled decoding API requires a scaler object, `UdonScaler`, that holds constants for scaling and coloring as the last argument. The object is created by `UdonScaler::new()` with a scaling factor and a palette. The scaling factor is defined as columns / pixel manner, and each color in the palette is in (R, G, B, -) form where R comes first. The output color array of the scaled decoder contains **negated sum** of the column colors, and **should be overlaid onto base color** using `append_on_basecolor`. Additionally it provides gamma correction function (gamma = 2.2) for direct use of the output array for plotting. 143 | 144 | *Note: Since `Udon` object itself is not tied to any scaling factor or color, we can make use of multiple scalers for different scaling factors and colors. For example, when we provide 16 magnification pitches for a visualization app, a natural implementation would be keeping 16 immutable `UdonScaler`s on an array and picking an appropriate one each time when the scale is changed.* 145 | 146 | #### Drawing deletion bars 147 | 148 | The two color channels can be used for drawing "deletion bars", as in Figure 1. The first channel renders deletion with white and the second with gray. The other colors being the same, the resulting channels are different only at deletions. Putting the second channel at the center of the ribbon makes the deletion drawn as bars. 149 | 150 | *(I'm planning to add another channel for insertion markers. To make the channel configuration more elastic, I'm waiting for the const generics, which is to be stable in Rust 2020.)* 151 | 152 | ### Querying insertion 153 | 154 | ```rust 155 | impl<'o> Udon<'o> { 156 | pub fn get_ins(&self, pos: usize) -> Option>; 157 | } 158 | ``` 159 | 160 | Returns inserted sequence at `UdonOp::Ins` position. Output sequence is encoded in ASCII. 161 | 162 | ## Augmented CIGAR string: format and decompression algorithm 163 | 164 | Udon constructs an "augmented CIGAR string" from the original BAM CIGAR, MD tag, and query-side sequence. The augmented CIGAR string is an extension of the original CIGAR string that explicitly expresses query-side bases at mismatch and insertion positions so that bases different or missing from the reference sequence are not lost. 165 | 166 | Udon uses a run-length encoding to keep the augmented CIGAR string in a compressed way. Up to thirty "events" of match, mismatch, insertion, or deletion are packed into a single byte, called "op." The series of events packed into a single op is called "chunk." 167 | 168 | ### Op structure 169 | 170 | Op is designed for fastest decompression and smallest footprint. Actual op structure is a pair of (bit-) fields, 3 bits for **leading event** and 5 bits for **total span**. 171 | 172 | * **Leading event** encodes one of mismatch, deletion, or insertion at the head of the chunk. Three bits from MSb are assigned to this field. 173 | * **0b000** represents **insertion**, it only indicates there is an inserted sequence between it and its preceeding chunk. The actual sequence is stored outside the op stream and retrieved by calling another API. 174 | * **0b001 - 0b011** represent **deletion**, it encodes the number of deleted columns: one to three bases. Deletion longer than three bases is divided into multiple ops. 175 | * **0b100 - 0b111** represent **mismatch**, it encodes a single base on query. Value 0x04 to 0x07 represent 'A', 'C', 'G', and 'T', respectively. 176 | * **Total span** is the reference sequence length that are covered by the chunk, including the leading event(s). Lower 5-bit is assinged to the field, expressing 0 to 30 columns. The value 31 is a special value for "continuing match", where its actual length is 30 and match events continue to the next chunk without any insertion, deletion, or mismatch. The value zero only appears when an insertion is followed by a mismatch or deletion. 177 | 178 | ![Transcoding process and Op structure](./fig/udon.op.svg) 179 | 180 | *Figure 2. Illustration of compression: the original (flat) augmented CIGAR string, built from alignment, is divided into chunks, and each chunk is encoded as an op, a pair of leading event and total span.* 181 | 182 | ### Expanding ops into ribbon 183 | 184 | Decompression is a procedure to convert op stream back to the original array of events. Since each op encodes up to 30 events in a run-length manner, the entire event array is obtained by concatenating successive results of op-to-chunk conversion. 185 | 186 | A trick for fast decompression is here; dividing the conversion procedure into the following two disjoint subprocedures: 187 | 188 | * **Composing 32-column-length event array:** A constant-length vector is constructed from the input op, with up to three leading "special" events and trailing (up to 31) match events. Since match event is represented by 0x00 in the output array, placing the trailing match events is just clearing the columns. The entire operation is done on SIMD registers within ten or less instructions. 189 | * **Clipping the array to the chunk length:** Clipping the constant-length vector is done *after* storing it to the output array by forwarding the output array pointer by the chunk length. 190 | 191 | Since the two procedures are independent, they are executed without waiting for each other. On most modern processors, the bottleneck is the vector construction process. Latencies for forwarding pointers and storing vectors are successfully hidden, because the operations are accompanied by no further operation and handled by huge store queue and store buffer. Taken together, it achieves burst conversion at around 10 cycles per loop on typical modern processors like AMD Zen2. 192 | 193 | ### Scaled decompression 194 | 195 | Scaling of alignment ribbon is essential for visualization. Udon provides unified decomression-and-scaling API that require no external intermediate buffer. The implementation is quite straightforward; it divides the queried range into multiple constant-length subranges (16 KB by default), and apply decompression and scaling procedures one by one using a single intermediate buffer. The default subrange length was determined so that the buffer won't spill out of the last-level (L3) data cache. Everything done on L3 cache, the conversion throughput won't be impaired. 196 | 197 | ## Benchmark 198 | 199 | `todo!();` 200 | 201 | ## FAQ 202 | 203 | ### Can Udon change color for a particular motif? 204 | 205 | Currently No. One reason is that Udon itself omits information of reference-side bases, though it's stored in the MD string. If a motif can be detected only from mismatched bases, it's possible on decoding. However, such functionality is not implemented yet. 206 | 207 | It might be possible to implement an auxilialy data structure, motif array, for conditional colorization. It should be an array of the reference-side sequence, where motif sequence is stored as is and the others are cleared. The motif array is compared to the decoded udon stream, and "modification" flag is set for a series of columns that have a specific match-mismatch pattern. The detected motif region can be colored differently, by extending the `UdonPalette` and `UdonScaler` to treat the modification flag properly. 208 | 209 | *(I would appreciate if anyone implement this feature.)* 210 | 211 | ### Can Udon draw a combined ribbon for paired-end reads? 212 | 213 | Currently No, but I'm thinking of adding an API to combine two or more ribbons after drawing. However, even if we have the API, the user need to manage which reads to be connected and how much the reads are apart. Udon itself remains a compression data structure and algorithm for a single alignment record, and unaware of links between different ones. 214 | 215 | ## First impression on writing SIMD codes in Rust (2020/9) 216 | 217 | This is my study project writing SIMD-intensive, optimized code in stable Rust. My first impression was that it was reasonably easy writing lean data structures and algorithms when I learned basic syntax of Rust (though compiler often complains lifetimes don't match). Composing and refactoring control flows was basically the same feel as in C, and sometimes Rust-specific syntax, such as `if let`, `while let`, and `break 'label`, made it a bit easier. 218 | 219 | However, I feel it somewhat difficult tuning Rust codes than C for now. It's largely because I'm not familiar with compiler options yet. But I also found compiler sometimes messes up main control flow and error handling paths, and places not-so-important paths among intensive blocks. It was easy in C compilers deal with such phenomenon, but seems not in Rust (due to thick abstraction layers?). I don't have a good answer to this yet, but at least I'm sure `alias rustc-asm=rustc -C opt-level=3 -C target-cpu=native --crate-type lib --emit asm` was useful digging deeper into the compiler behavior. 220 | 221 | It was pretty comfotable writing x86\_64 SIMD codes in Rust. Thanks to the community, `core::arch::x86_64` is well maintained. I also found it really good having some methods such as `count_ones`, `trailing_zeros`, and `from_le_bytes` by default on primitive integer types. I also note I didn't find any major flaw in code generation (I got what I expected for the innermost loops that have no `panic!` path). Taken together, I understood I could write basic SIMD codes with some bit manipulation hacks almost in the same way as in C. 222 | 223 | Writing Arm AdvSIMD (NEON) codes was much harder. It still requires nightly channel, and large proportion of intrinsics are yet to be implemented. I found what I could do for now is writing SIMD codes in C and calling them via FFI, but it's not a good way anyway. The best way is to contribute to `core::arch::aarch64`, but I need learn much more to do this. 224 | 225 | I have some more things what I want to explore further about SIMD-intensive codes. The most important one is global dispatching. It's already easy to have multiple implementations, such as SSE4.2 and AVX2 variants, for a function using `#[cfg(target_feature)]`, but it's still unclear how to put a dispatcher at arbitrary point, or how compile the entire codebase into multiple different features and put global dispatcher at the entry point. I believe I can do this writing an appropriate macro (proc_macro?) or build.rs, but at this moment I can't imagine the detail how to implement it. Others include margined allocator (for eliminating need for special scalar implementation at the tail of arrays) and vectorized iterator (coupled with the margined allocator), but I found I need to learn concepts and existing convensions about unsafe things before trying implement them. 226 | 227 | ## Copyright and License 228 | 229 | Hajime Suzuki (2020), licensed under MIT. 230 | 231 | -------------------------------------------------------------------------------- /examples/ribbon.rs: -------------------------------------------------------------------------------- 1 | use bam::record::tags::TagValue; 2 | use bam::{BamReader, Record, RecordReader}; 3 | use image::png::PngEncoder; 4 | use image::ColorType; 5 | /** 6 | @file ribbon.rs 7 | @brief example for udon, creates simple pileup from command line. 8 | 9 | @author Hajime Suzuki 10 | @license MIT 11 | */ 12 | use std::fs::File; 13 | use std::io::Write; 14 | use std::ops::Range; 15 | use std::path::PathBuf; 16 | use udon::{Udon, UdonPalette, UdonScaler, UdonUtils}; 17 | 18 | /* argument parsing */ 19 | extern crate structopt; 20 | use structopt::StructOpt; 21 | 22 | #[derive(Debug, StructOpt)] 23 | #[structopt(name = "ribbon", about = "udon example")] 24 | struct RibbonOpt { 25 | /* plot range; (chrom name, spos, epos), half-inclusive */ 26 | #[structopt(short, long, default_value = "")] 27 | reference: String, 28 | #[structopt(short, long, default_value = "0")] 29 | start: usize, 30 | #[structopt(short, long, default_value = "1000000000000")] 31 | end: usize, 32 | 33 | /* image dimensions */ 34 | #[structopt(short, long, default_value = "640")] 35 | width: usize, 36 | #[structopt(short, long, default_value = "480")] 37 | height: usize, 38 | #[structopt(short, long, default_value = "10")] 39 | margin: usize, 40 | 41 | /* output PNG filename */ 42 | #[structopt(short, long, default_value = "")] 43 | output: PathBuf, 44 | 45 | /* input BAM filename (single positional argument) */ 46 | input: Option, 47 | } 48 | 49 | /* utilities on ranges */ 50 | trait RangeUtils 51 | where 52 | Self: Sized, 53 | { 54 | fn has_overlap(&self, query: &Self) -> bool; 55 | fn clip(&self, query: &Self) -> Option; 56 | fn scale(&self, divisor: f64) -> (Self, f64); 57 | } 58 | 59 | impl RangeUtils for Range { 60 | fn has_overlap(&self, query: &Range) -> bool { 61 | if query.start > self.end { 62 | return false; 63 | } 64 | if self.start > query.end { 65 | return false; 66 | } 67 | return true; 68 | } 69 | 70 | fn clip(&self, query: &Range) -> Option> { 71 | /* clip query range by self (window). returns local range in the window */ 72 | if query.start > self.end { 73 | return None; 74 | } 75 | if self.start > query.end { 76 | return None; 77 | } 78 | 79 | Some(Range:: { 80 | start: query.start.saturating_sub(self.start), 81 | end: query.end.min(self.end).saturating_sub(self.start), 82 | }) 83 | } 84 | 85 | fn scale(&self, divisor: f64) -> (Range, f64) { 86 | let start = self.start as f64 / divisor; 87 | let offset = start.fract(); 88 | 89 | let range = Range:: { 90 | start: start as usize, 91 | end: (self.end as f64 / divisor).ceil() as usize, 92 | }; 93 | 94 | (range, offset) 95 | } 96 | } 97 | 98 | /* piling up alignment ribbons */ 99 | #[derive(Copy, Clone, Debug)] 100 | struct Dimension { 101 | x: usize, 102 | y: usize, 103 | } 104 | 105 | #[derive(Copy, Clone, Debug)] 106 | struct Border { 107 | thickness: usize, 108 | color: [u8; 4], 109 | } 110 | 111 | #[derive(Copy, Clone, Debug)] 112 | struct RibbonAttributes { 113 | height: usize, 114 | border: Border, 115 | } 116 | 117 | #[derive(Copy, Clone, Debug)] 118 | struct PileupParams { 119 | window: Dimension, 120 | margin: Dimension, 121 | border: Border, 122 | background: [u8; 4], 123 | fontsize: f64, 124 | ribbon: RibbonAttributes, 125 | } 126 | 127 | impl Default for PileupParams { 128 | fn default() -> PileupParams { 129 | PileupParams { 130 | window: Dimension { x: 640, y: 480 }, 131 | margin: Dimension { x: 5, y: 5 }, 132 | border: Border { 133 | thickness: 1, 134 | color: [128, 128, 128, 0], 135 | }, 136 | background: [255, 255, 255, 0], 137 | fontsize: 9.0, 138 | ribbon: RibbonAttributes { 139 | height: 5, 140 | border: Border { 141 | thickness: 1, 142 | color: [32, 32, 32, 0], 143 | }, 144 | }, 145 | } 146 | } 147 | } 148 | 149 | struct Pileup { 150 | buf: Vec, 151 | height: usize, 152 | params: PileupParams, 153 | } 154 | 155 | impl Pileup { 156 | fn new(params: &PileupParams) -> Self { 157 | let mut this = Pileup { 158 | buf: Vec::::new(), 159 | height: 0, 160 | params: *params, 161 | }; 162 | 163 | for _ in 0..this.params.margin.y { 164 | this.append_margin_row(); 165 | } 166 | this.append_border_row(); 167 | this 168 | } 169 | 170 | fn push(&mut self, ribbon: &[[[u8; 4]; 2]], horizontal_offset: usize) -> Option<()> { 171 | let background = self.params.background; 172 | let border = self.params.border.color; 173 | 174 | let left_blank = horizontal_offset; 175 | let right_blank = self 176 | .params 177 | .window 178 | .x 179 | .saturating_sub(ribbon.len() + horizontal_offset); 180 | let ribbon_len = self.params.window.x - (left_blank + right_blank); 181 | 182 | // println!("{:?}, {:?}, {:?}", left_blank, right_blank, ribbon_len); 183 | 184 | for i in 0..self.params.ribbon.height { 185 | let idx = if i == self.params.ribbon.height / 2 { 186 | 1 187 | } else { 188 | 0 189 | }; 190 | 191 | /* left margin */ 192 | self.fill(&background, self.params.margin.x); 193 | self.fill(&border, self.params.border.thickness); 194 | self.fill(&background, left_blank); 195 | 196 | /* body */ 197 | for &x in &ribbon[..ribbon_len] { 198 | self.buf.write(&x[idx][..3]).ok()?; 199 | } 200 | 201 | /* right margin */ 202 | self.fill(&background, right_blank); 203 | self.fill(&border, self.params.border.thickness); 204 | self.fill(&background, self.params.margin.x); 205 | self.height += 1; 206 | } 207 | Some(()) 208 | } 209 | 210 | fn finalize(&mut self) -> Option<()> { 211 | /* fill blanks */ 212 | for _ in self.height..self.params.window.y { 213 | self.append_blank_row(); 214 | } 215 | 216 | /* bottom border and margin */ 217 | self.append_border_row()?; 218 | for _ in 0..self.params.margin.y { 219 | self.append_margin_row()?; 220 | } 221 | Some(()) 222 | } 223 | 224 | fn render(&self, filename: &PathBuf) -> Option<()> { 225 | let output = File::create(filename).ok()?; 226 | let encoder = PngEncoder::new(output); 227 | 228 | // debug!("{:?}, {:?}, {:?}, {:?}", self.total_width(), self.total_height(), self.height, self.buf.len()); 229 | 230 | encoder 231 | .encode( 232 | &self.buf[..3 * self.total_width() * self.total_height()], 233 | self.total_width() as u32, 234 | self.total_height() as u32, 235 | ColorType::Rgb8, 236 | ) 237 | .ok()?; 238 | Some(()) 239 | } 240 | 241 | /* internal utilities */ 242 | fn total_width(&self) -> usize { 243 | self.params.window.x + 2 * (self.params.margin.x + self.params.border.thickness) 244 | } 245 | 246 | fn total_height(&self) -> usize { 247 | self.params.window.y + 2 * (self.params.margin.y + self.params.border.thickness) 248 | } 249 | 250 | fn fill(&mut self, color: &[u8; 4], size: usize) -> Option { 251 | for _ in 0..size { 252 | self.buf.write(&color[..3]).ok()?; 253 | } 254 | Some(size) 255 | } 256 | 257 | fn append_margin_row(&mut self) -> Option { 258 | let len = self.total_width(); 259 | let background = self.params.background; 260 | self.fill(&background, len); 261 | Some(len) 262 | } 263 | 264 | fn append_blank_row(&mut self) -> Option { 265 | let background = self.params.background; 266 | let border = self.params.border.color; 267 | 268 | self.fill(&background, self.params.margin.x); 269 | self.fill(&border, self.params.border.thickness); 270 | self.fill(&background, self.params.window.x); 271 | self.fill(&border, self.params.border.thickness); 272 | self.fill(&background, self.params.margin.x); 273 | Some(self.total_width()) 274 | } 275 | 276 | fn append_border_row(&mut self) -> Option { 277 | let background = self.params.background; 278 | let border = self.params.border.color; 279 | 280 | self.fill(&background, self.params.margin.x); 281 | self.fill( 282 | &border, 283 | self.params.window.x + 2 * self.params.border.thickness, 284 | ); 285 | self.fill(&background, self.params.margin.x); 286 | Some(self.total_width()) 287 | } 288 | } 289 | 290 | fn main() { 291 | env_logger::init(); 292 | 293 | /* parse args */ 294 | let opt = RibbonOpt::from_args(); 295 | let filename = opt.input.unwrap(); 296 | 297 | /* then open file */ 298 | let mut reader = BamReader::from_path(&filename, 2).expect( 299 | format!( 300 | "Failed to open file `{:?}'. Please check the file exists.", 301 | &filename 302 | ) 303 | .as_str(), 304 | ); 305 | 306 | /* get reference sequence id, then extract valid range */ 307 | let id = reader.header().reference_id(&opt.reference).unwrap_or(0); 308 | let window = Range:: { 309 | start: opt.start, 310 | end: opt 311 | .end 312 | .min(reader.header().reference_len(id).unwrap() as usize), 313 | }; 314 | 315 | /* prepare ribbon scaler and color */ 316 | let columns_per_pixel = window.len() as f64 / opt.width as f64; 317 | let scaler = UdonScaler::new(&UdonPalette::default(), columns_per_pixel); 318 | let base_color: [[[u8; 4]; 2]; 2] = [ 319 | [[255, 202, 191, 255], [255, 202, 191, 255]], 320 | [[191, 228, 255, 255], [191, 228, 255, 255]], 321 | ]; 322 | 323 | /* everything successful; create PNG buffer */ 324 | let mut pileup = Pileup::new(&PileupParams { 325 | window: Dimension { 326 | x: opt.width, 327 | y: opt.height, 328 | }, 329 | ..PileupParams::default() 330 | }); 331 | 332 | /* for each alignment... */ 333 | let mut record = Record::new(); 334 | while let Ok(true) = reader.read_into(&mut record) { 335 | if !record.flag().is_mapped() { 336 | continue; 337 | } 338 | 339 | /* construct indexed ribbon (udon) */ 340 | let udon = Udon::build( 341 | record.cigar().raw(), 342 | record.sequence().raw(), 343 | if let Some(TagValue::String(s, _)) = record.tags().get(b"MD") { s } else { 344 | panic!("Each BAM record must have MD string. Inspect `samtools calmd` for restoring missing MD strings.") 345 | } 346 | ).expect(&format!("Failed to create udon index. Would be a bug. ({:?})", &record.name())); 347 | 348 | /* compose span, skip if out of the window */ 349 | let range = Range:: { 350 | start: record.start() as usize, 351 | end: record.start() as usize + udon.reference_span(), 352 | }; 353 | if !window.has_overlap(&range) { 354 | continue; 355 | } 356 | 357 | /* compute local ranges */ 358 | let udon_range = range.clip(&window).unwrap(); 359 | let window_range = window.clip(&range).unwrap(); 360 | if 3 * window_range.len() < window.len() { 361 | continue; 362 | } 363 | 364 | let (window_range, offset_in_pixel) = window_range.scale(columns_per_pixel); 365 | 366 | /* slice ribbon scaled */ 367 | let mut ribbon = udon 368 | .decode_scaled(&udon_range, offset_in_pixel, &scaler) 369 | .expect(&format!( 370 | "Failed to decode udon ribbon. Would be a bug. ({:?})", 371 | &record.name() 372 | )); 373 | 374 | /* put forward / reverse color then do gamma correction */ 375 | ribbon 376 | .append_on_basecolor(&base_color[record.flag().is_reverse_strand() as usize]) 377 | .correct_gamma(); 378 | 379 | /* then pileup; break when buffer is full */ 380 | pileup.push(&ribbon, window_range.start); 381 | // println!("{:?}, {:?}, {}", udon_range, window_range, offset_in_pixel); 382 | } 383 | 384 | /* done!!! */ 385 | pileup.finalize(); 386 | pileup 387 | .render(&opt.output) 388 | .expect(format!("failed to dump image to `{:?}'", &opt.output).as_str()); 389 | } 390 | -------------------------------------------------------------------------------- /fig/example.100.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/ocxtal/udon/f48a1ed6885ee30ea4a763cfd1ce99314be26976/fig/example.100.png -------------------------------------------------------------------------------- /fig/example.1000.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/ocxtal/udon/f48a1ed6885ee30ea4a763cfd1ce99314be26976/fig/example.1000.png -------------------------------------------------------------------------------- /fig/example.10000.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/ocxtal/udon/f48a1ed6885ee30ea4a763cfd1ce99314be26976/fig/example.10000.png -------------------------------------------------------------------------------- /fig/udon.op.svg: -------------------------------------------------------------------------------- 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 11 | 12 | 13 | 14 | 15 | 16 | 17 | 18 | 19 | 20 | 21 | 22 | 23 | 24 | 25 | 26 | 27 | 28 | 29 | 30 | 31 | 32 | 33 | 34 | 35 | 36 | 37 | 38 | 39 | 40 | 0x03,  0x85, 0xC0, 0x04, 0x60, 0x24 41 | 42 | 43 | (Nothing, 3=), (A, 5=), (G, 0=), (2I, 4=), (^GCG, 0=), (^G, 4=) 44 | 45 | 46 | CAGTCATAAC--TTAA 47 | GCGG 48 | AGGA 49 | CAG 50 | A 51 | CATAA 52 | G 53 | AC 54 | TTAA----AGGA 55 | 56 | 57 | 3=A5=G2I4=^GCGG4= 58 | 59 | 60 | 61 | 62 | Augmented CIGAR 63 | 64 | 65 | Alignment 66 | 67 | 68 | Chunked augmented 69 | CIGAR 70 | one of (Nothing, Matches) 71 | (A mismatch, Matches) 72 | (Insertion, Matches) 73 | (Deletion up to 3, Matches) 74 | 75 | 76 | Encoded chunked augmented 77 | CIGAR (Udon) 78 | 79 | 80 | 81 | 82 | 83 | 84 | 85 | 86 | 87 | -------------------------------------------------------------------------------- /src/builder.rs: -------------------------------------------------------------------------------- 1 | use super::index::{Block, Index, BLOCK_PITCH}; 2 | use super::insbin::InsTracker; 3 | use super::op::{op_len, Cigar, CigarOp, CompressMark, IntoOpsIterator}; 4 | use super::utils::{atoi_unchecked, transcode_base_unchecked, SlicePrecursor, Writer}; 5 | use std::io::Write; 6 | use std::mem::{align_of, forget, size_of, transmute}; 7 | use std::ops::Range; 8 | use std::ptr::copy_nonoverlapping; 9 | use std::slice::{Iter, IterMut}; 10 | // use std::str::from_utf8; 11 | 12 | /* builder APIs */ 13 | impl<'a, 'b> Index<'a> { 14 | pub(super) fn build( 15 | cigar: &'b [u32], 16 | packed_query: &'b [u8], 17 | is_full: bool, 18 | mdstr: &'b [u8], 19 | ) -> Option>> { 20 | /* 21 | Note: 22 | This function creates box for `Udon` but it has a flaw that the created box has 23 | a trailer for the object which is not visible from compiler. The current runtime 24 | does not cause problem around this because the current allocator (`GlobalAlloc`) 25 | ignores memory layout information. If the future allocator change this behavior 26 | to explicitly check the sanity of the layout, it collapses by something like 27 | "unmatched free size" error. 28 | 29 | (so what to do?) 30 | */ 31 | 32 | /* size_of::() == size_of::(); for compatibility with bam streamed reader */ 33 | let cigar = unsafe { transmute::<&'b [u32], &'b [Cigar]>(cigar) }; 34 | 35 | Self::build_core(cigar, packed_query, is_full, mdstr) 36 | } 37 | 38 | fn build_core( 39 | cigar: &'b [Cigar], 40 | packed_query: &'b [u8], 41 | is_full: bool, 42 | mdstr: &'b [u8], 43 | ) -> Option>> { 44 | let mut buf = Vec::::new(); 45 | 46 | /* header always at the head */ 47 | let range = buf.reserve_to(1, |header: &mut [Index], _: &[u8]| { 48 | header[0] = Default::default(); 49 | 1 50 | }); 51 | assert!(range.start == 0); 52 | assert!(range.end == size_of::()); 53 | 54 | return match Precursor::build_core(buf, cigar, packed_query, is_full, mdstr) { 55 | (_, None) => None, 56 | (buf, Some(precursor)) => Some(Self::compose_box(buf, precursor)), 57 | }; 58 | } 59 | 60 | fn make_arena_aligned(buf: &mut Vec, align: usize) -> usize { 61 | let size = ((buf.len() + align - 1) / align) * align; 62 | buf.resize(size, 0); 63 | size 64 | } 65 | 66 | fn compose_box(buf: Vec, precursor: Precursor) -> Box> { 67 | let mut buf = buf; 68 | let size = Self::make_arena_aligned(&mut buf, align_of::()); 69 | 70 | /* compose pointer-adjusted header on stack */ 71 | let base: *mut u8 = buf.as_mut_ptr(); 72 | let header = unsafe { Self::from_precursor_raw(base as *const u8, &precursor) }; 73 | 74 | /* compose box and copy header into it */ 75 | let mut udon = unsafe { 76 | /* convert buffer (on heap) to box */ 77 | let mut udon = Box::>::from_raw(transmute::<*mut u8, *mut Index<'a>>(base)); 78 | 79 | /* copy header from stack to heap (box) */ 80 | let src = &header as *const Index<'a>; 81 | let dst = &mut udon as &mut Index<'a> as *mut Index<'a>; 82 | copy_nonoverlapping(src, dst, 1); 83 | 84 | udon 85 | }; 86 | 87 | /* save size for proper memory handling */ 88 | udon.size = size; 89 | 90 | /* heap block inside buf was moved to udon so we have to release buf */ 91 | forget(buf); /* note: this allowed outside unsafe */ 92 | 93 | udon 94 | } 95 | 96 | #[allow(dead_code)] 97 | pub(super) unsafe fn from_precursor(buf: &Vec, precursor: Precursor) -> Index<'a> { 98 | let base: *const u8 = buf.as_ptr(); 99 | 100 | Self::from_precursor_raw(base, &precursor) 101 | } 102 | 103 | unsafe fn from_precursor_raw(base: *const u8, precursor: &Precursor) -> Index<'a> { 104 | Index::<'a> { 105 | /* just copy */ 106 | size: precursor.size, 107 | ref_span: precursor.ref_span, 108 | 109 | /* adjusting pointer */ 110 | op: precursor.op.finalize_raw(base), 111 | block: precursor.block.finalize_raw(base), 112 | ins: precursor.ins.finalize_raw(base), 113 | } 114 | } 115 | } 116 | 117 | /* strip_clips 118 | 119 | bam CIGAR string sometimes has clipping ops at the head and tail, and they need removed 120 | before conversion to augumented CIGAR. This function does this, and returns cigar slice 121 | without clips along with clipped lengths. 122 | 123 | Clipped length is defined as the number of bases to be removed from the matched query 124 | sequence. Thus (head, tail) = (0, 0) is returned for soft-clipped alignment. 125 | */ 126 | fn is_clip(c: Cigar) -> Option<(bool, usize)> { 127 | let op = c.op(); 128 | 129 | if op != CigarOp::SoftClip as u32 && op != CigarOp::HardClip as u32 { 130 | /* not a clip */ 131 | return None; 132 | } 133 | 134 | /* (is_clip, length) */ 135 | Some((op == CigarOp::SoftClip as u32, c.len() as usize)) 136 | } 137 | 138 | #[derive(Copy, Clone, Default)] 139 | struct Clip { 140 | is_soft: bool, 141 | len: usize, 142 | } 143 | 144 | fn strip_clips(cigar: &[Cigar]) -> Option<(&[Cigar], Clip)> { 145 | /* strip tail first */ 146 | if cigar.is_empty() { 147 | return None; 148 | } /* cigar length must be at least one */ 149 | let cigar = match is_clip(cigar[cigar.len() - 1]) { 150 | None => cigar, 151 | Some((_, _)) => cigar.split_last()?.1, 152 | }; 153 | 154 | /* strip head */ 155 | if cigar.is_empty() { 156 | return None; 157 | } /* check again */ 158 | match is_clip(cigar[0]) { 159 | None => Some(( 160 | cigar, 161 | Clip { 162 | is_soft: false, 163 | len: 0, 164 | }, 165 | )), 166 | Some(x) => Some(( 167 | cigar.split_first()?.1, 168 | Clip { 169 | is_soft: x.0, 170 | len: x.1, 171 | }, 172 | )), 173 | } 174 | } 175 | 176 | /* copy 4bit-encoded bases, for storing insertions */ 177 | fn copy_packed_nucl(src: &[u8], dst: &mut [u8], ofs: usize, len: usize) { 178 | // debug!("copy, ofs({}), len({})", ofs, len); 179 | 180 | /* very very very very very very very very very naive way though fine */ 181 | for i in 0..len { 182 | let pos = ofs + i; 183 | 184 | /* bam packed nucl is big endian */ 185 | let c = if (pos & 0x01) == 0 { 186 | src[pos / 2] >> 4 187 | } else { 188 | src[pos / 2] & 0x0f 189 | }; 190 | 191 | /* store to dst in little endian */ 192 | if (i & 0x01) == 0 { 193 | dst[i / 2] = c; 194 | } else { 195 | dst[i / 2] |= c << 4; 196 | } 197 | 198 | // debug!("push {} at {}, ({}, {})", c, i, pos, src[pos / 2]); 199 | } 200 | 201 | // debug!("{:?}", &dst[0 .. (len + 1) / 2]); 202 | } 203 | 204 | /* Precursor 205 | 206 | Precursor is type that has some invalid pointers (slices) but the others are all sane. 207 | The invalid pointer has an offset from a certain memory block, which may be reallocated 208 | along with the object construction. The offset is converted to valid pointer, by adding 209 | base pointer to the offset, at the very last of the object construction. 210 | */ 211 | #[repr(C)] 212 | #[derive(Copy, Clone, Debug, Default)] 213 | pub(super) struct Precursor { 214 | size: usize, 215 | ref_span: usize, 216 | op: SlicePrecursor, 217 | block: SlicePrecursor, 218 | ins: SlicePrecursor, 219 | } 220 | 221 | /* build Precursor using Builder internally */ 222 | impl<'a> Precursor { 223 | fn build_core( 224 | buf: Vec, 225 | cigar: &'a [Cigar], 226 | packed_query: &'a [u8], 227 | is_full: bool, 228 | mdstr: &'a [u8], 229 | ) -> (Vec, Option) { 230 | // debug!("start"); 231 | // debug!("{:?}", from_utf8(&mdstr)); 232 | 233 | /* save initial offset for unwinding */ 234 | let base_offset = buf.len(); 235 | 236 | /* compose working variables */ 237 | let (cigar, clip) = match strip_clips(cigar) { 238 | None => { 239 | return (buf, None); 240 | } /* just return buffer (nothing happened) */ 241 | Some((cigar, clip)) => (cigar, clip), 242 | }; 243 | 244 | /* 245 | initial offset for query sequence, zero for hard-clipped 246 | (or alignment starts from the very beginning of the query) 247 | 248 | qofs becomes non-zero when alignment is soft-clipped or query sequence is fetched from 249 | the primary record (the primary alignment generally keeps full-length query sequence) 250 | */ 251 | let qofs = if is_full || clip.is_soft { clip.len } else { 0 }; 252 | 253 | /* initialize state */ 254 | let mut state = Builder::<'a> { 255 | buf, /* move */ 256 | ins: Vec::new(), 257 | cigar: cigar.iter(), /* iterator for slice is a pair of pointers (ptr, tail) */ 258 | mdstr: mdstr.iter(), 259 | query: packed_query, 260 | qofs, 261 | }; 262 | 263 | /* if error detected, unwind destination vector and return it (so that the vector won't lost) */ 264 | macro_rules! unwrap_or_unwind { ( $expr:expr ) => { 265 | match $expr { 266 | /* equivalent to tail error handling in C? (I'm not sure how rustc treat this...) */ 267 | None => { 268 | // debug!("unwinding"); 269 | let mut buf = state.buf; 270 | buf.resize(base_offset, 0); /* unwind buffer for cleaning up errored sequence */ 271 | return (buf, None); 272 | }, 273 | Some(val) => val 274 | } 275 | }} 276 | 277 | /* parse input op array */ 278 | let op = unwrap_or_unwind!(state.parse_cigar()); 279 | let ref_span = state.calc_reference_span(&op); 280 | 281 | /* pack ins vector */ 282 | let ins = unwrap_or_unwind!(state.pack_ins()); 283 | 284 | /* build index for op and ins arrays */ 285 | let block = unwrap_or_unwind!(state.build_index(ref_span, &op, &ins)); 286 | 287 | /* everything done; compose precursor */ 288 | let precursor = state.finalize(ref_span, &op, &ins, &block); 289 | 290 | /* return back ownership of the buffer */ 291 | let buf = state.buf; 292 | (buf, Some(precursor)) 293 | } 294 | 295 | #[allow(dead_code)] 296 | pub unsafe fn build( 297 | buf: Vec, 298 | cigar: &'a [u32], 299 | packed_query: &'a [u8], 300 | is_full: bool, 301 | mdstr: &'a [u8], 302 | ) -> (Vec, Option) { 303 | /* for compatibility with bam streamed reader */ 304 | let cigar = transmute::<&'a [u32], &'a [Cigar]>(cigar); 305 | 306 | let (buf, precursor) = Self::build_core(buf, cigar, packed_query, is_full, mdstr); 307 | /* 308 | match precursor { 309 | None => { debug!("None"); }, 310 | Some(_) => { debug!("Some"); } 311 | }; 312 | */ 313 | (buf, precursor) 314 | } 315 | } 316 | 317 | /* 318 | /* UdonVec builder, see the comment above */ 319 | struct UdonPrecursorVec { 320 | buf: Vec, 321 | precursors: Vec 322 | } 323 | struct UdonVec { 324 | buf: Vec, 325 | precursors: Vec 326 | } 327 | 328 | impl<'a, 'b> UdonVec<'a, 'b> { 329 | /* returns reference side span */ 330 | pub fn append(&mut self, cigar: &'b [u32], packed_query: &'b [u8], mdstr: &'b [u8]) -> Option { 331 | 332 | /* move buf to push precursor content */ 333 | let (buf, precursor) = unsafe { Precursor::build(self.buf, cigar, packed_query, mdstr) }; 334 | 335 | /* put back */ 336 | self.buf = buf; 337 | let precursor = precursor?; 338 | let ref_span = precursor.ref_span; 339 | self.precursors.push(precursor); 340 | 341 | return Some(ref_span); 342 | } 343 | 344 | unsafe fn from_precursor_vec(buf: Vec, precursors: Vec) -> Vec> { 345 | 346 | let base: *const u8 = buf.as_ptr(); 347 | for mut precursor in precursors.iter_mut() { /* consume? */ 348 | 349 | /* compose udon header on stack */ 350 | let header = Udon::<'a>::from_precursor_raw(base, &precursor); 351 | 352 | /* copy back udon on stack to vec */ 353 | let src = &header as *const Udon<'a>; 354 | let dst = &mut precursor as &mut Precursor as *mut Precursor; 355 | 356 | let dst = transmute::<*mut Precursor, *mut Udon<'a>>(dst); 357 | copy_nonoverlapping(src, dst, 1); 358 | } 359 | 360 | /* is there simpler way? */ 361 | let ptr = buf.as_mut_ptr(); 362 | let len = buf.len(); 363 | let cap = buf.capacity(); 364 | 365 | Vec::>::from_raw_parts(ptr as *mut Udon<'a>, len, cap) 366 | 367 | } 368 | 369 | pub fn freeze(self) -> Vec> { 370 | 371 | /* get precursor count before merging two vectors */ 372 | let mut precursors = self.precursors; 373 | let count = precursors.len() / size_of::(); 374 | 375 | /* buf is consumed */ 376 | let buf = self.buf; 377 | (&mut precursors).reserve_to(buf.len(), |arr: &mut [u8], _: &[u8]| -> usize { 378 | arr.copy_from_slice(&buf); 379 | }); 380 | 381 | precursors.forget(); 382 | } 383 | 384 | } 385 | */ 386 | 387 | /* transcoder state object */ 388 | pub struct Builder<'a> { 389 | buf: Vec, 390 | ins: Vec, 391 | cigar: Iter<'a, Cigar>, 392 | mdstr: Iter<'a, u8>, 393 | query: &'a [u8], 394 | qofs: usize, 395 | } 396 | 397 | impl<'a> Builder<'a> { 398 | /* 399 | * input parsers 400 | */ 401 | 402 | /* imitate iterator on 4-bit packed nucleotide sequence */ 403 | fn next_base(&mut self) -> u32 { 404 | let ofs = self.qofs; 405 | self.qofs += 1; 406 | 407 | // debug!("qlen({}), ofs({})", self.query.len(), ofs); 408 | assert!( 409 | ofs / 2 < self.query.len(), 410 | "qlen({}), ofs({})", 411 | self.query.len(), 412 | ofs 413 | ); 414 | 415 | let c = self.query[ofs / 2]; 416 | if (ofs & 0x01) == 0x01 { 417 | return transcode_base_unchecked(c & 0x0f); 418 | } 419 | transcode_base_unchecked(c >> 4) 420 | } 421 | 422 | fn cigar_rem(&self) -> usize { 423 | self.cigar.as_slice().len() 424 | } 425 | 426 | /* 427 | We need cigar iterator peekable, but Iter::Peekable implementation is redundant 428 | for slice::Iter. What we really need is peeking the head by .as_slice()[0]. 429 | */ 430 | fn peek_cigar_op(&self) -> Option { 431 | if self.cigar_rem() == 0 { 432 | return None; 433 | } 434 | 435 | return Some(self.cigar.as_slice()[0].op()); 436 | } 437 | 438 | fn canonize_op(op: u32) -> u32 { 439 | if op >= CigarOp::Eq as u32 { 440 | return CigarOp::Match as u32; 441 | } 442 | op 443 | } 444 | 445 | fn eat_cigar(&mut self) -> Option<(u32, usize)> { 446 | let c = self.cigar.next()?; 447 | 448 | let op = Self::canonize_op(c.op()); 449 | let mut len = c.len() as usize; 450 | // debug!("first op({:?}), len({:?})", op, len); 451 | 452 | while self.cigar_rem() > 0 { 453 | let next_op = Self::canonize_op(self.peek_cigar_op()?); 454 | if next_op != op { 455 | break; 456 | } 457 | 458 | let c = self.cigar.next()?; 459 | len += c.len() as usize; 460 | 461 | // debug!("append op({:?}), len({:?})", c.op(), c.len()); 462 | } 463 | 464 | Some((op, len)) 465 | } 466 | 467 | /* MD string handling: forward pointer along with atoi */ 468 | fn strip_zero(&mut self) { 469 | let md = self.mdstr.as_slice(); 470 | if md.is_empty() || md[0] != b'0' { 471 | return; 472 | } 473 | 474 | let md = &mut self.mdstr; 475 | md.next(); 476 | } 477 | 478 | fn eat_md_del(&mut self, len: usize) -> Option<()> { 479 | self.mdstr.nth(len - 1)?; /* error if starved */ 480 | self.mdstr.next(); /* not regarded as error for the last element */ 481 | Some(()) 482 | } 483 | 484 | fn eat_md_eq(&mut self) -> usize { 485 | atoi_unchecked(&mut self.mdstr) as usize 486 | } 487 | 488 | /* make MD iterator peekable */ 489 | fn is_double_mismatch(&self) -> bool { 490 | let md = self.mdstr.as_slice(); 491 | 492 | /* if the last op is mismatch, MD string ends like "A0" */ 493 | if md.len() < 3 { 494 | return false; 495 | } 496 | 497 | /* 498 | note: deletion after mismatch is encoded as follows: 499 | 500 | ...A0^TTT... 501 | ^ mismatch 502 | ^ deletion 503 | 504 | so the `is_double_mismatch` function must see if the char after 505 | zero is '^' or nucleotide to properly distinguish double mismatch 506 | from deletion-after-mismatch. (see `test_udon_build_mismatch_del`) 507 | */ 508 | md[1] == b'0' && md[2] != b'^' 509 | } 510 | 511 | /* writers, on top of impl Writer for Vec */ 512 | fn push_op(&mut self, match_len: usize, marker: u32) { 513 | assert!(marker < 0x08); 514 | assert!(match_len < 0x20); 515 | 516 | // debug!("push_op, len({}), marker({})", match_len, marker); 517 | 518 | let op = (marker << 5) as u8 | match_len as u8; 519 | // assert!(op != 0); 520 | 521 | self.buf.reserve_to(1, |arr: &mut [u8], _: &[u8]| -> usize { 522 | arr[0] = op; 523 | 1 524 | }); 525 | } 526 | 527 | fn push_match(&mut self, match_len: usize, last_op: u32) { 528 | let mut op = last_op; 529 | let mut rem = match_len; 530 | 531 | /* divide into 30-column chunks (maximum chunk length for an op is 30) */ 532 | while rem > 30 { 533 | self.push_op(31, op); /* 31 is alias for 30-column chunk with "continuous" flag */ 534 | op = 0; 535 | rem -= 30; 536 | } 537 | self.push_op(rem, op); /* push the last */ 538 | } 539 | 540 | /* forward both cigar and md strings */ 541 | fn eat_del(&mut self) -> Option { 542 | let (op, len) = self.eat_cigar()?; 543 | assert!(op == CigarOp::Del as u32); 544 | 545 | self.strip_zero(); 546 | self.eat_md_del(len)?; 547 | 548 | /* 3 columns at maximum for deletion per chunk */ 549 | let mut rem = len; 550 | while rem > 3 { 551 | self.push_op(3, 3); /* deletion length == 3, chunk length == 3 */ 552 | rem -= 3; 553 | } 554 | 555 | // debug!("eat_del, len({}), rem({})", len, rem); 556 | 557 | assert!(rem > 0); 558 | Some(rem as u32)/* remainder is concatenated to the next match into the next chunk */ 559 | } 560 | 561 | fn eat_ins(&mut self, xrem: usize) -> Option { 562 | assert!(self.qofs <= std::i32::MAX as usize); 563 | 564 | let (op, len) = self.eat_cigar()?; 565 | assert!(op == CigarOp::Ins as u32); 566 | 567 | /* forward query offset */ 568 | let ofs = self.qofs; 569 | self.qofs += len; 570 | // debug!("eat_ins, xrem({}), len({}), qofs({})", xrem, len, self.qofs); 571 | 572 | self.save_ins(ofs, len); /* use offset before forwarding */ 573 | Some(xrem)/* there might be remainder */ 574 | } 575 | 576 | /* forward match; eating both cigar and md strings */ 577 | fn eat_match(&mut self, xrem: usize, last_op: u32) -> Option { 578 | /* consumes only when the op is match. works as an escape route for del after ins or ins after del */ 579 | let op = Self::canonize_op(self.peek_cigar_op()?); 580 | let is_valid = op == CigarOp::Match as u32; 581 | // if is_valid { self.cigar.next()?; } 582 | 583 | /* forward qofs before adjusting xrem with the previous op */ 584 | self.qofs += xrem; 585 | 586 | /* adjust crem and xrem; last_op == 0 for ins, > 0 for del */ 587 | let mut crem = if is_valid { 588 | self.eat_cigar().unwrap().1 589 | } else { 590 | 0 591 | } + last_op as usize; 592 | let mut xrem = xrem + last_op as usize; /* might continues from the previous cigar op, possibly insertion */ 593 | let mut op = last_op; /* insertion, deletion, or mismatch */ 594 | 595 | // debug!("eat_match, crem({}), xrem({})", crem, xrem); 596 | 597 | while xrem < crem { 598 | /* xrem < crem indicates this op (cigar span) is interrupted by mismatch(es) at the middle */ 599 | self.push_match(xrem, op); 600 | crem -= xrem; 601 | // debug!("eat_match mismatch?, crem({}), xrem({}), qofs({})", crem, xrem, self.qofs); 602 | 603 | while self.is_double_mismatch() { 604 | // debug!("eat_match, crem({}), md({:?}, {:?})", crem, from_utf8(&self.mdstr.as_slice()[.. self.mdstr.as_slice().len().min(32)]), &self.mdstr.as_slice()[.. self.mdstr.as_slice().len().min(32)]); 605 | 606 | let c = self.next_base(); 607 | self.push_op(1, c); /* this chunk contains only a single mismatch */ 608 | self.mdstr.nth(1)?; 609 | crem -= 1; 610 | } 611 | // debug!("eat_match, crem({}), md({:?}, {:?})", crem, from_utf8(&self.mdstr.as_slice()[.. self.mdstr.as_slice().len().min(32)]), &self.mdstr.as_slice()[.. self.mdstr.as_slice().len().min(32)]); 612 | 613 | op = self.next_base(); /* we only have a single mismatch remaining, will be combined to succeeding matches */ 614 | self.mdstr.next()?; 615 | 616 | /* 617 | xrem for the next { match, insertion } region, including +1 for the last mismatch. 618 | adjustment is already done on crem, by not decrementing it for the last mismatch. 619 | */ 620 | xrem = self.eat_md_eq() + 1; 621 | self.qofs += xrem - 1; 622 | // debug!("eat_match, updated xrem, crem({}), xrem({})", crem, xrem); 623 | } 624 | 625 | self.push_match(crem, op); /* tail match; length is the remainder of crem */ 626 | xrem -= crem; 627 | self.qofs -= xrem; 628 | 629 | // debug!("eat_match, done, crem({}), xrem({}), qofs({}), cigar({})", crem, xrem, self.qofs, self.cigar_rem()); 630 | 631 | /* invariant condition: if match to reference (md) continues, an insertion must follow */ 632 | assert!(xrem == 0 || self.peek_cigar_op().unwrap() == CigarOp::Ins as u32); 633 | Some(xrem)/* nonzero if insertion follows */ 634 | } 635 | 636 | /* insertion bin handling */ 637 | fn save_ins(&mut self, qofs: usize, qlen: usize) { 638 | if qlen == 0 { 639 | return; 640 | } /* must be corrupted cigar, but safe to ignore */ 641 | 642 | /* copy subsequence; 4-bit packed */ 643 | let packet_size = (qlen + 1) / 2 + 1; 644 | let query = self.query; 645 | self.ins 646 | .reserve_to(packet_size, |arr: &mut [u8], _: &[u8]| -> usize { 647 | copy_packed_nucl(query, arr, qofs, qlen); 648 | arr[arr.len() - 1] = 0; 649 | 650 | packet_size 651 | }); 652 | } 653 | 654 | /* parse raw CIGAR into augumented CIGAR, takes clip-stripped raw CIGAR, which must be stored in Self. */ 655 | fn parse_cigar(&mut self) -> Option> { 656 | let base_offset = self.buf.len(); 657 | let mut xrem = 0; 658 | 659 | /* 660 | CIGAR might start with insertion (possible in short-read alignment).cigar 661 | in such case, we need a dummy ins marker at the head of op array, 662 | because dummy ins is placed for normal CIGAR (that begins with match) and is 663 | skipped on decoding. we have to tell the decoder the head insertion must not 664 | be ignored by adding one more insertion. 665 | */ 666 | let op = Self::canonize_op(self.peek_cigar_op()?); 667 | // debug!("c({}), rem({})", op, self.cigar_rem()); 668 | if op == CigarOp::Ins as u32 { 669 | xrem = self.eat_md_eq(); 670 | 671 | /* op iterator not forwarded here */ 672 | self.push_op(0, CompressMark::Ins as u32); 673 | self.ins.write_all(&[0]).ok()?; /* dummy insertion marker for this */ 674 | } else if op == CigarOp::Match as u32 { 675 | xrem = self.eat_md_eq(); 676 | 677 | /* 678 | the first op consists of dummy insertion and match 679 | (the insertion is real one for a CIGAR that starts with insertion. see above.) 680 | */ 681 | xrem = self.eat_match(xrem, CompressMark::Ins as u32)?; 682 | self.ins.write_all(&[0]).ok()?; /* dummy insertion marker at the head */ 683 | } 684 | 685 | 'outer: loop { 686 | macro_rules! peek_or_break { 687 | ( $self: expr ) => {{ 688 | match $self.peek_cigar_op() { 689 | None => { 690 | break 'outer; 691 | } 692 | Some(x) => Self::canonize_op(x), 693 | } 694 | }}; 695 | } 696 | 697 | /* deletion-match pair */ 698 | loop { 699 | let op = peek_or_break!(self); 700 | if op != CigarOp::Del as u32 { 701 | break; 702 | } 703 | // debug!("op({}), rem({})", op, self.cigar_rem()); 704 | 705 | /* the CIGAR ends with deletion; must be treated specially */ 706 | if self.cigar_rem() < 2 { 707 | break 'outer; 708 | } 709 | 710 | /* push deletion-match pair, then parse next eq length */ 711 | let op = self.eat_del()?; 712 | xrem = self.eat_md_eq(); 713 | // debug!("eat_del done, op({}), xrem({})", op, xrem); 714 | xrem = self.eat_match(xrem, op)?; 715 | } 716 | 717 | /* it's insertion-match pair when it appeared not be deletion-match */ 718 | let op = peek_or_break!(self); 719 | if op != CigarOp::Ins as u32 { 720 | return None; /* if not, we regard it broken */ 721 | } 722 | // debug!("op({}), remaining cigars({})", op, self.cigar_rem()); 723 | 724 | /* the CIGAR ends with insertion; must be treated specially */ 725 | if self.cigar_rem() < 2 { 726 | break 'outer; 727 | } 728 | 729 | /* push insertion-match pair, update eq length remainder */ 730 | xrem = self.eat_ins(xrem)?; 731 | xrem = self.eat_match(xrem, CompressMark::Ins as u32)?; 732 | } 733 | 734 | /* CIGAR ends with isolated insertion or deletion */ 735 | if self.cigar_rem() > 0 { 736 | let op = Self::canonize_op(self.peek_cigar_op()?); 737 | // debug!("c({})", op); 738 | 739 | if op == CigarOp::Del as u32 { 740 | let op = self.eat_del()?; 741 | self.push_op(op as usize, op); 742 | } else if op == CigarOp::Ins as u32 { 743 | self.eat_ins(xrem)?; 744 | } 745 | } 746 | 747 | /* range in output buffer, for composing precursor */ 748 | Some(Range:: { 749 | start: base_offset, 750 | end: self.buf.len(), 751 | }) 752 | } 753 | 754 | /* accumulate chunk lengths to compute reference span (for computing index table size) */ 755 | fn calc_reference_span(&self, op_range: &Range) -> usize { 756 | let ops = &self.buf[op_range.start..op_range.end]; 757 | 758 | ops.iter().fold(0, |a, &x| a + op_len(x)) 759 | } 760 | 761 | /* construct index for blocks */ 762 | fn push_block(dst: &mut IterMut, ins_offset: usize, op_offset: usize, op_skip: usize) { 763 | let bin = match dst.next() { 764 | None => { 765 | return; 766 | } 767 | Some(bin) => bin, 768 | }; 769 | bin.set_ins_offset(ins_offset as u64); 770 | bin.set_op_offset(op_offset as u64); 771 | bin.set_op_skip(op_skip as u64); 772 | } 773 | 774 | fn build_index( 775 | &mut self, 776 | ref_span: usize, 777 | op_range: &Range, 778 | ins_range: &Range, 779 | ) -> Option> { 780 | /* block pitch must be 2^n */ 781 | assert!(BLOCK_PITCH.next_power_of_two() == BLOCK_PITCH); 782 | 783 | /* rip buffer for disjoint ownership */ 784 | let buf = &mut self.buf; 785 | 786 | /* index for regular pitch on span */ 787 | let block_count = (ref_span + BLOCK_PITCH - 1) / BLOCK_PITCH; 788 | let range = buf.reserve_to(block_count, |block: &mut [Block], base: &[u8]| { 789 | /* src: both are &[u8] and placed within base */ 790 | let ops = (&base[op_range.start..op_range.end]).iter_ops(0); 791 | let mut ins = InsTracker::new(0, &base[ins_range.start..ins_range.end]); 792 | 793 | /* prepare dst. put head boundary info for simplicity */ 794 | let mut dst = block.iter_mut(); 795 | Self::push_block(&mut dst, 0, 0, 0); 796 | 797 | /* I want this loop be more lightweight... */ 798 | let mut rbnd: usize = BLOCK_PITCH; 799 | for (i, (x, rpos)) in ops.enumerate() { 800 | let rem = rbnd - (rpos - op_len(x)); 801 | 802 | /* forward insertion array */ 803 | let iofs = ins.get_offset(); 804 | ins.forward(x); 805 | 806 | /* if forwarded rpos doexn't exceed the next boundary, just skip this op */ 807 | if rpos <= rbnd { 808 | continue; 809 | } 810 | rbnd += BLOCK_PITCH; 811 | 812 | /* boundary found; save block info */ 813 | Self::push_block(&mut dst, iofs, i, rem); 814 | } 815 | 816 | let dst = dst.into_slice(); 817 | assert!(dst.is_empty(), "len({})", dst.len()); 818 | block.len() 819 | }); 820 | Some(range) 821 | } 822 | 823 | fn pack_ins(&mut self) -> Option> { 824 | /* rip vectors from self for ownership */ 825 | let buf = &mut self.buf; 826 | let ins = &self.ins; 827 | 828 | /* just copy */ 829 | let range = buf.reserve_to(ins.len(), |arr: &mut [u8], _: &[u8]| { 830 | arr.copy_from_slice(ins.as_slice()); 831 | ins.len() 832 | }); 833 | Some(range) 834 | } 835 | 836 | /* convert internal object (Builder) to Precursor */ 837 | fn finalize( 838 | &self, 839 | ref_span: usize, 840 | op: &Range, 841 | ins: &Range, 842 | block: &Range, 843 | ) -> Precursor { 844 | assert!(op.end <= ins.start); 845 | assert!(ins.end <= block.start); 846 | 847 | assert!(((block.end - block.start) % size_of::()) == 0); 848 | 849 | /* entire range on buffer for this object */ 850 | let range = Range:: { 851 | start: op.start, 852 | end: ins.end, 853 | }; 854 | 855 | /* convert range to slice */ 856 | Precursor { 857 | /* just save */ 858 | size: range.end - range.start, 859 | ref_span, 860 | 861 | /* compose fake slices (dereference causes SEGV) */ 862 | op: SlicePrecursor::compose(op), 863 | block: SlicePrecursor::compose(block), 864 | ins: SlicePrecursor::compose(ins), 865 | } 866 | } 867 | } 868 | -------------------------------------------------------------------------------- /src/decoder.rs: -------------------------------------------------------------------------------- 1 | use super::index::Index; 2 | use super::op::{op_is_cont, op_len, op_marker, CompressMark}; 3 | use super::scaler::Scaler; 4 | use super::utils::SimdAlignedU8; 5 | use super::UdonOp; 6 | use std::io::Write; 7 | use std::ops::Range; 8 | 9 | /* architecture-dependent stuffs */ 10 | #[cfg(all(target_arch = "x86_64", target_feature = "ssse3"))] 11 | use core::arch::x86_64::*; 12 | 13 | #[cfg(all(target_arch = "aarch64", target_feature = "neon"))] 14 | use core::arch::aarch64::*; 15 | 16 | /* Udon builder and ribbon slicing APIs 17 | 18 | We have two choices for building Udon object: `Udon::build` and `UdonVec::append` in a 19 | safe way. 20 | 21 | The first funcion `build` has the simplest interface. It's good to start from this one 22 | since the signature, taking CIGAR, MD, and query sequence slices and returning Box> 23 | looks quite natural as a Rust-native library. 24 | 25 | The others are for memory-optimized applications. Since boxing udon object and creating 26 | lots of them makes heap fragmented, even if the allocator (GlobalAlloc / jemalloc) does 27 | its best. Thinking that the number of objects sometimes reaches as many as several million 28 | for visualization apps, it's important to prevent such fragmentation for smaller memory 29 | consumption and better performance. 30 | 31 | `UdonVec` is for this purpose, creating multiple udon objects in a single monolithic 32 | region of memory. It has two states, mutable for building objects in it and immutable 33 | for using the objects. The mutable vec can be converted to immutable one by `freeze` method. 34 | 35 | 36 | We also provide unsafe APIs, `Precursor::build` and `Udon::from_precursor`. This two 37 | methods were originally implemented for use in `UdonVec::append` and `UdonVec::freeze` each. 38 | The first function creates `Precursor` object, which is almost equivalent to `Udon` 39 | except that pointers in slices are not valid. The invalid pointer in slice actually 40 | retains an offset from the memory region. Being offset, not an absolute pointer, it allows 41 | reallocation (on-the-fly extension) of the memory region without breaking consistency, 42 | which is important for creating multiple objects in a single region one by one. The 43 | offsets are converted to valid slices at once in the `freeze` procedure. 44 | 45 | The raw unsafe APIs `Precursor::build` and `Udon::from_precursor` are useful if we 46 | want to pack udon objects in a more complex way, not a flat vector. It is especially 47 | useful when we design a data structure with additional metadata for visualization, by 48 | keeping `Udon` object as one member of larger struct. 49 | 50 | Please make sure the `Precursor::build` and `Udon::from_precursor` are unsafe APIs. 51 | The unsafeness comes from the requirement that a vector precursor was built and a vector 52 | the precursor is freezed are the same one, which can't be forced by the type system. 53 | If the two vector don't match, the created `Udon` objects are definitely broken and 54 | causes exception. 55 | */ 56 | 57 | /* decoder implementation */ 58 | impl<'a> Index<'a> { 59 | pub(super) fn decode_raw_into( 60 | &self, 61 | dst: &mut Vec, 62 | ref_span: &Range, 63 | ) -> Option { 64 | self.check_span(ref_span)?; 65 | self.decode_core(dst, ref_span) 66 | } 67 | 68 | pub(super) fn decode_raw(&self, ref_span: &Range) -> Option> { 69 | self.check_span(ref_span)?; 70 | 71 | let size = ref_span.end - ref_span.start; 72 | let mut buf = Vec::::with_capacity(size); 73 | 74 | let used = self.decode_core(&mut buf, ref_span)?; 75 | buf.resize(used, 0); 76 | 77 | Some(buf) 78 | } 79 | 80 | pub(super) fn decode_scaled_into( 81 | &self, 82 | dst: &mut Vec<[[u8; 4]; 2]>, 83 | ref_span: &Range, 84 | offset_in_pixels: f64, 85 | scaler: &Scaler, 86 | ) -> Option { 87 | self.check_span(ref_span)?; 88 | self.decode_scaled_core(dst, ref_span, offset_in_pixels, scaler) 89 | } 90 | 91 | pub(super) fn decode_scaled( 92 | &self, 93 | ref_span: &Range, 94 | offset_in_pixels: f64, 95 | scaler: &Scaler, 96 | ) -> Option> { 97 | self.check_span(ref_span)?; 98 | 99 | let span = ref_span.end - ref_span.start; 100 | let size = scaler.expected_size(span); 101 | let mut buf = Vec::<[[u8; 4]; 2]>::with_capacity(size); 102 | // debug!("ref_span({:?}), span({}), size({})", ref_span, span, size); 103 | 104 | let used = self.decode_scaled_core(&mut buf, ref_span, offset_in_pixels, scaler)?; 105 | buf.resize(used, [[0; 4]; 2]); 106 | // debug!("used({})", used); 107 | 108 | Some(buf) 109 | } 110 | 111 | const DEL_MASK: SimdAlignedU8 = { 112 | let mut x = [0u8; 16]; 113 | x[0] = UdonOp::Del as u8; 114 | x[1] = UdonOp::Del as u8; 115 | x[2] = UdonOp::Del as u8; 116 | SimdAlignedU8::new(&x) 117 | }; 118 | 119 | #[allow(dead_code)] 120 | const SCATTER_MASK: SimdAlignedU8 = { 121 | let mut x = [0x80u8; 16]; 122 | x[0] = 0; 123 | x[1] = 0; 124 | x[2] = 0; 125 | SimdAlignedU8::new(&x) 126 | }; 127 | 128 | const IS_DEL_THRESH: SimdAlignedU8 = { 129 | let mut x = [0xffu8; 16]; 130 | x[0] = 0x1f; 131 | x[1] = 0x3f; 132 | x[2] = 0x5f; 133 | SimdAlignedU8::new(&x) 134 | }; 135 | 136 | #[cfg(all(target_arch = "x86_64", target_feature = "ssse3"))] 137 | unsafe fn decode_core_block(dst: &mut [u8], op: u8, ins: u32) -> (usize, u32) { 138 | /* load constants; expelled out of the innermost loop when inlined */ 139 | let del_mask = _mm_load_si128(Self::DEL_MASK.as_ref() as *const [u8; 16] as *const __m128i); 140 | let scatter_mask = 141 | _mm_load_si128(Self::SCATTER_MASK.as_ref() as *const [u8; 16] as *const __m128i); 142 | let is_del_thresh = 143 | _mm_load_si128(Self::IS_DEL_THRESH.as_ref() as *const [u8; 16] as *const __m128i); 144 | 145 | /* compute deletion mask */ 146 | let xop = _mm_cvtsi64_si128(op as i64); /* copy op to xmm */ 147 | let xop = _mm_shuffle_epi8(xop, scatter_mask); /* [op, op, op, 0, ...] */ 148 | let is_del = _mm_cmpgt_epi8(xop, is_del_thresh); /* signed comparison */ 149 | 150 | /* compute mismatch / insertion mask */ 151 | let marker = if op_marker(op) == CompressMark::Ins as u32 { 152 | ins 153 | } else { 154 | op_marker(op) 155 | }; 156 | let ins_mismatch_mask = _mm_cvtsi64_si128(marker as i64); 157 | 158 | /* merge deletion / insertion-mismatch vector */ 159 | let merged = _mm_blendv_epi8(ins_mismatch_mask, del_mask, is_del); 160 | 161 | _mm_storeu_si128(&mut dst[0] as *mut u8 as *mut __m128i, merged); 162 | _mm_storeu_si128(&mut dst[16] as *mut u8 as *mut __m128i, _mm_setzero_si128()); 163 | 164 | /* 165 | compute forward length; 31 is "continuous marker" 166 | rop-to-rop critical path length is 6 167 | */ 168 | let next_ins = if op_is_cont(op) { 169 | 0 170 | } else { 171 | UdonOp::Ins as u32 172 | }; /* next ins will be masked if 0x1f */ 173 | let adjusted_len = op_len(op); 174 | 175 | // debug!("{:#x}, {:#x}, {:#x}, {:#x}", op>>5, ins, marker, op_len(op)); 176 | // debug!("{}, {:?}", adjusted_len, transmute::<__m128i, [u8; 16]>(merged)); 177 | 178 | (adjusted_len, next_ins) 179 | } 180 | 181 | #[cfg(all(target_arch = "aarch64", target_feature = "neon"))] 182 | unsafe fn decode_core_block(dst: &mut [u8], op: u8, ins: u32) -> (usize, u32) { 183 | /* the following code correspond one-to-one to the x86_64 implementation above */ 184 | let del_mask = vld1q_u8(Self::DEL_MASK.as_ref() as *const u8); 185 | let is_del_thresh = vld1q_s8(Self::IS_DEL_THRESH.as_ref() as *const u8 as *const i8); 186 | 187 | let xop = [op, op, op, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]; 188 | let xop = vld1q_s8(&xop as *const u8 as *const i8); 189 | let is_del = vcgtq_s8(xop, is_del_thresh); /* signed comparison */ 190 | 191 | let marker = if op_marker(op) == CompressMark::Ins as u32 { 192 | ins 193 | } else { 194 | op_marker(op) 195 | }; 196 | let ins_mismatch_mask = vsetq_lane_u8(marker as u8, vmovq_n_u8(0), 0); 197 | 198 | let merged = vbslq_u8(is_del, del_mask, ins_mismatch_mask); 199 | let zero = vmovq_n_u8(0); 200 | vst1q_u8(&mut dst[0] as *mut u8, merged); 201 | vst1q_u8(&mut dst[16] as *mut u8, zero); 202 | 203 | let next_ins = if op_is_cont(op) { 204 | 0 205 | } else { 206 | UdonOp::Ins as u32 207 | }; /* next ins will be masked if 0x1f */ 208 | let adjusted_len = op_len(op); 209 | (adjusted_len, next_ins) 210 | } 211 | 212 | fn decode_core(&self, dst: &mut Vec, ref_span: &Range) -> Option { 213 | /* 214 | we suppose the ref_span is sane. 215 | linear polling from block head, then decode until the end of the span 216 | */ 217 | let len = ref_span.end - ref_span.start; 218 | let (ops, offset) = self.scan_op_array(ref_span.start); 219 | 220 | /* working variables */ 221 | let mut buf: [u8; 96] = [0; 96]; 222 | let mut ops = ops.iter(); 223 | let mut ins = 0; // Op::Ins as u64; 224 | let mut ofs = offset; 225 | let mut rem = len; 226 | 227 | if rem > 32 { 228 | /* decode head block */ 229 | let op = ops.next()?; 230 | let (block_len, next_ins) = unsafe { Self::decode_core_block(&mut buf, *op, ins) }; 231 | 232 | let block_len = block_len - ofs; 233 | dst.write_all(&buf[ofs..ofs + block_len]).ok()?; /* I expect it never fails though... */ 234 | rem -= block_len; 235 | ins = next_ins; /* just copy to mutable variable for use in the loop */ 236 | ofs = 0; /* clear offset for tail */ 237 | 238 | /* decode body blocks */ 239 | while rem > 32 { 240 | let op = ops.next()?; 241 | let (block_len, next_ins) = unsafe { Self::decode_core_block(&mut buf, *op, ins) }; 242 | 243 | dst.write_all(&buf[0..block_len]).ok()?; 244 | rem -= block_len; 245 | ins = next_ins; 246 | } 247 | } 248 | 249 | /* tail */ 250 | { 251 | let end = ofs + rem; 252 | let mut ofs = 0; 253 | while ofs < end { 254 | let op = ops.next()?; 255 | let (block_len, next_ins) = 256 | unsafe { Self::decode_core_block(&mut buf[ofs..], *op, ins) }; 257 | 258 | ofs += block_len; 259 | ins = next_ins; 260 | } 261 | 262 | dst.write_all(&buf[end - rem..end]).ok()?; 263 | } 264 | 265 | Some(len) 266 | } 267 | 268 | fn decode_scaled_core( 269 | &self, 270 | dst: &mut Vec<[[u8; 4]; 2]>, 271 | ref_span: &Range, 272 | offset_in_pixels: f64, 273 | scaler: &Scaler, 274 | ) -> Option { 275 | /* states (working variables) */ 276 | let (mut offset, margin) = scaler.init(offset_in_pixels); 277 | let dst = dst; 278 | 279 | // println!("offset({}), margin({})", offset, margin); 280 | 281 | /* buffer */ 282 | let bulk_size: usize = 16 * 1024; /* 16KB */ 283 | let mut buf = vec![0; margin]; 284 | buf.reserve(bulk_size); 285 | 286 | for spos in ref_span.clone().step_by(bulk_size) { 287 | /* decode a block */ 288 | let decoded = self.decode_core( 289 | &mut buf, 290 | &Range:: { 291 | start: spos, 292 | end: ref_span.end.min(spos + bulk_size), 293 | }, 294 | )?; 295 | if decoded < bulk_size { 296 | buf.resize(buf.len() + margin + 1, 0); 297 | } 298 | 299 | /* rescale to dst array, forward offset */ 300 | let (next_offset, consumed) = scaler.scale(dst, &buf, offset)?; 301 | offset = next_offset; 302 | if buf.len() < consumed || consumed < buf.len() - consumed { 303 | continue; 304 | } 305 | 306 | let (base, tail) = buf.split_at_mut(consumed); 307 | let (base, _) = base.split_at_mut(tail.len()); 308 | base.copy_from_slice(tail); 309 | 310 | let next_margin = base.len(); 311 | buf.resize(next_margin, 0); 312 | 313 | // println!("margin({}), len({}), decoded({}), consumed({}), next_margin({})", margin, buf.len(), decoded, consumed, next_margin); 314 | } 315 | Some(dst.len()) 316 | } 317 | } 318 | -------------------------------------------------------------------------------- /src/index.rs: -------------------------------------------------------------------------------- 1 | use super::op::op_len; 2 | use super::utils::PeekFold; 3 | // use std::alloc::{ Layout, dealloc }; 4 | // use std::mem::align_of; 5 | use std::ops::Range; 6 | 7 | /* Transcoded CIGAR and index object container 8 | 9 | everything is laid out in a single flat memory block, thus constructed via 10 | some unsafe operations. see `Precursor` and `SlicePrecursor`. 11 | 12 | Impls are found in `decoder.rs` and `insbin.rs`. Builder implmentation is in `builder.rs`/ 13 | */ 14 | 15 | /* block pitch, see above */ 16 | pub(super) const BLOCK_PITCH: usize = 256; 17 | 18 | /* block object, see above */ 19 | bitfield! { 20 | #[derive(Copy, Clone, Debug, Default)] 21 | pub struct Block(u64); 22 | pub ins_offset, set_ins_offset: 28, 0; 23 | pub op_offset, set_op_offset: 58, 29; 24 | pub op_skip, set_op_skip: 63, 59; 25 | } 26 | 27 | #[derive(Default)] 28 | pub(super) struct Index<'a> { 29 | pub(super) size: usize, /* object size for copying */ 30 | pub(super) ref_span: usize, /* reference side span */ 31 | pub(super) op: &'a [u8], 32 | pub(super) block: &'a [Block], 33 | pub(super) ins: &'a [u8], 34 | } 35 | 36 | /* common; builders and decoders are found in `builder.rs`, `decoder.rs`, and `insbin.rs`. */ 37 | impl<'a> Index<'a> { 38 | /* API redirected */ 39 | pub(super) fn reference_span(&self) -> usize { 40 | self.ref_span 41 | } 42 | 43 | /* Check sanity of the span. If queried span (range) is out of the indexed one, return None */ 44 | pub(super) fn check_span(&self, range: &Range) -> Option<()> { 45 | // debug!("span({}, {}), ref_span({})", range.start, range.end, self.ref_span); 46 | 47 | if range.end < range.start { 48 | return None; 49 | } 50 | if range.end > self.ref_span { 51 | return None; 52 | } 53 | Some(()) 54 | } 55 | 56 | /* fetch block head for this pos. */ 57 | pub(super) fn fetch_ops_block(&self, pos: usize) -> (u8, &[u8], usize) { 58 | let block_index = pos / BLOCK_PITCH; 59 | let block_rem = pos & (BLOCK_PITCH - 1); 60 | let block = &self.block[block_index]; 61 | 62 | let op_offset = block.op_offset() as usize; 63 | let op_skip = block.op_skip() as usize; 64 | // let ins_offset = block.ins_offset() as usize; 65 | 66 | let ops = &self.op[op_offset..]; 67 | let last_op = if op_offset == 0 { 68 | 0 69 | } else { 70 | self.op[op_offset - 1] 71 | }; 72 | 73 | // debug!("pos({}), rem({}), skip({})", pos, block_rem, op_skip); 74 | (last_op, ops, block_rem + op_skip) 75 | } 76 | 77 | pub(super) fn scan_op_array(&self, pos: usize) -> (&[u8], usize) { 78 | /* get block head for this pos */ 79 | let (_, ops, rem) = self.fetch_ops_block(pos); 80 | // debug!("rem({}), ops({:?})", rem, ops); 81 | 82 | /* linear polling */ 83 | let mut ops = ops.iter(); 84 | let ofs = ops.peek_fold(0, |a, &x| { 85 | let len = a + op_len(x); 86 | 87 | /* continue if at least one column remaining */ 88 | if len >= rem { 89 | return None; 90 | } 91 | Some(len) 92 | }); 93 | 94 | // debug!("rem({}), ofs({})", rem, ofs); 95 | return (ops.as_slice(), rem - ofs); /* is this sound? (I'm not sure...) */ 96 | } 97 | } 98 | -------------------------------------------------------------------------------- /src/insbin.rs: -------------------------------------------------------------------------------- 1 | use super::index::{Index, BLOCK_PITCH}; 2 | use super::op::{op_is_cont, op_len, op_marker, CompressMark}; 3 | use super::utils::{PeekFold, Writer}; 4 | use std::ops::Range; 5 | use std::slice::Iter; 6 | 7 | /* Insertion marker tracker 8 | 9 | CompressMark::Ins should be ignored when the succeeding op length is 31, 10 | which actually indicates its chunk length is 30. The length 31 appears when 11 | longer matching region is divided into multiple chunks, so it shouldn't be 12 | regarded as insertion even if the marker is Ins. 13 | 14 | InsTracker handles this double-meaning Ins marker. It keeps "ignore" state, 15 | which indicates Ins marker of the next op is treated NOP. It also keeps 16 | track of the inserted sequence array, by eating each variable-length sub- 17 | string on the array every time Ins marker apperared on the op stream. The 18 | current substring (inserted sequence for the current op with Ins marker) 19 | is available via `as_slice()` method. 20 | */ 21 | pub(super) struct InsTracker<'a> { 22 | ignore_next: bool, 23 | len: usize, 24 | ins: Iter<'a, u8>, 25 | } 26 | 27 | impl<'a> InsTracker<'a> { 28 | pub(super) fn new(last_op: u8, ins: &'a [u8]) -> Self { 29 | InsTracker::<'a> { 30 | ignore_next: op_is_cont(last_op), 31 | len: ins.len(), 32 | ins: ins.iter(), 33 | } 34 | } 35 | 36 | pub(super) fn forward(&mut self, op: u8) -> bool { 37 | /* check if the current op has ins, and update state for the next op */ 38 | let is_ins = !self.ignore_next && op_marker(op) == CompressMark::Ins as u32; 39 | self.ignore_next = op_is_cont(op); 40 | 41 | /* if current does not have ins, just return it's not ins */ 42 | if !is_ins { 43 | return false; 44 | } 45 | 46 | /* has ins, forward iterator */ 47 | self.ins.try_fold(1, |a, &x| { 48 | /* consumes at least one byte */ 49 | let a = a - (x == 0) as usize; 50 | if a == 0 { 51 | return None; 52 | } 53 | Some(a) 54 | }); 55 | true 56 | } 57 | 58 | pub(super) fn get_offset(&self) -> usize { 59 | self.len - self.ins.as_slice().len() /* is there better way for tracking the offset? */ 60 | } 61 | 62 | pub(super) fn as_slice(&self) -> &'a [u8] { 63 | self.ins.as_slice() 64 | } 65 | } 66 | 67 | impl<'a> Index<'a> { 68 | fn fetch_ins_block(&self, pos: usize) -> &[u8] { 69 | let block_index = pos / BLOCK_PITCH; 70 | let block = &self.block[block_index]; 71 | 72 | let ins_offset = block.ins_offset() as usize; 73 | &self.ins[ins_offset..] 74 | } 75 | 76 | /* ins */ 77 | fn scan_ins_array(&self, pos: usize) -> Option<&[u8]> { 78 | let (last_op, ops, rem) = self.fetch_ops_block(pos); 79 | let ins = self.fetch_ins_block(pos); 80 | 81 | /* linear polling on op array */ 82 | let mut ops = ops.iter(); 83 | let mut ins = InsTracker::new(last_op, ins); 84 | let len = ops.peek_fold(0, |a, &x| { 85 | let len = a + op_len(x); 86 | if len > rem { 87 | return None; 88 | } 89 | ins.forward(x); 90 | Some(len) 91 | }); 92 | 93 | /* if the length doesn't match, it indicates the column doesn't have insertion (so the query is wrong) */ 94 | if len < rem { 95 | return None; 96 | } 97 | 98 | /* insertion found. determine the valid length of the slice */ 99 | let ins = ins.as_slice(); 100 | let len = ins.iter().peek_fold(0, |a, &x| { 101 | if x == 0 { 102 | return None; 103 | } 104 | Some(a + 1) 105 | }); 106 | Some(&ins[..len]) 107 | } 108 | 109 | fn get_ins_core(dst: &mut Vec, ins: &[u8]) -> usize { 110 | /* expand packed nucleotide to Vec */ 111 | let range = dst.reserve_to(ins.len() * 2, |arr: &mut [u8], _: &[u8]| -> usize { 112 | for (i, x) in arr.iter_mut().enumerate() { 113 | /* little endian */ 114 | *x = if (i & 0x01) == 0 { 115 | ins[i / 2] & 0x0f 116 | } else { 117 | ins[i / 2] >> 4 118 | }; 119 | } 120 | 121 | let remove_tail = arr[arr.len() - 1] == 0; 122 | // debug!("{}, {}, {}, {:?}", ins.len(), arr.len(), remove_tail, arr); 123 | 124 | arr.len() - remove_tail as usize 125 | }); 126 | 127 | dst.resize(range.end, 0); 128 | range.end - range.start 129 | } 130 | 131 | pub(super) fn get_ins_into(&self, dst: &mut Vec, pos: usize) -> Option { 132 | self.check_span(&Range:: { start: 0, end: pos })?; 133 | 134 | let ins = self.scan_ins_array(pos)?; 135 | let len = Self::get_ins_core(dst, ins); 136 | 137 | Some(len) 138 | } 139 | 140 | pub(super) fn get_ins(&self, pos: usize) -> Option> { 141 | self.check_span(&Range:: { start: 0, end: pos })?; 142 | 143 | /* fetch ins vector */ 144 | let ins = self.scan_ins_array(pos)?; 145 | 146 | /* expand packed nucleotide to Vec */ 147 | let mut v = Vec::with_capacity(ins.len() * 2); 148 | Self::get_ins_core(&mut v, ins); 149 | 150 | Some(v) 151 | } 152 | } 153 | -------------------------------------------------------------------------------- /src/lib.rs: -------------------------------------------------------------------------------- 1 | /* 2 | @file udon/lib.rs 3 | @brief udon implementation 4 | 5 | @author Hajime Suzuki 6 | @license MIT 7 | 8 | @detail 9 | Udon is a transcoding and indexing data structure / algorithm for BAM CIGAR / MD strings. 10 | It enables fast querying of subalignments with arbitrary span, with optional scaling for visualization. 11 | 12 | See SAM / BAM specification for the definition of the CIGAR / MD strings: 13 | https://samtools.github.io/hts-specs/ 14 | 15 | Note: This file provides only APIs. See `builder.rs` and `decoder.rs` for the internals. 16 | */ 17 | 18 | mod builder; 19 | mod decoder; 20 | mod index; 21 | mod insbin; 22 | mod op; 23 | mod scaler; 24 | mod utils; 25 | 26 | use self::{index::Index, scaler::Scaler}; 27 | 28 | use std::ops::Range; 29 | 30 | #[macro_use] 31 | extern crate bitfield; 32 | 33 | /* logging */ 34 | extern crate log; 35 | 36 | /* Event for output (expanded) array. */ 37 | #[repr(u8)] 38 | pub enum UdonOp { 39 | MisA = 0x04, 40 | MisC = 0x04 | 0x01, 41 | MisG = 0x04 | 0x02, 42 | MisT = 0x04 | 0x03, 43 | Del = 0x08, 44 | Ins = 0x10, /* OR-ed with one of the others when a column has two edits (Ins-Del or Ins-Mismatch) */ 45 | } 46 | 47 | /* aliasing type for convenience */ 48 | pub type UdonColorPair = [[u8; 4]; 2]; 49 | 50 | /* udon object */ 51 | pub struct Udon<'o>(Index<'o>); 52 | 53 | /* wrapping is unnecessary; to keep this file looks simple... */ 54 | impl<'i, 'o> Udon<'o> { 55 | /* builders */ 56 | pub fn build( 57 | cigar: &'i [u32], 58 | packed_query: &'i [u8], 59 | mdstr: &'i [u8], 60 | ) -> Option>> { 61 | let index = Index::build(cigar, packed_query, false, mdstr)?; 62 | let udon = unsafe { Box::>::from_raw(Box::into_raw(index) as *mut Udon) }; 63 | Some(udon) 64 | } 65 | 66 | pub fn build_alt( 67 | cigar: &'i [u32], 68 | packed_query_primary: &'i [u8], 69 | mdstr: &'i [u8], 70 | ) -> Option>> { 71 | let index = Index::build(cigar, packed_query_primary, true, mdstr)?; 72 | let udon = unsafe { Box::>::from_raw(Box::into_raw(index) as *mut Udon) }; 73 | Some(udon) 74 | } 75 | 76 | /* UdonPrecursor is not exported for now. 77 | pub unsafe fn from_precursor(buf: &Vec, precursor: UdonPrecursor) -> Udon<'a> { 78 | Index::from_precursor(buf, precursor) 79 | } 80 | */ 81 | 82 | /* reference side span */ 83 | pub fn reference_span(&self) -> usize { 84 | self.0.reference_span() 85 | } 86 | 87 | /* decoders, output is actually Vec */ 88 | pub fn decode_raw_into(&self, dst: &mut Vec, ref_span: &Range) -> Option { 89 | self.0.decode_raw_into(dst, ref_span) 90 | } 91 | 92 | pub fn decode_raw(&self, ref_span: &Range) -> Option> { 93 | self.0.decode_raw(ref_span) 94 | } 95 | 96 | pub fn decode_scaled_into( 97 | &self, 98 | dst: &mut Vec, 99 | ref_span: &Range, 100 | offset_in_pixels: f64, 101 | scaler: &UdonScaler, 102 | ) -> Option { 103 | self.0 104 | .decode_scaled_into(dst, ref_span, offset_in_pixels, &scaler.0) 105 | } 106 | 107 | pub fn decode_scaled( 108 | &self, 109 | ref_span: &Range, 110 | offset_in_pixels: f64, 111 | scaler: &UdonScaler, 112 | ) -> Option> { 113 | self.0.decode_scaled(ref_span, offset_in_pixels, &scaler.0) 114 | } 115 | 116 | /* fetch insertion sequence at a specific position */ 117 | pub fn get_ins_into(&self, dst: &mut Vec, pos: usize) -> Option { 118 | self.0.get_ins_into(dst, pos) 119 | } 120 | 121 | pub fn get_ins(&self, pos: usize) -> Option> { 122 | self.0.get_ins(pos) 123 | } 124 | } 125 | 126 | /* UdonPalette 127 | 128 | Color table for scaled decoder. Required to build `UdonScaler`. 129 | `UdonColorPair` is defined as a pair of RGBa8. In the default color scheme, 130 | the first channel is used for normal ribbon, and the second channel is for 131 | insertion markers. The two channels are treated without any difference, 132 | they can be used for arbitrary purposes. 133 | 134 | See also: `UdonColorPair`, `UdonScaler`, and `Udon::decode_scaled`. 135 | */ 136 | #[derive(Copy, Clone, Debug)] 137 | pub struct UdonPalette { 138 | /* all in [(r, g, b, alpha); 2] form; the larger alpha value, the more transparent */ 139 | background: UdonColorPair, 140 | del: UdonColorPair, 141 | ins: UdonColorPair, 142 | mismatch: [UdonColorPair; 4], 143 | } 144 | 145 | impl Default for UdonPalette { 146 | fn default() -> UdonPalette { 147 | UdonPalette { 148 | background: [[255, 255, 255, 255], [255, 255, 255, 255]], 149 | del: [[255, 255, 255, 0], [0, 0, 0, 0]], /* white, black */ 150 | ins: [[153, 153, 153, 0], [153, 153, 153, 0]], /* lightgray */ 151 | mismatch: [ 152 | [[3, 175, 64, 0], [3, 175, 64, 0]], /* green */ 153 | [[0, 90, 255, 0], [0, 90, 255, 0]], /* blue */ 154 | [[80, 32, 64, 0], [80, 32, 64, 0]], /* dark brown */ 155 | [[255, 0, 0, 0], [255, 0, 0, 0]], /* yellow */ 156 | ], 157 | } 158 | } 159 | } 160 | 161 | /* UdonScaler 162 | 163 | Holds constants for the scaled decoder. Built from `UdonPalette`. 164 | */ 165 | pub struct UdonScaler(Scaler); 166 | 167 | impl UdonScaler { 168 | pub fn new(color: &UdonPalette, columns_per_pixel: f64) -> UdonScaler { 169 | UdonScaler(Scaler::new(color, columns_per_pixel)) 170 | } 171 | } 172 | 173 | /* UdonUtils 174 | 175 | Provides utilities on ribbon (&mut [u32]): blending and gamma correction 176 | */ 177 | pub trait UdonUtils { 178 | fn append_on_basecolor(&mut self, basecolor: &UdonColorPair) -> &mut Self; 179 | fn correct_gamma(&mut self) -> &mut Self; 180 | } 181 | 182 | impl UdonUtils for [UdonColorPair] { 183 | fn append_on_basecolor(&mut self, basecolor: &UdonColorPair) -> &mut Self { 184 | for x in self.iter_mut() { 185 | let alpha0 = x[0][3] as u16; 186 | let alpha1 = x[1][3] as u16; 187 | 188 | for i in 0..4 { 189 | let base0 = basecolor[0][i] as u16; 190 | let base1 = basecolor[1][i] as u16; 191 | let color0 = (255 - x[0][i]) as u16; 192 | let color1 = (255 - x[1][i]) as u16; 193 | 194 | let color0 = base0 * (256 - alpha0) + color0 * alpha0; 195 | let color1 = base1 * (256 - alpha1) + color1 * alpha1; 196 | 197 | x[0][i] = (color0 >> 8) as u8; 198 | x[1][i] = (color1 >> 8) as u8; 199 | } 200 | } 201 | self 202 | } 203 | 204 | fn correct_gamma(&mut self) -> &mut Self { 205 | const GAMMA: [u8; 256] = [ 206 | 0, 12, 21, 28, 33, 38, 42, 46, 49, 52, 55, 58, 61, 63, 66, 68, 70, 73, 75, 77, 79, 81, 207 | 83, 84, 86, 88, 90, 91, 93, 94, 96, 97, 99, 100, 102, 103, 105, 106, 107, 109, 110, 208 | 111, 113, 114, 115, 116, 118, 119, 120, 121, 122, 123, 124, 126, 127, 128, 129, 130, 209 | 131, 132, 133, 134, 135, 136, 137, 138, 139, 140, 141, 142, 143, 144, 145, 146, 146, 210 | 147, 148, 149, 150, 151, 152, 153, 153, 154, 155, 156, 157, 158, 159, 159, 160, 161, 211 | 162, 163, 163, 164, 165, 166, 166, 167, 168, 169, 169, 170, 171, 172, 172, 173, 174, 212 | 175, 175, 176, 177, 178, 178, 179, 180, 180, 181, 182, 182, 183, 184, 184, 185, 186, 213 | 186, 187, 188, 188, 189, 190, 190, 191, 192, 192, 193, 194, 194, 195, 195, 196, 197, 214 | 197, 198, 199, 199, 200, 200, 201, 202, 202, 203, 203, 204, 205, 205, 206, 206, 207, 215 | 207, 208, 209, 209, 210, 210, 211, 211, 212, 213, 213, 214, 214, 215, 215, 216, 216, 216 | 217, 218, 218, 219, 219, 220, 220, 221, 221, 222, 222, 223, 223, 224, 224, 225, 226, 217 | 226, 227, 227, 228, 228, 229, 229, 230, 230, 231, 231, 232, 232, 233, 233, 234, 234, 218 | 235, 235, 236, 236, 237, 237, 238, 238, 238, 239, 239, 240, 240, 241, 241, 242, 242, 219 | 243, 243, 244, 244, 245, 245, 246, 246, 246, 247, 247, 248, 248, 249, 249, 250, 250, 220 | 251, 251, 252, 252, 252, 253, 253, 254, 254, 255, 255, 221 | ]; 222 | 223 | for x in self.iter_mut() { 224 | for i in 0..4 { 225 | x[0][i] = GAMMA[x[0][i] as usize]; 226 | x[1][i] = GAMMA[x[1][i] as usize]; 227 | } 228 | } 229 | self 230 | } 231 | } 232 | 233 | #[cfg(test)] 234 | mod test { 235 | use std::ops::Range; 236 | use std::str::from_utf8; 237 | 238 | use super::index::BLOCK_PITCH; 239 | use super::op::CigarOp; 240 | use super::utils::{decode_base_unchecked, encode_base_unchecked}; 241 | #[allow(unused_imports)] 242 | use super::{Udon, UdonColorPair, UdonPalette, UdonScaler}; 243 | 244 | macro_rules! cigar { 245 | [ $( ( $op: ident, $len: expr ) ),* ] => ({ 246 | vec![ $( CigarOp::$op as u32 | (( $len as u32 )<<4) ),* ] 247 | }); 248 | } 249 | 250 | macro_rules! nucl { 251 | ( $st: expr ) => {{ 252 | let s = $st; 253 | // println!("{:?}", s); 254 | 255 | let mut v = Vec::::new(); 256 | let mut a: u8 = 0; 257 | for (i, c) in s.bytes().enumerate() { 258 | if (i & 0x01) == 0x01 { 259 | v.push(a | encode_base_unchecked(c as char)); 260 | } else { 261 | a = encode_base_unchecked(c as char) << 4; 262 | } 263 | } 264 | 265 | if (s.len() & 0x01) == 0x01 { 266 | v.push(a); 267 | } 268 | v 269 | }}; 270 | } 271 | 272 | macro_rules! encode_flat { 273 | ( $arr: expr ) => {{ 274 | let mut v = Vec::new(); 275 | for &x in $arr { 276 | let c = match x { 277 | 0x00 => 'M', 278 | 0x04 => 'A', 279 | 0x05 => 'C', 280 | 0x06 => 'G', 281 | 0x07 => 'T', 282 | 0x08 => 'D', 283 | 0x10 => 'M', 284 | 0x14 => 'A', 285 | 0x15 => 'C', 286 | 0x16 => 'G', 287 | 0x17 => 'T', 288 | 0x18 => 'D', 289 | _ => ' ', 290 | }; 291 | v.push(c as u8); 292 | } 293 | v 294 | }}; 295 | } 296 | 297 | macro_rules! encode_ins { 298 | ( $arr: expr ) => {{ 299 | let mut v = Vec::new(); 300 | for &x in $arr { 301 | v.push(if (x & 0x10) == 0x10 { 'I' } else { '-' } as u8); 302 | } 303 | v 304 | }}; 305 | } 306 | 307 | #[allow(unused_macros)] 308 | macro_rules! decode_nucl { 309 | ( $arr: expr ) => {{ 310 | let mut v = Vec::new(); 311 | for &x in $arr { 312 | v.push(decode_base_unchecked((x as u32) >> 4) as u8); 313 | if (x & 0x0f) == 0 { 314 | continue; 315 | } /* should be the last base */ 316 | 317 | v.push(decode_base_unchecked((x as u32) & 0x0f) as u8); 318 | } 319 | v 320 | }}; 321 | } 322 | 323 | macro_rules! compare { 324 | ( $build: expr, $cigar: expr, $nucl: expr, $mdstr: expr, $range: expr, $flat: expr, $ins: expr ) => {{ 325 | // let v = Vec::::new(); 326 | let c = $cigar; 327 | let n = $nucl; 328 | let m = $mdstr; 329 | let u = match $build(&c, &n, &m.as_bytes()) { 330 | None => { 331 | assert!(false, "failed to build index"); 332 | return; 333 | } 334 | Some(u) => u, 335 | }; 336 | let mut r: Range = $range; 337 | if r.start == 0 && r.end == 0 { 338 | r.end = u.reference_span(); 339 | } 340 | 341 | let a = u.decode_raw(&r).unwrap(); 342 | let f = encode_flat!(&a); 343 | let d = from_utf8(&f).unwrap(); 344 | assert!(d == $flat, "{:?}, {:?}", d, $flat); 345 | 346 | let j = encode_ins!(&a); 347 | let i = from_utf8(&j).unwrap(); 348 | assert!(i == $ins, "{:?}, {:?}", i, $ins); 349 | }}; 350 | } 351 | 352 | macro_rules! compare_ins { 353 | ( $build: expr, $cigar: expr, $nucl: expr, $mdstr: expr, $pos: expr, $ins_seq: expr ) => {{ 354 | let c = $cigar; 355 | let n = $nucl; 356 | let m = $mdstr; 357 | let u = match $build(&c, &n, &m.as_bytes()) { 358 | None => { 359 | assert!(false, "failed to build index"); 360 | return; 361 | } 362 | Some(u) => u, 363 | }; 364 | 365 | let i = match u.get_ins($pos) { 366 | None => vec!['*' as u8], 367 | Some(v) => v 368 | .iter() 369 | .map(|x| decode_base_unchecked(*x as u32) as u8) 370 | .collect(), 371 | }; 372 | let i = from_utf8(&i).unwrap(); 373 | assert!(i == $ins_seq, "{:?}, {:?}", i, $ins_seq); 374 | }}; 375 | } 376 | 377 | const BG: [[u8; 4]; 2] = [[0xff, 0xff, 0xff, 0xff], [0xff, 0xff, 0xff, 0xff]]; 378 | const DEL: [[u8; 4]; 2] = [[0x00, 0x00, 0xff, 0x00], [0x00, 0x00, 0xff, 0x00]]; 379 | const INS: [[u8; 4]; 2] = [[0xff, 0x00, 0xff, 0x00], [0xff, 0x00, 0xff, 0x00]]; 380 | const MISA: [[u8; 4]; 2] = [[0x7f, 0x1f, 0x1f, 0x00], [0x7f, 0x1f, 0x1f, 0x00]]; 381 | const MISC: [[u8; 4]; 2] = [[0x00, 0x00, 0xff, 0x00], [0x00, 0x00, 0xff, 0x00]]; 382 | const MISG: [[u8; 4]; 2] = [[0x00, 0xff, 0x00, 0x00], [0x00, 0xff, 0x00, 0x00]]; 383 | const MIST: [[u8; 4]; 2] = [[0xff, 0x00, 0x00, 0x00], [0xff, 0x00, 0x00, 0x00]]; 384 | 385 | macro_rules! compare_color { 386 | ( $build: expr, $cigar: expr, $nucl: expr, $mdstr: expr, $range: expr, $offset: expr, $scale: expr, $ribbon: expr, $color_factor: expr, $alpha_factor: expr ) => {{ 387 | let c = $cigar; 388 | let n = $nucl; 389 | let m = $mdstr; 390 | let u = match $build(&c, &n, &m.as_bytes()) { 391 | None => { 392 | assert!(false, "failed to build index"); 393 | return; 394 | } 395 | Some(u) => u, 396 | }; 397 | let mut r: Range = $range; 398 | if r.start == 0 && r.end == 0 { 399 | r.end = u.reference_span(); 400 | } 401 | 402 | let s = UdonScaler::new( 403 | &UdonPalette { 404 | background: BG, 405 | del: DEL, 406 | ins: INS, 407 | mismatch: [MISA, MISC, MISG, MIST], 408 | }, 409 | $scale, 410 | ); 411 | let b = match u.decode_scaled(&r, $offset, &s) { 412 | None => Vec::::new(), 413 | Some(v) => v, 414 | }; 415 | let n: Vec = $ribbon 416 | .iter() 417 | .map(|x| { 418 | let mut x = *x; 419 | for i in 0..3 { 420 | x[0][i] = ((0xff - x[0][i]) as f64 * $color_factor) as u8; 421 | x[1][i] = ((0xff - x[1][i]) as f64 * $color_factor) as u8; 422 | } 423 | for i in 3..4 { 424 | x[0][i] = ((0xff - x[0][i]) as f64 * $alpha_factor) as u8; 425 | x[1][i] = ((0xff - x[1][i]) as f64 * $alpha_factor) as u8; 426 | } 427 | 428 | // u32::from_le_bytes(x) 429 | x 430 | }) 431 | .collect(); 432 | // println!("{:?}, {:?}", b, n); 433 | assert!(b == n, "{:?}, {:?}", b, n); 434 | }}; 435 | } 436 | 437 | #[test] 438 | fn test_udon_build_match() { 439 | compare!( 440 | Udon::build, 441 | cigar![(Match, 4)], 442 | nucl!("ACGT"), 443 | "4", 444 | Range { start: 0, end: 0 }, 445 | "MMMM", 446 | "----" 447 | ); 448 | compare!( 449 | Udon::build, 450 | cigar![(Match, 30)], 451 | nucl!("ACGTACGTACGTACGTACGTACGTACGTAC"), 452 | "30", 453 | Range { start: 0, end: 0 }, 454 | "MMMMMMMMMMMMMMMMMMMMMMMMMMMMMM", 455 | "------------------------------" 456 | ); 457 | compare!( 458 | Udon::build, 459 | cigar![(Match, 31)], 460 | nucl!("ACGTACGTACGTACGTACGTACGTACGTACG"), 461 | "31", 462 | Range { start: 0, end: 0 }, 463 | "MMMMMMMMMMMMMMMMMMMMMMMMMMMMMMM", 464 | "-------------------------------" 465 | ); 466 | compare!( 467 | Udon::build, 468 | cigar![(Match, 32)], 469 | nucl!("ACGTACGTACGTACGTACGTACGTACGTACGT"), 470 | "32", 471 | Range { start: 0, end: 0 }, 472 | "MMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMM", 473 | "--------------------------------" 474 | ); 475 | compare!(Udon::build, 476 | cigar![(Match, 128)], 477 | nucl!("ACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGT"), 478 | "128", 479 | Range { start: 0, end: 0 }, 480 | "MMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMM", 481 | "--------------------------------------------------------------------------------------------------------------------------------" 482 | ); 483 | } 484 | 485 | #[test] 486 | fn test_udon_build_del() { 487 | compare!( 488 | Udon::build, 489 | cigar![(Match, 4), (Del, 1), (Match, 4)], 490 | nucl!("ACGTACGT"), 491 | "4^A4", 492 | Range { start: 0, end: 0 }, 493 | "MMMMDMMMM", 494 | "---------" 495 | ); 496 | compare!( 497 | Udon::build, 498 | cigar![(Match, 4), (Del, 3), (Match, 4)], 499 | nucl!("ACGTACGT"), 500 | "4^AGG4", 501 | Range { start: 0, end: 0 }, 502 | "MMMMDDDMMMM", 503 | "-----------" 504 | ); 505 | compare!( 506 | Udon::build, 507 | cigar![(Match, 4), (Del, 4), (Match, 4)], 508 | nucl!("ACGTACGT"), 509 | "4^AAAC4", 510 | Range { start: 0, end: 0 }, 511 | "MMMMDDDDMMMM", 512 | "------------" 513 | ); 514 | compare!( 515 | Udon::build, 516 | cigar![(Match, 4), (Del, 11), (Match, 4)], 517 | nucl!("ACGTACGT"), 518 | "4^GATAGATAGGG4", 519 | Range { start: 0, end: 0 }, 520 | "MMMMDDDDDDDDDDDMMMM", 521 | "-------------------" 522 | ); 523 | } 524 | 525 | #[test] 526 | fn test_udon_build_ins() { 527 | compare!( 528 | Udon::build, 529 | cigar![(Match, 4), (Ins, 1), (Match, 4)], 530 | nucl!("ACGTACGTA"), 531 | "8", 532 | Range { start: 0, end: 0 }, 533 | "MMMMMMMM", 534 | "----I---" 535 | ); 536 | compare!( 537 | Udon::build, 538 | cigar![(Match, 4), (Ins, 2), (Match, 4)], 539 | nucl!("ACGTACGTAC"), 540 | "8", 541 | Range { start: 0, end: 0 }, 542 | "MMMMMMMM", 543 | "----I---" 544 | ); 545 | } 546 | 547 | #[test] 548 | fn test_udon_build_mismatch() { 549 | compare!( 550 | Udon::build, 551 | cigar![(Match, 10)], 552 | nucl!("ACGTACGTAC"), 553 | "4T5", 554 | Range { start: 0, end: 0 }, 555 | "MMMMAMMMMM", 556 | "----------" 557 | ); 558 | compare!( 559 | Udon::build, 560 | cigar![(Match, 10)], 561 | nucl!("ACGTACGTAC"), 562 | "4T0C4", 563 | Range { start: 0, end: 0 }, 564 | "MMMMACMMMM", 565 | "----------" 566 | ); 567 | compare!( 568 | Udon::build, 569 | cigar![(Match, 10)], 570 | nucl!("ACGTACGTAC"), 571 | "4T1A0T2", 572 | Range { start: 0, end: 0 }, 573 | "MMMMAMGTMM", 574 | "----------" 575 | ); 576 | } 577 | 578 | #[test] 579 | fn test_udon_build_cont_mismatch() { 580 | /* continuous flag */ 581 | compare!( 582 | Udon::build, 583 | cigar![(Match, 34)], 584 | nucl!("ACGTACGTACGTACGTACGTACGTACGTACGTACGTAC"), 585 | "30T3", 586 | Range { start: 0, end: 0 }, 587 | "MMMMMMMMMMMMMMMMMMMMMMMMMMMMMMGMMM", 588 | "----------------------------------" 589 | ); 590 | compare!( 591 | Udon::build, 592 | cigar![(Match, 64)], 593 | nucl!("ACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGT"), 594 | "60T3", 595 | Range { start: 0, end: 0 }, 596 | "MMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMAMMM", 597 | "----------------------------------------------------------------" 598 | ); 599 | } 600 | 601 | #[test] 602 | fn test_udon_build_cont_del() { 603 | /* continuous flag */ 604 | compare!( 605 | Udon::build, 606 | cigar![(Match, 30), (Del, 4), (Match, 4)], 607 | nucl!("ACGTACGTACGTACGTACGTACGTACGTACGTACGTAC"), 608 | "30^ACGT4", 609 | Range { start: 0, end: 0 }, 610 | "MMMMMMMMMMMMMMMMMMMMMMMMMMMMMMDDDDMMMM", 611 | "--------------------------------------" 612 | ); 613 | compare!( 614 | Udon::build, 615 | cigar![(Match, 60), (Del, 4), (Match, 4)], 616 | nucl!("ACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGT"), 617 | "60^ACGT4", 618 | Range { start: 0, end: 0 }, 619 | "MMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMDDDDMMMM", 620 | "--------------------------------------------------------------------" 621 | ); 622 | } 623 | 624 | #[test] 625 | fn test_udon_build_cont_ins() { 626 | /* continuous flag */ 627 | compare!( 628 | Udon::build, 629 | cigar![(Match, 30), (Ins, 4), (Match, 4)], 630 | nucl!("ACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTAC"), 631 | "34", 632 | Range { start: 0, end: 0 }, 633 | "MMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMM", 634 | "------------------------------I---" 635 | ); 636 | compare!( 637 | Udon::build, 638 | cigar![(Match, 60), (Ins, 4), (Match, 4)], 639 | nucl!("ACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGT"), 640 | "64", 641 | Range { start: 0, end: 0 }, 642 | "MMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMM", 643 | "------------------------------------------------------------I---" 644 | ); 645 | } 646 | 647 | #[test] 648 | fn test_udon_build_softclip() { 649 | compare!( 650 | Udon::build, 651 | cigar![(SoftClip, 10), (Match, 10)], 652 | nucl!("ACGTACGTACGTACGTACGT"), 653 | "4T5", 654 | Range { start: 0, end: 0 }, 655 | "MMMMGMMMMM", 656 | "----------" 657 | ); 658 | compare!( 659 | Udon::build, 660 | cigar![(Match, 10), (SoftClip, 10)], 661 | nucl!("ACGTACGTACGTACGTACGT"), 662 | "4T5", 663 | Range { start: 0, end: 0 }, 664 | "MMMMAMMMMM", 665 | "----------" 666 | ); 667 | compare!( 668 | Udon::build, 669 | cigar![(SoftClip, 10), (Match, 10), (SoftClip, 10)], 670 | nucl!("ACGTACGTACGTACGTACGTACGTACGTAC"), 671 | "4T5", 672 | Range { start: 0, end: 0 }, 673 | "MMMMGMMMMM", 674 | "----------" 675 | ); 676 | } 677 | 678 | #[test] 679 | fn test_udon_build_hardclip() { 680 | compare!( 681 | Udon::build, 682 | cigar![(HardClip, 10), (Match, 10)], 683 | nucl!("GTACGTACGT"), 684 | "4T5", 685 | Range { start: 0, end: 0 }, 686 | "MMMMGMMMMM", 687 | "----------" 688 | ); 689 | compare!( 690 | Udon::build, 691 | cigar![(Match, 10), (HardClip, 10)], 692 | nucl!("ACGTACGTAC"), 693 | "4T5", 694 | Range { start: 0, end: 0 }, 695 | "MMMMAMMMMM", 696 | "----------" 697 | ); 698 | compare!( 699 | Udon::build, 700 | cigar![(HardClip, 10), (Match, 10), (HardClip, 10)], 701 | nucl!("GTACGTACGT"), 702 | "4T5", 703 | Range { start: 0, end: 0 }, 704 | "MMMMGMMMMM", 705 | "----------" 706 | ); 707 | } 708 | 709 | #[test] 710 | fn test_udon_build_alt() { 711 | compare!( 712 | Udon::build_alt, 713 | cigar![(HardClip, 10), (Match, 10)], 714 | nucl!("ACGTACGTACGTACGTACGT"), 715 | "4T5", 716 | Range { start: 0, end: 0 }, 717 | "MMMMGMMMMM", 718 | "----------" 719 | ); 720 | compare!( 721 | Udon::build_alt, 722 | cigar![(Match, 10), (HardClip, 10)], 723 | nucl!("ACGTACGTACACGTACGTAC"), 724 | "4T5", 725 | Range { start: 0, end: 0 }, 726 | "MMMMAMMMMM", 727 | "----------" 728 | ); 729 | compare!( 730 | Udon::build_alt, 731 | cigar![(HardClip, 10), (Match, 10), (HardClip, 10)], 732 | nucl!("ACGTACGTACGTACGTACGTACGTACGTAC"), 733 | "4T5", 734 | Range { start: 0, end: 0 }, 735 | "MMMMGMMMMM", 736 | "----------" 737 | ); 738 | } 739 | 740 | #[test] 741 | fn test_udon_build_head_del() { 742 | /* accept both */ 743 | compare!( 744 | Udon::build, 745 | cigar![(Del, 4), (Match, 4)], 746 | nucl!("ACGT"), 747 | "^ACGT4", 748 | Range { start: 0, end: 0 }, 749 | "DDDDMMMM", 750 | "--------" 751 | ); 752 | compare!( 753 | Udon::build, 754 | cigar![(Del, 4), (Match, 4)], 755 | nucl!("ACGT"), 756 | "0^ACGT4", 757 | Range { start: 0, end: 0 }, 758 | "DDDDMMMM", 759 | "--------" 760 | ); 761 | } 762 | 763 | #[test] 764 | fn test_udon_build_head_ins() { 765 | compare!( 766 | Udon::build, 767 | cigar![(Ins, 4), (Match, 4)], 768 | nucl!("ACGTACGT"), 769 | "4", 770 | Range { start: 0, end: 0 }, 771 | "MMMM", 772 | "I---" 773 | ); 774 | } 775 | 776 | #[test] 777 | fn test_udon_build_tail_del() { 778 | compare!( 779 | Udon::build, 780 | cigar![(Match, 4), (Del, 4)], 781 | nucl!("ACGT"), 782 | "4^ACGT", 783 | Range { start: 0, end: 0 }, 784 | "MMMMDDDD", 785 | "--------" 786 | ); 787 | } 788 | 789 | #[test] 790 | fn test_udon_build_tail_ins() { 791 | compare!( 792 | Udon::build, 793 | cigar![(Match, 4), (Ins, 4)], 794 | nucl!("ACGTACGT"), 795 | "4", 796 | Range { start: 0, end: 0 }, 797 | "MMMM", 798 | "----" 799 | ); 800 | } 801 | 802 | #[test] 803 | fn test_udon_build_head_mismatch() { 804 | /* we regard both encodings valid */ 805 | compare!( 806 | Udon::build, 807 | cigar![(Match, 4)], 808 | nucl!("ACGT"), 809 | "T3", 810 | Range { start: 0, end: 0 }, 811 | "AMMM", 812 | "----" 813 | ); 814 | compare!( 815 | Udon::build, 816 | cigar![(Match, 4)], 817 | nucl!("ACGT"), 818 | "0T3", 819 | Range { start: 0, end: 0 }, 820 | "AMMM", 821 | "----" 822 | ); 823 | } 824 | 825 | #[test] 826 | fn test_udon_build_tail_mismatch() { 827 | /* we regard both encodings valid */ 828 | compare!( 829 | Udon::build, 830 | cigar![(Match, 4)], 831 | nucl!("ACGT"), 832 | "3A", 833 | Range { start: 0, end: 0 }, 834 | "MMMT", 835 | "----" 836 | ); 837 | compare!( 838 | Udon::build, 839 | cigar![(Match, 4)], 840 | nucl!("ACGT"), 841 | "3A0", 842 | Range { start: 0, end: 0 }, 843 | "MMMT", 844 | "----" 845 | ); 846 | } 847 | 848 | #[test] 849 | fn test_udon_build_del_ins() { 850 | /* not natural as CIGAR string but sometimes appear in real data */ 851 | compare!( 852 | Udon::build, 853 | cigar![(Match, 4), (Del, 4), (Ins, 4), (Match, 4)], 854 | nucl!("ACGTGGGGACGT"), 855 | "4^CCCC4", 856 | Range { start: 0, end: 0 }, 857 | "MMMMDDDDMMMM", 858 | "--------I---" 859 | ); 860 | compare!( 861 | Udon::build, 862 | cigar![ 863 | (Match, 4), 864 | (Del, 4), 865 | (Ins, 4), 866 | (Del, 4), 867 | (Ins, 4), 868 | (Match, 4) 869 | ], 870 | nucl!("ACGTGGGGAAAAACGT"), 871 | "4^CCCC0^TTTT4", /* is this correct? */ 872 | Range { start: 0, end: 0 }, 873 | "MMMMDDDDDDDDMMMM", 874 | "------------I---" /* FIXME: insertion marker lost */ 875 | ); 876 | } 877 | 878 | #[test] 879 | fn test_udon_build_ins_del() { 880 | /* also not natural */ 881 | compare!( 882 | Udon::build, 883 | cigar![(Match, 4), (Ins, 4), (Del, 4), (Match, 4)], 884 | nucl!("ACGTGGGGACGT"), 885 | "4^CCCC4", 886 | Range { start: 0, end: 0 }, 887 | "MMMMDDDDMMMM", 888 | "------------" /* insertion marker lost */ 889 | ); 890 | compare!( 891 | Udon::build, 892 | cigar![ 893 | (Match, 4), 894 | (Ins, 4), 895 | (Del, 4), 896 | (Ins, 4), 897 | (Del, 4), 898 | (Match, 4) 899 | ], 900 | nucl!("ACGTGGGGACGT"), 901 | "4^CCCC0^AAAA4", 902 | Range { start: 0, end: 0 }, 903 | "MMMMDDDDDDDDMMMM", 904 | "----------------" /* again, lost */ 905 | ); 906 | } 907 | 908 | #[test] 909 | fn test_udon_build_mismatch_del() { 910 | compare!( 911 | Udon::build, 912 | cigar![(Match, 4), (Del, 4), (Match, 4)], 913 | nucl!("ACGTGGGGACGT"), 914 | "3A0^CCCC4", 915 | Range { start: 0, end: 0 }, 916 | "MMMTDDDDMMMM", 917 | "------------" 918 | ); 919 | } 920 | 921 | #[test] 922 | fn test_udon_build_mismatch_ins() { 923 | compare!( 924 | Udon::build, 925 | cigar![(Match, 4), (Ins, 4), (Match, 4)], 926 | nucl!("ACGTACGT"), 927 | "3A4", 928 | Range { start: 0, end: 0 }, 929 | "MMMTMMMM", 930 | "----I---" 931 | ); 932 | } 933 | 934 | #[test] 935 | fn test_udon_build_complex() { 936 | compare!( 937 | Udon::build, 938 | cigar![ 939 | (SoftClip, 7), 940 | (Match, 4), 941 | (Ins, 1), 942 | (Match, 4), 943 | (Del, 1), 944 | (Match, 2), 945 | (Del, 7), 946 | (Match, 40), 947 | (HardClip, 15) 948 | ], 949 | nucl!("TTTTTTTACGTACGTACGACGTACGTACGTACGTACGTACGTACGTACGTACGTACGT"), 950 | "8^A2^ACGTACG4T9A0C0G23", 951 | Range { start: 0, end: 0 }, 952 | "MMMMMMMMDMMDDDDDDDMMMMAMMMMMMMMMGTAMMMMMMMMMMMMMMMMMMMMMMM", 953 | "----I-----------------------------------------------------" 954 | ); 955 | } 956 | 957 | #[test] 958 | fn test_udon_build_extended() { 959 | compare!( 960 | Udon::build, 961 | cigar![(Eq, 4)], 962 | nucl!("ACGT"), 963 | "4", 964 | Range { start: 0, end: 0 }, 965 | "MMMM", 966 | "----" 967 | ); 968 | compare!( 969 | Udon::build, 970 | cigar![(Eq, 30)], 971 | nucl!("ACGTACGTACGTACGTACGTACGTACGTAC"), 972 | "30", 973 | Range { start: 0, end: 0 }, 974 | "MMMMMMMMMMMMMMMMMMMMMMMMMMMMMM", 975 | "------------------------------" 976 | ); 977 | compare!( 978 | Udon::build, 979 | cigar![(Eq, 4), (Mismatch, 1), (Eq, 4)], 980 | nucl!("ACGTACGTA"), 981 | "4T4", 982 | Range { start: 0, end: 0 }, 983 | "MMMMAMMMM", 984 | "---------" 985 | ); 986 | } 987 | 988 | #[test] 989 | fn test_udon_build_squash() { 990 | compare!( 991 | Udon::build, 992 | cigar![(Match, 2), (Match, 2)], 993 | nucl!("ACGT"), 994 | "4", 995 | Range { start: 0, end: 0 }, 996 | "MMMM", 997 | "----" 998 | ); 999 | compare!( 1000 | Udon::build, 1001 | cigar![(Eq, 5), (Mismatch, 1), (Match, 1), (Eq, 2)], 1002 | nucl!("ACGTACGTA"), 1003 | "5T3", 1004 | Range { start: 0, end: 0 }, 1005 | "MMMMMCMMM", 1006 | "---------" 1007 | ); 1008 | compare!( 1009 | Udon::build, 1010 | cigar![(Eq, 5), (Mismatch, 1), (Mismatch, 1), (Eq, 2)], 1011 | nucl!("ACGTACGTA"), 1012 | "5T0T2", 1013 | Range { start: 0, end: 0 }, 1014 | "MMMMMCGMM", 1015 | "---------" 1016 | ); 1017 | compare!( 1018 | Udon::build, 1019 | cigar![(Ins, 2), (Ins, 2), (Match, 4)], 1020 | nucl!("ACGTACGT"), 1021 | "4", 1022 | Range { start: 0, end: 0 }, 1023 | "MMMM", 1024 | "I---" 1025 | ); 1026 | compare!( 1027 | Udon::build, 1028 | cigar![(Del, 2), (Del, 2), (Match, 4)], 1029 | nucl!("ACGT"), 1030 | "^ACGT4", 1031 | Range { start: 0, end: 0 }, 1032 | "DDDDMMMM", 1033 | "--------" 1034 | ); 1035 | compare!( 1036 | Udon::build, 1037 | cigar![(Del, 2), (Del, 2), (Match, 4)], 1038 | nucl!("ACGT"), 1039 | "0^ACGT4", 1040 | Range { start: 0, end: 0 }, 1041 | "DDDDMMMM", 1042 | "--------" 1043 | ); 1044 | } 1045 | 1046 | #[test] 1047 | fn test_udon_decode_match() { 1048 | compare!( 1049 | Udon::build, 1050 | cigar![(Match, 8)], 1051 | nucl!("ACGTACGT"), 1052 | "8", 1053 | Range { start: 2, end: 6 }, 1054 | "MMMM", 1055 | "----" 1056 | ); 1057 | } 1058 | 1059 | #[test] 1060 | fn test_udon_decode_mismatch() { 1061 | compare!( 1062 | Udon::build, 1063 | cigar![(Match, 8)], 1064 | nucl!("ACGTACGT"), 1065 | "4T3", 1066 | Range { start: 2, end: 6 }, 1067 | "MMAM", 1068 | "----" 1069 | ); 1070 | } 1071 | 1072 | #[test] 1073 | fn test_udon_decode_del() { 1074 | compare!( 1075 | Udon::build, 1076 | cigar![(Match, 4), (Del, 1), (Match, 4)], 1077 | nucl!("ACGTACGT"), 1078 | "4^T4", 1079 | Range { start: 2, end: 7 }, 1080 | "MMDMM", 1081 | "-----" 1082 | ); 1083 | } 1084 | 1085 | #[test] 1086 | fn test_udon_decode_ins() { 1087 | compare!( 1088 | Udon::build, 1089 | cigar![(Match, 4), (Ins, 1), (Match, 4)], 1090 | nucl!("ACGTACGTA"), 1091 | "8", 1092 | Range { start: 2, end: 6 }, 1093 | "MMMM", 1094 | "--I-" 1095 | ); 1096 | } 1097 | 1098 | #[test] 1099 | fn test_udon_decode_cont_mismatch() { 1100 | /* mismatch on boundary */ 1101 | compare!( 1102 | Udon::build, 1103 | cigar![(Match, 34)], 1104 | nucl!("ACGTACGTACGTACGTACGTACGTACGTACGTACGTAC"), 1105 | "30T3", 1106 | Range { start: 30, end: 34 }, 1107 | "GMMM", 1108 | "----" 1109 | ); 1110 | compare!( 1111 | Udon::build, 1112 | cigar![(Match, 64)], 1113 | nucl!("ACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGT"), 1114 | "60T3", 1115 | Range { start: 60, end: 64 }, 1116 | "AMMM", 1117 | "----" 1118 | ); 1119 | 1120 | /* skip one */ 1121 | compare!( 1122 | Udon::build, 1123 | cigar![(Match, 34)], 1124 | nucl!("ACGTACGTACGTACGTACGTACGTACGTACGTACGTAC"), 1125 | "30T3", 1126 | Range { start: 31, end: 34 }, 1127 | "MMM", 1128 | "---" 1129 | ); 1130 | } 1131 | 1132 | #[test] 1133 | fn test_udon_decode_cont_del() { 1134 | /* deletion on boundary */ 1135 | compare!( 1136 | Udon::build, 1137 | cigar![(Match, 30), (Del, 4), (Match, 4)], 1138 | nucl!("ACGTACGTACGTACGTACGTACGTACGTACGTACGTAC"), 1139 | "30^ACGT4", 1140 | Range { start: 30, end: 38 }, 1141 | "DDDDMMMM", 1142 | "--------" 1143 | ); 1144 | compare!( 1145 | Udon::build, 1146 | cigar![(Match, 60), (Del, 4), (Match, 4)], 1147 | nucl!("ACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGT"), 1148 | "60^ACGT4", 1149 | Range { start: 60, end: 68 }, 1150 | "DDDDMMMM", 1151 | "--------" 1152 | ); 1153 | 1154 | /* skipping one */ 1155 | compare!( 1156 | Udon::build, 1157 | cigar![(Match, 30), (Del, 4), (Match, 4)], 1158 | nucl!("ACGTACGTACGTACGTACGTACGTACGTACGTACGTAC"), 1159 | "30^ACGT4", 1160 | Range { start: 31, end: 38 }, 1161 | "DDDMMMM", 1162 | "-------" 1163 | ); 1164 | 1165 | /* leaving one */ 1166 | compare!( 1167 | Udon::build, 1168 | cigar![(Match, 30), (Del, 4), (Match, 4)], 1169 | nucl!("ACGTACGTACGTACGTACGTACGTACGTACGTACGTAC"), 1170 | "30^ACGT4", 1171 | Range { start: 29, end: 33 }, 1172 | "MDDD", 1173 | "----" 1174 | ); 1175 | } 1176 | 1177 | #[test] 1178 | fn test_udon_decode_cont_ins() { 1179 | /* insertion on boundary */ 1180 | compare!( 1181 | Udon::build, 1182 | cigar![(Match, 30), (Ins, 4), (Match, 4)], 1183 | nucl!("ACGTACGTACGTACGTACGTACGTACGTACGTACGTAC"), 1184 | "34", 1185 | Range { start: 30, end: 34 }, 1186 | "MMMM", 1187 | "I---" 1188 | ); 1189 | compare!( 1190 | Udon::build, 1191 | cigar![(Match, 60), (Ins, 4), (Match, 4)], 1192 | nucl!("ACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGT"), 1193 | "64", 1194 | Range { start: 60, end: 64 }, 1195 | "MMMM", 1196 | "I---" 1197 | ); 1198 | /* skip one */ 1199 | compare!( 1200 | Udon::build, 1201 | cigar![(Match, 30), (Ins, 4), (Match, 4)], 1202 | nucl!("ACGTACGTACGTACGTACGTACGTACGTACGTACGTAC"), 1203 | "34", 1204 | Range { start: 31, end: 34 }, 1205 | "MMM", 1206 | "---" 1207 | ); 1208 | } 1209 | #[test] 1210 | fn test_udon_decode_poll_match() { 1211 | /* test block polling */ 1212 | compare!( 1213 | Udon::build, 1214 | cigar![(Match, BLOCK_PITCH + 8)], 1215 | nucl!(format!( 1216 | "{}{}", 1217 | from_utf8(&['A' as u8; BLOCK_PITCH]).unwrap(), 1218 | "ACGTACGT" 1219 | )), 1220 | format!("{}", BLOCK_PITCH + 8), 1221 | Range { 1222 | start: BLOCK_PITCH - 2, 1223 | end: BLOCK_PITCH + 6 1224 | }, 1225 | "MMMMMMMM", 1226 | "--------" 1227 | ); 1228 | compare!( 1229 | Udon::build, 1230 | cigar![(Match, BLOCK_PITCH + 8)], 1231 | nucl!(format!( 1232 | "{}{}", 1233 | from_utf8(&['A' as u8; BLOCK_PITCH]).unwrap(), 1234 | "ACGTACGT" 1235 | )), 1236 | format!("{}", BLOCK_PITCH + 8), 1237 | Range { 1238 | start: BLOCK_PITCH + 2, 1239 | end: BLOCK_PITCH + 6 1240 | }, 1241 | "MMMM", 1242 | "----" 1243 | ); 1244 | 1245 | /* longer */ 1246 | compare!( 1247 | Udon::build, 1248 | cigar![(Match, 21 * BLOCK_PITCH + 8)], 1249 | nucl!(format!( 1250 | "{}{}", 1251 | from_utf8(&['A' as u8; 21 * BLOCK_PITCH]).unwrap(), 1252 | "ACGTACGT" 1253 | )), 1254 | format!("{}", 21 * BLOCK_PITCH + 8), 1255 | Range { 1256 | start: 21 * BLOCK_PITCH - 2, 1257 | end: 21 * BLOCK_PITCH + 6 1258 | }, 1259 | "MMMMMMMM", 1260 | "--------" 1261 | ); 1262 | compare!( 1263 | Udon::build, 1264 | cigar![(Match, 21 * BLOCK_PITCH + 8)], 1265 | nucl!(format!( 1266 | "{}{}", 1267 | from_utf8(&['A' as u8; 21 * BLOCK_PITCH]).unwrap(), 1268 | "ACGTACGT" 1269 | )), 1270 | format!("{}", 21 * BLOCK_PITCH + 8), 1271 | Range { 1272 | start: 21 * BLOCK_PITCH + 2, 1273 | end: 21 * BLOCK_PITCH + 6 1274 | }, 1275 | "MMMM", 1276 | "----" 1277 | ); 1278 | } 1279 | 1280 | #[test] 1281 | fn test_udon_decode_poll_mismatch() { 1282 | compare!( 1283 | Udon::build, 1284 | cigar![(Match, BLOCK_PITCH + 8)], 1285 | nucl!(format!( 1286 | "{}{}", 1287 | from_utf8(&['A' as u8; BLOCK_PITCH]).unwrap(), 1288 | "ACGTACGT" 1289 | )), 1290 | format!("{}T3", BLOCK_PITCH + 4), 1291 | Range { 1292 | start: BLOCK_PITCH - 2, 1293 | end: BLOCK_PITCH + 6 1294 | }, 1295 | "MMMMMMAM", 1296 | "--------" 1297 | ); 1298 | compare!( 1299 | Udon::build, 1300 | cigar![(Match, BLOCK_PITCH + 8)], 1301 | nucl!(format!( 1302 | "{}{}", 1303 | from_utf8(&['A' as u8; BLOCK_PITCH]).unwrap(), 1304 | "ACGTACGT" 1305 | )), 1306 | format!("{}T3", BLOCK_PITCH + 4), 1307 | Range { 1308 | start: BLOCK_PITCH + 2, 1309 | end: BLOCK_PITCH + 6 1310 | }, 1311 | "MMAM", 1312 | "----" 1313 | ); 1314 | 1315 | /* mismatch on block boundary */ 1316 | compare!( 1317 | Udon::build, 1318 | cigar![(Match, BLOCK_PITCH + 8)], 1319 | nucl!(format!( 1320 | "{}{}", 1321 | from_utf8(&['A' as u8; BLOCK_PITCH]).unwrap(), 1322 | "ACGTACGT" 1323 | )), 1324 | format!("{}T7", BLOCK_PITCH), 1325 | Range { 1326 | start: BLOCK_PITCH - 2, 1327 | end: BLOCK_PITCH + 2 1328 | }, 1329 | "MMAM", 1330 | "----" 1331 | ); 1332 | /* mismatch right before block boundary */ 1333 | compare!( 1334 | Udon::build, 1335 | cigar![(Match, BLOCK_PITCH + 8)], 1336 | nucl!(format!( 1337 | "{}{}", 1338 | from_utf8(&['A' as u8; BLOCK_PITCH]).unwrap(), 1339 | "ACGTACGT" 1340 | )), 1341 | format!("{}T8", BLOCK_PITCH - 1), 1342 | Range { 1343 | start: BLOCK_PITCH - 2, 1344 | end: BLOCK_PITCH + 2 1345 | }, 1346 | "MAMM", 1347 | "----" 1348 | ); 1349 | } 1350 | 1351 | #[test] 1352 | fn test_udon_decode_poll_mismatch_long() { 1353 | /* much longer */ 1354 | compare!( 1355 | Udon::build, 1356 | cigar![(Match, 321 * BLOCK_PITCH + 8)], 1357 | nucl!(format!( 1358 | "{}{}", 1359 | from_utf8(&['A' as u8; 321 * BLOCK_PITCH]).unwrap(), 1360 | "ACGTACGT" 1361 | )), 1362 | format!("{}T3", 321 * BLOCK_PITCH + 4), 1363 | Range { 1364 | start: 321 * BLOCK_PITCH - 2, 1365 | end: 321 * BLOCK_PITCH + 6 1366 | }, 1367 | "MMMMMMAM", 1368 | "--------" 1369 | ); 1370 | compare!( 1371 | Udon::build, 1372 | cigar![(Match, 321 * BLOCK_PITCH + 8)], 1373 | nucl!(format!( 1374 | "{}{}", 1375 | from_utf8(&['A' as u8; 321 * BLOCK_PITCH]).unwrap(), 1376 | "ACGTACGT" 1377 | )), 1378 | format!("{}T3", 321 * BLOCK_PITCH + 4), 1379 | Range { 1380 | start: 321 * BLOCK_PITCH + 2, 1381 | end: 321 * BLOCK_PITCH + 6 1382 | }, 1383 | "MMAM", 1384 | "----" 1385 | ); 1386 | compare!( 1387 | Udon::build, 1388 | cigar![(Match, 321 * BLOCK_PITCH + 8)], 1389 | nucl!(format!( 1390 | "{}{}", 1391 | from_utf8(&['A' as u8; 321 * BLOCK_PITCH]).unwrap(), 1392 | "ACGTACGT" 1393 | )), 1394 | format!("{}T7", 321 * BLOCK_PITCH), 1395 | Range { 1396 | start: 321 * BLOCK_PITCH - 2, 1397 | end: 321 * BLOCK_PITCH + 2 1398 | }, 1399 | "MMAM", 1400 | "----" 1401 | ); 1402 | } 1403 | 1404 | #[test] 1405 | fn test_udon_decode_poll_del() { 1406 | /* boundary on boundary */ 1407 | compare!( 1408 | Udon::build, 1409 | cigar![(Match, BLOCK_PITCH), (Del, 4), (Match, 4)], 1410 | nucl!(format!( 1411 | "{}{}", 1412 | from_utf8(&['A' as u8; BLOCK_PITCH]).unwrap(), 1413 | "ACGTACGT" 1414 | )), 1415 | format!("{}^ACGT4", BLOCK_PITCH), 1416 | Range { 1417 | start: BLOCK_PITCH - 2, 1418 | end: BLOCK_PITCH + 6 1419 | }, 1420 | "MMDDDDMM", 1421 | "--------" 1422 | ); 1423 | compare!( 1424 | Udon::build, 1425 | cigar![(Match, BLOCK_PITCH), (Del, 4), (Match, 4)], 1426 | nucl!(format!( 1427 | "{}{}", 1428 | from_utf8(&['A' as u8; BLOCK_PITCH]).unwrap(), 1429 | "ACGTACGT" 1430 | )), 1431 | format!("{}^ACGT4", BLOCK_PITCH), 1432 | Range { 1433 | start: BLOCK_PITCH, 1434 | end: BLOCK_PITCH + 6 1435 | }, 1436 | "DDDDMM", 1437 | "------" 1438 | ); 1439 | } 1440 | 1441 | #[test] 1442 | fn test_udon_decode_poll_del2() { 1443 | /* over boundary */ 1444 | compare!( 1445 | Udon::build, 1446 | cigar![(Match, BLOCK_PITCH - 2), (Del, 4), (Match, 6)], 1447 | nucl!(format!( 1448 | "{}{}", 1449 | from_utf8(&['A' as u8; BLOCK_PITCH]).unwrap(), 1450 | "ACGTACGT" 1451 | )), 1452 | format!("{}^ACGT6", BLOCK_PITCH - 2), 1453 | Range { 1454 | start: BLOCK_PITCH - 2, 1455 | end: BLOCK_PITCH + 6 1456 | }, 1457 | "DDDDMMMM", 1458 | "--------" 1459 | ); 1460 | compare!( 1461 | Udon::build, 1462 | cigar![(Match, BLOCK_PITCH - 2), (Del, 4), (Match, 6)], 1463 | nucl!(format!( 1464 | "{}{}", 1465 | from_utf8(&['A' as u8; BLOCK_PITCH]).unwrap(), 1466 | "ACGTACGT" 1467 | )), 1468 | format!("{}^ACGT6", BLOCK_PITCH - 2), 1469 | Range { 1470 | start: BLOCK_PITCH, 1471 | end: BLOCK_PITCH + 6 1472 | }, 1473 | "DDMMMM", 1474 | "------" 1475 | ); 1476 | compare!( 1477 | Udon::build, 1478 | cigar![(Match, BLOCK_PITCH - 2), (Del, 4), (Match, 6)], 1479 | nucl!(format!( 1480 | "{}{}", 1481 | from_utf8(&['A' as u8; BLOCK_PITCH]).unwrap(), 1482 | "ACGTACGT" 1483 | )), 1484 | format!("{}^ACGT6", BLOCK_PITCH - 2), 1485 | Range { 1486 | start: BLOCK_PITCH + 2, 1487 | end: BLOCK_PITCH + 6 1488 | }, 1489 | "MMMM", 1490 | "----" 1491 | ); 1492 | } 1493 | 1494 | #[test] 1495 | fn test_udon_decode_poll_ins() { 1496 | /* boundary on boundary */ 1497 | compare!( 1498 | Udon::build, 1499 | cigar![(Match, BLOCK_PITCH - 2), (Ins, 4), (Match, 6)], 1500 | nucl!(format!( 1501 | "{}{}", 1502 | from_utf8(&['A' as u8; BLOCK_PITCH]).unwrap(), 1503 | "ACGTACGT" 1504 | )), 1505 | format!("{}", BLOCK_PITCH + 4), 1506 | Range { 1507 | start: BLOCK_PITCH - 2, 1508 | end: BLOCK_PITCH + 2 1509 | }, 1510 | "MMMM", 1511 | "I---" 1512 | ); 1513 | compare!( 1514 | Udon::build, 1515 | cigar![(Match, BLOCK_PITCH - 2), (Ins, 4), (Match, 6)], 1516 | nucl!(format!( 1517 | "{}{}", 1518 | from_utf8(&['A' as u8; BLOCK_PITCH]).unwrap(), 1519 | "ACGTACGT" 1520 | )), 1521 | format!("{}", BLOCK_PITCH + 4), 1522 | Range { 1523 | start: BLOCK_PITCH, 1524 | end: BLOCK_PITCH + 4 1525 | }, 1526 | "MMMM", 1527 | "----" 1528 | ); 1529 | } 1530 | 1531 | #[test] 1532 | fn test_udon_decode_query_ins() { 1533 | compare_ins!( 1534 | Udon::build, 1535 | cigar![(Match, 4), (Ins, 4), (Match, 4)], 1536 | nucl!("ACGTACGTACGT"), 1537 | "8", 1538 | 4, 1539 | "ACGT" 1540 | ); 1541 | compare_ins!( 1542 | Udon::build, 1543 | cigar![(Match, 4), (Ins, 4), (Match, 4)], 1544 | nucl!("ACGTACGTACGT"), 1545 | "8", 1546 | 3, 1547 | "*" 1548 | ); 1549 | compare_ins!( 1550 | Udon::build, 1551 | cigar![(Match, 4), (Ins, 4), (Match, 4)], 1552 | nucl!("ACGTACGTACGT"), 1553 | "8", 1554 | 5, 1555 | "*" 1556 | ); 1557 | } 1558 | 1559 | #[test] 1560 | fn test_udon_decode_query_ins_double() { 1561 | compare_ins!( 1562 | Udon::build, 1563 | cigar![(Match, 4), (Ins, 4), (Match, 4), (Ins, 4), (Match, 4)], 1564 | nucl!("ACGTACGTACGTGGGGACGT"), 1565 | "12", 1566 | 8, 1567 | "GGGG" 1568 | ); 1569 | compare_ins!( 1570 | Udon::build, 1571 | cigar![(Match, 4), (Ins, 4), (Match, 4), (Ins, 4), (Match, 4)], 1572 | nucl!("ACGTACGTACGTGGGGACGT"), 1573 | "12", 1574 | 7, 1575 | "*" 1576 | ); 1577 | compare_ins!( 1578 | Udon::build, 1579 | cigar![(Match, 4), (Ins, 4), (Match, 4), (Ins, 4), (Match, 4)], 1580 | nucl!("ACGTACGTACGTGGGGACGT"), 1581 | "12", 1582 | 9, 1583 | "*" 1584 | ); 1585 | } 1586 | 1587 | #[test] 1588 | fn test_udon_decode_query_ins_head() { 1589 | compare_ins!( 1590 | Udon::build, 1591 | cigar![(Ins, 4), (Match, 4), (Ins, 4), (Match, 4)], 1592 | nucl!("CCCCACGTACGTACGT"), 1593 | "8", 1594 | 0, 1595 | "CCCC" 1596 | ); 1597 | compare_ins!( 1598 | Udon::build, 1599 | cigar![(Ins, 4), (Match, 4), (Ins, 4), (Match, 4)], 1600 | nucl!("CCCCACGTACGTACGT"), 1601 | "8", 1602 | 1, 1603 | "*" 1604 | ); 1605 | compare_ins!( 1606 | Udon::build, 1607 | cigar![(Ins, 4), (Match, 4), (Ins, 4), (Match, 4)], 1608 | nucl!("CCCCACGTGGGGACGT"), 1609 | "8", 1610 | 4, 1611 | "GGGG" 1612 | ); 1613 | } 1614 | 1615 | #[test] 1616 | fn test_udon_decode_query_ins_poll() { 1617 | compare_ins!( 1618 | Udon::build, 1619 | cigar![(Match, BLOCK_PITCH + 4), (Ins, 4), (Match, 4)], 1620 | nucl!(format!( 1621 | "{}{}", 1622 | from_utf8(&['A' as u8; BLOCK_PITCH]).unwrap(), 1623 | "ACGTACGTACGT" 1624 | )), 1625 | format!("{}", BLOCK_PITCH + 8), 1626 | BLOCK_PITCH + 4, 1627 | "ACGT" 1628 | ); 1629 | compare_ins!( 1630 | Udon::build, 1631 | cigar![(Match, BLOCK_PITCH + 4), (Ins, 4), (Match, 4)], 1632 | nucl!(format!( 1633 | "{}{}", 1634 | from_utf8(&['A' as u8; BLOCK_PITCH]).unwrap(), 1635 | "ACGTACGTACGT" 1636 | )), 1637 | format!("{}", BLOCK_PITCH + 8), 1638 | BLOCK_PITCH + 3, 1639 | "*" 1640 | ); 1641 | compare_ins!( 1642 | Udon::build, 1643 | cigar![(Match, BLOCK_PITCH + 4), (Ins, 4), (Match, 4)], 1644 | nucl!(format!( 1645 | "{}{}", 1646 | from_utf8(&['A' as u8; BLOCK_PITCH]).unwrap(), 1647 | "ACGTACGTACGT" 1648 | )), 1649 | format!("{}", BLOCK_PITCH + 8), 1650 | BLOCK_PITCH + 5, 1651 | "*" 1652 | ); 1653 | } 1654 | 1655 | #[test] 1656 | fn test_udon_decode_query_ins_double_poll() { 1657 | compare_ins!( 1658 | Udon::build, 1659 | cigar![ 1660 | (Match, BLOCK_PITCH + 4), 1661 | (Ins, 4), 1662 | (Match, 4), 1663 | (Ins, 4), 1664 | (Match, 4) 1665 | ], 1666 | nucl!(format!( 1667 | "{}{}", 1668 | from_utf8(&['A' as u8; BLOCK_PITCH]).unwrap(), 1669 | "ACGTACGTACGTGGGGACGT" 1670 | )), 1671 | format!("{}", BLOCK_PITCH + 12), 1672 | BLOCK_PITCH + 8, 1673 | "GGGG" 1674 | ); 1675 | compare_ins!( 1676 | Udon::build, 1677 | cigar![ 1678 | (Match, BLOCK_PITCH + 4), 1679 | (Ins, 4), 1680 | (Match, 4), 1681 | (Ins, 4), 1682 | (Match, 4) 1683 | ], 1684 | nucl!(format!( 1685 | "{}{}", 1686 | from_utf8(&['A' as u8; BLOCK_PITCH]).unwrap(), 1687 | "ACGTACGTACGTGGGGACGT" 1688 | )), 1689 | format!("{}", BLOCK_PITCH + 12), 1690 | BLOCK_PITCH + 7, 1691 | "*" 1692 | ); 1693 | compare_ins!( 1694 | Udon::build, 1695 | cigar![ 1696 | (Match, BLOCK_PITCH + 4), 1697 | (Ins, 4), 1698 | (Match, 4), 1699 | (Ins, 4), 1700 | (Match, 4) 1701 | ], 1702 | nucl!(format!( 1703 | "{}{}", 1704 | from_utf8(&['A' as u8; BLOCK_PITCH]).unwrap(), 1705 | "ACGTACGTACGTGGGGACGT" 1706 | )), 1707 | format!("{}", BLOCK_PITCH + 12), 1708 | BLOCK_PITCH + 9, 1709 | "*" 1710 | ); 1711 | } 1712 | 1713 | #[test] 1714 | fn test_udon_decode_query_ins_double_poll2() { 1715 | compare_ins!( 1716 | Udon::build, 1717 | cigar![ 1718 | (Match, 4), 1719 | (Ins, 4), 1720 | (Match, BLOCK_PITCH - 8), 1721 | (Ins, 4), 1722 | (Match, 4), 1723 | (Ins, 4), 1724 | (Match, 4) 1725 | ], 1726 | nucl!(format!( 1727 | "{}{}{}", 1728 | "ACGTCCCC", 1729 | from_utf8(&['A' as u8; BLOCK_PITCH - 8]).unwrap(), 1730 | "TTTTACGTGGGGACGT" 1731 | )), 1732 | format!("{}", BLOCK_PITCH + 4), 1733 | 4, 1734 | "CCCC" 1735 | ); 1736 | compare_ins!( 1737 | Udon::build, 1738 | cigar![ 1739 | (Match, 4), 1740 | (Ins, 4), 1741 | (Match, BLOCK_PITCH - 8), 1742 | (Ins, 4), 1743 | (Match, 4), 1744 | (Ins, 4), 1745 | (Match, 4) 1746 | ], 1747 | nucl!(format!( 1748 | "{}{}{}", 1749 | "ACGTCCCC", 1750 | from_utf8(&['A' as u8; BLOCK_PITCH - 8]).unwrap(), 1751 | "TTTTACGTGGGGACGT" 1752 | )), 1753 | format!("{}", BLOCK_PITCH + 4), 1754 | BLOCK_PITCH - 4, 1755 | "TTTT" 1756 | ); 1757 | compare_ins!( 1758 | Udon::build, 1759 | cigar![ 1760 | (Match, 4), 1761 | (Ins, 4), 1762 | (Match, BLOCK_PITCH - 8), 1763 | (Ins, 4), 1764 | (Match, 4), 1765 | (Ins, 4), 1766 | (Match, 4) 1767 | ], 1768 | nucl!(format!( 1769 | "{}{}{}", 1770 | "ACGTCCCC", 1771 | from_utf8(&['A' as u8; BLOCK_PITCH - 8]).unwrap(), 1772 | "TTTTACGTGGGGACGT" 1773 | )), 1774 | format!("{}", BLOCK_PITCH + 4), 1775 | BLOCK_PITCH, 1776 | "GGGG" 1777 | ); 1778 | } 1779 | 1780 | #[test] 1781 | fn test_udon_decode_query_ins_head_double_poll() { 1782 | compare_ins!( 1783 | Udon::build, 1784 | cigar![ 1785 | (Ins, 4), 1786 | (Match, 4), 1787 | (Ins, 4), 1788 | (Match, BLOCK_PITCH - 8), 1789 | (Ins, 4), 1790 | (Match, 4), 1791 | (Ins, 4), 1792 | (Match, 4) 1793 | ], 1794 | nucl!(format!( 1795 | "{}{}{}", 1796 | "GGGGACGTCCCC", 1797 | from_utf8(&['A' as u8; BLOCK_PITCH - 8]).unwrap(), 1798 | "TTTTACGTGGGGACGT" 1799 | )), 1800 | format!("{}", BLOCK_PITCH + 4), 1801 | 0, 1802 | "GGGG" 1803 | ); 1804 | compare_ins!( 1805 | Udon::build, 1806 | cigar![ 1807 | (Ins, 4), 1808 | (Match, 4), 1809 | (Ins, 4), 1810 | (Match, BLOCK_PITCH - 8), 1811 | (Ins, 4), 1812 | (Match, 4), 1813 | (Ins, 4), 1814 | (Match, 4) 1815 | ], 1816 | nucl!(format!( 1817 | "{}{}{}", 1818 | "GGGGACGTCCCC", 1819 | from_utf8(&['A' as u8; BLOCK_PITCH - 8]).unwrap(), 1820 | "TTTTACGTGGGGACGT" 1821 | )), 1822 | format!("{}", BLOCK_PITCH + 4), 1823 | 4, 1824 | "CCCC" 1825 | ); 1826 | compare_ins!( 1827 | Udon::build, 1828 | cigar![ 1829 | (Ins, 4), 1830 | (Match, 4), 1831 | (Ins, 4), 1832 | (Match, BLOCK_PITCH - 8), 1833 | (Ins, 4), 1834 | (Match, 4), 1835 | (Ins, 4), 1836 | (Match, 4) 1837 | ], 1838 | nucl!(format!( 1839 | "{}{}{}", 1840 | "GGGGACGTCCCC", 1841 | from_utf8(&['A' as u8; BLOCK_PITCH - 8]).unwrap(), 1842 | "TTTTACGTGGGGACGT" 1843 | )), 1844 | format!("{}", BLOCK_PITCH + 4), 1845 | BLOCK_PITCH - 4, 1846 | "TTTT" 1847 | ); 1848 | compare_ins!( 1849 | Udon::build, 1850 | cigar![ 1851 | (Ins, 4), 1852 | (Match, 4), 1853 | (Ins, 4), 1854 | (Match, BLOCK_PITCH - 8), 1855 | (Ins, 4), 1856 | (Match, 4), 1857 | (Ins, 4), 1858 | (Match, 4) 1859 | ], 1860 | nucl!(format!( 1861 | "{}{}{}", 1862 | "GGGGACGTCCCC", 1863 | from_utf8(&['A' as u8; BLOCK_PITCH - 8]).unwrap(), 1864 | "TTTTACGTGGGGACGT" 1865 | )), 1866 | format!("{}", BLOCK_PITCH + 4), 1867 | BLOCK_PITCH, 1868 | "GGGG" 1869 | ); 1870 | } 1871 | 1872 | #[test] 1873 | fn test_udon_decode_query_ins_head_double_poll2() { 1874 | compare_ins!( 1875 | Udon::build, 1876 | cigar![ 1877 | (Ins, 4), 1878 | (Match, 4), 1879 | (Ins, 4), 1880 | (Match, BLOCK_PITCH - 8), 1881 | (Ins, 4), 1882 | (Match, 8), 1883 | (Ins, 4), 1884 | (Match, 4) 1885 | ], 1886 | nucl!(format!( 1887 | "{}{}{}", 1888 | "GGGGACGTCCCC", 1889 | from_utf8(&['A' as u8; BLOCK_PITCH - 8]).unwrap(), 1890 | "TTTTACGTGGGGACGTACGT" 1891 | )), 1892 | format!("{}", BLOCK_PITCH + 8), 1893 | BLOCK_PITCH + 4, 1894 | "ACGT" 1895 | ); 1896 | compare_ins!( 1897 | Udon::build, 1898 | cigar![ 1899 | (Ins, 4), 1900 | (Match, 4), 1901 | (Ins, 4), 1902 | (Match, BLOCK_PITCH - 8), 1903 | (Ins, 4), 1904 | (Match, 2), 1905 | (Del, 4), 1906 | (Match, 2), 1907 | (Ins, 4), 1908 | (Match, 4) 1909 | ], 1910 | nucl!(format!( 1911 | "{}{}{}", 1912 | "GGGGACGTCCCC", 1913 | from_utf8(&['A' as u8; BLOCK_PITCH - 8]).unwrap(), 1914 | "TTTTACGTGGGGACGT" 1915 | )), 1916 | format!("{}^ACGT6", BLOCK_PITCH - 2), 1917 | BLOCK_PITCH + 4, 1918 | "GGGG" 1919 | ); 1920 | compare_ins!( 1921 | Udon::build, 1922 | cigar![ 1923 | (Ins, 4), 1924 | (Match, 4), 1925 | (Ins, 4), 1926 | (Match, BLOCK_PITCH - 8), 1927 | (Ins, 4), 1928 | (Del, 8), 1929 | (Ins, 4), 1930 | (Match, 4) 1931 | ], 1932 | nucl!(format!( 1933 | "{}{}{}", 1934 | "GGGGACGTCCCC", 1935 | from_utf8(&['A' as u8; BLOCK_PITCH - 8]).unwrap(), 1936 | "TTTTGGGGACGT" 1937 | )), 1938 | format!("{}^ACGTACGT4", BLOCK_PITCH - 4), 1939 | BLOCK_PITCH + 4, 1940 | "GGGG" 1941 | ); 1942 | } 1943 | 1944 | #[test] 1945 | fn test_udon_decode_scaled() { 1946 | compare_color!( 1947 | Udon::build, 1948 | cigar![(Match, 4), (Del, 1), (Match, 4)], 1949 | nucl!("ACGTACGT"), 1950 | "4^A4", 1951 | Range { start: 0, end: 0 }, 1952 | 0.0, 1953 | 1.0, 1954 | vec![BG, BG, BG, BG, DEL, BG, BG, BG, BG], 1955 | 1.0 / (1.0f64.log(3.5).max(1.0) + 1.0f64 / 10.0), 1956 | 1.0 / (1.0f64.log(2.5).max(1.0) + 1.0f64 / 5.0) 1957 | ); 1958 | 1959 | compare_color!( 1960 | Udon::build, 1961 | cigar![(Match, 4), (Del, 1), (Match, 4)], 1962 | nucl!("ACGTACGT"), 1963 | "4^A4", 1964 | Range { start: 0, end: 0 }, 1965 | 0.0, 1966 | 3.0, 1967 | vec![BG, DEL, BG], 1968 | 1.0 / (3.0f64.log(3.5).max(1.0) + 3.0f64 / 10.0), 1969 | 1.0 / (3.0f64.log(2.5).max(1.0) + 3.0f64 / 5.0) 1970 | ); 1971 | 1972 | /* we need more tests but how to do */ 1973 | /* 1974 | compare_color!(Udon::build, 1975 | cigar![(Match, 9)], 1976 | nucl!("ACGTACGTA"), 1977 | "G0T0A0C0G0T0A0C0G", 1978 | Range { start: 0, end: 0 }, 1979 | 1.0 / 3.0, 1.5, 1980 | vec![BG, DEL.map(|x| x / 3), BG] 1981 | ); 1982 | */ 1983 | } 1984 | } 1985 | -------------------------------------------------------------------------------- /src/op.rs: -------------------------------------------------------------------------------- 1 | use std::slice::Iter; 2 | 3 | /* bam CIGAR (raw CIGAR) related structs: 4 | Transcoder API takes an array of ops, defined as the the following: 5 | 6 | ``` 7 | struct { 8 | uint32_t op : 4; 9 | uint32_t len : 28; 10 | }; 11 | ``` 12 | 13 | the detailed specification is found in SAM/BAM spec pdf. 14 | */ 15 | 16 | /* bam CIGAR operation code */ 17 | #[repr(u32)] 18 | #[allow(dead_code)] 19 | pub(super) enum CigarOp { 20 | Match = 0, 21 | Ins = 0x01, 22 | Del = 0x02, 23 | Unknown = 0x03, 24 | SoftClip = 0x04, 25 | HardClip = 0x05, 26 | Pad = 0x06, 27 | Eq = 0x07, 28 | Mismatch = 0x08, 29 | } 30 | 31 | /* bam CIGAR element definition */ 32 | bitfield! { 33 | #[derive(Copy, Clone, Default)] 34 | pub struct Cigar(u32); 35 | pub op, _: 3, 0; 36 | pub len, _: 31, 4; 37 | } 38 | 39 | /* transcoded CIGAR and index 40 | 41 | Transcoded CIGAR is encoded in a run-length manner, as in the original CIGAR string. 42 | Each chunk for the run-length compression is represented by an op, which is 8-bit unsigned. 43 | 44 | Op consists of two bitfields: total chunk length (columns) on reference sequence (5bit), 45 | and leading event(s) (3bit). The leading event is one of { insertion, deletion, mismatch } 46 | and the remainder till the end of the chunk is filled with matches. That is, any a chunk 47 | has one of the following composition: 48 | 49 | 1. insertion of arbitrary length, and trailing matches up to 30 bases. 50 | 2. deletion up to three bases, and trailing matches up to 30 - #deletions. 51 | 3. a mismatch, and trailing matches up to 29 bases. 52 | 53 | 54 | The 3-bit leading is defined to hold the events above as the following: 55 | 56 | * for insertion, place a marker (0b000) that there is inserted bases between the chunk 57 | and the previous chunk. 58 | * for deletion, encode the number of deleted bases as 0b001 ~ 0b011 (1 to 3). 59 | * for mismatch, encode a base on the query sequence as 0b100 (A) ~ 0b111 (T). 60 | 61 | 62 | The transcoded CIGAR stream is indexed at reference-side positions at regular intervals. 63 | The interval is defined as `BLOCK_PITCH`, which is 256 by default. Pointer to the op array 64 | and the insertion sequence array for each block is stored in `Block`. Since block boundary 65 | is not always aligned to chunk (op) boundary, an offset within chunk from the block 66 | boundary is also stored in `op_skip` field. 67 | */ 68 | 69 | /* op trailing events */ 70 | #[repr(u32)] 71 | pub(super) enum CompressMark { 72 | Ins = 0x00, 73 | 74 | /* mismatch base for 'A': 0x04, .., 'T': 0x07 */ 75 | Mismatch = 0x04, 76 | } 77 | 78 | /* 79 | op -> len conversion 80 | */ 81 | pub(super) fn op_len(x: u8) -> usize { 82 | let len = x as usize & 0x1f; 83 | len - (len == 31) as usize 84 | } 85 | 86 | pub(super) fn op_marker(x: u8) -> u32 { 87 | (x >> 5) as u32 88 | } 89 | 90 | pub(super) fn op_is_cont(x: u8) -> bool { 91 | (x & 0x1f) == 0x1f 92 | } 93 | 94 | /* OpsIterator 95 | 96 | Iterates over op array (&[u8]), accumulating reference-side offset. 97 | */ 98 | pub(super) struct OpsIter<'a> { 99 | iter: Iter<'a, u8>, 100 | rofs: usize, 101 | } 102 | 103 | pub(super) trait IntoOpsIterator<'a> { 104 | fn iter_ops(self, base_rofs: usize) -> OpsIter<'a>; 105 | } 106 | 107 | impl<'a> IntoOpsIterator<'a> for &'a [u8] { 108 | fn iter_ops(self, base_rofs: usize) -> OpsIter<'a> { 109 | OpsIter { 110 | iter: self.iter(), 111 | rofs: base_rofs, 112 | } 113 | } 114 | } 115 | 116 | impl<'a> Iterator for OpsIter<'a> { 117 | type Item = (u8, usize); 118 | 119 | fn next(&mut self) -> Option { 120 | let x = self.iter.next()?; 121 | self.rofs += op_len(*x); 122 | Some((*x, self.rofs)) 123 | } 124 | } 125 | -------------------------------------------------------------------------------- /src/scaler.rs: -------------------------------------------------------------------------------- 1 | use super::{UdonOp, UdonPalette}; 2 | use std::f64::consts::PI; 3 | use std::ops::{Add, AddAssign, Mul, MulAssign, Range, Sub}; 4 | 5 | /* Scaler 6 | 7 | Scales decoded op array into a color bar. Taking scaling factor (`columns_per_pixel`) and 8 | output vector of `Udon::decode`, it outputs Vec whose length is `input.len() / columns_per_pixel`. 9 | Each element of the output vector is negated 8-bit RGB color where R is placed at the least 10 | significant byte. The negated color is then overlaid on base color (typically pink for forward 11 | and light blue for reverse) by `UdonUtils::append_on_basecolor` then applied gamma correction by 12 | `UdonUtils::correct_gamma`. 13 | 14 | Scaler implements approximated Lanczos filter (sinc interpolation) for arbitrary scaling 15 | factor. It precomputes interpolation coefficient table on `new()` for sampling input columns for 16 | an output bin. Sixteen tables for different column offset fractions for arbitrary combination of 17 | scaling factor and drawing offset. 18 | */ 19 | #[derive(Copy, Clone, Debug, Default)] 20 | struct Color { 21 | v: [i32; 8], 22 | } 23 | 24 | impl From<&[[u8; 4]; 2]> for Color { 25 | fn from(val: &[[u8; 4]; 2]) -> Color { 26 | let mut x = Color::default(); 27 | for i in 0..4 { 28 | x.v[i] = val[0][i] as i32; 29 | } 30 | for i in 0..4 { 31 | x.v[i + 4] = val[1][i] as i32; 32 | } 33 | x 34 | } 35 | } 36 | 37 | impl From<&Color> for [[u8; 4]; 2] { 38 | fn from(val: &Color) -> [[u8; 4]; 2] { 39 | let mut x: [[u8; 4]; 2] = Default::default(); 40 | for i in 0..4 { 41 | x[0][i] = val.v[i].min(255).max(0) as u8; 42 | x[1][i] = val.v[i + 4].min(255).max(0) as u8; 43 | } 44 | x 45 | } 46 | } 47 | 48 | /* 49 | impl From<&Color> for (u32, u32) { 50 | fn from(val: &Color) -> (u32, u32) { 51 | let mut x: [[u8; 4]; 2] = Default::default(); 52 | for i in 0 .. 4 { 53 | x[0][i] = val.v[i ].min(255).max(0) as u8; 54 | x[1][i] = val.v[i + 4].min(255).max(0) as u8; 55 | } 56 | (u32::from_le_bytes(x[0]), u32::from_le_bytes(x[1])) 57 | } 58 | } 59 | */ 60 | 61 | impl Add for Color { 62 | type Output = Color; 63 | fn add(self, other: Color) -> Color { 64 | let mut x = self; 65 | for i in 0..8 { 66 | x.v[i] += other.v[i]; 67 | } 68 | x 69 | } 70 | } 71 | 72 | impl Sub for Color { 73 | type Output = Color; 74 | fn sub(self, other: Color) -> Color { 75 | let mut x = self; 76 | for i in 0..8 { 77 | x.v[i] -= other.v[i]; 78 | } 79 | x 80 | } 81 | } 82 | 83 | impl Mul for Color { 84 | type Output = Color; 85 | fn mul(self, other: Color) -> Color { 86 | let mut x = self; 87 | for i in 0..8 { 88 | /* upcast, multiply, then downcast. (expect pmuludq) */ 89 | let n = x.v[i] as i64 * other.v[i] as i64; 90 | x.v[i] = (n >> 24) as i32; /* I expect this won't overflow but no guarantee */ 91 | } 92 | x 93 | } 94 | } 95 | 96 | impl Add for Color { 97 | type Output = Color; 98 | fn add(self, other: i32) -> Color { 99 | let mut x = self; 100 | for i in 0..8 { 101 | x.v[i] += other; 102 | } 103 | x 104 | } 105 | } 106 | 107 | impl Mul for Color { 108 | type Output = Color; 109 | fn mul(self, other: i32) -> Color { 110 | let mut x = self; 111 | for i in 0..8 { 112 | let n = x.v[i] as i64 * other as i64; 113 | x.v[i] = (n >> 24) as i32; 114 | } 115 | x 116 | } 117 | } 118 | 119 | /* arithmetic assign */ 120 | impl AddAssign for Color { 121 | fn add_assign(&mut self, other: Color) { 122 | *self = self.add(other); 123 | } 124 | } 125 | 126 | impl MulAssign for Color { 127 | fn mul_assign(&mut self, other: Color) { 128 | *self = self.mul(other); 129 | } 130 | } 131 | 132 | /* Scaler second impl */ 133 | #[derive(Default)] 134 | pub(super) struct Scaler { 135 | columns_per_pixel: f64, 136 | window: f64, 137 | offset: f64, 138 | normalizer: Color, 139 | color: [Color; 12], 140 | table: [Vec; 17], 141 | } 142 | 143 | fn sinc(rad: f64) -> f64 { 144 | if !rad.is_normal() { 145 | return 1.0; 146 | } 147 | rad.sin() / rad 148 | } 149 | 150 | fn sincx(x: f64, order: f64) -> f64 { 151 | sinc(PI * x) * sinc(PI * (x / order).max(-1.0).min(1.0)) 152 | } 153 | 154 | fn clip(x: f64, window: f64) -> f64 { 155 | if x > 0.0 { 156 | if x < window { 0.0 } else { x - window } 157 | } else if x > -window { 0.0 } else { x + window } 158 | } 159 | 160 | impl Scaler { 161 | /* column -> color index */ 162 | const INDEX: [u8; 32] = { 163 | let mut index = [0; 32]; 164 | index[UdonOp::MisA as usize] = 1; 165 | index[UdonOp::MisC as usize] = 2; 166 | index[UdonOp::MisG as usize] = 3; 167 | index[UdonOp::MisT as usize] = 4; 168 | index[UdonOp::Del as usize] = 5; 169 | index[UdonOp::Ins as usize | UdonOp::MisA as usize] = 6; 170 | index[UdonOp::Ins as usize | UdonOp::MisC as usize] = 7; 171 | index[UdonOp::Ins as usize | UdonOp::MisG as usize] = 8; 172 | index[UdonOp::Ins as usize | UdonOp::MisT as usize] = 9; 173 | index[UdonOp::Ins as usize | UdonOp::Del as usize] = 10; 174 | index 175 | }; 176 | 177 | fn index(column: u8) -> usize { 178 | assert!(column < 32, "{}", column); 179 | 180 | Self::INDEX[column as usize] as usize 181 | } 182 | 183 | fn pick_color(&self, column: u8) -> Color { 184 | self.color[Self::index(column)] 185 | } 186 | 187 | fn build_color_table(color: &UdonPalette) -> [Color; 12] { 188 | let ff = Color::from(&[[0xff, 0xff, 0xff, 0xff], [0xff, 0xff, 0xff, 0xff]]); 189 | let mismatch: [Color; 4] = [ 190 | ff - Color::from(&color.mismatch[0]), 191 | ff - Color::from(&color.mismatch[1]), 192 | ff - Color::from(&color.mismatch[2]), 193 | ff - Color::from(&color.mismatch[3]), 194 | ]; 195 | let del = ff - Color::from(&color.del); 196 | let ins = ff - Color::from(&color.ins); 197 | let bg = ff - Color::from(&color.background); 198 | 199 | let mut x = [bg; 12]; 200 | x[Self::index(UdonOp::MisA as u8)] = mismatch[0]; 201 | x[Self::index(UdonOp::MisC as u8)] = mismatch[1]; 202 | x[Self::index(UdonOp::MisG as u8)] = mismatch[2]; 203 | x[Self::index(UdonOp::MisT as u8)] = mismatch[3]; 204 | x[Self::index(UdonOp::Del as u8)] = del; 205 | x[Self::index(UdonOp::Ins as u8 + UdonOp::MisA as u8)] = 206 | ins * 0x800000 + mismatch[0] * 0x800000; 207 | x[Self::index(UdonOp::Ins as u8 + UdonOp::MisC as u8)] = 208 | ins * 0x800000 + mismatch[1] * 0x800000; 209 | x[Self::index(UdonOp::Ins as u8 + UdonOp::MisG as u8)] = 210 | ins * 0x800000 + mismatch[2] * 0x800000; 211 | x[Self::index(UdonOp::Ins as u8 + UdonOp::MisT as u8)] = 212 | ins * 0x800000 + mismatch[3] * 0x800000; 213 | x[Self::index(UdonOp::Ins as u8 + UdonOp::Del as u8)] = ins * 0x800000 + del * 0x800000; 214 | 215 | x 216 | } 217 | 218 | const WINDOW: f64 = 1.0; 219 | 220 | fn build_coef_table(v: &mut Vec, i: usize, scale: f64, pitch: f64, width: f64) { 221 | let span = 2 * ((0.5 * scale).ceil() as usize); 222 | let offset = (i as f64 - 8.0) / 16.0; 223 | 224 | /* FIXME */ 225 | let center = pitch * pitch * offset + span as f64 / 2.0; 226 | 227 | for j in 0..=span { 228 | let dist = center - j as f64; 229 | let coef = sincx(clip(dist, width) / pitch, Self::WINDOW); 230 | 231 | /* FIXME */ 232 | let coef = coef.max(0.0); 233 | v.push((0x01000000 as f64 * coef) as i32); 234 | } 235 | } 236 | 237 | pub(super) fn new(color: &UdonPalette, columns_per_pixel: f64) -> Scaler { 238 | let scale = columns_per_pixel.max(1.0); 239 | let pitch = columns_per_pixel / scale; 240 | let width = 0.5 * (scale - 1.0); 241 | let color_coef = 1.0 / (columns_per_pixel.log(3.5).max(1.0) + columns_per_pixel / 10.0); 242 | let alpha_coef = 1.0 / (columns_per_pixel.log(2.5).max(1.0) + columns_per_pixel / 5.0); 243 | 244 | let mut x = Scaler { 245 | columns_per_pixel, 246 | window: scale.ceil() + 1.0, 247 | offset: (scale.ceil() + 1.0) / 2.0, 248 | normalizer: Color { 249 | v: [ 250 | (0x01000000 as f64 * color_coef) as i32, 251 | (0x01000000 as f64 * color_coef) as i32, 252 | (0x01000000 as f64 * color_coef) as i32, 253 | (0x01000000 as f64 * alpha_coef) as i32, 254 | (0x01000000 as f64 * color_coef) as i32, 255 | (0x01000000 as f64 * color_coef) as i32, 256 | (0x01000000 as f64 * color_coef) as i32, 257 | (0x01000000 as f64 * alpha_coef) as i32, 258 | ], 259 | }, 260 | color: Self::build_color_table(color), 261 | table: Default::default(), 262 | }; 263 | 264 | for i in 0..17 { 265 | Self::build_coef_table(&mut x.table[i], i, scale, pitch, width); 266 | } 267 | x 268 | } 269 | 270 | /* 271 | fn calc_coef(dist: f64, shoulder: f64, magnifier: f64) -> f64 { 272 | let width = magnifier - 1.0; 273 | 274 | if dist.abs() < magnifier / 4.0 { 275 | return 1.0; 276 | } 277 | 0.5 * ( 278 | sincx((dist + shoulder) * magnifier, Self::WINDOW) 279 | + sincx((dist - shoulder) * magnifier, Self::WINDOW) 280 | ) 281 | } 282 | 283 | fn build_coef_table(v: &mut Vec, i: usize, magnifier: f64, shoulder: f64) { 284 | 285 | // println!("columns_per_pixel({:?}), scale({:?}), width({:?}), shoulder({:?})", columns_per_pixel, scale, pitch - 1.0, shoulder); 286 | let offset = i as f64 / 16.0; 287 | let center = offset + Self::WINDOW * magnifier; 288 | 289 | // let start = offset; 290 | let end = offset + 2.0 * Self::WINDOW * magnifier; 291 | let span = end.ceil() as usize; 292 | 293 | // let mut x = Vec::new(); 294 | for j in 0 ..= span { 295 | 296 | let dist = center - j as f64; 297 | let coef = Self::calc_coef(dist, shoulder, magnifier); 298 | // v.push((j, dist, format!("{:#.3}", coef))); 299 | v.push((0x01000000 as f64 * coef) as i32); 300 | } 301 | // println!("offset({:#.05}), center({:#.05}), r({:#.3}, {:#.3}), r({:#.3}, {:#.3}), {:?}", offset, center, start, end, start.floor() as i64, end.ceil() as i64, x); 302 | } 303 | 304 | pub fn new(color: &UdonPalette, columns_per_pixel: f64) -> Scaler { 305 | let scale = columns_per_pixel.max(1.0); 306 | let magnifier = scale / columns_per_pixel; 307 | 308 | let mut x = Scaler { 309 | columns_per_pixel: columns_per_pixel, 310 | window: (2.0 * Self::WINDOW) * magnifier, 311 | color: Self::build_color_table(&color), 312 | table: Default::default() 313 | }; 314 | 315 | for i in 0 .. 17 { 316 | Self::build_coef_table(&mut x.table[i], 317 | i, magnifier, 0.25 / columns_per_pixel 318 | ); 319 | } 320 | x 321 | } 322 | */ 323 | 324 | pub(super) fn expected_size(&self, span: usize) -> usize { 325 | (span as f64 / self.columns_per_pixel) as usize + 2 326 | } 327 | 328 | pub(super) fn init(&self, offset_in_pixels: f64) -> (f64, usize) { 329 | let offset = (offset_in_pixels + 0.5) * self.columns_per_pixel; 330 | let margin = (/* offset + */self.offset) as usize; 331 | // println!("init, offset({}, {}), margin({})", offset, self.offset, margin); 332 | 333 | (offset, margin) 334 | } 335 | 336 | pub(super) fn scale( 337 | &self, 338 | dst: &mut Vec<[[u8; 4]; 2]>, 339 | src: &[u8], 340 | offset: f64, 341 | ) -> Option<(f64, usize)> { 342 | // println!("scale, offset({})", offset); 343 | for i in 0.. { 344 | /* 345 | offset := offset_in_columns 346 | */ 347 | let base = offset + (i as f64 * self.columns_per_pixel); 348 | let range = Range:: { 349 | start: base as usize, 350 | end: (base + self.window + 1.0).ceil() as usize, 351 | }; 352 | // println!("base({}), range({:?})", base, range); 353 | 354 | if range.end > src.len() { 355 | return Some((base.fract(), range.start)); 356 | } 357 | 358 | let table = &self.table[(base.fract() * 16.0) as usize]; 359 | // println!("frac({}), {:?}", base.fract(), table); 360 | 361 | let mut a = Color::default(); 362 | for (&coef, &column) in table.iter().zip(src[range].iter()) { 363 | // println!("col({}), color({:?}), coef({})", column, self.pick_color(column), coef); 364 | a += self.pick_color(column) * coef; 365 | } 366 | 367 | a *= self.normalizer; 368 | // println!("acc({:?})", <[[u8; 4]; 2]>::from(&a)); 369 | dst.push(<[[u8; 4]; 2]>::from(&a)); 370 | } 371 | 372 | None 373 | } 374 | } 375 | 376 | /* 377 | /* Scaler and its impl */ 378 | struct Scaler { 379 | accum: Color, 380 | prev: Color, 381 | color: [Color; 12], 382 | normalizer: u32 383 | } 384 | 385 | impl Scaler { 386 | 387 | /* column -> color index */ 388 | const INDEX: [u8; 32] = { 389 | let mut index = [0; 32]; 390 | index[Op::MisA as usize] = 1; 391 | index[Op::MisC as usize] = 2; 392 | index[Op::MisG as usize] = 3; 393 | index[Op::MisT as usize] = 4; 394 | index[Op::Del as usize] = 5; 395 | index[Op::Ins as usize | Op::MisA as usize] = 6; 396 | index[Op::Ins as usize | Op::MisC as usize] = 7; 397 | index[Op::Ins as usize | Op::MisG as usize] = 8; 398 | index[Op::Ins as usize | Op::MisT as usize] = 9; 399 | index[Op::Ins as usize | Op::Del as usize] = 10; 400 | index 401 | }; 402 | fn index(column: u8) -> usize { 403 | assert!(column < 32, "{}", column); 404 | 405 | Self::INDEX[column as usize] as usize 406 | } 407 | 408 | /* fraction in margin -> coefficient */ 409 | const COEF: [u32; 33] = [ 410 | 0x01000000, 411 | 0x00ff47f5, 412 | 0x00fd2228, 413 | 0x00f99580, 414 | 0x00f4ad63, 415 | 0x00ee7987, 416 | 0x00e70db5, 417 | 0x00de8179, 418 | 0x00d4efc8, 419 | 0x00ca7697, 420 | 0x00bf3666, 421 | 0x00b351bf, 422 | 0x00a6ecb7, 423 | 0x009a2c61, 424 | 0x008d3640, 425 | 0x00802fba, 426 | 0x00733d90, 427 | 0x00668354, 428 | 0x005a22e9, 429 | 0x004e3c09, 430 | 0x0042ebda, 431 | 0x00384c89, 432 | 0x002e74f8, 433 | 0x00257873, 434 | 0x001d6681, 435 | 0x00164ab8, 436 | 0x00102ca9, 437 | 0x000b0fdf, 438 | 0x0006f3e8, 439 | 0x0003d473, 440 | 0x0001a97f, 441 | 0x00006790, 442 | 0x00000000 443 | ]; 444 | fn coef(frac: f64) -> (u32, u32) { 445 | 446 | /* 447 | let index = (frac * 32.0) as usize; 448 | assert!(index < 33, "frac({}), index({})", frac, index); 449 | 450 | (Self::COEF[index], Self::COEF[32 - index]) 451 | */ 452 | 453 | let coef = (0x01000000 as f64 * frac) as u32; 454 | (coef, 0x01000000 - coef) 455 | 456 | } 457 | 458 | 459 | fn new(color: &UdonPalette, pitch: f64) -> Self { 460 | // let normalizer = (0x01000000 as f64 * (pitch + 1.0).log(2.7) + 1.0) as u32; 461 | let normalizer = (0x01000000 as f64 * pitch) as u32; 462 | debug!("pitch({}), normalizer({})", pitch, (pitch + 1.0).log(2.7)); 463 | 464 | Scaler { 465 | accum: Color::default(), 466 | prev: Color::default(), 467 | normalizer: normalizer, 468 | color: Self::build_color_table(&color) 469 | } 470 | } 471 | 472 | fn build_color_table(color: &UdonPalette) -> [Color; 12] { 473 | 474 | let mismatch: [Color; 4] = [ 475 | Color::from(&color.mismatch[0]), 476 | Color::from(&color.mismatch[1]), 477 | Color::from(&color.mismatch[2]), 478 | Color::from(&color.mismatch[3]) 479 | ]; 480 | let del = Color::from(&color.del); 481 | let ins = Color::from(&color.ins); 482 | let bg = Color::from(&color.background); 483 | 484 | let mut x = [bg; 12]; 485 | x[Self::index(Op::MisA as u8)] = mismatch[0]; 486 | x[Self::index(Op::MisC as u8)] = mismatch[1]; 487 | x[Self::index(Op::MisG as u8)] = mismatch[2]; 488 | x[Self::index(Op::MisT as u8)] = mismatch[3]; 489 | x[Self::index(Op::Del as u8)] = del; 490 | x[Self::index(Op::MisA as u8 + Op::Ins as u8)] = ins * 0x800000 + mismatch[0] * 0x800000; 491 | x[Self::index(Op::MisC as u8 + Op::Ins as u8)] = ins * 0x800000 + mismatch[1] * 0x800000; 492 | x[Self::index(Op::MisG as u8 + Op::Ins as u8)] = ins * 0x800000 + mismatch[2] * 0x800000; 493 | x[Self::index(Op::MisT as u8 + Op::Ins as u8)] = ins * 0x800000 + mismatch[3] * 0x800000; 494 | x[Self::index(Op::Del as u8 + Op::Ins as u8)] = ins * 0x800000 + del * 0x800000; 495 | 496 | debug!("{:?}", x); 497 | x 498 | } 499 | 500 | fn accumulate(&mut self, column: u8) { 501 | let color = &self.color[Self::index(column)]; 502 | 503 | /* let compiler vectorize this! */ 504 | self.accum += *color; 505 | 506 | debug!("accumulate: color({:?}), accum({:?})", color, self.accum); 507 | 508 | /* copy to save */ 509 | self.prev = *color; 510 | } 511 | 512 | fn interpolate(&mut self, column: u8, frac: f64) { 513 | 514 | let curr = &self.color[Self::index(column)]; 515 | let (c0, c1) = Self::coef(frac); 516 | 517 | /* let compiler vectorize this!! */ 518 | self.accum += *curr * c0; 519 | self.accum += self.prev * c1; 520 | 521 | debug!("interpolate: color({:?}, {:?}), frac({}), c({}, {}), accum({:?})", self.prev, curr, frac, c0, c1, self.accum); 522 | 523 | /* copy to save */ 524 | self.prev = *curr; 525 | } 526 | 527 | fn flush(&mut self, dst: &mut Vec, cnt: usize) -> Option<()> { 528 | assert!(cnt > 0, "{}", cnt); 529 | 530 | /* I hope this conversions are automatically vectorized!!!! */ 531 | let body = u32::from(&self.prev); 532 | let tail = u32::from(&(self.accum * self.normalizer)); 533 | 534 | debug!("flush: body({:?}), tail({:?})", self.prev, self.accum * self.normalizer); 535 | 536 | for _ in 0 .. cnt - 1 { dst.push(body); } /* broadcast for scale < 1.0 */ 537 | dst.push(tail); 538 | 539 | /* clear all */ 540 | self.accum = Color::default(); 541 | return Some(()); 542 | } 543 | 544 | fn scale(&mut self, dst: &mut Vec, src: &[u8], offset: f64, pitch: f64, margin: f64) -> Option { 545 | 546 | /* returns new offset on success, None on failure */ 547 | 548 | debug!("offset({}), pitch({}), margin({})", offset, pitch, margin); 549 | debug!("{:?}", src); 550 | 551 | assert!(!(offset < 0.0)); 552 | assert!(pitch > 0.0); 553 | assert!(margin > 0.0); 554 | 555 | let rev_pitch = 1.0 / pitch; 556 | let rev_margin = 1.0 / margin; 557 | let mut last_bin = 0; 558 | let mut dst = dst; 559 | 560 | for (i, &column) in src.iter().enumerate() { 561 | let pos = i as f64 + offset; 562 | let bin = (pos * rev_pitch).floor(); 563 | let thresh = bin * pitch + 1.0; 564 | 565 | debug!("i({}), pos({}), bin({}), frac({}), thresh({}), column({}), index({}), color({:?})", i, pos, bin, (pos - bin * pitch) * rev_margin, thresh, column, Self::index(column), &self.color[Self::index(column)]); 566 | 567 | /* flush if needed */ { 568 | let bin = bin as usize; 569 | if bin != last_bin { 570 | self.flush(&mut dst, bin - last_bin)?; 571 | } 572 | last_bin = bin; 573 | } 574 | 575 | /* do interpolation if on margin */ 576 | if pos < thresh { 577 | /* relative position in margin: [0.0, 1.0) */ 578 | self.interpolate(column, (pos - bin * pitch) * rev_margin); 579 | continue; 580 | } 581 | 582 | /* just accumulate otherwise */ 583 | self.accumulate(column); 584 | } 585 | self.flush(&mut dst, 1); 586 | 587 | /* compute new offset */ 588 | let last_pos = src.len() as f64 + offset; 589 | let last_bin = (last_pos / pitch).floor(); 590 | return Some(last_pos - last_bin * pitch); 591 | } 592 | } 593 | */ 594 | -------------------------------------------------------------------------------- /src/utils.rs: -------------------------------------------------------------------------------- 1 | use super::op::CompressMark; 2 | use std::convert::AsRef; 3 | use std::marker::PhantomData; 4 | use std::mem::size_of; 5 | use std::ops::Range; 6 | use std::slice::{from_raw_parts, from_raw_parts_mut, Iter}; 7 | 8 | /* Precursor 9 | 10 | keeps offsets in Vec, then convert it to slice in Vec. 11 | (Is there better (safer) way? This implementation can't force user to use the same 12 | vector on creating and transforming it. It's better if we can put lifetime paramater 13 | to match it to that of vec.) 14 | */ 15 | #[derive(Copy, Clone, Debug, Default)] 16 | pub(super) struct SlicePrecursor { 17 | /* just equivalent to Range */ 18 | ofs: usize, 19 | len: usize, 20 | _marker: PhantomData, /* not effectively used for now. what to do with this? */ 21 | } 22 | 23 | #[allow(dead_code)] 24 | impl<'a, T> SlicePrecursor { 25 | pub(super) fn compose(range: &Range) -> Self { 26 | SlicePrecursor:: { 27 | ofs: range.start, 28 | len: range.end - range.start, 29 | _marker: PhantomData::, 30 | } 31 | } 32 | 33 | pub(super) fn finalize_raw(&self, base: *const u8) -> &'a [T] { 34 | let ptr = base.wrapping_add(self.ofs) as *const T; 35 | let cnt = self.len / size_of::(); 36 | unsafe { from_raw_parts(ptr, cnt) } 37 | } 38 | 39 | pub(super) fn finalize(&self, v: &'a Vec) -> &'a [T] { 40 | let base = v.as_ptr() as *const u8; 41 | self.finalize_raw(base) 42 | } 43 | } 44 | 45 | /* SimdAlignedU8 46 | 47 | 16-byte aligned array for SSE and NEON 48 | */ 49 | #[repr(align(16))] 50 | pub(super) struct SimdAlignedU8 { 51 | v: [u8; 16], 52 | } 53 | 54 | impl SimdAlignedU8 { 55 | pub(super) const fn new(v: &[u8; 16]) -> Self { 56 | SimdAlignedU8 { v: *v } 57 | } 58 | } 59 | 60 | impl AsRef<[u8; 16]> for SimdAlignedU8 { 61 | fn as_ref(&self) -> &[u8; 16] { 62 | &self.v 63 | } 64 | } 65 | 66 | /* Writer and reserve_to 67 | 68 | This trait and functions is for vectorized array storing. The `reserve_to` function 69 | allocates `count` elements in the writer `T`, which is typically Vec and passes 70 | it to a clsure for use as a destination array. 71 | 72 | The return value `usize` of the closure is the actual number of elements stored to 73 | the array. It is used for shrinking the base buffer to the exact length that valid 74 | elements stored. 75 | */ 76 | pub(super) trait Writer { 77 | type T; 78 | 79 | /* 80 | `func: FnOnce(&mut [U], &[T]) -> usize` supposed to take working buffer of 81 | size `count` and corresponding base array at the head, do the work, and 82 | return object count that is actually used in the procedure. 83 | */ 84 | fn reserve_to(&mut self, count: usize, func: F) -> Range 85 | where 86 | U: Sized, 87 | F: FnOnce(&mut [U], &[T]) -> usize; 88 | } 89 | 90 | /* implementation for Vec */ 91 | impl Writer for Vec { 92 | type T = u8; 93 | 94 | fn reserve_to(&mut self, count: usize, func: F) -> Range 95 | where 96 | U: Sized, 97 | F: FnOnce(&mut [U], &[T]) -> usize, 98 | { 99 | let base_offset = (self.len() + size_of::() - 1) / size_of::() * size_of::(); 100 | let request_size = size_of::() * count; 101 | // println!("base({:?}), request({:?})", base_offset, request_size); 102 | 103 | /* extend array to hold requested count of U */ 104 | self.resize(base_offset + request_size, T::default()); /* can be mem::uninitialized() ? */ 105 | 106 | /* split resized buffer into base (immutable) and working area (mutable) */ 107 | let (base, work) = self.split_at_mut(base_offset); 108 | 109 | /* convert &mut [u8] -> &mut [U] */ 110 | let ptr = work.as_mut_ptr(); 111 | let work = unsafe { from_raw_parts_mut(ptr as *mut U, count) }; 112 | 113 | /* do the work */ 114 | let consumed_count = func(work, base); 115 | 116 | /* save the length */ 117 | assert!(consumed_count <= count); 118 | let consumed_size = size_of::() * consumed_count; 119 | self.resize(base_offset + consumed_size, T::default()); 120 | 121 | /* returns base_offset (count in bytes) for composing packed data structure */ 122 | Range:: { 123 | start: base_offset, 124 | end: base_offset + consumed_size, 125 | } 126 | } 127 | } 128 | 129 | /* PeekFold iterator 130 | 131 | Similar to `Iterator::try_fold`, but it doesn't consume the last-peeked element. 132 | The iteration can be resumed from the element of the last failure. 133 | */ 134 | pub trait PeekFold { 135 | fn peek_fold(&mut self, init: A, func: F) -> A 136 | where 137 | Self: Sized, 138 | A: Copy, 139 | F: FnMut(A, &T) -> Option; 140 | } 141 | 142 | /* this implementation uses `as_slice()` for peeking the head element, assuming T being slice::Iter */ 143 | impl<'a, T: Sized> PeekFold for Iter<'a, T> { 144 | fn peek_fold(&mut self, init: A, mut func: F) -> A 145 | where 146 | Self: Sized, 147 | A: Copy, 148 | F: FnMut(A, &T) -> Option, 149 | { 150 | let mut accum: A = init; 151 | loop { 152 | /* peek */ 153 | let x = match self.as_slice().first() { 154 | None => { 155 | return accum; 156 | } 157 | Some(x) => x, 158 | }; 159 | 160 | /* updated accumulator is discarded when func returned false */ 161 | let next_accum = match func(accum, x) { 162 | None => { 163 | return accum; 164 | } 165 | Some(x) => x, 166 | }; 167 | 168 | /* if continuous flag is true, overwrite (update) accumulator and forward iterator */ 169 | self.next(); 170 | accum = next_accum; 171 | } 172 | } 173 | } 174 | 175 | /* convert ascii-printed number to u64, overflow is ignored. */ 176 | pub(super) fn atoi_unchecked(v: &mut Iter) -> u64 { 177 | v.peek_fold(0, |a, x| { 178 | let m = (*x as u64).wrapping_sub('0' as u64); 179 | if m >= 10 { 180 | return None; 181 | } 182 | 183 | Some(10 * a + m) 184 | }) 185 | } 186 | 187 | #[allow(dead_code)] 188 | pub(super) fn isnum(c: u8) -> bool { 189 | (c as u64).wrapping_sub('0' as u64) < 10_u64 190 | } 191 | 192 | /* 193 | transcode { 'A', 'C', 'G', 'T', 'a', 'c', 'g', 't' } -> { 0x0, 0x1, 0x2, 0x3, 0x0, 0x1, 0x2, 0x3 } 194 | and then mark mismatch. 195 | */ 196 | pub(super) fn transcode_base_unchecked(c: u8) -> u32 { 197 | let c = c as u32; 198 | let b2 = ((c >> 1) - (c >> 3)) & 0x03; 199 | CompressMark::Mismatch as u32 + b2 200 | } 201 | 202 | #[allow(dead_code)] 203 | pub(super) fn encode_base_unchecked(c: char) -> u8 { 204 | match c { 205 | 'A' | 'a' => 0x01, 206 | 'C' | 'c' => 0x02, 207 | 'G' | 'g' => 0x04, 208 | 'T' | 't' => 0x08, 209 | _ => 0x00, 210 | } 211 | } 212 | 213 | #[allow(dead_code)] 214 | pub(super) fn decode_base_unchecked(c: u32) -> char { 215 | /* one-hot encoding */ 216 | match "-AC-G---T-------".as_bytes().get(c as usize).copied() { 217 | None => '-', 218 | Some(x) => x as char, 219 | } 220 | } 221 | 222 | /* all the remainings are unittests */ 223 | #[cfg(test)] 224 | mod test_utils { 225 | use super::{atoi_unchecked, isnum, PeekFold}; 226 | 227 | macro_rules! test_atoi_unchecked_impl { 228 | ( $str: expr, $( $num: expr ),* ) => ({ 229 | let s = $str; 230 | let mut it = s.as_bytes().iter(); 231 | $({ 232 | let n = atoi_unchecked(&mut it); 233 | assert_eq!(n, $num); 234 | 235 | (&mut it).peek_fold(0, |_, &x| { if isnum(x) { None } else { Some(0) } }); 236 | })* 237 | }) 238 | } 239 | 240 | #[test] 241 | fn test_atoi_unchecked() { 242 | test_atoi_unchecked_impl!("0", 0); 243 | test_atoi_unchecked_impl!("10", 10); 244 | test_atoi_unchecked_impl!("-10", 0); 245 | 246 | /* the following also work tests for PeekFold iterator */ 247 | test_atoi_unchecked_impl!("10M11", 10, 11); 248 | test_atoi_unchecked_impl!("X0M1X222222MMMMMMM1234XXXX", 0, 0, 1, 222222, 1234); 249 | } 250 | 251 | #[test] 252 | fn test_isnum() { 253 | /* trivial */ 254 | assert_eq!(isnum('0' as u8), true); 255 | assert_eq!(isnum('9' as u8), true); 256 | assert_eq!(isnum('-' as u8), false); 257 | assert_eq!(isnum(' ' as u8), false); 258 | } 259 | } 260 | --------------------------------------------------------------------------------