├── .gitignore ├── .gitmodules ├── LICENSE ├── README.md ├── resources ├── 1kg.superpopulation ├── 1kgp.indiv_population ├── GRCh38.chrom_map └── GRCh38.length_map ├── snakemake ├── README.md ├── Snakefile ├── config.yaml └── shared │ ├── alignment_paired_end.Snakefile │ ├── alignment_single_end.Snakefile │ ├── functions.Snakefile │ ├── lift_and_sort.Snakefile │ ├── prepare_pop_genome.Snakefile │ └── prepare_standard_genome.Snakefile ├── src ├── Makefile ├── add_aux.cpp ├── add_aux.hpp ├── download_1kg_pop_table.sh ├── download_1kg_vcf.sh ├── download_genome.sh ├── download_prebuilt_indexes.sh ├── list_indiv_from_pop.py ├── merge_incremental.py ├── merge_sam.cpp ├── merge_sam.hpp ├── refflow_utils.cpp ├── refflow_utils.hpp ├── split_sam.cpp ├── split_sam.hpp ├── split_sam_by_mapq.py ├── update_genome.py └── utils.py └── test └── SRR622457_1-1k.fastq /.gitignore: -------------------------------------------------------------------------------- 1 | snakemake/run/ 2 | **/*.vcf.gz 3 | src/*.o 4 | resources/GCA_000001405.15_GRCh38_no_alt_analysis_set.fna* 5 | resources/20130606_g1k.ped 6 | -------------------------------------------------------------------------------- /.gitmodules: -------------------------------------------------------------------------------- 1 | [submodule "levioSAM"] 2 | path = levioSAM 3 | url = git@github.com:alshai/levioSAM.git 4 | -------------------------------------------------------------------------------- /LICENSE: -------------------------------------------------------------------------------- 1 | MIT License 2 | 3 | Copyright (c) 2020 langmead-lab 4 | 5 | Permission is hereby granted, free of charge, to any person obtaining a copy 6 | of this software and associated documentation files (the "Software"), to deal 7 | in the Software without restriction, including without limitation the rights 8 | to use, copy, modify, merge, publish, distribute, sublicense, and/or sell 9 | copies of the Software, and to permit persons to whom the Software is 10 | furnished to do so, subject to the following conditions: 11 | 12 | The above copyright notice and this permission notice shall be included in all 13 | copies or substantial portions of the Software. 14 | 15 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR 16 | IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, 17 | FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE 18 | AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER 19 | LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, 20 | OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE 21 | SOFTWARE. 22 | -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | # Reference Flow 2 | 3 | Reference flow is a computational genomics pipeline to align short sequencing 4 | reads with a number of reference genomes built with population genomic information included. 5 | 6 | The preprint ["Reducing reference bias using multiple population reference genomes"](https://doi.org/10.1101/2020.03.03.975219) is available on bioRxiv. 7 | 8 | The experiments we've done are provided in the [Reference Flow Experiments repository](https://github.com/langmead-lab/reference_flow-experiments). 9 | 10 | ## Preparation 11 | 12 | ### Install Snakemake 13 | 14 | Reference flow is written using [Snakemake](https://snakemake.readthedocs.io/en/stable/index.html) and it can be installed using conda: 15 | 16 | ``` 17 | conda install -c bioconda -c conda-forge snakemake 18 | ``` 19 | 20 | Other installation approaches are also provided in the [Snakemake installation page](https://snakemake.readthedocs.io/en/stable/getting_started/installation.html). 21 | 22 | ### Download the 1000 Genomes Project population table 23 | 24 | [Population information for the 1000 Genomes Project individuals](https://www.internationalgenome.org/faq/which-samples-are-you-sequencing/) can be downloaded by running the script: 25 | 26 | ``` 27 | sh src/download_1kg_pop_table.sh 28 | ``` 29 | 30 | ### LevioSAM 31 | 32 | LevioSAM is used to perform coordinate system mappign between population reference genomes and a standard reference genome such as GRCh38. 33 | The levioSAM software tracks the "edits" between a pair of genomes using a VCF file. 34 | Please refer to the [levioSAM github page](https://github.com/alshai/levioSAM) to install the software. 35 | 36 | 37 | ## Running reference flow with pre-built RandFlow-LD or RandFlow-LD-26 indexes 38 | 39 | We hosted pre-built [RandFlow-LD (22GB)](https://genome-idx.s3.amazonaws.com/bt/flow/randflow_ld.tar.gz) and [RandFlow-LD-26 (96GB)](https://genome-idx.s3.amazonaws.com/bt/flow/randflow_ld_26.tar.gz) indexes. 40 | Script `src/download_prebuilt_indexes.sh` helps to download the RandFlow-LD indexes, including an indexed major-allele reference and 5 indexed superpop references. 41 | 42 | ``` 43 | sh src/download_prebuilt_indexes.sh 44 | # Run `sh src/download_prebuilt_indexes.sh randflow_ld_26` to download pre-built RandFlow-LD-26 indexes. 45 | cd snakemake 46 | snakemake -j 32 47 | ``` 48 | 49 | Before executing the Snakemake pipeline, it is recommended to perform a dry-run `snakemake -np` to preview the scheduling plan. 50 | The option `-j` specifies the number of threads used, which is set to 32 in the above example. 51 | 52 | By default, a directory called `run` will be created under `snakemake` and all the results will be under it. 53 | The alignment results are aggrelated as a single SAM file, which uses the GRCh38 coordinate system by default. 54 | The final SAM file will appear in directory `snakemake/run/experiments/test/thrds0_S1_b1000_ld1/wg-refflow-10-thrds0_S1_b1000_ld1-liftover.sam`. 55 | 56 | To use another read set, please change the `READS1` parameter in `snakemake/config.yaml`, 57 | or run command: 58 | 59 | ``` 60 | snakemake -j 32 --config READS1= 61 | ``` 62 | 63 | ## Running complete reference flow pipeline 64 | 65 | By set the `USE_PREBUILT` option to `False`, users can run the complete reference flow pipeline. 66 | 67 | ### Download Reference genome 68 | 69 | The GRCh38 reference genome can be downloaded from NCBI using the following script: 70 | 71 | ``` 72 | sh src/download_genome.sh 73 | ``` 74 | 75 | Users may choose other reference genomes of interest by changing the `GENOME` parameter in `snakemake/config.yaml`. 76 | 77 | ### Download GRCh38 call sets from the 1000 Genomes Project 78 | 79 | [The 1000 Genomes Project GRCh38 call sets with SNVs and indels](https://www.internationalgenome.org/announcements/Variant-calls-from-1000-Genomes-Project-data-on-the-GRCh38-reference-assemlby/) can be downloaded using script: 80 | 81 | ``` 82 | sh src/download_1kg_vcf.sh 83 | ``` 84 | 85 | Users may switch to the 1KG phase-3 call set by using the corresponding VCFs and changing parameters `DIR_VCF`, `VCF_PREFIX`, and `VCF_SUFFIX` in the configuration file. 86 | For now we only tested on call sets provided by the 1000 Genomes Project. 87 | Call sets provided by other studies may differ slightly in population labelling and VCF format, causing the pipeline not working properly. 88 | 89 | ## Setting reference flow configuration 90 | 91 | Parameters for reference flow are specified in `snakemake/config.yaml` and described [here](snakemake/README.md). 92 | 93 | We recommend users try a dry-run by `snakemake -np` to check if the pipeline is ready. 94 | 95 | When all set, run 96 | 97 | ``` 98 | snakemake -j 32 99 | ``` 100 | -------------------------------------------------------------------------------- /resources/1kg.superpopulation: -------------------------------------------------------------------------------- 1 | CHB Han Chinese in Beijing, China EAS 1 1 1 2 | JPT Japanese in Tokyo, Japan EAS 1 1 1 3 | CHS Southern Han Chinese EAS 1 1 1 4 | CDX Chinese Dai in Xishuangbanna, China EAS 1 1 1 5 | KHV Kinh in Ho Chi Minh City, Vietnam EAS 1 1 1 6 | CEU Utah Residents (CEPH) with Northern and Western European Ancestry EUR 1 1 1 7 | TSI Toscani in Italia EUR 1 1 1 8 | FIN Finnish in Finland EUR 1 1 1 9 | GBR British in England and Scotland EUR 1 1 1 10 | IBS Iberian Population in Spain EUR 1 1 1 11 | YRI Yoruba in Ibadan, Nigeria AFR 1 1 1 12 | LWK Luhya in Webuye, Kenya AFR 1 1 1 13 | GWD Gambian in Western Divisions in the Gambia AFR 1 1 1 14 | MSL Mende in Sierra Leone AFR 1 1 1 15 | ESN Esan in Nigeria AFR 1 1 1 16 | ASW Americans of African Ancestry in SW USA AFR 1 1 1 17 | ACB African Caribbeans in Barbados AFR 1 1 1 18 | MXL Mexican Ancestry from Los Angeles USA AMR 1 1 1 19 | PUR Puerto Ricans from Puerto Rico AMR 1 1 1 20 | CLM Colombians from Medellin, Colombia AMR 1 1 1 21 | PEL Peruvians from Lima, Peru AMR 1 1 1 22 | GIH Gujarati Indian from Houston, Texas SAS 1 1 1 23 | PJL Punjabi from Lahore, Pakistan SAS 1 1 1 24 | BEB Bengali from Bangladesh SAS 1 1 1 25 | STU Sri Lankan Tamil from the UK SAS 1 1 1 26 | ITU Indian Telugu from the UK SAS 1 1 1 -------------------------------------------------------------------------------- /resources/1kgp.indiv_population: -------------------------------------------------------------------------------- 1 | HG01879 AFR ACB 2 | HG01880 AFR ACB 3 | HG01881 AFR ACB 4 | HG01882 AFR ACB 5 | HG01883 AFR ACB 6 | HG01888 AFR ACB 7 | HG01884 AFR ACB 8 | HG01885 AFR ACB 9 | HG01956 AFR ACB 10 | HG01886 AFR ACB 11 | HG01887 AFR ACB 12 | HG02014 AFR ACB 13 | HG01889 AFR ACB 14 | HG01890 AFR ACB 15 | HG01891 AFR ACB 16 | HG01894 AFR ACB 17 | HG01895 AFR ACB 18 | HG01912 AFR ACB 19 | HG01896 AFR ACB 20 | HG01897 AFR ACB 21 | HG02013 AFR ACB 22 | HG01985 AFR ACB 23 | HG01914 AFR ACB 24 | HG01915 AFR ACB 25 | HG01916 AFR ACB 26 | HG01958 AFR ACB 27 | HG01959 AFR ACB 28 | HG01986 AFR ACB 29 | HG01960 AFR ACB 30 | HG01988 AFR ACB 31 | HG01989 AFR ACB 32 | HG01987 AFR ACB 33 | HG01990 AFR ACB 34 | HG02012 AFR ACB 35 | HG02009 AFR ACB 36 | HG02010 AFR ACB 37 | HG02011 AFR ACB 38 | HG02051 AFR ACB 39 | HG02052 AFR ACB 40 | HG02053 AFR ACB 41 | HG02054 AFR ACB 42 | HG02055 AFR ACB 43 | HG02095 AFR ACB 44 | HG02332 AFR ACB 45 | HG02107 AFR ACB 46 | HG02108 AFR ACB 47 | HG02111 AFR ACB 48 | HG02143 AFR ACB 49 | HG02144 AFR ACB 50 | HG02145 AFR ACB 51 | HG02255 AFR ACB 52 | HG02256 AFR ACB 53 | HG02257 AFR ACB 54 | HG02258 AFR ACB 55 | HG02314 AFR ACB 56 | HG02315 AFR ACB 57 | HG02280 AFR ACB 58 | HG02307 AFR ACB 59 | HG02308 AFR ACB 60 | HG02281 AFR ACB 61 | HG02282 AFR ACB 62 | HG02321 AFR ACB 63 | HG02283 AFR ACB 64 | HG02309 AFR ACB 65 | HG02284 AFR ACB 66 | HG02316 AFR ACB 67 | HG02317 AFR ACB 68 | HG02318 AFR ACB 69 | HG02322 AFR ACB 70 | HG02323 AFR ACB 71 | HG02325 AFR ACB 72 | HG02337 AFR ACB 73 | HG02334 AFR ACB 74 | HG02330 AFR ACB 75 | HG02339 AFR ACB 76 | HG02343 AFR ACB 77 | HG02419 AFR ACB 78 | HG02420 AFR ACB 79 | HG02427 AFR ACB 80 | HG02428 AFR ACB 81 | HG02429 AFR ACB 82 | HG02436 AFR ACB 83 | HG02433 AFR ACB 84 | HG02439 AFR ACB 85 | HG02442 AFR ACB 86 | HG02445 AFR ACB 87 | HG02489 AFR ACB 88 | HG02449 AFR ACB 89 | HG02450 AFR ACB 90 | HG02451 AFR ACB 91 | HG02455 AFR ACB 92 | HG02470 AFR ACB 93 | HG02471 AFR ACB 94 | HG02476 AFR ACB 95 | HG02477 AFR ACB 96 | HG02478 AFR ACB 97 | HG02479 AFR ACB 98 | HG02480 AFR ACB 99 | HG02481 AFR ACB 100 | HG02484 AFR ACB 101 | HG02485 AFR ACB 102 | HG02486 AFR ACB 103 | HG02511 AFR ACB 104 | HG02496 AFR ACB 105 | HG02497 AFR ACB 106 | HG02501 AFR ACB 107 | HG02502 AFR ACB 108 | HG02505 AFR ACB 109 | HG02508 AFR ACB 110 | HG02536 AFR ACB 111 | HG02537 AFR ACB 112 | HG02541 AFR ACB 113 | HG02545 AFR ACB 114 | HG02546 AFR ACB 115 | HG02547 AFR ACB 116 | HG02549 AFR ACB 117 | HG02554 AFR ACB 118 | HG02555 AFR ACB 119 | HG02557 AFR ACB 120 | HG02558 AFR ACB 121 | HG02559 AFR ACB 122 | HG02577 AFR ACB 123 | HG02580 AFR ACB 124 | NA19625 AFR ASW 125 | NA20274 AFR ASW 126 | NA19700 AFR ASW 127 | NA19701 AFR ASW 128 | NA19702 AFR ASW 129 | NA19703 AFR ASW 130 | NA19704 AFR ASW 131 | NA19705 AFR ASW 132 | NA19707 AFR ASW 133 | NA19708 AFR ASW 134 | NA19711 AFR ASW 135 | NA19712 AFR ASW 136 | NA19818 AFR ASW 137 | NA19819 AFR ASW 138 | NA19828 AFR ASW 139 | NA19834 AFR ASW 140 | NA19835 AFR ASW 141 | NA19836 AFR ASW 142 | NA19900 AFR ASW 143 | NA19901 AFR ASW 144 | NA19902 AFR ASW 145 | NA19904 AFR ASW 146 | NA19905 AFR ASW 147 | NA19913 AFR ASW 148 | NA19908 AFR ASW 149 | NA19909 AFR ASW 150 | NA19919 AFR ASW 151 | NA19914 AFR ASW 152 | NA19915 AFR ASW 153 | NA19916 AFR ASW 154 | NA19917 AFR ASW 155 | NA19918 AFR ASW 156 | NA19920 AFR ASW 157 | NA19921 AFR ASW 158 | NA20129 AFR ASW 159 | NA19922 AFR ASW 160 | NA19923 AFR ASW 161 | NA19924 AFR ASW 162 | NA19713 AFR ASW 163 | NA19982 AFR ASW 164 | NA19983 AFR ASW 165 | NA19714 AFR ASW 166 | NA19985 AFR ASW 167 | NA19984 AFR ASW 168 | NA20126 AFR ASW 169 | NA20127 AFR ASW 170 | NA20128 AFR ASW 171 | NA20276 AFR ASW 172 | NA20277 AFR ASW 173 | NA20278 AFR ASW 174 | NA20279 AFR ASW 175 | NA20282 AFR ASW 176 | NA20284 AFR ASW 177 | NA20285 AFR ASW 178 | NA20281 AFR ASW 179 | NA20287 AFR ASW 180 | NA20288 AFR ASW 181 | NA20289 AFR ASW 182 | NA20290 AFR ASW 183 | NA20291 AFR ASW 184 | NA20292 AFR ASW 185 | NA20294 AFR ASW 186 | NA20295 AFR ASW 187 | NA20296 AFR ASW 188 | NA20297 AFR ASW 189 | NA20299 AFR ASW 190 | NA20300 AFR ASW 191 | NA20298 AFR ASW 192 | NA20301 AFR ASW 193 | NA20302 AFR ASW 194 | NA20312 AFR ASW 195 | NA20313 AFR ASW 196 | NA20314 AFR ASW 197 | NA20316 AFR ASW 198 | NA20317 AFR ASW 199 | NA20319 AFR ASW 200 | NA20318 AFR ASW 201 | NA20321 AFR ASW 202 | NA20322 AFR ASW 203 | NA20320 AFR ASW 204 | NA20332 AFR ASW 205 | NA20333 AFR ASW 206 | NA20334 AFR ASW 207 | NA20335 AFR ASW 208 | NA20355 AFR ASW 209 | NA20336 AFR ASW 210 | NA20337 AFR ASW 211 | NA20339 AFR ASW 212 | NA20340 AFR ASW 213 | NA20341 AFR ASW 214 | NA20342 AFR ASW 215 | NA20343 AFR ASW 216 | NA20344 AFR ASW 217 | NA20345 AFR ASW 218 | NA20346 AFR ASW 219 | NA20347 AFR ASW 220 | NA20348 AFR ASW 221 | NA20349 AFR ASW 222 | NA20350 AFR ASW 223 | NA20351 AFR ASW 224 | NA20356 AFR ASW 225 | NA20357 AFR ASW 226 | NA20358 AFR ASW 227 | NA20359 AFR ASW 228 | NA20360 AFR ASW 229 | NA20361 AFR ASW 230 | NA20362 AFR ASW 231 | NA20363 AFR ASW 232 | NA20364 AFR ASW 233 | NA20412 AFR ASW 234 | NA20413 AFR ASW 235 | NA20414 AFR ASW 236 | HG03006 SAS BEB 237 | HG03007 SAS BEB 238 | HG03008 SAS BEB 239 | HG03589 SAS BEB 240 | HG03590 SAS BEB 241 | HG03600 SAS BEB 242 | HG03602 SAS BEB 243 | HG03603 SAS BEB 244 | HG03604 SAS BEB 245 | HG03605 SAS BEB 246 | HG03606 SAS BEB 247 | HG03607 SAS BEB 248 | HG03611 SAS BEB 249 | HG03615 SAS BEB 250 | HG03616 SAS BEB 251 | HG03617 SAS BEB 252 | HG03793 SAS BEB 253 | HG03794 SAS BEB 254 | HG03795 SAS BEB 255 | HG03796 SAS BEB 256 | HG03797 SAS BEB 257 | HG03798 SAS BEB 258 | HG03799 SAS BEB 259 | HG03800 SAS BEB 260 | HG03801 SAS BEB 261 | HG03802 SAS BEB 262 | HG03803 SAS BEB 263 | HG03804 SAS BEB 264 | HG03805 SAS BEB 265 | HG03806 SAS BEB 266 | HG03807 SAS BEB 267 | HG03808 SAS BEB 268 | HG03809 SAS BEB 269 | HG03811 SAS BEB 270 | HG03813 SAS BEB 271 | HG03814 SAS BEB 272 | HG03815 SAS BEB 273 | HG03816 SAS BEB 274 | HG03817 SAS BEB 275 | HG03821 SAS BEB 276 | HG03822 SAS BEB 277 | HG03823 SAS BEB 278 | HG03824 SAS BEB 279 | HG03825 SAS BEB 280 | HG03826 SAS BEB 281 | HG03829 SAS BEB 282 | HG03830 SAS BEB 283 | HG03831 SAS BEB 284 | HG03832 SAS BEB 285 | HG03833 SAS BEB 286 | HG03834 SAS BEB 287 | HG03901 SAS BEB 288 | HG03903 SAS BEB 289 | HG03012 SAS BEB 290 | HG03904 SAS BEB 291 | HG03905 SAS BEB 292 | HG03906 SAS BEB 293 | HG03907 SAS BEB 294 | HG03908 SAS BEB 295 | HG03909 SAS BEB 296 | HG03910 SAS BEB 297 | HG03911 SAS BEB 298 | HG03913 SAS BEB 299 | HG03914 SAS BEB 300 | HG03915 SAS BEB 301 | HG03917 SAS BEB 302 | HG03919 SAS BEB 303 | HG03920 SAS BEB 304 | HG03922 SAS BEB 305 | HG03924 SAS BEB 306 | HG03925 SAS BEB 307 | HG03926 SAS BEB 308 | HG03927 SAS BEB 309 | HG03928 SAS BEB 310 | HG03929 SAS BEB 311 | HG03930 SAS BEB 312 | HG03931 SAS BEB 313 | HG03585 SAS BEB 314 | HG03587 SAS BEB 315 | HG03934 SAS BEB 316 | HG03937 SAS BEB 317 | HG03939 SAS BEB 318 | HG03940 SAS BEB 319 | HG03941 SAS BEB 320 | HG03942 SAS BEB 321 | HG04128 SAS BEB 322 | HG04131 SAS BEB 323 | HG04132 SAS BEB 324 | HG04133 SAS BEB 325 | HG04134 SAS BEB 326 | HG04135 SAS BEB 327 | HG04136 SAS BEB 328 | HG04140 SAS BEB 329 | HG04141 SAS BEB 330 | HG04142 SAS BEB 331 | HG04144 SAS BEB 332 | HG04146 SAS BEB 333 | HG04147 SAS BEB 334 | HG04148 SAS BEB 335 | HG04149 SAS BEB 336 | HG04150 SAS BEB 337 | HG04151 SAS BEB 338 | HG04152 SAS BEB 339 | HG04153 SAS BEB 340 | HG04155 SAS BEB 341 | HG04156 SAS BEB 342 | HG04157 SAS BEB 343 | HG04158 SAS BEB 344 | HG04159 SAS BEB 345 | HG04160 SAS BEB 346 | HG04161 SAS BEB 347 | HG04162 SAS BEB 348 | HG04164 SAS BEB 349 | HG04171 SAS BEB 350 | HG04173 SAS BEB 351 | HG04174 SAS BEB 352 | HG04175 SAS BEB 353 | HG04176 SAS BEB 354 | HG04177 SAS BEB 355 | HG03593 SAS BEB 356 | HG04180 SAS BEB 357 | HG04181 SAS BEB 358 | HG04182 SAS BEB 359 | HG04183 SAS BEB 360 | HG04184 SAS BEB 361 | HG04185 SAS BEB 362 | HG04186 SAS BEB 363 | HG04187 SAS BEB 364 | HG04188 SAS BEB 365 | HG04189 SAS BEB 366 | HG04191 SAS BEB 367 | HG04192 SAS BEB 368 | HG04193 SAS BEB 369 | HG04194 SAS BEB 370 | HG04195 SAS BEB 371 | HG03594 SAS BEB 372 | HG03595 SAS BEB 373 | HG03596 SAS BEB 374 | HG03598 SAS BEB 375 | HG03599 SAS BEB 376 | HG03009 SAS BEB 377 | HG03812 SAS BEB 378 | HG03902 SAS BEB 379 | HG03916 SAS BEB 380 | HG00866 EAS CDX 381 | HG00867 EAS CDX 382 | HG02371 EAS CDX 383 | HG02372 EAS CDX 384 | HG00759 EAS CDX 385 | HG00766 EAS CDX 386 | HG00844 EAS CDX 387 | HG00851 EAS CDX 388 | HG00864 EAS CDX 389 | HG00879 EAS CDX 390 | HG00881 EAS CDX 391 | HG00956 EAS CDX 392 | HG00978 EAS CDX 393 | HG00982 EAS CDX 394 | HG00983 EAS CDX 395 | HG01028 EAS CDX 396 | HG01029 EAS CDX 397 | HG01031 EAS CDX 398 | HG01046 EAS CDX 399 | HG01794 EAS CDX 400 | HG01795 EAS CDX 401 | HG01796 EAS CDX 402 | HG01797 EAS CDX 403 | HG01798 EAS CDX 404 | HG01799 EAS CDX 405 | HG01800 EAS CDX 406 | HG01801 EAS CDX 407 | HG01802 EAS CDX 408 | HG01804 EAS CDX 409 | HG01805 EAS CDX 410 | HG01806 EAS CDX 411 | HG01807 EAS CDX 412 | HG01808 EAS CDX 413 | HG01809 EAS CDX 414 | HG01810 EAS CDX 415 | HG01811 EAS CDX 416 | HG01812 EAS CDX 417 | HG01813 EAS CDX 418 | HG01815 EAS CDX 419 | HG01816 EAS CDX 420 | HG01817 EAS CDX 421 | HG02151 EAS CDX 422 | HG02152 EAS CDX 423 | HG02153 EAS CDX 424 | HG02154 EAS CDX 425 | HG02155 EAS CDX 426 | HG02156 EAS CDX 427 | HG02164 EAS CDX 428 | HG02165 EAS CDX 429 | HG02166 EAS CDX 430 | HG02168 EAS CDX 431 | HG02169 EAS CDX 432 | HG02170 EAS CDX 433 | HG02173 EAS CDX 434 | HG02176 EAS CDX 435 | HG02178 EAS CDX 436 | HG02179 EAS CDX 437 | HG02180 EAS CDX 438 | HG02181 EAS CDX 439 | HG02182 EAS CDX 440 | HG02184 EAS CDX 441 | HG02185 EAS CDX 442 | HG02186 EAS CDX 443 | HG02187 EAS CDX 444 | HG02188 EAS CDX 445 | HG02189 EAS CDX 446 | HG02190 EAS CDX 447 | HG02250 EAS CDX 448 | HG02351 EAS CDX 449 | HG02353 EAS CDX 450 | HG02355 EAS CDX 451 | HG02356 EAS CDX 452 | HG02358 EAS CDX 453 | HG02360 EAS CDX 454 | HG02363 EAS CDX 455 | HG02364 EAS CDX 456 | HG02367 EAS CDX 457 | HG02373 EAS CDX 458 | HG02374 EAS CDX 459 | HG02375 EAS CDX 460 | HG02377 EAS CDX 461 | HG02379 EAS CDX 462 | HG02380 EAS CDX 463 | HG02381 EAS CDX 464 | HG02382 EAS CDX 465 | HG02383 EAS CDX 466 | HG02384 EAS CDX 467 | HG02385 EAS CDX 468 | HG02386 EAS CDX 469 | HG02387 EAS CDX 470 | HG02388 EAS CDX 471 | HG02389 EAS CDX 472 | HG02390 EAS CDX 473 | HG02391 EAS CDX 474 | HG02392 EAS CDX 475 | HG02394 EAS CDX 476 | HG02395 EAS CDX 477 | HG02396 EAS CDX 478 | HG02397 EAS CDX 479 | HG02398 EAS CDX 480 | HG02399 EAS CDX 481 | HG02401 EAS CDX 482 | HG02402 EAS CDX 483 | HG02405 EAS CDX 484 | HG02406 EAS CDX 485 | HG02407 EAS CDX 486 | HG02408 EAS CDX 487 | HG02409 EAS CDX 488 | HG02410 EAS CDX 489 | NA06984 EUR CEU 490 | NA06989 EUR CEU 491 | NA12329 EUR CEU 492 | NA12344 EUR CEU 493 | NA12347 EUR CEU 494 | NA12348 EUR CEU 495 | NA06986 EUR CEU 496 | NA06995 EUR CEU 497 | NA06997 EUR CEU 498 | NA07037 EUR CEU 499 | NA07045 EUR CEU 500 | NA07435 EUR CEU 501 | NA07014 EUR CEU 502 | NA07031 EUR CEU 503 | NA07051 EUR CEU 504 | NA12335 EUR CEU 505 | NA12336 EUR CEU 506 | NA12340 EUR CEU 507 | NA12341 EUR CEU 508 | NA12342 EUR CEU 509 | NA12343 EUR CEU 510 | NA07340 EUR CEU 511 | NA10846 EUR CEU 512 | NA10847 EUR CEU 513 | NA12144 EUR CEU 514 | NA12145 EUR CEU 515 | NA12146 EUR CEU 516 | NA12239 EUR CEU 517 | NA06994 EUR CEU 518 | NA07000 EUR CEU 519 | NA07019 EUR CEU 520 | NA07022 EUR CEU 521 | NA07029 EUR CEU 522 | NA07056 EUR CEU 523 | NA06985 EUR CEU 524 | NA06991 EUR CEU 525 | NA06993 EUR CEU 526 | NA07034 EUR CEU 527 | NA07048 EUR CEU 528 | NA07055 EUR CEU 529 | NA10850 EUR CEU 530 | NA10851 EUR CEU 531 | NA12056 EUR CEU 532 | NA12057 EUR CEU 533 | NA12058 EUR CEU 534 | NA07345 EUR CEU 535 | NA07346 EUR CEU 536 | NA07347 EUR CEU 537 | NA07348 EUR CEU 538 | NA07349 EUR CEU 539 | NA07357 EUR CEU 540 | NA10852 EUR CEU 541 | NA10857 EUR CEU 542 | NA12043 EUR CEU 543 | NA12044 EUR CEU 544 | NA12045 EUR CEU 545 | NA12046 EUR CEU 546 | NA10859 EUR CEU 547 | NA11881 EUR CEU 548 | NA11882 EUR CEU 549 | NA10853 EUR CEU 550 | NA10854 EUR CEU 551 | NA11839 EUR CEU 552 | NA11840 EUR CEU 553 | NA11843 EUR CEU 554 | NA10855 EUR CEU 555 | NA10856 EUR CEU 556 | NA11829 EUR CEU 557 | NA11830 EUR CEU 558 | NA11831 EUR CEU 559 | NA11832 EUR CEU 560 | NA12375 EUR CEU 561 | NA12376 EUR CEU 562 | NA12383 EUR CEU 563 | NA12489 EUR CEU 564 | NA12546 EUR CEU 565 | NA12386 EUR CEU 566 | NA12399 EUR CEU 567 | NA12400 EUR CEU 568 | NA12413 EUR CEU 569 | NA12414 EUR CEU 570 | NA12485 EUR CEU 571 | NA12707 EUR CEU 572 | NA12708 EUR CEU 573 | NA12716 EUR CEU 574 | NA12717 EUR CEU 575 | NA12718 EUR CEU 576 | NA10860 EUR CEU 577 | NA10861 EUR CEU 578 | NA11992 EUR CEU 579 | NA11993 EUR CEU 580 | NA11994 EUR CEU 581 | NA11995 EUR CEU 582 | NA10863 EUR CEU 583 | NA12234 EUR CEU 584 | NA12264 EUR CEU 585 | NA10864 EUR CEU 586 | NA10865 EUR CEU 587 | NA11891 EUR CEU 588 | NA11892 EUR CEU 589 | NA11893 EUR CEU 590 | NA11894 EUR CEU 591 | NA10830 EUR CEU 592 | NA10831 EUR CEU 593 | NA12154 EUR CEU 594 | NA12155 EUR CEU 595 | NA12156 EUR CEU 596 | NA12236 EUR CEU 597 | NA10835 EUR CEU 598 | NA12248 EUR CEU 599 | NA12249 EUR CEU 600 | NA10836 EUR CEU 601 | NA10837 EUR CEU 602 | NA12272 EUR CEU 603 | NA12273 EUR CEU 604 | NA12274 EUR CEU 605 | NA12275 EUR CEU 606 | NA10838 EUR CEU 607 | NA10839 EUR CEU 608 | NA12003 EUR CEU 609 | NA12004 EUR CEU 610 | NA12005 EUR CEU 611 | NA12006 EUR CEU 612 | NA10840 EUR CEU 613 | NA12286 EUR CEU 614 | NA12287 EUR CEU 615 | NA12282 EUR CEU 616 | NA12283 EUR CEU 617 | NA10842 EUR CEU 618 | NA10843 EUR CEU 619 | NA11917 EUR CEU 620 | NA11918 EUR CEU 621 | NA11919 EUR CEU 622 | NA11920 EUR CEU 623 | NA10845 EUR CEU 624 | NA11930 EUR CEU 625 | NA11931 EUR CEU 626 | NA11932 EUR CEU 627 | NA11933 EUR CEU 628 | NA12739 EUR CEU 629 | NA12740 EUR CEU 630 | NA12748 EUR CEU 631 | NA12749 EUR CEU 632 | NA12750 EUR CEU 633 | NA12751 EUR CEU 634 | NA12752 EUR CEU 635 | NA12753 EUR CEU 636 | NA12760 EUR CEU 637 | NA12761 EUR CEU 638 | NA12762 EUR CEU 639 | NA12763 EUR CEU 640 | NA12766 EUR CEU 641 | NA12767 EUR CEU 642 | NA12775 EUR CEU 643 | NA12776 EUR CEU 644 | NA12777 EUR CEU 645 | NA12778 EUR CEU 646 | NA12801 EUR CEU 647 | NA12802 EUR CEU 648 | NA12812 EUR CEU 649 | NA12813 EUR CEU 650 | NA12814 EUR CEU 651 | NA12815 EUR CEU 652 | NA12817 EUR CEU 653 | NA12818 EUR CEU 654 | NA12827 EUR CEU 655 | NA12828 EUR CEU 656 | NA12829 EUR CEU 657 | NA12830 EUR CEU 658 | NA12832 EUR CEU 659 | NA12842 EUR CEU 660 | NA12843 EUR CEU 661 | NA12864 EUR CEU 662 | NA12865 EUR CEU 663 | NA12872 EUR CEU 664 | NA12873 EUR CEU 665 | NA12874 EUR CEU 666 | NA12875 EUR CEU 667 | NA12877 EUR CEU 668 | NA12878 EUR CEU 669 | NA12889 EUR CEU 670 | NA12890 EUR CEU 671 | NA12891 EUR CEU 672 | NA12892 EUR CEU 673 | NA18525 EAS CHB 674 | NA18526 EAS CHB 675 | NA18527 EAS CHB 676 | NA18528 EAS CHB 677 | NA18530 EAS CHB 678 | NA18531 EAS CHB 679 | NA18532 EAS CHB 680 | NA18533 EAS CHB 681 | NA18534 EAS CHB 682 | NA18535 EAS CHB 683 | NA18536 EAS CHB 684 | NA18537 EAS CHB 685 | NA18538 EAS CHB 686 | NA18539 EAS CHB 687 | NA18541 EAS CHB 688 | NA18542 EAS CHB 689 | NA18543 EAS CHB 690 | NA18544 EAS CHB 691 | NA18545 EAS CHB 692 | NA18546 EAS CHB 693 | NA18547 EAS CHB 694 | NA18548 EAS CHB 695 | NA18549 EAS CHB 696 | NA18550 EAS CHB 697 | NA18552 EAS CHB 698 | NA18553 EAS CHB 699 | NA18555 EAS CHB 700 | NA18557 EAS CHB 701 | NA18558 EAS CHB 702 | NA18559 EAS CHB 703 | NA18560 EAS CHB 704 | NA18561 EAS CHB 705 | NA18562 EAS CHB 706 | NA18563 EAS CHB 707 | NA18564 EAS CHB 708 | NA18565 EAS CHB 709 | NA18566 EAS CHB 710 | NA18567 EAS CHB 711 | NA18570 EAS CHB 712 | NA18571 EAS CHB 713 | NA18572 EAS CHB 714 | NA18573 EAS CHB 715 | NA18574 EAS CHB 716 | NA18576 EAS CHB 717 | NA18577 EAS CHB 718 | NA18579 EAS CHB 719 | NA18582 EAS CHB 720 | NA18591 EAS CHB 721 | NA18592 EAS CHB 722 | NA18593 EAS CHB 723 | NA18595 EAS CHB 724 | NA18596 EAS CHB 725 | NA18597 EAS CHB 726 | NA18599 EAS CHB 727 | NA18602 EAS CHB 728 | NA18603 EAS CHB 729 | NA18605 EAS CHB 730 | NA18606 EAS CHB 731 | NA18608 EAS CHB 732 | NA18609 EAS CHB 733 | NA18610 EAS CHB 734 | NA18611 EAS CHB 735 | NA18612 EAS CHB 736 | NA18613 EAS CHB 737 | NA18614 EAS CHB 738 | NA18615 EAS CHB 739 | NA18616 EAS CHB 740 | NA18617 EAS CHB 741 | NA18618 EAS CHB 742 | NA18619 EAS CHB 743 | NA18620 EAS CHB 744 | NA18621 EAS CHB 745 | NA18622 EAS CHB 746 | NA18623 EAS CHB 747 | NA18624 EAS CHB 748 | NA18625 EAS CHB 749 | NA18626 EAS CHB 750 | NA18627 EAS CHB 751 | NA18628 EAS CHB 752 | NA18629 EAS CHB 753 | NA18630 EAS CHB 754 | NA18631 EAS CHB 755 | NA18632 EAS CHB 756 | NA18633 EAS CHB 757 | NA18634 EAS CHB 758 | NA18635 EAS CHB 759 | NA18636 EAS CHB 760 | NA18637 EAS CHB 761 | NA18638 EAS CHB 762 | NA18639 EAS CHB 763 | NA18640 EAS CHB 764 | NA18641 EAS CHB 765 | NA18642 EAS CHB 766 | NA18643 EAS CHB 767 | NA18644 EAS CHB 768 | NA18645 EAS CHB 769 | NA18646 EAS CHB 770 | NA18647 EAS CHB 771 | NA18648 EAS CHB 772 | NA18740 EAS CHB 773 | NA18745 EAS CHB 774 | NA18747 EAS CHB 775 | NA18748 EAS CHB 776 | NA18749 EAS CHB 777 | NA18757 EAS CHB 778 | NA18791 EAS CHB 779 | NA18794 EAS CHB 780 | NA18795 EAS CHB 781 | HG00403 EAS CHS 782 | HG00404 EAS CHS 783 | HG00405 EAS CHS 784 | HG00406 EAS CHS 785 | HG00407 EAS CHS 786 | HG00408 EAS CHS 787 | HG00409 EAS CHS 788 | HG00410 EAS CHS 789 | HG00411 EAS CHS 790 | HG00418 EAS CHS 791 | HG00419 EAS CHS 792 | HG00420 EAS CHS 793 | HG00421 EAS CHS 794 | HG00422 EAS CHS 795 | HG00423 EAS CHS 796 | HG00427 EAS CHS 797 | HG00428 EAS CHS 798 | HG00429 EAS CHS 799 | HG00436 EAS CHS 800 | HG00437 EAS CHS 801 | HG00438 EAS CHS 802 | HG00442 EAS CHS 803 | HG00443 EAS CHS 804 | HG00444 EAS CHS 805 | HG00445 EAS CHS 806 | HG00446 EAS CHS 807 | HG00447 EAS CHS 808 | HG00448 EAS CHS 809 | HG00449 EAS CHS 810 | HG00450 EAS CHS 811 | HG00451 EAS CHS 812 | HG00452 EAS CHS 813 | HG00453 EAS CHS 814 | HG00457 EAS CHS 815 | HG00458 EAS CHS 816 | HG00459 EAS CHS 817 | HG00463 EAS CHS 818 | HG00464 EAS CHS 819 | HG00465 EAS CHS 820 | HG00472 EAS CHS 821 | HG00473 EAS CHS 822 | HG00474 EAS CHS 823 | HG00475 EAS CHS 824 | HG00476 EAS CHS 825 | HG00477 EAS CHS 826 | HG00478 EAS CHS 827 | HG00479 EAS CHS 828 | HG00480 EAS CHS 829 | HG00500 EAS CHS 830 | HG00501 EAS CHS 831 | HG00502 EAS CHS 832 | HG00512 EAS CHS 833 | HG00513 EAS CHS 834 | HG00514 EAS CHS 835 | HG00524 EAS CHS 836 | HG00525 EAS CHS 837 | HG00526 EAS CHS 838 | HG00530 EAS CHS 839 | HG00531 EAS CHS 840 | HG00532 EAS CHS 841 | HG00533 EAS CHS 842 | HG00534 EAS CHS 843 | HG00535 EAS CHS 844 | HG00536 EAS CHS 845 | HG00537 EAS CHS 846 | HG00538 EAS CHS 847 | HG00542 EAS CHS 848 | HG00543 EAS CHS 849 | HG00544 EAS CHS 850 | HG00556 EAS CHS 851 | HG00557 EAS CHS 852 | HG00558 EAS CHS 853 | HG00559 EAS CHS 854 | HG00560 EAS CHS 855 | HG00561 EAS CHS 856 | HG00565 EAS CHS 857 | HG00566 EAS CHS 858 | HG00567 EAS CHS 859 | HG00577 EAS CHS 860 | HG00578 EAS CHS 861 | HG00579 EAS CHS 862 | HG00580 EAS CHS 863 | HG00581 EAS CHS 864 | HG00582 EAS CHS 865 | HG00583 EAS CHS 866 | HG00584 EAS CHS 867 | HG00585 EAS CHS 868 | HG00589 EAS CHS 869 | HG00590 EAS CHS 870 | HG00591 EAS CHS 871 | HG00592 EAS CHS 872 | HG00593 EAS CHS 873 | HG00594 EAS CHS 874 | HG00595 EAS CHS 875 | HG00596 EAS CHS 876 | HG00597 EAS CHS 877 | HG00598 EAS CHS 878 | HG00599 EAS CHS 879 | HG00600 EAS CHS 880 | HG00607 EAS CHS 881 | HG00608 EAS CHS 882 | HG00609 EAS CHS 883 | HG00610 EAS CHS 884 | HG00611 EAS CHS 885 | HG00612 EAS CHS 886 | HG00613 EAS CHS 887 | HG00614 EAS CHS 888 | HG00615 EAS CHS 889 | HG00619 EAS CHS 890 | HG00620 EAS CHS 891 | HG00621 EAS CHS 892 | HG00622 EAS CHS 893 | HG00623 EAS CHS 894 | HG00624 EAS CHS 895 | HG00625 EAS CHS 896 | HG00626 EAS CHS 897 | HG00627 EAS CHS 898 | HG00628 EAS CHS 899 | HG00629 EAS CHS 900 | HG00630 EAS CHS 901 | HG00631 EAS CHS 902 | HG00632 EAS CHS 903 | HG00633 EAS CHS 904 | HG00634 EAS CHS 905 | HG00635 EAS CHS 906 | HG00636 EAS CHS 907 | HG00650 EAS CHS 908 | HG00651 EAS CHS 909 | HG00652 EAS CHS 910 | HG00653 EAS CHS 911 | HG00654 EAS CHS 912 | HG00655 EAS CHS 913 | HG00658 EAS CHS 914 | HG00662 EAS CHS 915 | HG00663 EAS CHS 916 | HG00664 EAS CHS 917 | HG00671 EAS CHS 918 | HG00672 EAS CHS 919 | HG00673 EAS CHS 920 | HG00674 EAS CHS 921 | HG00675 EAS CHS 922 | HG00676 EAS CHS 923 | HG00683 EAS CHS 924 | HG00684 EAS CHS 925 | HG00685 EAS CHS 926 | HG00689 EAS CHS 927 | HG00690 EAS CHS 928 | HG00691 EAS CHS 929 | HG00692 EAS CHS 930 | HG00693 EAS CHS 931 | HG00694 EAS CHS 932 | HG00698 EAS CHS 933 | HG00699 EAS CHS 934 | HG00700 EAS CHS 935 | HG00656 EAS CHS 936 | HG00657 EAS CHS 937 | HG00701 EAS CHS 938 | HG00702 EAS CHS 939 | HG00703 EAS CHS 940 | HG00704 EAS CHS 941 | HG00705 EAS CHS 942 | HG00706 EAS CHS 943 | HG00707 EAS CHS 944 | HG00708 EAS CHS 945 | HG00709 EAS CHS 946 | HG00716 EAS CHS 947 | HG00717 EAS CHS 948 | HG00718 EAS CHS 949 | HG00728 EAS CHS 950 | HG00729 EAS CHS 951 | HG00730 EAS CHS 952 | HG01119 AMR CLM 953 | HG01121 AMR CLM 954 | HG01122 AMR CLM 955 | HG01123 AMR CLM 956 | HG01112 AMR CLM 957 | HG01113 AMR CLM 958 | HG01114 AMR CLM 959 | HG01124 AMR CLM 960 | HG01125 AMR CLM 961 | HG01126 AMR CLM 962 | HG01130 AMR CLM 963 | HG01131 AMR CLM 964 | HG01133 AMR CLM 965 | HG01134 AMR CLM 966 | HG01135 AMR CLM 967 | HG01136 AMR CLM 968 | HG01137 AMR CLM 969 | HG01138 AMR CLM 970 | HG01139 AMR CLM 971 | HG01140 AMR CLM 972 | HG01141 AMR CLM 973 | HG01142 AMR CLM 974 | HG01148 AMR CLM 975 | HG01149 AMR CLM 976 | HG01150 AMR CLM 977 | HG01250 AMR CLM 978 | HG01251 AMR CLM 979 | HG01252 AMR CLM 980 | HG01253 AMR CLM 981 | HG01254 AMR CLM 982 | HG01255 AMR CLM 983 | HG01256 AMR CLM 984 | HG01257 AMR CLM 985 | HG01258 AMR CLM 986 | HG01259 AMR CLM 987 | HG01260 AMR CLM 988 | HG01261 AMR CLM 989 | HG01269 AMR CLM 990 | HG01271 AMR CLM 991 | HG01272 AMR CLM 992 | HG01273 AMR CLM 993 | HG01274 AMR CLM 994 | HG01275 AMR CLM 995 | HG01276 AMR CLM 996 | HG01277 AMR CLM 997 | HG01278 AMR CLM 998 | HG01279 AMR CLM 999 | HG01280 AMR CLM 1000 | HG01281 AMR CLM 1001 | HG01284 AMR CLM 1002 | HG01341 AMR CLM 1003 | HG01342 AMR CLM 1004 | HG01343 AMR CLM 1005 | HG01344 AMR CLM 1006 | HG01345 AMR CLM 1007 | HG01346 AMR CLM 1008 | HG01347 AMR CLM 1009 | HG01348 AMR CLM 1010 | HG01349 AMR CLM 1011 | HG01350 AMR CLM 1012 | HG01351 AMR CLM 1013 | HG01352 AMR CLM 1014 | HG01353 AMR CLM 1015 | HG01354 AMR CLM 1016 | HG01355 AMR CLM 1017 | HG01356 AMR CLM 1018 | HG01357 AMR CLM 1019 | HG01358 AMR CLM 1020 | HG01359 AMR CLM 1021 | HG01360 AMR CLM 1022 | HG01361 AMR CLM 1023 | HG01362 AMR CLM 1024 | HG01363 AMR CLM 1025 | HG01364 AMR CLM 1026 | HG01365 AMR CLM 1027 | HG01366 AMR CLM 1028 | HG01367 AMR CLM 1029 | HG01369 AMR CLM 1030 | HG01372 AMR CLM 1031 | HG01374 AMR CLM 1032 | HG01375 AMR CLM 1033 | HG01376 AMR CLM 1034 | HG01377 AMR CLM 1035 | HG01378 AMR CLM 1036 | HG01379 AMR CLM 1037 | HG01383 AMR CLM 1038 | HG01384 AMR CLM 1039 | HG01385 AMR CLM 1040 | HG01389 AMR CLM 1041 | HG01390 AMR CLM 1042 | HG01391 AMR CLM 1043 | HG01431 AMR CLM 1044 | HG01432 AMR CLM 1045 | HG01433 AMR CLM 1046 | HG01435 AMR CLM 1047 | HG01437 AMR CLM 1048 | HG01438 AMR CLM 1049 | HG01439 AMR CLM 1050 | HG01441 AMR CLM 1051 | HG01442 AMR CLM 1052 | HG01440 AMR CLM 1053 | HG01443 AMR CLM 1054 | HG01444 AMR CLM 1055 | HG01445 AMR CLM 1056 | HG01447 AMR CLM 1057 | HG01452 AMR CLM 1058 | HG01453 AMR CLM 1059 | HG01454 AMR CLM 1060 | HG01455 AMR CLM 1061 | HG01456 AMR CLM 1062 | HG01457 AMR CLM 1063 | HG01459 AMR CLM 1064 | HG01461 AMR CLM 1065 | HG01462 AMR CLM 1066 | HG01463 AMR CLM 1067 | HG01464 AMR CLM 1068 | HG01465 AMR CLM 1069 | HG01466 AMR CLM 1070 | HG01468 AMR CLM 1071 | HG01471 AMR CLM 1072 | HG01473 AMR CLM 1073 | HG01474 AMR CLM 1074 | HG01477 AMR CLM 1075 | HG01479 AMR CLM 1076 | HG01480 AMR CLM 1077 | HG01481 AMR CLM 1078 | HG01482 AMR CLM 1079 | HG01483 AMR CLM 1080 | HG01484 AMR CLM 1081 | HG01485 AMR CLM 1082 | HG01486 AMR CLM 1083 | HG01487 AMR CLM 1084 | HG01488 AMR CLM 1085 | HG01489 AMR CLM 1086 | HG01490 AMR CLM 1087 | HG01491 AMR CLM 1088 | HG01492 AMR CLM 1089 | HG01493 AMR CLM 1090 | HG01494 AMR CLM 1091 | HG01495 AMR CLM 1092 | HG01496 AMR CLM 1093 | HG01497 AMR CLM 1094 | HG01498 AMR CLM 1095 | HG01499 AMR CLM 1096 | HG01550 AMR CLM 1097 | HG01551 AMR CLM 1098 | HG01552 AMR CLM 1099 | HG01556 AMR CLM 1100 | HG03171 AFR ESN 1101 | HG02922 AFR ESN 1102 | HG02923 AFR ESN 1103 | HG02924 AFR ESN 1104 | HG03493 AFR ESN 1105 | HG03499 AFR ESN 1106 | HG03508 AFR ESN 1107 | HG03510 AFR ESN 1108 | HG03511 AFR ESN 1109 | HG03513 AFR ESN 1110 | HG03514 AFR ESN 1111 | HG03515 AFR ESN 1112 | HG03516 AFR ESN 1113 | HG03517 AFR ESN 1114 | HG03518 AFR ESN 1115 | HG03519 AFR ESN 1116 | HG03520 AFR ESN 1117 | HG03521 AFR ESN 1118 | HG03522 AFR ESN 1119 | HG02938 AFR ESN 1120 | HG02939 AFR ESN 1121 | HG02941 AFR ESN 1122 | HG02942 AFR ESN 1123 | HG02943 AFR ESN 1124 | HG02944 AFR ESN 1125 | HG02945 AFR ESN 1126 | HG02946 AFR ESN 1127 | HG02947 AFR ESN 1128 | HG02948 AFR ESN 1129 | HG02952 AFR ESN 1130 | HG02953 AFR ESN 1131 | HG02954 AFR ESN 1132 | HG02964 AFR ESN 1133 | HG02965 AFR ESN 1134 | HG02966 AFR ESN 1135 | HG02968 AFR ESN 1136 | HG02969 AFR ESN 1137 | HG02970 AFR ESN 1138 | HG02971 AFR ESN 1139 | HG02972 AFR ESN 1140 | HG02973 AFR ESN 1141 | HG02974 AFR ESN 1142 | HG02975 AFR ESN 1143 | HG02976 AFR ESN 1144 | HG02977 AFR ESN 1145 | HG02978 AFR ESN 1146 | HG02979 AFR ESN 1147 | HG02980 AFR ESN 1148 | HG02981 AFR ESN 1149 | HG03099 AFR ESN 1150 | HG03100 AFR ESN 1151 | HG03101 AFR ESN 1152 | HG03103 AFR ESN 1153 | HG03104 AFR ESN 1154 | HG03105 AFR ESN 1155 | HG03107 AFR ESN 1156 | HG03108 AFR ESN 1157 | HG03109 AFR ESN 1158 | HG03110 AFR ESN 1159 | HG03111 AFR ESN 1160 | HG03112 AFR ESN 1161 | HG03113 AFR ESN 1162 | HG03114 AFR ESN 1163 | HG03115 AFR ESN 1164 | HG03116 AFR ESN 1165 | HG03117 AFR ESN 1166 | HG03118 AFR ESN 1167 | HG03119 AFR ESN 1168 | HG03120 AFR ESN 1169 | HG03121 AFR ESN 1170 | HG03122 AFR ESN 1171 | HG03123 AFR ESN 1172 | HG03124 AFR ESN 1173 | HG03125 AFR ESN 1174 | HG03126 AFR ESN 1175 | HG03127 AFR ESN 1176 | HG03128 AFR ESN 1177 | HG03129 AFR ESN 1178 | HG03130 AFR ESN 1179 | HG03131 AFR ESN 1180 | HG03132 AFR ESN 1181 | HG03133 AFR ESN 1182 | HG03134 AFR ESN 1183 | HG03135 AFR ESN 1184 | HG03136 AFR ESN 1185 | HG03137 AFR ESN 1186 | HG03139 AFR ESN 1187 | HG03140 AFR ESN 1188 | HG03157 AFR ESN 1189 | HG03158 AFR ESN 1190 | HG03159 AFR ESN 1191 | HG03160 AFR ESN 1192 | HG03161 AFR ESN 1193 | HG03162 AFR ESN 1194 | HG03163 AFR ESN 1195 | HG03164 AFR ESN 1196 | HG03166 AFR ESN 1197 | HG03167 AFR ESN 1198 | HG03168 AFR ESN 1199 | HG03169 AFR ESN 1200 | HG03170 AFR ESN 1201 | HG03172 AFR ESN 1202 | HG03173 AFR ESN 1203 | HG03175 AFR ESN 1204 | HG03176 AFR ESN 1205 | HG03189 AFR ESN 1206 | HG03190 AFR ESN 1207 | HG03191 AFR ESN 1208 | HG03193 AFR ESN 1209 | HG03194 AFR ESN 1210 | HG03195 AFR ESN 1211 | HG03196 AFR ESN 1212 | HG03197 AFR ESN 1213 | HG03198 AFR ESN 1214 | HG03199 AFR ESN 1215 | HG03200 AFR ESN 1216 | HG03202 AFR ESN 1217 | HG03203 AFR ESN 1218 | HG03265 AFR ESN 1219 | HG03266 AFR ESN 1220 | HG03267 AFR ESN 1221 | HG03268 AFR ESN 1222 | HG03269 AFR ESN 1223 | HG03270 AFR ESN 1224 | HG03271 AFR ESN 1225 | HG03272 AFR ESN 1226 | HG03279 AFR ESN 1227 | HG03280 AFR ESN 1228 | HG03291 AFR ESN 1229 | HG03293 AFR ESN 1230 | HG03294 AFR ESN 1231 | HG03295 AFR ESN 1232 | HG03296 AFR ESN 1233 | HG03297 AFR ESN 1234 | HG03298 AFR ESN 1235 | HG03299 AFR ESN 1236 | HG03300 AFR ESN 1237 | HG03301 AFR ESN 1238 | HG03302 AFR ESN 1239 | HG03303 AFR ESN 1240 | HG03304 AFR ESN 1241 | HG03305 AFR ESN 1242 | HG03306 AFR ESN 1243 | HG03307 AFR ESN 1244 | HG03308 AFR ESN 1245 | HG03309 AFR ESN 1246 | HG03310 AFR ESN 1247 | HG03311 AFR ESN 1248 | HG03312 AFR ESN 1249 | HG03313 AFR ESN 1250 | HG03314 AFR ESN 1251 | HG03339 AFR ESN 1252 | HG03341 AFR ESN 1253 | HG03342 AFR ESN 1254 | HG03343 AFR ESN 1255 | HG03344 AFR ESN 1256 | HG03350 AFR ESN 1257 | HG03351 AFR ESN 1258 | HG03352 AFR ESN 1259 | HG03354 AFR ESN 1260 | HG03361 AFR ESN 1261 | HG03362 AFR ESN 1262 | HG03363 AFR ESN 1263 | HG03365 AFR ESN 1264 | HG03366 AFR ESN 1265 | HG03367 AFR ESN 1266 | HG03368 AFR ESN 1267 | HG03369 AFR ESN 1268 | HG03370 AFR ESN 1269 | HG03371 AFR ESN 1270 | HG03372 AFR ESN 1271 | HG03373 AFR ESN 1272 | HG03374 AFR ESN 1273 | HG00171 EUR FIN 1274 | HG00173 EUR FIN 1275 | HG00174 EUR FIN 1276 | HG00176 EUR FIN 1277 | HG00177 EUR FIN 1278 | HG00178 EUR FIN 1279 | HG00179 EUR FIN 1280 | HG00180 EUR FIN 1281 | HG00181 EUR FIN 1282 | HG00182 EUR FIN 1283 | HG00183 EUR FIN 1284 | HG00185 EUR FIN 1285 | HG00186 EUR FIN 1286 | HG00187 EUR FIN 1287 | HG00188 EUR FIN 1288 | HG00189 EUR FIN 1289 | HG00190 EUR FIN 1290 | HG00266 EUR FIN 1291 | HG00267 EUR FIN 1292 | HG00268 EUR FIN 1293 | HG00269 EUR FIN 1294 | HG00270 EUR FIN 1295 | HG00271 EUR FIN 1296 | HG00272 EUR FIN 1297 | HG00273 EUR FIN 1298 | HG00274 EUR FIN 1299 | HG00275 EUR FIN 1300 | HG00276 EUR FIN 1301 | HG00277 EUR FIN 1302 | HG00278 EUR FIN 1303 | HG00280 EUR FIN 1304 | HG00281 EUR FIN 1305 | HG00282 EUR FIN 1306 | HG00284 EUR FIN 1307 | HG00285 EUR FIN 1308 | HG00288 EUR FIN 1309 | HG00290 EUR FIN 1310 | HG00302 EUR FIN 1311 | HG00303 EUR FIN 1312 | HG00304 EUR FIN 1313 | HG00306 EUR FIN 1314 | HG00308 EUR FIN 1315 | HG00309 EUR FIN 1316 | HG00310 EUR FIN 1317 | HG00311 EUR FIN 1318 | HG00312 EUR FIN 1319 | HG00313 EUR FIN 1320 | HG00315 EUR FIN 1321 | HG00318 EUR FIN 1322 | HG00319 EUR FIN 1323 | HG00320 EUR FIN 1324 | HG00321 EUR FIN 1325 | HG00323 EUR FIN 1326 | HG00324 EUR FIN 1327 | HG00325 EUR FIN 1328 | HG00326 EUR FIN 1329 | HG00327 EUR FIN 1330 | HG00328 EUR FIN 1331 | HG00329 EUR FIN 1332 | HG00330 EUR FIN 1333 | HG00331 EUR FIN 1334 | HG00332 EUR FIN 1335 | HG00334 EUR FIN 1336 | HG00335 EUR FIN 1337 | HG00336 EUR FIN 1338 | HG00337 EUR FIN 1339 | HG00338 EUR FIN 1340 | HG00339 EUR FIN 1341 | HG00341 EUR FIN 1342 | HG00342 EUR FIN 1343 | HG00343 EUR FIN 1344 | HG00344 EUR FIN 1345 | HG00345 EUR FIN 1346 | HG00346 EUR FIN 1347 | HG00349 EUR FIN 1348 | HG00350 EUR FIN 1349 | HG00351 EUR FIN 1350 | HG00353 EUR FIN 1351 | HG00355 EUR FIN 1352 | HG00356 EUR FIN 1353 | HG00357 EUR FIN 1354 | HG00358 EUR FIN 1355 | HG00359 EUR FIN 1356 | HG00360 EUR FIN 1357 | HG00361 EUR FIN 1358 | HG00362 EUR FIN 1359 | HG00364 EUR FIN 1360 | HG00365 EUR FIN 1361 | HG00366 EUR FIN 1362 | HG00367 EUR FIN 1363 | HG00368 EUR FIN 1364 | HG00369 EUR FIN 1365 | HG00371 EUR FIN 1366 | HG00372 EUR FIN 1367 | HG00373 EUR FIN 1368 | HG00375 EUR FIN 1369 | HG00376 EUR FIN 1370 | HG00377 EUR FIN 1371 | HG00378 EUR FIN 1372 | HG00379 EUR FIN 1373 | HG00380 EUR FIN 1374 | HG00381 EUR FIN 1375 | HG00382 EUR FIN 1376 | HG00383 EUR FIN 1377 | HG00384 EUR FIN 1378 | HG00144 EUR GBR 1379 | HG00155 EUR GBR 1380 | HG00146 EUR GBR 1381 | HG00147 EUR GBR 1382 | HG00153 EUR GBR 1383 | HG00158 EUR GBR 1384 | HG00247 EUR GBR 1385 | HG00248 EUR GBR 1386 | HG00096 EUR GBR 1387 | HG00097 EUR GBR 1388 | HG00098 EUR GBR 1389 | HG00099 EUR GBR 1390 | HG00100 EUR GBR 1391 | HG00101 EUR GBR 1392 | HG00102 EUR GBR 1393 | HG00103 EUR GBR 1394 | HG00104 EUR GBR 1395 | HG00105 EUR GBR 1396 | HG00106 EUR GBR 1397 | HG00107 EUR GBR 1398 | HG00108 EUR GBR 1399 | HG00109 EUR GBR 1400 | HG00110 EUR GBR 1401 | HG00111 EUR GBR 1402 | HG00112 EUR GBR 1403 | HG00113 EUR GBR 1404 | HG00114 EUR GBR 1405 | HG00115 EUR GBR 1406 | HG00116 EUR GBR 1407 | HG00117 EUR GBR 1408 | HG00118 EUR GBR 1409 | HG00119 EUR GBR 1410 | HG00120 EUR GBR 1411 | HG00121 EUR GBR 1412 | HG00122 EUR GBR 1413 | HG00123 EUR GBR 1414 | HG00124 EUR GBR 1415 | HG00125 EUR GBR 1416 | HG00126 EUR GBR 1417 | HG00127 EUR GBR 1418 | HG00128 EUR GBR 1419 | HG00129 EUR GBR 1420 | HG00130 EUR GBR 1421 | HG00131 EUR GBR 1422 | HG00132 EUR GBR 1423 | HG00133 EUR GBR 1424 | HG00134 EUR GBR 1425 | HG00135 EUR GBR 1426 | HG00136 EUR GBR 1427 | HG00137 EUR GBR 1428 | HG00138 EUR GBR 1429 | HG00139 EUR GBR 1430 | HG00140 EUR GBR 1431 | HG00141 EUR GBR 1432 | HG00142 EUR GBR 1433 | HG00143 EUR GBR 1434 | HG00145 EUR GBR 1435 | HG00148 EUR GBR 1436 | HG00149 EUR GBR 1437 | HG00150 EUR GBR 1438 | HG00151 EUR GBR 1439 | HG00152 EUR GBR 1440 | HG00154 EUR GBR 1441 | HG00156 EUR GBR 1442 | HG00157 EUR GBR 1443 | HG00159 EUR GBR 1444 | HG00160 EUR GBR 1445 | HG00231 EUR GBR 1446 | HG00232 EUR GBR 1447 | HG00233 EUR GBR 1448 | HG00234 EUR GBR 1449 | HG00235 EUR GBR 1450 | HG00236 EUR GBR 1451 | HG00237 EUR GBR 1452 | HG00238 EUR GBR 1453 | HG00239 EUR GBR 1454 | HG00240 EUR GBR 1455 | HG00242 EUR GBR 1456 | HG00243 EUR GBR 1457 | HG00244 EUR GBR 1458 | HG00245 EUR GBR 1459 | HG00246 EUR GBR 1460 | HG00249 EUR GBR 1461 | HG00250 EUR GBR 1462 | HG00251 EUR GBR 1463 | HG00252 EUR GBR 1464 | HG00253 EUR GBR 1465 | HG00254 EUR GBR 1466 | HG00255 EUR GBR 1467 | HG00256 EUR GBR 1468 | HG00257 EUR GBR 1469 | HG00258 EUR GBR 1470 | HG00259 EUR GBR 1471 | HG00260 EUR GBR 1472 | HG00261 EUR GBR 1473 | HG00262 EUR GBR 1474 | HG00263 EUR GBR 1475 | HG00264 EUR GBR 1476 | HG00265 EUR GBR 1477 | HG01334 EUR GBR 1478 | HG01789 EUR GBR 1479 | HG01790 EUR GBR 1480 | HG01791 EUR GBR 1481 | HG02215 EUR GBR 1482 | HG04301 EUR GBR 1483 | HG04302 EUR GBR 1484 | HG04303 EUR GBR 1485 | NA20868 SAS GIH 1486 | NA20871 SAS GIH 1487 | NA20886 SAS GIH 1488 | NA20898 SAS GIH 1489 | NA20909 SAS GIH 1490 | NA20910 SAS GIH 1491 | NA20845 SAS GIH 1492 | NA20846 SAS GIH 1493 | NA20847 SAS GIH 1494 | NA20849 SAS GIH 1495 | NA20850 SAS GIH 1496 | NA20851 SAS GIH 1497 | NA20852 SAS GIH 1498 | NA20853 SAS GIH 1499 | NA20854 SAS GIH 1500 | NA20856 SAS GIH 1501 | NA20858 SAS GIH 1502 | NA20859 SAS GIH 1503 | NA20861 SAS GIH 1504 | NA20862 SAS GIH 1505 | NA20863 SAS GIH 1506 | NA20864 SAS GIH 1507 | NA20866 SAS GIH 1508 | NA20867 SAS GIH 1509 | NA20869 SAS GIH 1510 | NA20870 SAS GIH 1511 | NA20872 SAS GIH 1512 | NA20873 SAS GIH 1513 | NA20874 SAS GIH 1514 | NA20875 SAS GIH 1515 | NA20876 SAS GIH 1516 | NA20877 SAS GIH 1517 | NA20878 SAS GIH 1518 | NA20879 SAS GIH 1519 | NA20881 SAS GIH 1520 | NA20882 SAS GIH 1521 | NA20883 SAS GIH 1522 | NA20884 SAS GIH 1523 | NA20885 SAS GIH 1524 | NA20887 SAS GIH 1525 | NA20888 SAS GIH 1526 | NA20889 SAS GIH 1527 | NA20890 SAS GIH 1528 | NA20891 SAS GIH 1529 | NA20892 SAS GIH 1530 | NA20893 SAS GIH 1531 | NA20894 SAS GIH 1532 | NA20895 SAS GIH 1533 | NA20896 SAS GIH 1534 | NA20897 SAS GIH 1535 | NA20899 SAS GIH 1536 | NA20900 SAS GIH 1537 | NA20901 SAS GIH 1538 | NA20902 SAS GIH 1539 | NA20903 SAS GIH 1540 | NA20904 SAS GIH 1541 | NA20905 SAS GIH 1542 | NA20906 SAS GIH 1543 | NA20907 SAS GIH 1544 | NA20908 SAS GIH 1545 | NA20911 SAS GIH 1546 | NA21086 SAS GIH 1547 | NA21087 SAS GIH 1548 | NA21088 SAS GIH 1549 | NA21089 SAS GIH 1550 | NA21090 SAS GIH 1551 | NA21091 SAS GIH 1552 | NA21092 SAS GIH 1553 | NA21093 SAS GIH 1554 | NA21094 SAS GIH 1555 | NA21095 SAS GIH 1556 | NA21097 SAS GIH 1557 | NA21098 SAS GIH 1558 | NA21099 SAS GIH 1559 | NA21100 SAS GIH 1560 | NA21101 SAS GIH 1561 | NA21102 SAS GIH 1562 | NA21103 SAS GIH 1563 | NA21104 SAS GIH 1564 | NA21105 SAS GIH 1565 | NA21106 SAS GIH 1566 | NA21107 SAS GIH 1567 | NA21108 SAS GIH 1568 | NA21109 SAS GIH 1569 | NA21110 SAS GIH 1570 | NA21111 SAS GIH 1571 | NA21112 SAS GIH 1572 | NA21113 SAS GIH 1573 | NA21114 SAS GIH 1574 | NA21115 SAS GIH 1575 | NA21116 SAS GIH 1576 | NA21117 SAS GIH 1577 | NA21118 SAS GIH 1578 | NA21119 SAS GIH 1579 | NA21120 SAS GIH 1580 | NA21121 SAS GIH 1581 | NA21122 SAS GIH 1582 | NA21123 SAS GIH 1583 | NA21124 SAS GIH 1584 | NA21125 SAS GIH 1585 | NA21126 SAS GIH 1586 | NA21127 SAS GIH 1587 | NA21128 SAS GIH 1588 | NA21129 SAS GIH 1589 | NA21130 SAS GIH 1590 | NA21133 SAS GIH 1591 | NA21134 SAS GIH 1592 | NA21135 SAS GIH 1593 | NA21137 SAS GIH 1594 | NA21141 SAS GIH 1595 | NA21142 SAS GIH 1596 | NA21143 SAS GIH 1597 | NA21144 SAS GIH 1598 | HG03024 AFR GWD 1599 | HG03025 AFR GWD 1600 | HG03026 AFR GWD 1601 | HG03027 AFR GWD 1602 | HG03028 AFR GWD 1603 | HG03029 AFR GWD 1604 | HG03033 AFR GWD 1605 | HG03034 AFR GWD 1606 | HG03035 AFR GWD 1607 | HG03039 AFR GWD 1608 | HG03040 AFR GWD 1609 | HG03041 AFR GWD 1610 | HG03045 AFR GWD 1611 | HG03046 AFR GWD 1612 | HG03047 AFR GWD 1613 | HG03048 AFR GWD 1614 | HG03049 AFR GWD 1615 | HG03050 AFR GWD 1616 | HG03240 AFR GWD 1617 | HG03241 AFR GWD 1618 | HG03242 AFR GWD 1619 | HG03246 AFR GWD 1620 | HG03247 AFR GWD 1621 | HG03248 AFR GWD 1622 | HG03249 AFR GWD 1623 | HG03250 AFR GWD 1624 | HG03251 AFR GWD 1625 | HG03258 AFR GWD 1626 | HG03259 AFR GWD 1627 | HG03260 AFR GWD 1628 | HG03538 AFR GWD 1629 | HG03539 AFR GWD 1630 | HG03540 AFR GWD 1631 | HG02461 AFR GWD 1632 | HG02462 AFR GWD 1633 | HG02463 AFR GWD 1634 | HG02464 AFR GWD 1635 | HG02465 AFR GWD 1636 | HG02466 AFR GWD 1637 | HG02561 AFR GWD 1638 | HG02562 AFR GWD 1639 | HG02563 AFR GWD 1640 | HG02567 AFR GWD 1641 | HG02568 AFR GWD 1642 | HG02569 AFR GWD 1643 | HG02570 AFR GWD 1644 | HG02571 AFR GWD 1645 | HG02572 AFR GWD 1646 | HG02573 AFR GWD 1647 | HG02574 AFR GWD 1648 | HG02575 AFR GWD 1649 | HG02582 AFR GWD 1650 | HG02583 AFR GWD 1651 | HG02584 AFR GWD 1652 | HG02585 AFR GWD 1653 | HG02586 AFR GWD 1654 | HG02587 AFR GWD 1655 | HG02588 AFR GWD 1656 | HG02589 AFR GWD 1657 | HG02590 AFR GWD 1658 | HG02594 AFR GWD 1659 | HG02595 AFR GWD 1660 | HG02596 AFR GWD 1661 | HG02610 AFR GWD 1662 | HG02611 AFR GWD 1663 | HG02612 AFR GWD 1664 | HG02613 AFR GWD 1665 | HG02614 AFR GWD 1666 | HG02615 AFR GWD 1667 | HG02620 AFR GWD 1668 | HG02621 AFR GWD 1669 | HG02622 AFR GWD 1670 | HG02623 AFR GWD 1671 | HG02624 AFR GWD 1672 | HG02625 AFR GWD 1673 | HG02628 AFR GWD 1674 | HG02629 AFR GWD 1675 | HG02630 AFR GWD 1676 | HG02634 AFR GWD 1677 | HG02635 AFR GWD 1678 | HG02636 AFR GWD 1679 | HG02642 AFR GWD 1680 | HG02643 AFR GWD 1681 | HG02644 AFR GWD 1682 | HG02645 AFR GWD 1683 | HG02646 AFR GWD 1684 | HG02647 AFR GWD 1685 | HG02666 AFR GWD 1686 | HG02667 AFR GWD 1687 | HG02668 AFR GWD 1688 | HG02675 AFR GWD 1689 | HG02676 AFR GWD 1690 | HG02677 AFR GWD 1691 | HG02678 AFR GWD 1692 | HG02679 AFR GWD 1693 | HG02680 AFR GWD 1694 | HG02702 AFR GWD 1695 | HG02703 AFR GWD 1696 | HG02704 AFR GWD 1697 | HG02715 AFR GWD 1698 | HG02716 AFR GWD 1699 | HG02717 AFR GWD 1700 | HG02721 AFR GWD 1701 | HG02722 AFR GWD 1702 | HG02723 AFR GWD 1703 | HG02756 AFR GWD 1704 | HG02757 AFR GWD 1705 | HG02758 AFR GWD 1706 | HG02759 AFR GWD 1707 | HG02760 AFR GWD 1708 | HG02761 AFR GWD 1709 | HG02762 AFR GWD 1710 | HG02763 AFR GWD 1711 | HG02764 AFR GWD 1712 | HG02768 AFR GWD 1713 | HG02769 AFR GWD 1714 | HG02770 AFR GWD 1715 | HG02771 AFR GWD 1716 | HG02772 AFR GWD 1717 | HG02773 AFR GWD 1718 | HG02798 AFR GWD 1719 | HG02799 AFR GWD 1720 | HG02800 AFR GWD 1721 | HG02804 AFR GWD 1722 | HG02805 AFR GWD 1723 | HG02806 AFR GWD 1724 | HG02807 AFR GWD 1725 | HG02808 AFR GWD 1726 | HG02809 AFR GWD 1727 | HG02810 AFR GWD 1728 | HG02811 AFR GWD 1729 | HG02812 AFR GWD 1730 | HG02813 AFR GWD 1731 | HG02814 AFR GWD 1732 | HG02815 AFR GWD 1733 | HG02816 AFR GWD 1734 | HG02817 AFR GWD 1735 | HG02818 AFR GWD 1736 | HG02819 AFR GWD 1737 | HG02820 AFR GWD 1738 | HG02821 AFR GWD 1739 | HG02836 AFR GWD 1740 | HG02837 AFR GWD 1741 | HG02838 AFR GWD 1742 | HG02839 AFR GWD 1743 | HG02840 AFR GWD 1744 | HG02841 AFR GWD 1745 | HG02851 AFR GWD 1746 | HG02852 AFR GWD 1747 | HG02853 AFR GWD 1748 | HG02854 AFR GWD 1749 | HG02855 AFR GWD 1750 | HG02856 AFR GWD 1751 | HG02860 AFR GWD 1752 | HG02861 AFR GWD 1753 | HG02862 AFR GWD 1754 | HG02869 AFR GWD 1755 | HG02870 AFR GWD 1756 | HG02871 AFR GWD 1757 | HG02878 AFR GWD 1758 | HG02879 AFR GWD 1759 | HG02880 AFR GWD 1760 | HG02881 AFR GWD 1761 | HG02882 AFR GWD 1762 | HG02883 AFR GWD 1763 | HG02884 AFR GWD 1764 | HG02885 AFR GWD 1765 | HG02886 AFR GWD 1766 | HG02887 AFR GWD 1767 | HG02888 AFR GWD 1768 | HG02889 AFR GWD 1769 | HG02890 AFR GWD 1770 | HG02891 AFR GWD 1771 | HG02892 AFR GWD 1772 | HG02895 AFR GWD 1773 | HG02896 AFR GWD 1774 | HG02897 AFR GWD 1775 | HG02982 AFR GWD 1776 | HG02983 AFR GWD 1777 | HG02984 AFR GWD 1778 | HG01500 EUR IBS 1779 | HG01501 EUR IBS 1780 | HG01502 EUR IBS 1781 | HG01503 EUR IBS 1782 | HG01504 EUR IBS 1783 | HG01505 EUR IBS 1784 | HG01506 EUR IBS 1785 | HG01507 EUR IBS 1786 | HG01508 EUR IBS 1787 | HG01509 EUR IBS 1788 | HG01510 EUR IBS 1789 | HG01511 EUR IBS 1790 | HG01512 EUR IBS 1791 | HG01513 EUR IBS 1792 | HG01514 EUR IBS 1793 | HG01515 EUR IBS 1794 | HG01516 EUR IBS 1795 | HG01517 EUR IBS 1796 | HG01518 EUR IBS 1797 | HG01519 EUR IBS 1798 | HG01520 EUR IBS 1799 | HG01521 EUR IBS 1800 | HG01522 EUR IBS 1801 | HG01523 EUR IBS 1802 | HG01524 EUR IBS 1803 | HG01525 EUR IBS 1804 | HG01526 EUR IBS 1805 | HG01527 EUR IBS 1806 | HG01528 EUR IBS 1807 | HG01529 EUR IBS 1808 | HG01530 EUR IBS 1809 | HG01531 EUR IBS 1810 | HG01532 EUR IBS 1811 | HG01536 EUR IBS 1812 | HG01537 EUR IBS 1813 | HG01538 EUR IBS 1814 | HG01631 EUR IBS 1815 | HG01632 EUR IBS 1816 | HG01633 EUR IBS 1817 | HG01628 EUR IBS 1818 | HG01629 EUR IBS 1819 | HG01630 EUR IBS 1820 | HG01625 EUR IBS 1821 | HG01626 EUR IBS 1822 | HG01627 EUR IBS 1823 | HG01622 EUR IBS 1824 | HG01623 EUR IBS 1825 | HG01624 EUR IBS 1826 | HG01619 EUR IBS 1827 | HG01620 EUR IBS 1828 | HG01621 EUR IBS 1829 | HG01616 EUR IBS 1830 | HG01617 EUR IBS 1831 | HG01618 EUR IBS 1832 | HG01613 EUR IBS 1833 | HG01614 EUR IBS 1834 | HG01615 EUR IBS 1835 | HG01610 EUR IBS 1836 | HG01611 EUR IBS 1837 | HG01612 EUR IBS 1838 | HG01607 EUR IBS 1839 | HG01608 EUR IBS 1840 | HG01609 EUR IBS 1841 | HG01604 EUR IBS 1842 | HG01605 EUR IBS 1843 | HG01606 EUR IBS 1844 | HG01601 EUR IBS 1845 | HG01602 EUR IBS 1846 | HG01603 EUR IBS 1847 | HG01667 EUR IBS 1848 | HG01668 EUR IBS 1849 | HG01669 EUR IBS 1850 | HG01670 EUR IBS 1851 | HG01671 EUR IBS 1852 | HG01672 EUR IBS 1853 | HG01673 EUR IBS 1854 | HG01674 EUR IBS 1855 | HG01675 EUR IBS 1856 | HG01676 EUR IBS 1857 | HG01677 EUR IBS 1858 | HG01678 EUR IBS 1859 | HG01679 EUR IBS 1860 | HG01680 EUR IBS 1861 | HG01681 EUR IBS 1862 | HG01682 EUR IBS 1863 | HG01683 EUR IBS 1864 | HG01684 EUR IBS 1865 | HG01685 EUR IBS 1866 | HG01686 EUR IBS 1867 | HG01687 EUR IBS 1868 | HG01694 EUR IBS 1869 | HG01695 EUR IBS 1870 | HG01696 EUR IBS 1871 | HG01697 EUR IBS 1872 | HG01698 EUR IBS 1873 | HG01699 EUR IBS 1874 | HG01700 EUR IBS 1875 | HG01701 EUR IBS 1876 | HG01702 EUR IBS 1877 | HG01703 EUR IBS 1878 | HG01704 EUR IBS 1879 | HG01705 EUR IBS 1880 | HG01706 EUR IBS 1881 | HG01707 EUR IBS 1882 | HG01708 EUR IBS 1883 | HG01709 EUR IBS 1884 | HG01710 EUR IBS 1885 | HG01711 EUR IBS 1886 | HG01746 EUR IBS 1887 | HG01747 EUR IBS 1888 | HG01748 EUR IBS 1889 | HG01755 EUR IBS 1890 | HG01756 EUR IBS 1891 | HG01757 EUR IBS 1892 | HG01761 EUR IBS 1893 | HG01762 EUR IBS 1894 | HG01763 EUR IBS 1895 | HG01764 EUR IBS 1896 | HG01765 EUR IBS 1897 | HG01766 EUR IBS 1898 | HG01767 EUR IBS 1899 | HG01768 EUR IBS 1900 | HG01769 EUR IBS 1901 | HG01770 EUR IBS 1902 | HG01771 EUR IBS 1903 | HG01772 EUR IBS 1904 | HG01773 EUR IBS 1905 | HG01774 EUR IBS 1906 | HG01775 EUR IBS 1907 | HG01776 EUR IBS 1908 | HG01777 EUR IBS 1909 | HG01778 EUR IBS 1910 | HG01779 EUR IBS 1911 | HG01780 EUR IBS 1912 | HG01781 EUR IBS 1913 | HG01782 EUR IBS 1914 | HG01783 EUR IBS 1915 | HG01784 EUR IBS 1916 | HG01785 EUR IBS 1917 | HG01786 EUR IBS 1918 | HG01787 EUR IBS 1919 | HG02217 EUR IBS 1920 | HG02218 EUR IBS 1921 | HG02219 EUR IBS 1922 | HG02220 EUR IBS 1923 | HG02221 EUR IBS 1924 | HG02222 EUR IBS 1925 | HG02223 EUR IBS 1926 | HG02224 EUR IBS 1927 | HG02225 EUR IBS 1928 | HG02229 EUR IBS 1929 | HG02230 EUR IBS 1930 | HG02231 EUR IBS 1931 | HG02232 EUR IBS 1932 | HG02233 EUR IBS 1933 | HG02234 EUR IBS 1934 | HG02235 EUR IBS 1935 | HG02236 EUR IBS 1936 | HG02237 EUR IBS 1937 | HG02238 EUR IBS 1938 | HG02239 EUR IBS 1939 | HG02240 EUR IBS 1940 | HG03871 SAS ITU 1941 | HG04206 SAS ITU 1942 | HG04239 SAS ITU 1943 | HG03713 SAS ITU 1944 | HG03715 SAS ITU 1945 | HG03719 SAS ITU 1946 | HG03722 SAS ITU 1947 | HG03725 SAS ITU 1948 | HG03721 SAS ITU 1949 | HG03727 SAS ITU 1950 | HG03732 SAS ITU 1951 | HG03772 SAS ITU 1952 | HG03773 SAS ITU 1953 | HG03874 SAS ITU 1954 | HG03876 SAS ITU 1955 | HG03879 SAS ITU 1956 | HG04018 SAS ITU 1957 | HG03714 SAS ITU 1958 | HG03716 SAS ITU 1959 | HG03717 SAS ITU 1960 | HG03718 SAS ITU 1961 | HG03730 SAS ITU 1962 | HG03723 SAS ITU 1963 | HG03729 SAS ITU 1964 | HG03731 SAS ITU 1965 | HG03742 SAS ITU 1966 | HG03770 SAS ITU 1967 | HG03771 SAS ITU 1968 | HG03781 SAS ITU 1969 | HG03779 SAS ITU 1970 | HG03774 SAS ITU 1971 | HG03775 SAS ITU 1972 | HG03777 SAS ITU 1973 | HG03784 SAS ITU 1974 | HG03786 SAS ITU 1975 | HG03785 SAS ITU 1976 | HG03790 SAS ITU 1977 | HG03792 SAS ITU 1978 | HG03787 SAS ITU 1979 | HG03788 SAS ITU 1980 | HG03789 SAS ITU 1981 | HG03782 SAS ITU 1982 | HG03783 SAS ITU 1983 | HG03866 SAS ITU 1984 | HG03873 SAS ITU 1985 | HG03861 SAS ITU 1986 | HG03863 SAS ITU 1987 | HG03864 SAS ITU 1988 | HG03862 SAS ITU 1989 | HG03869 SAS ITU 1990 | HG03872 SAS ITU 1991 | HG03870 SAS ITU 1992 | HG03867 SAS ITU 1993 | HG03780 SAS ITU 1994 | HG03882 SAS ITU 1995 | HG03963 SAS ITU 1996 | HG03960 SAS ITU 1997 | HG03968 SAS ITU 1998 | HG03969 SAS ITU 1999 | HG03977 SAS ITU 2000 | HG03978 SAS ITU 2001 | HG03971 SAS ITU 2002 | HG03972 SAS ITU 2003 | HG03974 SAS ITU 2004 | HG03973 SAS ITU 2005 | HG03976 SAS ITU 2006 | HG03720 SAS ITU 2007 | HG04001 SAS ITU 2008 | HG04002 SAS ITU 2009 | HG04019 SAS ITU 2010 | HG04024 SAS ITU 2011 | HG04023 SAS ITU 2012 | HG04025 SAS ITU 2013 | HG04014 SAS ITU 2014 | HG04020 SAS ITU 2015 | HG04015 SAS ITU 2016 | HG04017 SAS ITU 2017 | HG04026 SAS ITU 2018 | HG04022 SAS ITU 2019 | HG04050 SAS ITU 2020 | HG04053 SAS ITU 2021 | HG04055 SAS ITU 2022 | HG04054 SAS ITU 2023 | HG04056 SAS ITU 2024 | HG04061 SAS ITU 2025 | HG04062 SAS ITU 2026 | HG04060 SAS ITU 2027 | HG04063 SAS ITU 2028 | HG04058 SAS ITU 2029 | HG04059 SAS ITU 2030 | HG04076 SAS ITU 2031 | HG04080 SAS ITU 2032 | HG04090 SAS ITU 2033 | HG04093 SAS ITU 2034 | HG04094 SAS ITU 2035 | HG04096 SAS ITU 2036 | HG04098 SAS ITU 2037 | HG03778 SAS ITU 2038 | HG04118 SAS ITU 2039 | HG04209 SAS ITU 2040 | HG04198 SAS ITU 2041 | HG04200 SAS ITU 2042 | HG04217 SAS ITU 2043 | HG04202 SAS ITU 2044 | HG04211 SAS ITU 2045 | HG04212 SAS ITU 2046 | HG04222 SAS ITU 2047 | HG04214 SAS ITU 2048 | HG04216 SAS ITU 2049 | HG04219 SAS ITU 2050 | HG04225 SAS ITU 2051 | HG04235 SAS ITU 2052 | HG04238 SAS ITU 2053 | HG03875 SAS ITU 2054 | HG03965 SAS ITU 2055 | HG04070 SAS ITU 2056 | HG03868 SAS ITU 2057 | HG03967 SAS ITU 2058 | NA18939 EAS JPT 2059 | NA18940 EAS JPT 2060 | NA18941 EAS JPT 2061 | NA18942 EAS JPT 2062 | NA18943 EAS JPT 2063 | NA18944 EAS JPT 2064 | NA18945 EAS JPT 2065 | NA18946 EAS JPT 2066 | NA18947 EAS JPT 2067 | NA18948 EAS JPT 2068 | NA18949 EAS JPT 2069 | NA18950 EAS JPT 2070 | NA18951 EAS JPT 2071 | NA18952 EAS JPT 2072 | NA18953 EAS JPT 2073 | NA18954 EAS JPT 2074 | NA18955 EAS JPT 2075 | NA18956 EAS JPT 2076 | NA18957 EAS JPT 2077 | NA18959 EAS JPT 2078 | NA18960 EAS JPT 2079 | NA18961 EAS JPT 2080 | NA18962 EAS JPT 2081 | NA18963 EAS JPT 2082 | NA18964 EAS JPT 2083 | NA18965 EAS JPT 2084 | NA18966 EAS JPT 2085 | NA18967 EAS JPT 2086 | NA18968 EAS JPT 2087 | NA18969 EAS JPT 2088 | NA18970 EAS JPT 2089 | NA18971 EAS JPT 2090 | NA18972 EAS JPT 2091 | NA18973 EAS JPT 2092 | NA18974 EAS JPT 2093 | NA18975 EAS JPT 2094 | NA18976 EAS JPT 2095 | NA18977 EAS JPT 2096 | NA18978 EAS JPT 2097 | NA18979 EAS JPT 2098 | NA18980 EAS JPT 2099 | NA18981 EAS JPT 2100 | NA18982 EAS JPT 2101 | NA18983 EAS JPT 2102 | NA18984 EAS JPT 2103 | NA18985 EAS JPT 2104 | NA18986 EAS JPT 2105 | NA18987 EAS JPT 2106 | NA18988 EAS JPT 2107 | NA18989 EAS JPT 2108 | NA18990 EAS JPT 2109 | NA18991 EAS JPT 2110 | NA18992 EAS JPT 2111 | NA18993 EAS JPT 2112 | NA18994 EAS JPT 2113 | NA18995 EAS JPT 2114 | NA18997 EAS JPT 2115 | NA18998 EAS JPT 2116 | NA18999 EAS JPT 2117 | NA19000 EAS JPT 2118 | NA19001 EAS JPT 2119 | NA19002 EAS JPT 2120 | NA19003 EAS JPT 2121 | NA19004 EAS JPT 2122 | NA19005 EAS JPT 2123 | NA19006 EAS JPT 2124 | NA19007 EAS JPT 2125 | NA19009 EAS JPT 2126 | NA19010 EAS JPT 2127 | NA19011 EAS JPT 2128 | NA19012 EAS JPT 2129 | NA19054 EAS JPT 2130 | NA19055 EAS JPT 2131 | NA19056 EAS JPT 2132 | NA19057 EAS JPT 2133 | NA19058 EAS JPT 2134 | NA19059 EAS JPT 2135 | NA19060 EAS JPT 2136 | NA19062 EAS JPT 2137 | NA19063 EAS JPT 2138 | NA19064 EAS JPT 2139 | NA19065 EAS JPT 2140 | NA19066 EAS JPT 2141 | NA19067 EAS JPT 2142 | NA19068 EAS JPT 2143 | NA19070 EAS JPT 2144 | NA19072 EAS JPT 2145 | NA19074 EAS JPT 2146 | NA19075 EAS JPT 2147 | NA19076 EAS JPT 2148 | NA19077 EAS JPT 2149 | NA19078 EAS JPT 2150 | NA19079 EAS JPT 2151 | NA19080 EAS JPT 2152 | NA19081 EAS JPT 2153 | NA19082 EAS JPT 2154 | NA19083 EAS JPT 2155 | NA19084 EAS JPT 2156 | NA19085 EAS JPT 2157 | NA19086 EAS JPT 2158 | NA19087 EAS JPT 2159 | NA19088 EAS JPT 2160 | NA19089 EAS JPT 2161 | NA19090 EAS JPT 2162 | NA19091 EAS JPT 2163 | HG01595 EAS KHV 2164 | HG01596 EAS KHV 2165 | HG01597 EAS KHV 2166 | HG01598 EAS KHV 2167 | HG01599 EAS KHV 2168 | HG01600 EAS KHV 2169 | HG01840 EAS KHV 2170 | HG01841 EAS KHV 2171 | HG01842 EAS KHV 2172 | HG01843 EAS KHV 2173 | HG01844 EAS KHV 2174 | HG01845 EAS KHV 2175 | HG01846 EAS KHV 2176 | HG01847 EAS KHV 2177 | HG01848 EAS KHV 2178 | HG01849 EAS KHV 2179 | HG01850 EAS KHV 2180 | HG01851 EAS KHV 2181 | HG01852 EAS KHV 2182 | HG01853 EAS KHV 2183 | HG01855 EAS KHV 2184 | HG01857 EAS KHV 2185 | HG01858 EAS KHV 2186 | HG01859 EAS KHV 2187 | HG01860 EAS KHV 2188 | HG01861 EAS KHV 2189 | HG01862 EAS KHV 2190 | HG01863 EAS KHV 2191 | HG01864 EAS KHV 2192 | HG01865 EAS KHV 2193 | HG01866 EAS KHV 2194 | HG01867 EAS KHV 2195 | HG01868 EAS KHV 2196 | HG01869 EAS KHV 2197 | HG01870 EAS KHV 2198 | HG01871 EAS KHV 2199 | HG01872 EAS KHV 2200 | HG01873 EAS KHV 2201 | HG01874 EAS KHV 2202 | HG01878 EAS KHV 2203 | HG02015 EAS KHV 2204 | HG02016 EAS KHV 2205 | HG02017 EAS KHV 2206 | HG02018 EAS KHV 2207 | HG02019 EAS KHV 2208 | HG02020 EAS KHV 2209 | HG02021 EAS KHV 2210 | HG02023 EAS KHV 2211 | HG02024 EAS KHV 2212 | HG02025 EAS KHV 2213 | HG02026 EAS KHV 2214 | HG02027 EAS KHV 2215 | HG02028 EAS KHV 2216 | HG02029 EAS KHV 2217 | HG02030 EAS KHV 2218 | HG02031 EAS KHV 2219 | HG02032 EAS KHV 2220 | HG02035 EAS KHV 2221 | HG02040 EAS KHV 2222 | HG02046 EAS KHV 2223 | HG02047 EAS KHV 2224 | HG02048 EAS KHV 2225 | HG02049 EAS KHV 2226 | HG02050 EAS KHV 2227 | HG02056 EAS KHV 2228 | HG02057 EAS KHV 2229 | HG02058 EAS KHV 2230 | HG02059 EAS KHV 2231 | HG02060 EAS KHV 2232 | HG02061 EAS KHV 2233 | HG02064 EAS KHV 2234 | HG02067 EAS KHV 2235 | HG02068 EAS KHV 2236 | HG02069 EAS KHV 2237 | HG02070 EAS KHV 2238 | HG02071 EAS KHV 2239 | HG02072 EAS KHV 2240 | HG02073 EAS KHV 2241 | HG02074 EAS KHV 2242 | HG02075 EAS KHV 2243 | HG02076 EAS KHV 2244 | HG02077 EAS KHV 2245 | HG02078 EAS KHV 2246 | HG02079 EAS KHV 2247 | HG02080 EAS KHV 2248 | HG02081 EAS KHV 2249 | HG02082 EAS KHV 2250 | HG02083 EAS KHV 2251 | HG02084 EAS KHV 2252 | HG02085 EAS KHV 2253 | HG02086 EAS KHV 2254 | HG02087 EAS KHV 2255 | HG02088 EAS KHV 2256 | HG02113 EAS KHV 2257 | HG02116 EAS KHV 2258 | HG02120 EAS KHV 2259 | HG02121 EAS KHV 2260 | HG02122 EAS KHV 2261 | HG02126 EAS KHV 2262 | HG02127 EAS KHV 2263 | HG02128 EAS KHV 2264 | HG02129 EAS KHV 2265 | HG02130 EAS KHV 2266 | HG02131 EAS KHV 2267 | HG02132 EAS KHV 2268 | HG02133 EAS KHV 2269 | HG02134 EAS KHV 2270 | HG02135 EAS KHV 2271 | HG02136 EAS KHV 2272 | HG02137 EAS KHV 2273 | HG02138 EAS KHV 2274 | HG02139 EAS KHV 2275 | HG02140 EAS KHV 2276 | HG02141 EAS KHV 2277 | HG02142 EAS KHV 2278 | HG02512 EAS KHV 2279 | HG02513 EAS KHV 2280 | HG02514 EAS KHV 2281 | HG02521 EAS KHV 2282 | HG02522 EAS KHV 2283 | HG02523 EAS KHV 2284 | HG02524 EAS KHV 2285 | HG02525 EAS KHV 2286 | HG02526 EAS KHV 2287 | NA19313 AFR LWK 2288 | NA19331 AFR LWK 2289 | NA19381 AFR LWK 2290 | NA19382 AFR LWK 2291 | NA19432 AFR LWK 2292 | NA19434 AFR LWK 2293 | NA19444 AFR LWK 2294 | NA19445 AFR LWK 2295 | NA19453 AFR LWK 2296 | NA19469 AFR LWK 2297 | NA19470 AFR LWK 2298 | NA19017 AFR LWK 2299 | NA19019 AFR LWK 2300 | NA19020 AFR LWK 2301 | NA19023 AFR LWK 2302 | NA19024 AFR LWK 2303 | NA19025 AFR LWK 2304 | NA19026 AFR LWK 2305 | NA19027 AFR LWK 2306 | NA19028 AFR LWK 2307 | NA19030 AFR LWK 2308 | NA19031 AFR LWK 2309 | NA19035 AFR LWK 2310 | NA19036 AFR LWK 2311 | NA19037 AFR LWK 2312 | NA19038 AFR LWK 2313 | NA19041 AFR LWK 2314 | NA19042 AFR LWK 2315 | NA19043 AFR LWK 2316 | NA19044 AFR LWK 2317 | NA19046 AFR LWK 2318 | NA19307 AFR LWK 2319 | NA19308 AFR LWK 2320 | NA19309 AFR LWK 2321 | NA19310 AFR LWK 2322 | NA19311 AFR LWK 2323 | NA19312 AFR LWK 2324 | NA19314 AFR LWK 2325 | NA19315 AFR LWK 2326 | NA19316 AFR LWK 2327 | NA19317 AFR LWK 2328 | NA19318 AFR LWK 2329 | NA19319 AFR LWK 2330 | NA19320 AFR LWK 2331 | NA19321 AFR LWK 2332 | NA19323 AFR LWK 2333 | NA19324 AFR LWK 2334 | NA19327 AFR LWK 2335 | NA19328 AFR LWK 2336 | NA19332 AFR LWK 2337 | NA19334 AFR LWK 2338 | NA19338 AFR LWK 2339 | NA19346 AFR LWK 2340 | NA19347 AFR LWK 2341 | NA19350 AFR LWK 2342 | NA19351 AFR LWK 2343 | NA19352 AFR LWK 2344 | NA19355 AFR LWK 2345 | NA19359 AFR LWK 2346 | NA19360 AFR LWK 2347 | NA19371 AFR LWK 2348 | NA19372 AFR LWK 2349 | NA19373 AFR LWK 2350 | NA19374 AFR LWK 2351 | NA19375 AFR LWK 2352 | NA19376 AFR LWK 2353 | NA19377 AFR LWK 2354 | NA19378 AFR LWK 2355 | NA19379 AFR LWK 2356 | NA19380 AFR LWK 2357 | NA19383 AFR LWK 2358 | NA19384 AFR LWK 2359 | NA19385 AFR LWK 2360 | NA19390 AFR LWK 2361 | NA19391 AFR LWK 2362 | NA19393 AFR LWK 2363 | NA19394 AFR LWK 2364 | NA19395 AFR LWK 2365 | NA19396 AFR LWK 2366 | NA19397 AFR LWK 2367 | NA19398 AFR LWK 2368 | NA19399 AFR LWK 2369 | NA19401 AFR LWK 2370 | NA19403 AFR LWK 2371 | NA19404 AFR LWK 2372 | NA19428 AFR LWK 2373 | NA19429 AFR LWK 2374 | NA19430 AFR LWK 2375 | NA19431 AFR LWK 2376 | NA19435 AFR LWK 2377 | NA19436 AFR LWK 2378 | NA19437 AFR LWK 2379 | NA19438 AFR LWK 2380 | NA19439 AFR LWK 2381 | NA19440 AFR LWK 2382 | NA19443 AFR LWK 2383 | NA19446 AFR LWK 2384 | NA19448 AFR LWK 2385 | NA19449 AFR LWK 2386 | NA19451 AFR LWK 2387 | NA19452 AFR LWK 2388 | NA19454 AFR LWK 2389 | NA19455 AFR LWK 2390 | NA19456 AFR LWK 2391 | NA19457 AFR LWK 2392 | NA19461 AFR LWK 2393 | NA19462 AFR LWK 2394 | NA19463 AFR LWK 2395 | NA19466 AFR LWK 2396 | NA19467 AFR LWK 2397 | NA19468 AFR LWK 2398 | NA19471 AFR LWK 2399 | NA19472 AFR LWK 2400 | NA19473 AFR LWK 2401 | NA19474 AFR LWK 2402 | NA19475 AFR LWK 2403 | HG03057 AFR MSL 2404 | HG03060 AFR MSL 2405 | HG03074 AFR MSL 2406 | HG03077 AFR MSL 2407 | HG03078 AFR MSL 2408 | HG03084 AFR MSL 2409 | HG03085 AFR MSL 2410 | HG03086 AFR MSL 2411 | HG03378 AFR MSL 2412 | HG03383 AFR MSL 2413 | HG03393 AFR MSL 2414 | HG03397 AFR MSL 2415 | HG03398 AFR MSL 2416 | HG03431 AFR MSL 2417 | HG03432 AFR MSL 2418 | HG03433 AFR MSL 2419 | HG03436 AFR MSL 2420 | HG03437 AFR MSL 2421 | HG03442 AFR MSL 2422 | HG03445 AFR MSL 2423 | HG03446 AFR MSL 2424 | HG03457 AFR MSL 2425 | HG03460 AFR MSL 2426 | HG03461 AFR MSL 2427 | HG03462 AFR MSL 2428 | HG03472 AFR MSL 2429 | HG03547 AFR MSL 2430 | HG03548 AFR MSL 2431 | HG03549 AFR MSL 2432 | HG03556 AFR MSL 2433 | HG03557 AFR MSL 2434 | HG03558 AFR MSL 2435 | HG03565 AFR MSL 2436 | HG03566 AFR MSL 2437 | HG03567 AFR MSL 2438 | HG03052 AFR MSL 2439 | HG03053 AFR MSL 2440 | HG03054 AFR MSL 2441 | HG03055 AFR MSL 2442 | HG03056 AFR MSL 2443 | HG03058 AFR MSL 2444 | HG03059 AFR MSL 2445 | HG03061 AFR MSL 2446 | HG03062 AFR MSL 2447 | HG03063 AFR MSL 2448 | HG03064 AFR MSL 2449 | HG03065 AFR MSL 2450 | HG03066 AFR MSL 2451 | HG03069 AFR MSL 2452 | HG03072 AFR MSL 2453 | HG03073 AFR MSL 2454 | HG03076 AFR MSL 2455 | HG03079 AFR MSL 2456 | HG03080 AFR MSL 2457 | HG03081 AFR MSL 2458 | HG03082 AFR MSL 2459 | HG03088 AFR MSL 2460 | HG03091 AFR MSL 2461 | HG03095 AFR MSL 2462 | HG03096 AFR MSL 2463 | HG03097 AFR MSL 2464 | HG03098 AFR MSL 2465 | HG03209 AFR MSL 2466 | HG03212 AFR MSL 2467 | HG03224 AFR MSL 2468 | HG03225 AFR MSL 2469 | HG03376 AFR MSL 2470 | HG03380 AFR MSL 2471 | HG03382 AFR MSL 2472 | HG03384 AFR MSL 2473 | HG03399 AFR MSL 2474 | HG03385 AFR MSL 2475 | HG03388 AFR MSL 2476 | HG03390 AFR MSL 2477 | HG03391 AFR MSL 2478 | HG03381 AFR MSL 2479 | HG03394 AFR MSL 2480 | HG03401 AFR MSL 2481 | HG03402 AFR MSL 2482 | HG03408 AFR MSL 2483 | HG03410 AFR MSL 2484 | HG03411 AFR MSL 2485 | HG03419 AFR MSL 2486 | HG03428 AFR MSL 2487 | HG03439 AFR MSL 2488 | HG03449 AFR MSL 2489 | HG03450 AFR MSL 2490 | HG03451 AFR MSL 2491 | HG03452 AFR MSL 2492 | HG03453 AFR MSL 2493 | HG03466 AFR MSL 2494 | HG03468 AFR MSL 2495 | HG03454 AFR MSL 2496 | HG03455 AFR MSL 2497 | HG03456 AFR MSL 2498 | HG03458 AFR MSL 2499 | HG03459 AFR MSL 2500 | HG03464 AFR MSL 2501 | HG03465 AFR MSL 2502 | HG03438 AFR MSL 2503 | HG03470 AFR MSL 2504 | HG03469 AFR MSL 2505 | HG03471 AFR MSL 2506 | HG03473 AFR MSL 2507 | HG03474 AFR MSL 2508 | HG03476 AFR MSL 2509 | HG03477 AFR MSL 2510 | HG03478 AFR MSL 2511 | HG03479 AFR MSL 2512 | HG03480 AFR MSL 2513 | HG03484 AFR MSL 2514 | HG03485 AFR MSL 2515 | HG03486 AFR MSL 2516 | HG03559 AFR MSL 2517 | HG03563 AFR MSL 2518 | HG03564 AFR MSL 2519 | HG03569 AFR MSL 2520 | HG03571 AFR MSL 2521 | HG03572 AFR MSL 2522 | HG03574 AFR MSL 2523 | HG03575 AFR MSL 2524 | HG03576 AFR MSL 2525 | HG03577 AFR MSL 2526 | HG03578 AFR MSL 2527 | HG03579 AFR MSL 2528 | HG03582 AFR MSL 2529 | HG03583 AFR MSL 2530 | HG03584 AFR MSL 2531 | NA19672 AMR MXL 2532 | NA19674 AMR MXL 2533 | NA19734 AMR MXL 2534 | NA19735 AMR MXL 2535 | NA19737 AMR MXL 2536 | NA19738 AMR MXL 2537 | NA19740 AMR MXL 2538 | NA19741 AMR MXL 2539 | NA19742 AMR MXL 2540 | NA19752 AMR MXL 2541 | NA19753 AMR MXL 2542 | NA19754 AMR MXL 2543 | NA19764 AMR MXL 2544 | NA19766 AMR MXL 2545 | NA19792 AMR MXL 2546 | NA19797 AMR MXL 2547 | NA19798 AMR MXL 2548 | NA19648 AMR MXL 2549 | NA19649 AMR MXL 2550 | NA19650 AMR MXL 2551 | NA19669 AMR MXL 2552 | NA19670 AMR MXL 2553 | NA19671 AMR MXL 2554 | NA19675 AMR MXL 2555 | NA19676 AMR MXL 2556 | NA19677 AMR MXL 2557 | NA19651 AMR MXL 2558 | NA19652 AMR MXL 2559 | NA19653 AMR MXL 2560 | NA19654 AMR MXL 2561 | NA19655 AMR MXL 2562 | NA19656 AMR MXL 2563 | NA19657 AMR MXL 2564 | NA19658 AMR MXL 2565 | NA19659 AMR MXL 2566 | NA19662 AMR MXL 2567 | NA19678 AMR MXL 2568 | NA19679 AMR MXL 2569 | NA19680 AMR MXL 2570 | NA19681 AMR MXL 2571 | NA19682 AMR MXL 2572 | NA19683 AMR MXL 2573 | NA19660 AMR MXL 2574 | NA19661 AMR MXL 2575 | NA19684 AMR MXL 2576 | NA19685 AMR MXL 2577 | NA19686 AMR MXL 2578 | NA19663 AMR MXL 2579 | NA19664 AMR MXL 2580 | NA19665 AMR MXL 2581 | NA19716 AMR MXL 2582 | NA19717 AMR MXL 2583 | NA19718 AMR MXL 2584 | NA19719 AMR MXL 2585 | NA19720 AMR MXL 2586 | NA19721 AMR MXL 2587 | NA19722 AMR MXL 2588 | NA19723 AMR MXL 2589 | NA19724 AMR MXL 2590 | NA19725 AMR MXL 2591 | NA19726 AMR MXL 2592 | NA19727 AMR MXL 2593 | NA19728 AMR MXL 2594 | NA19729 AMR MXL 2595 | NA19730 AMR MXL 2596 | NA19731 AMR MXL 2597 | NA19732 AMR MXL 2598 | NA19733 AMR MXL 2599 | NA19746 AMR MXL 2600 | NA19747 AMR MXL 2601 | NA19748 AMR MXL 2602 | NA19749 AMR MXL 2603 | NA19750 AMR MXL 2604 | NA19751 AMR MXL 2605 | NA19755 AMR MXL 2606 | NA19756 AMR MXL 2607 | NA19757 AMR MXL 2608 | NA19758 AMR MXL 2609 | NA19759 AMR MXL 2610 | NA19760 AMR MXL 2611 | NA19761 AMR MXL 2612 | NA19762 AMR MXL 2613 | NA19763 AMR MXL 2614 | NA19770 AMR MXL 2615 | NA19771 AMR MXL 2616 | NA19772 AMR MXL 2617 | NA19785 AMR MXL 2618 | NA19786 AMR MXL 2619 | NA19787 AMR MXL 2620 | NA19773 AMR MXL 2621 | NA19774 AMR MXL 2622 | NA19775 AMR MXL 2623 | NA19776 AMR MXL 2624 | NA19777 AMR MXL 2625 | NA19778 AMR MXL 2626 | NA19779 AMR MXL 2627 | NA19780 AMR MXL 2628 | NA19781 AMR MXL 2629 | NA19782 AMR MXL 2630 | NA19783 AMR MXL 2631 | NA19784 AMR MXL 2632 | NA19788 AMR MXL 2633 | NA19789 AMR MXL 2634 | NA19790 AMR MXL 2635 | NA19794 AMR MXL 2636 | NA19795 AMR MXL 2637 | NA19796 AMR MXL 2638 | HG01565 AMR PEL 2639 | HG01566 AMR PEL 2640 | HG01567 AMR PEL 2641 | HG01571 AMR PEL 2642 | HG01572 AMR PEL 2643 | HG01573 AMR PEL 2644 | HG01577 AMR PEL 2645 | HG01578 AMR PEL 2646 | HG01579 AMR PEL 2647 | HG01917 AMR PEL 2648 | HG01918 AMR PEL 2649 | HG01919 AMR PEL 2650 | HG01892 AMR PEL 2651 | HG01893 AMR PEL 2652 | HG01898 AMR PEL 2653 | HG01920 AMR PEL 2654 | HG01921 AMR PEL 2655 | HG01922 AMR PEL 2656 | HG01923 AMR PEL 2657 | HG01924 AMR PEL 2658 | HG01925 AMR PEL 2659 | HG01926 AMR PEL 2660 | HG01927 AMR PEL 2661 | HG01928 AMR PEL 2662 | HG01932 AMR PEL 2663 | HG01933 AMR PEL 2664 | HG01934 AMR PEL 2665 | HG01935 AMR PEL 2666 | HG01936 AMR PEL 2667 | HG01937 AMR PEL 2668 | HG01938 AMR PEL 2669 | HG01939 AMR PEL 2670 | HG01940 AMR PEL 2671 | HG01941 AMR PEL 2672 | HG01942 AMR PEL 2673 | HG01943 AMR PEL 2674 | HG01944 AMR PEL 2675 | HG01945 AMR PEL 2676 | HG01946 AMR PEL 2677 | HG01947 AMR PEL 2678 | HG01948 AMR PEL 2679 | HG01949 AMR PEL 2680 | HG01950 AMR PEL 2681 | HG01951 AMR PEL 2682 | HG01952 AMR PEL 2683 | HG01953 AMR PEL 2684 | HG01954 AMR PEL 2685 | HG01955 AMR PEL 2686 | HG01967 AMR PEL 2687 | HG01968 AMR PEL 2688 | HG01969 AMR PEL 2689 | HG01970 AMR PEL 2690 | HG01971 AMR PEL 2691 | HG01972 AMR PEL 2692 | HG01961 AMR PEL 2693 | HG01965 AMR PEL 2694 | HG01973 AMR PEL 2695 | HG01974 AMR PEL 2696 | HG01975 AMR PEL 2697 | HG01976 AMR PEL 2698 | HG01977 AMR PEL 2699 | HG01978 AMR PEL 2700 | HG01979 AMR PEL 2701 | HG01980 AMR PEL 2702 | HG01981 AMR PEL 2703 | HG01982 AMR PEL 2704 | HG01983 AMR PEL 2705 | HG01984 AMR PEL 2706 | HG01991 AMR PEL 2707 | HG01992 AMR PEL 2708 | HG01993 AMR PEL 2709 | HG01995 AMR PEL 2710 | HG01997 AMR PEL 2711 | HG01998 AMR PEL 2712 | HG02008 AMR PEL 2713 | HG02002 AMR PEL 2714 | HG02003 AMR PEL 2715 | HG02004 AMR PEL 2716 | HG02006 AMR PEL 2717 | HG02089 AMR PEL 2718 | HG02090 AMR PEL 2719 | HG02091 AMR PEL 2720 | HG02102 AMR PEL 2721 | HG02104 AMR PEL 2722 | HG02105 AMR PEL 2723 | HG02106 AMR PEL 2724 | HG02146 AMR PEL 2725 | HG02147 AMR PEL 2726 | HG02148 AMR PEL 2727 | HG02150 AMR PEL 2728 | HG02252 AMR PEL 2729 | HG02253 AMR PEL 2730 | HG02254 AMR PEL 2731 | HG02259 AMR PEL 2732 | HG02260 AMR PEL 2733 | HG02261 AMR PEL 2734 | HG02262 AMR PEL 2735 | HG02265 AMR PEL 2736 | HG02266 AMR PEL 2737 | HG02267 AMR PEL 2738 | HG02271 AMR PEL 2739 | HG02272 AMR PEL 2740 | HG02273 AMR PEL 2741 | HG02274 AMR PEL 2742 | HG02275 AMR PEL 2743 | HG02276 AMR PEL 2744 | HG02277 AMR PEL 2745 | HG02278 AMR PEL 2746 | HG02279 AMR PEL 2747 | HG02285 AMR PEL 2748 | HG02286 AMR PEL 2749 | HG02287 AMR PEL 2750 | HG02288 AMR PEL 2751 | HG02291 AMR PEL 2752 | HG02292 AMR PEL 2753 | HG02293 AMR PEL 2754 | HG02298 AMR PEL 2755 | HG02299 AMR PEL 2756 | HG02300 AMR PEL 2757 | HG02301 AMR PEL 2758 | HG02302 AMR PEL 2759 | HG02303 AMR PEL 2760 | HG02304 AMR PEL 2761 | HG02312 AMR PEL 2762 | HG02344 AMR PEL 2763 | HG02345 AMR PEL 2764 | HG02347 AMR PEL 2765 | HG02348 AMR PEL 2766 | HG02415 AMR PEL 2767 | HG02425 AMR PEL 2768 | HG03022 SAS PJL 2769 | HG01583 SAS PJL 2770 | HG01586 SAS PJL 2771 | HG01589 SAS PJL 2772 | HG01590 SAS PJL 2773 | HG01593 SAS PJL 2774 | HG01594 SAS PJL 2775 | HG02490 SAS PJL 2776 | HG02491 SAS PJL 2777 | HG02492 SAS PJL 2778 | HG02493 SAS PJL 2779 | HG02494 SAS PJL 2780 | HG02495 SAS PJL 2781 | HG02597 SAS PJL 2782 | HG02599 SAS PJL 2783 | HG02600 SAS PJL 2784 | HG02601 SAS PJL 2785 | HG02602 SAS PJL 2786 | HG02603 SAS PJL 2787 | HG02604 SAS PJL 2788 | HG02605 SAS PJL 2789 | HG02648 SAS PJL 2790 | HG02649 SAS PJL 2791 | HG02650 SAS PJL 2792 | HG02651 SAS PJL 2793 | HG02652 SAS PJL 2794 | HG02653 SAS PJL 2795 | HG02654 SAS PJL 2796 | HG02655 SAS PJL 2797 | HG02656 SAS PJL 2798 | HG02657 SAS PJL 2799 | HG02658 SAS PJL 2800 | HG02659 SAS PJL 2801 | HG02660 SAS PJL 2802 | HG02661 SAS PJL 2803 | HG02662 SAS PJL 2804 | HG02681 SAS PJL 2805 | HG02682 SAS PJL 2806 | HG02683 SAS PJL 2807 | HG02684 SAS PJL 2808 | HG02685 SAS PJL 2809 | HG02686 SAS PJL 2810 | HG02687 SAS PJL 2811 | HG02688 SAS PJL 2812 | HG02689 SAS PJL 2813 | HG02690 SAS PJL 2814 | HG02691 SAS PJL 2815 | HG02692 SAS PJL 2816 | HG02694 SAS PJL 2817 | HG02696 SAS PJL 2818 | HG02697 SAS PJL 2819 | HG02698 SAS PJL 2820 | HG02699 SAS PJL 2821 | HG02700 SAS PJL 2822 | HG02701 SAS PJL 2823 | HG02724 SAS PJL 2824 | HG02725 SAS PJL 2825 | HG02726 SAS PJL 2826 | HG02727 SAS PJL 2827 | HG02728 SAS PJL 2828 | HG02729 SAS PJL 2829 | HG02731 SAS PJL 2830 | HG02733 SAS PJL 2831 | HG02734 SAS PJL 2832 | HG02735 SAS PJL 2833 | HG02736 SAS PJL 2834 | HG02737 SAS PJL 2835 | HG02738 SAS PJL 2836 | HG02774 SAS PJL 2837 | HG02775 SAS PJL 2838 | HG02776 SAS PJL 2839 | HG02778 SAS PJL 2840 | HG02779 SAS PJL 2841 | HG02780 SAS PJL 2842 | HG02781 SAS PJL 2843 | HG02783 SAS PJL 2844 | HG02784 SAS PJL 2845 | HG02785 SAS PJL 2846 | HG02786 SAS PJL 2847 | HG02787 SAS PJL 2848 | HG02788 SAS PJL 2849 | HG02789 SAS PJL 2850 | HG02790 SAS PJL 2851 | HG02791 SAS PJL 2852 | HG02792 SAS PJL 2853 | HG02793 SAS PJL 2854 | HG02794 SAS PJL 2855 | HG03015 SAS PJL 2856 | HG03016 SAS PJL 2857 | HG03017 SAS PJL 2858 | HG03018 SAS PJL 2859 | HG03019 SAS PJL 2860 | HG03021 SAS PJL 2861 | HG03023 SAS PJL 2862 | HG03228 SAS PJL 2863 | HG03229 SAS PJL 2864 | HG03230 SAS PJL 2865 | HG03234 SAS PJL 2866 | HG03235 SAS PJL 2867 | HG03236 SAS PJL 2868 | HG03237 SAS PJL 2869 | HG03238 SAS PJL 2870 | HG03239 SAS PJL 2871 | HG03487 SAS PJL 2872 | HG03488 SAS PJL 2873 | HG03489 SAS PJL 2874 | HG03490 SAS PJL 2875 | HG03491 SAS PJL 2876 | HG03492 SAS PJL 2877 | HG03618 SAS PJL 2878 | HG03619 SAS PJL 2879 | HG03620 SAS PJL 2880 | HG03621 SAS PJL 2881 | HG03624 SAS PJL 2882 | HG03625 SAS PJL 2883 | HG03626 SAS PJL 2884 | HG03629 SAS PJL 2885 | HG03631 SAS PJL 2886 | HG03633 SAS PJL 2887 | HG03634 SAS PJL 2888 | HG03635 SAS PJL 2889 | HG03636 SAS PJL 2890 | HG03638 SAS PJL 2891 | HG03639 SAS PJL 2892 | HG03640 SAS PJL 2893 | HG03641 SAS PJL 2894 | HG03649 SAS PJL 2895 | HG03650 SAS PJL 2896 | HG03651 SAS PJL 2897 | HG03652 SAS PJL 2898 | HG03653 SAS PJL 2899 | HG03654 SAS PJL 2900 | HG03656 SAS PJL 2901 | HG03657 SAS PJL 2902 | HG03660 SAS PJL 2903 | HG03663 SAS PJL 2904 | HG03667 SAS PJL 2905 | HG03668 SAS PJL 2906 | HG03669 SAS PJL 2907 | HG03699 SAS PJL 2908 | HG03700 SAS PJL 2909 | HG03701 SAS PJL 2910 | HG03702 SAS PJL 2911 | HG03703 SAS PJL 2912 | HG03704 SAS PJL 2913 | HG03705 SAS PJL 2914 | HG03706 SAS PJL 2915 | HG03707 SAS PJL 2916 | HG03708 SAS PJL 2917 | HG03709 SAS PJL 2918 | HG03710 SAS PJL 2919 | HG03761 SAS PJL 2920 | HG03762 SAS PJL 2921 | HG03763 SAS PJL 2922 | HG03765 SAS PJL 2923 | HG03766 SAS PJL 2924 | HG03767 SAS PJL 2925 | HG03769 SAS PJL 2926 | HG00551 AMR PUR 2927 | HG00552 AMR PUR 2928 | HG01075 AMR PUR 2929 | HG00553 AMR PUR 2930 | HG00554 AMR PUR 2931 | HG00555 AMR PUR 2932 | HG00637 AMR PUR 2933 | HG00638 AMR PUR 2934 | HG00639 AMR PUR 2935 | HG00640 AMR PUR 2936 | HG00641 AMR PUR 2937 | HG00642 AMR PUR 2938 | HG00731 AMR PUR 2939 | HG00732 AMR PUR 2940 | HG00733 AMR PUR 2941 | HG00734 AMR PUR 2942 | HG00735 AMR PUR 2943 | HG01047 AMR PUR 2944 | HG00736 AMR PUR 2945 | HG00737 AMR PUR 2946 | HG00738 AMR PUR 2947 | HG01048 AMR PUR 2948 | HG01049 AMR PUR 2949 | HG01050 AMR PUR 2950 | HG00739 AMR PUR 2951 | HG00740 AMR PUR 2952 | HG00741 AMR PUR 2953 | HG00742 AMR PUR 2954 | HG00743 AMR PUR 2955 | HG01051 AMR PUR 2956 | HG01052 AMR PUR 2957 | HG01053 AMR PUR 2958 | HG01054 AMR PUR 2959 | HG01055 AMR PUR 2960 | HG01056 AMR PUR 2961 | HG01058 AMR PUR 2962 | HG01060 AMR PUR 2963 | HG01061 AMR PUR 2964 | HG01062 AMR PUR 2965 | HG01063 AMR PUR 2966 | HG01064 AMR PUR 2967 | HG01066 AMR PUR 2968 | HG01067 AMR PUR 2969 | HG01068 AMR PUR 2970 | HG01069 AMR PUR 2971 | HG01070 AMR PUR 2972 | HG01071 AMR PUR 2973 | HG01072 AMR PUR 2974 | HG01073 AMR PUR 2975 | HG01074 AMR PUR 2976 | HG01101 AMR PUR 2977 | HG01102 AMR PUR 2978 | HG01103 AMR PUR 2979 | HG01077 AMR PUR 2980 | HG01079 AMR PUR 2981 | HG01080 AMR PUR 2982 | HG01081 AMR PUR 2983 | HG01241 AMR PUR 2984 | HG01242 AMR PUR 2985 | HG01243 AMR PUR 2986 | HG01170 AMR PUR 2987 | HG01171 AMR PUR 2988 | HG01172 AMR PUR 2989 | HG01104 AMR PUR 2990 | HG01105 AMR PUR 2991 | HG01106 AMR PUR 2992 | HG01085 AMR PUR 2993 | HG01086 AMR PUR 2994 | HG01087 AMR PUR 2995 | HG01088 AMR PUR 2996 | HG01089 AMR PUR 2997 | HG01090 AMR PUR 2998 | HG01082 AMR PUR 2999 | HG01083 AMR PUR 3000 | HG01084 AMR PUR 3001 | HG01094 AMR PUR 3002 | HG01095 AMR PUR 3003 | HG01096 AMR PUR 3004 | HG01092 AMR PUR 3005 | HG01097 AMR PUR 3006 | HG01098 AMR PUR 3007 | HG01099 AMR PUR 3008 | HG01100 AMR PUR 3009 | HG01247 AMR PUR 3010 | HG01248 AMR PUR 3011 | HG01107 AMR PUR 3012 | HG01108 AMR PUR 3013 | HG01109 AMR PUR 3014 | HG01110 AMR PUR 3015 | HG01111 AMR PUR 3016 | HG01249 AMR PUR 3017 | HG01161 AMR PUR 3018 | HG01162 AMR PUR 3019 | HG01173 AMR PUR 3020 | HG01174 AMR PUR 3021 | HG01175 AMR PUR 3022 | HG01164 AMR PUR 3023 | HG01187 AMR PUR 3024 | HG01188 AMR PUR 3025 | HG01189 AMR PUR 3026 | HG01167 AMR PUR 3027 | HG01168 AMR PUR 3028 | HG01169 AMR PUR 3029 | HG01190 AMR PUR 3030 | HG01191 AMR PUR 3031 | HG01192 AMR PUR 3032 | HG01176 AMR PUR 3033 | HG01177 AMR PUR 3034 | HG01178 AMR PUR 3035 | HG01182 AMR PUR 3036 | HG01183 AMR PUR 3037 | HG01184 AMR PUR 3038 | HG01195 AMR PUR 3039 | HG01197 AMR PUR 3040 | HG01198 AMR PUR 3041 | HG01199 AMR PUR 3042 | HG01200 AMR PUR 3043 | HG01204 AMR PUR 3044 | HG01205 AMR PUR 3045 | HG01206 AMR PUR 3046 | HG01286 AMR PUR 3047 | HG01301 AMR PUR 3048 | HG01302 AMR PUR 3049 | HG01303 AMR PUR 3050 | HG01305 AMR PUR 3051 | HG01308 AMR PUR 3052 | HG01311 AMR PUR 3053 | HG01312 AMR PUR 3054 | HG01411 AMR PUR 3055 | HG01322 AMR PUR 3056 | HG01323 AMR PUR 3057 | HG01324 AMR PUR 3058 | HG01325 AMR PUR 3059 | HG01326 AMR PUR 3060 | HG01327 AMR PUR 3061 | HG01392 AMR PUR 3062 | HG01393 AMR PUR 3063 | HG01394 AMR PUR 3064 | HG01395 AMR PUR 3065 | HG01396 AMR PUR 3066 | HG01397 AMR PUR 3067 | HG01398 AMR PUR 3068 | HG01402 AMR PUR 3069 | HG01403 AMR PUR 3070 | HG01404 AMR PUR 3071 | HG01405 AMR PUR 3072 | HG01412 AMR PUR 3073 | HG01413 AMR PUR 3074 | HG01414 AMR PUR 3075 | HG01415 AMR PUR 3076 | HG03646 SAS STU 3077 | HG03645 SAS STU 3078 | HG03643 SAS STU 3079 | HG03644 SAS STU 3080 | HG03642 SAS STU 3081 | HG03679 SAS STU 3082 | HG04204 SAS STU 3083 | HG04215 SAS STU 3084 | HG03999 SAS STU 3085 | HG03672 SAS STU 3086 | HG03680 SAS STU 3087 | HG03673 SAS STU 3088 | HG03948 SAS STU 3089 | HG04040 SAS STU 3090 | HG03681 SAS STU 3091 | HG03682 SAS STU 3092 | HG03683 SAS STU 3093 | HG03684 SAS STU 3094 | HG03685 SAS STU 3095 | HG03951 SAS STU 3096 | HG03686 SAS STU 3097 | HG03687 SAS STU 3098 | HG03688 SAS STU 3099 | HG03885 SAS STU 3100 | HG03886 SAS STU 3101 | HG03689 SAS STU 3102 | HG03690 SAS STU 3103 | HG03691 SAS STU 3104 | HG03692 SAS STU 3105 | HG03693 SAS STU 3106 | HG03995 SAS STU 3107 | HG03694 SAS STU 3108 | HG03695 SAS STU 3109 | HG03696 SAS STU 3110 | HG03835 SAS STU 3111 | HG03840 SAS STU 3112 | HG03697 SAS STU 3113 | HG03698 SAS STU 3114 | HG03711 SAS STU 3115 | HG03740 SAS STU 3116 | HG03741 SAS STU 3117 | HG03884 SAS STU 3118 | HG03733 SAS STU 3119 | HG03736 SAS STU 3120 | HG03738 SAS STU 3121 | HG03743 SAS STU 3122 | HG03757 SAS STU 3123 | HG03744 SAS STU 3124 | HG03745 SAS STU 3125 | HG03746 SAS STU 3126 | HG03888 SAS STU 3127 | HG03750 SAS STU 3128 | HG03754 SAS STU 3129 | HG03755 SAS STU 3130 | HG03756 SAS STU 3131 | HG03752 SAS STU 3132 | HG03753 SAS STU 3133 | HG03760 SAS STU 3134 | HG03836 SAS STU 3135 | HG03842 SAS STU 3136 | HG03844 SAS STU 3137 | HG04033 SAS STU 3138 | HG03838 SAS STU 3139 | HG04006 SAS STU 3140 | HG03837 SAS STU 3141 | HG03846 SAS STU 3142 | HG03845 SAS STU 3143 | HG03849 SAS STU 3144 | HG03850 SAS STU 3145 | HG03847 SAS STU 3146 | HG03848 SAS STU 3147 | HG03851 SAS STU 3148 | HG03858 SAS STU 3149 | HG03950 SAS STU 3150 | HG03857 SAS STU 3151 | HG03982 SAS STU 3152 | HG03856 SAS STU 3153 | HG03854 SAS STU 3154 | HG04035 SAS STU 3155 | HG03887 SAS STU 3156 | HG03986 SAS STU 3157 | HG03895 SAS STU 3158 | HG03890 SAS STU 3159 | HG03894 SAS STU 3160 | HG03896 SAS STU 3161 | HG04115 SAS STU 3162 | HG03898 SAS STU 3163 | HG03899 SAS STU 3164 | HG03897 SAS STU 3165 | HG03900 SAS STU 3166 | HG03943 SAS STU 3167 | HG03944 SAS STU 3168 | HG03992 SAS STU 3169 | HG04036 SAS STU 3170 | HG04067 SAS STU 3171 | HG03945 SAS STU 3172 | HG03947 SAS STU 3173 | HG03955 SAS STU 3174 | HG04114 SAS STU 3175 | HG03953 SAS STU 3176 | HG03949 SAS STU 3177 | HG03990 SAS STU 3178 | HG03991 SAS STU 3179 | HG03985 SAS STU 3180 | HG03988 SAS STU 3181 | HG03989 SAS STU 3182 | HG03998 SAS STU 3183 | HG04003 SAS STU 3184 | HG04029 SAS STU 3185 | HG04197 SAS STU 3186 | HG04037 SAS STU 3187 | HG04038 SAS STU 3188 | HG04042 SAS STU 3189 | HG04039 SAS STU 3190 | HG04047 SAS STU 3191 | HG04106 SAS STU 3192 | HG04107 SAS STU 3193 | HG04208 SAS STU 3194 | HG04210 SAS STU 3195 | HG04075 SAS STU 3196 | HG04122 SAS STU 3197 | HG04127 SAS STU 3198 | HG04099 SAS STU 3199 | HG04100 SAS STU 3200 | HG04199 SAS STU 3201 | HG04227 SAS STU 3202 | HG04228 SAS STU 3203 | HG04229 SAS STU 3204 | NA20502 EUR TSI 3205 | NA20503 EUR TSI 3206 | NA20504 EUR TSI 3207 | NA20505 EUR TSI 3208 | NA20506 EUR TSI 3209 | NA20507 EUR TSI 3210 | NA20508 EUR TSI 3211 | NA20509 EUR TSI 3212 | NA20510 EUR TSI 3213 | NA20511 EUR TSI 3214 | NA20512 EUR TSI 3215 | NA20513 EUR TSI 3216 | NA20514 EUR TSI 3217 | NA20515 EUR TSI 3218 | NA20516 EUR TSI 3219 | NA20517 EUR TSI 3220 | NA20518 EUR TSI 3221 | NA20519 EUR TSI 3222 | NA20520 EUR TSI 3223 | NA20521 EUR TSI 3224 | NA20522 EUR TSI 3225 | NA20524 EUR TSI 3226 | NA20525 EUR TSI 3227 | NA20526 EUR TSI 3228 | NA20527 EUR TSI 3229 | NA20528 EUR TSI 3230 | NA20529 EUR TSI 3231 | NA20530 EUR TSI 3232 | NA20531 EUR TSI 3233 | NA20532 EUR TSI 3234 | NA20533 EUR TSI 3235 | NA20534 EUR TSI 3236 | NA20535 EUR TSI 3237 | NA20536 EUR TSI 3238 | NA20537 EUR TSI 3239 | NA20538 EUR TSI 3240 | NA20539 EUR TSI 3241 | NA20540 EUR TSI 3242 | NA20541 EUR TSI 3243 | NA20542 EUR TSI 3244 | NA20543 EUR TSI 3245 | NA20544 EUR TSI 3246 | NA20581 EUR TSI 3247 | NA20582 EUR TSI 3248 | NA20585 EUR TSI 3249 | NA20586 EUR TSI 3250 | NA20587 EUR TSI 3251 | NA20588 EUR TSI 3252 | NA20589 EUR TSI 3253 | NA20752 EUR TSI 3254 | NA20753 EUR TSI 3255 | NA20754 EUR TSI 3256 | NA20755 EUR TSI 3257 | NA20756 EUR TSI 3258 | NA20757 EUR TSI 3259 | NA20758 EUR TSI 3260 | NA20759 EUR TSI 3261 | NA20760 EUR TSI 3262 | NA20761 EUR TSI 3263 | NA20762 EUR TSI 3264 | NA20763 EUR TSI 3265 | NA20764 EUR TSI 3266 | NA20765 EUR TSI 3267 | NA20766 EUR TSI 3268 | NA20767 EUR TSI 3269 | NA20768 EUR TSI 3270 | NA20769 EUR TSI 3271 | NA20770 EUR TSI 3272 | NA20771 EUR TSI 3273 | NA20772 EUR TSI 3274 | NA20773 EUR TSI 3275 | NA20774 EUR TSI 3276 | NA20775 EUR TSI 3277 | NA20778 EUR TSI 3278 | NA20783 EUR TSI 3279 | NA20785 EUR TSI 3280 | NA20786 EUR TSI 3281 | NA20787 EUR TSI 3282 | NA20790 EUR TSI 3283 | NA20792 EUR TSI 3284 | NA20795 EUR TSI 3285 | NA20796 EUR TSI 3286 | NA20797 EUR TSI 3287 | NA20798 EUR TSI 3288 | NA20799 EUR TSI 3289 | NA20800 EUR TSI 3290 | NA20801 EUR TSI 3291 | NA20802 EUR TSI 3292 | NA20803 EUR TSI 3293 | NA20804 EUR TSI 3294 | NA20805 EUR TSI 3295 | NA20806 EUR TSI 3296 | NA20807 EUR TSI 3297 | NA20808 EUR TSI 3298 | NA20809 EUR TSI 3299 | NA20810 EUR TSI 3300 | NA20811 EUR TSI 3301 | NA20812 EUR TSI 3302 | NA20813 EUR TSI 3303 | NA20814 EUR TSI 3304 | NA20815 EUR TSI 3305 | NA20816 EUR TSI 3306 | NA20818 EUR TSI 3307 | NA20819 EUR TSI 3308 | NA20821 EUR TSI 3309 | NA20822 EUR TSI 3310 | NA20826 EUR TSI 3311 | NA20827 EUR TSI 3312 | NA20828 EUR TSI 3313 | NA20829 EUR TSI 3314 | NA20831 EUR TSI 3315 | NA20832 EUR TSI 3316 | NA18484 AFR YRI 3317 | NA18486 AFR YRI 3318 | NA18488 AFR YRI 3319 | NA18485 AFR YRI 3320 | NA18487 AFR YRI 3321 | NA18489 AFR YRI 3322 | NA18497 AFR YRI 3323 | NA18498 AFR YRI 3324 | NA18499 AFR YRI 3325 | NA18500 AFR YRI 3326 | NA18501 AFR YRI 3327 | NA18502 AFR YRI 3328 | NA18503 AFR YRI 3329 | NA18504 AFR YRI 3330 | NA18505 AFR YRI 3331 | NA19107 AFR YRI 3332 | NA19108 AFR YRI 3333 | NA19109 AFR YRI 3334 | NA18867 AFR YRI 3335 | NA18868 AFR YRI 3336 | NA18869 AFR YRI 3337 | NA18506 AFR YRI 3338 | NA18507 AFR YRI 3339 | NA18508 AFR YRI 3340 | NA18509 AFR YRI 3341 | NA18511 AFR YRI 3342 | NA18510 AFR YRI 3343 | NA18858 AFR YRI 3344 | NA18859 AFR YRI 3345 | NA18860 AFR YRI 3346 | NA18515 AFR YRI 3347 | NA18516 AFR YRI 3348 | NA18517 AFR YRI 3349 | NA18518 AFR YRI 3350 | NA18519 AFR YRI 3351 | NA18520 AFR YRI 3352 | NA18521 AFR YRI 3353 | NA18522 AFR YRI 3354 | NA18523 AFR YRI 3355 | NA18870 AFR YRI 3356 | NA18871 AFR YRI 3357 | NA18872 AFR YRI 3358 | NA18852 AFR YRI 3359 | NA18853 AFR YRI 3360 | NA18854 AFR YRI 3361 | NA18873 AFR YRI 3362 | NA18874 AFR YRI 3363 | NA18875 AFR YRI 3364 | NA18876 AFR YRI 3365 | NA18877 AFR YRI 3366 | NA18906 AFR YRI 3367 | NA18878 AFR YRI 3368 | NA18879 AFR YRI 3369 | NA18881 AFR YRI 3370 | NA18855 AFR YRI 3371 | NA18856 AFR YRI 3372 | NA18857 AFR YRI 3373 | NA18861 AFR YRI 3374 | NA18862 AFR YRI 3375 | NA18863 AFR YRI 3376 | NA19105 AFR YRI 3377 | NA18907 AFR YRI 3378 | NA19252 AFR YRI 3379 | NA18908 AFR YRI 3380 | NA18864 AFR YRI 3381 | NA18865 AFR YRI 3382 | NA18909 AFR YRI 3383 | NA18910 AFR YRI 3384 | NA18911 AFR YRI 3385 | NA18912 AFR YRI 3386 | NA18913 AFR YRI 3387 | NA18914 AFR YRI 3388 | NA18915 AFR YRI 3389 | NA18916 AFR YRI 3390 | NA18917 AFR YRI 3391 | NA18930 AFR YRI 3392 | NA18923 AFR YRI 3393 | NA18924 AFR YRI 3394 | NA18925 AFR YRI 3395 | NA19197 AFR YRI 3396 | NA19198 AFR YRI 3397 | NA19199 AFR YRI 3398 | NA18933 AFR YRI 3399 | NA18934 AFR YRI 3400 | NA18935 AFR YRI 3401 | NA19184 AFR YRI 3402 | NA19185 AFR YRI 3403 | NA19186 AFR YRI 3404 | NA19092 AFR YRI 3405 | NA19093 AFR YRI 3406 | NA19094 AFR YRI 3407 | NA19095 AFR YRI 3408 | NA19096 AFR YRI 3409 | NA19097 AFR YRI 3410 | NA19101 AFR YRI 3411 | NA19102 AFR YRI 3412 | NA19103 AFR YRI 3413 | NA19137 AFR YRI 3414 | NA19138 AFR YRI 3415 | NA19139 AFR YRI 3416 | NA19175 AFR YRI 3417 | NA19176 AFR YRI 3418 | NA19177 AFR YRI 3419 | NA19200 AFR YRI 3420 | NA19201 AFR YRI 3421 | NA19202 AFR YRI 3422 | NA19171 AFR YRI 3423 | NA19172 AFR YRI 3424 | NA19173 AFR YRI 3425 | NA19203 AFR YRI 3426 | NA19204 AFR YRI 3427 | NA19205 AFR YRI 3428 | NA19166 AFR YRI 3429 | NA19209 AFR YRI 3430 | NA19210 AFR YRI 3431 | NA19211 AFR YRI 3432 | NA19206 AFR YRI 3433 | NA19207 AFR YRI 3434 | NA19208 AFR YRI 3435 | NA19159 AFR YRI 3436 | NA19160 AFR YRI 3437 | NA19161 AFR YRI 3438 | NA19224 AFR YRI 3439 | NA19225 AFR YRI 3440 | NA19226 AFR YRI 3441 | NA19221 AFR YRI 3442 | NA19222 AFR YRI 3443 | NA19223 AFR YRI 3444 | NA19116 AFR YRI 3445 | NA19119 AFR YRI 3446 | NA19120 AFR YRI 3447 | NA19121 AFR YRI 3448 | NA19122 AFR YRI 3449 | NA19123 AFR YRI 3450 | NA19254 AFR YRI 3451 | NA19140 AFR YRI 3452 | NA19141 AFR YRI 3453 | NA19142 AFR YRI 3454 | NA19152 AFR YRI 3455 | NA19153 AFR YRI 3456 | NA19154 AFR YRI 3457 | NA19149 AFR YRI 3458 | NA19150 AFR YRI 3459 | NA19151 AFR YRI 3460 | NA19143 AFR YRI 3461 | NA19144 AFR YRI 3462 | NA19145 AFR YRI 3463 | NA19146 AFR YRI 3464 | NA19147 AFR YRI 3465 | NA19148 AFR YRI 3466 | NA19127 AFR YRI 3467 | NA19128 AFR YRI 3468 | NA19129 AFR YRI 3469 | NA19113 AFR YRI 3470 | NA19114 AFR YRI 3471 | NA19115 AFR YRI 3472 | NA19104 AFR YRI 3473 | NA19256 AFR YRI 3474 | NA19257 AFR YRI 3475 | NA19258 AFR YRI 3476 | NA19117 AFR YRI 3477 | NA19118 AFR YRI 3478 | NA19174 AFR YRI 3479 | NA19130 AFR YRI 3480 | NA19131 AFR YRI 3481 | NA19132 AFR YRI 3482 | NA19098 AFR YRI 3483 | NA19099 AFR YRI 3484 | NA19100 AFR YRI 3485 | NA19195 AFR YRI 3486 | NA19196 AFR YRI 3487 | NA19213 AFR YRI 3488 | NA19214 AFR YRI 3489 | NA19215 AFR YRI 3490 | NA19189 AFR YRI 3491 | NA19190 AFR YRI 3492 | NA19191 AFR YRI 3493 | NA19235 AFR YRI 3494 | NA19236 AFR YRI 3495 | NA19237 AFR YRI 3496 | NA19238 AFR YRI 3497 | NA19239 AFR YRI 3498 | NA19240 AFR YRI 3499 | NA19247 AFR YRI 3500 | NA19248 AFR YRI 3501 | NA19249 AFR YRI 3502 | -------------------------------------------------------------------------------- /resources/GRCh38.chrom_map: -------------------------------------------------------------------------------- 1 | 1 chr1 2 | 2 chr2 3 | 3 chr3 4 | 4 chr4 5 | 5 chr5 6 | 6 chr6 7 | 7 chr7 8 | 8 chr8 9 | 9 chr9 10 | 10 chr10 11 | 11 chr11 12 | 12 chr12 13 | 13 chr13 14 | 14 chr14 15 | 15 chr15 16 | 16 chr16 17 | 17 chr17 18 | 18 chr18 19 | 19 chr19 20 | 20 chr20 21 | 21 chr21 22 | 22 chr22 23 | X chrX 24 | Y chrY 25 | M chrM 26 | -------------------------------------------------------------------------------- /resources/GRCh38.length_map: -------------------------------------------------------------------------------- 1 | chr1 248956422 2 | chr2 242193529 3 | chr3 198295559 4 | chr4 190214555 5 | chr5 181538259 6 | chr6 170805979 7 | chr7 159345973 8 | chr8 145138636 9 | chr9 138394717 10 | chr10 133797422 11 | chr11 135086622 12 | chr12 133275309 13 | chr13 114364328 14 | chr14 107043718 15 | chr15 101991189 16 | chr16 90338345 17 | chr17 83257441 18 | chr18 80373285 19 | chr19 58617616 20 | chr20 64444167 21 | chr21 46709983 22 | chr22 50818468 23 | chrX 156040895 24 | chrY 57227415 25 | chrM 16569 26 | -------------------------------------------------------------------------------- /snakemake/README.md: -------------------------------------------------------------------------------- 1 | # Reference Flow parameters 2 | 3 | ### Data-dependent parameters 4 | For users interested in the default RandFlow-LD pipeline based on GRCh38 and 1KG GRCh38 call set, 5 | changing parameters in this section is sufficient. 6 | 7 | - `READS1` : reads to align 8 | 9 | - `INDIV` : name of the tested sample. This parameter is used to name directories and files. 10 | 11 | - `DIR` : directory where the outputs will be put 12 | 13 | - `THREADS` : max number of threads used for each rule. 14 | This is different from the number specified by `snakemake -j `, which is the number of threads for the entire Snakemake pipeline. I.e., there can be multiple programs running with `THREADS` until the sum reaches ``. 15 | 16 | 17 | ### Pipeline options 18 | - `USE_PREBUILT` : whether to use the RandFlow-LD pre-built indexes 19 | 20 | - `SORT_SAM` : whether to sort the final SAM output 21 | 22 | - `ALN_MAPQ_THRSD` : mapping quality cutoff to split read into committed and deferred groups 23 | 24 | 25 | ### Dataset parameters 26 | - `GENOME` : reference genome; usually a standard GRC genome 27 | 28 | - `DIR_VCF` : directory where the VCFs are put 29 | 30 | - `VCF_PREFIX`, `VCF_SUFFIX` : prefix and suffix of a VCF file. 31 | For example, we set `VCF_PREFIX` to "ALL.chr" and 32 | `VCF_SUFFIX` to ".shapeit2_integrated_snvindels_v2a_27022019.GRCh38.phased.vcf.gz" 33 | for the 1KG GRCh38 call set, where the VCFs are named as 34 | "ALL.chr.shapeit2_integrated_snvindels_v2a_27022019.GRCh38.phased.vcf.gz" 35 | 36 | - `CHROM` : a list specifying chromosomes included. 37 | Reference flow finds VCFs corresponding to the specified chromosomes and builds population genomes for them. 38 | 39 | - `GROUP` : population groups included to build second pass population genomes 40 | 41 | 42 | ### Second-pass genome parameters 43 | - `POP_THRSD` : allele frequency threshold. 0: do not filter by frequency; 0.5: only use major alleles 44 | 45 | - `POP_STOCHASTIC` : whether to use stochastic update. 1: stochastic update; 0: deterministic 46 | 47 | - `POP_BLOCK_SIZE` : size of phase-preserving LD blocks. 48 | Set to 1 when doing independent sampling. Only applicable to call sets providing phase information. 49 | 50 | - `POP_USE_LD` : whether to include local LD information. 1: phase-preserving; 0: independent-sampling 51 | 52 | We provided configurations for `MajorFlow`, `RandFlow` and `RandFlow-LD` in the file. 53 | Users may uncomment the setting of interest. 54 | 55 | 56 | ### Individual-population mappings 57 | - `FAMILY` : mapping between individuals and populations 58 | - `SPOP` : mapping between populations and super populations 59 | 60 | Currently we only support call sets from the 1000 Genomes Project, including phase-3 and GRCh38 call sets. 61 | Default mappings are provided under `../resources` directory. 62 | For users interested in using other call sets, 63 | changes in the pipeline may be needed depending on the labelling and VCF format adopted by the call set. 64 | `FAMILY` and `SPOP` are used under the `prepare_pop_indiv` rule in `shared/prepare_pop_genome.Snakefile` and 65 | the `build_dict_pop_to_spop` rule in `shared/functions.Snakefile`. 66 | 67 | 68 | ### Chromosomal mappings 69 | - `LENGTH_MAP` : lengths for each chromosome 70 | - `CHROM_MAP` : mapping from `1` to `chr1`, etc 71 | 72 | 73 | ### Paths of software 74 | - `BCFTOOLS` : we used bcftools 1.9-206-g4694164. 75 | There is a `bcftools consensus` issue with the major release so we used a development version. 76 | 77 | - `SAMTOOLS` : we used samtools v1.9 78 | 79 | - `LIFTOVER` : `../liftover/liftover` 80 | 81 | - `PYTHON`: we developed Reference flow using Python3.7 82 | 83 | - `DIR_SCRIPTS` : `../src` 84 | 85 | ### Randomness control 86 | - `RAND_SEED`: random seed used in the reference flow stochastic reference genome update process and for aligner 87 | -------------------------------------------------------------------------------- /snakemake/Snakefile: -------------------------------------------------------------------------------- 1 | import os 2 | import pandas as pd 3 | 4 | # configfile: 'test_pe.yaml' 5 | # configfile: 'test_se.yaml' 6 | # configfile: "config_local.yaml" 7 | configfile: "config.yaml" 8 | # configfile: "config_mouse.yaml" 9 | 10 | ''' Load from config ''' 11 | CHROM = config['CHROM'] 12 | INDIV = config['INDIV'] 13 | EXP_LABEL = config['EXP_LABEL'] 14 | ALN_MODE = config['ALN_MODE'] 15 | assert ALN_MODE in ['single-end', 'paired-end'] 16 | READS1 = config['READS1'] 17 | READS2 = config['READS2'] 18 | GROUP = config['GROUP'] 19 | POP_LEVEL = config['POP_LEVEL'] 20 | ALN_MAPQ_THRSD = config['ALN_MAPQ_THRSD'] 21 | POP_THRSD = config['POP_THRSD'] 22 | POP_STOCHASTIC = config['POP_STOCHASTIC'] 23 | POP_BLOCK_SIZE = config['POP_BLOCK_SIZE'] 24 | POP_USE_LD = config['POP_USE_LD'] 25 | 26 | USE_PREBUILT = config['USE_PREBUILT'] 27 | SORT_SAM = config['SORT_SAM'] 28 | 29 | DIR = config['DIR'] 30 | GENOME = config['GENOME'] 31 | DIR_VCF = config['DIR_VCF'] 32 | VCF_PREFIX = config['VCF_PREFIX'] 33 | VCF_SUFFIX = config['VCF_SUFFIX'] 34 | CHR_PREFIX = config['CHR_PREFIX'] 35 | LENGTH_MAP = config['LENGTH_MAP'] 36 | CHROM_MAP = config['CHROM_MAP'] 37 | 38 | FAMILY = config['FAMILY'] 39 | SPOP = config['SPOP'] 40 | BCFTOOLS = config['BCFTOOLS'] 41 | SAMTOOLS = config['SAMTOOLS'] 42 | LEVIOSAM = config['LEVIOSAM'] 43 | PYTHON = config['PYTHON'] 44 | DIR_SCRIPTS = config['DIR_SCRIPTS'] 45 | 46 | THREADS = config['THREADS'] 47 | RAND_SEED = config['RAND_SEED'] 48 | '''''' 49 | 50 | # Bowtie 2 index extensions 51 | IDX_ITEMS = ['1', '2', '3', '4', 'rev.1', 'rev.2'] 52 | 53 | # Prefixes and directory paths for major-allele reference contruction and indexing 54 | PREFIX_MAJOR_F = os.path.join(DIR, 'major/{CHROM}_filtered_major') 55 | PREFIX_MAJOR = os.path.join(DIR, 'major/chr{CHROM}_maj') 56 | DIR_MAJOR = os.path.join(DIR, 'major') 57 | 58 | # Prefixes and directory paths for population reference contruction and indexing 59 | DIR_POP_GENOME = os.path.join(DIR, 'pop_genome/') 60 | POP_DIRNAME = 'thrds{0}_S{1}_b{2}_ld{3}'.format(POP_THRSD, POP_STOCHASTIC, POP_BLOCK_SIZE, POP_USE_LD) 61 | WG_POP_GENOME_SUFFIX = EXP_LABEL + '-' + POP_LEVEL + '_{GROUP}_' + POP_DIRNAME 62 | DIR_POP_GENOME_BLOCK = os.path.join(DIR_POP_GENOME, POP_DIRNAME + '/') 63 | DIR_POP_GENOME_BLOCK_IDX = os.path.join(DIR_POP_GENOME_BLOCK, 'indexes/') 64 | 65 | # Prefix and directory paths for experiments 66 | DIR_FIRST_PASS = os.path.join(DIR, 'experiments/{INDIV}/') 67 | DIR_SECOND_PASS = os.path.join(DIR, 'experiments/{INDIV}/' + POP_DIRNAME) 68 | PREFIX_SECOND_PASS = os.path.join(DIR_SECOND_PASS, EXP_LABEL + '-major-' + ALN_MAPQ_THRSD + '-{GROUP}-' + POP_DIRNAME) 69 | 70 | # Bias results directory 71 | DIR_RESULTS_BIAS = os.path.join(DIR, 'results/bias') 72 | 73 | ''' Snakemake modules ''' 74 | # Functions 75 | include: 'shared/functions.Snakefile' 76 | 77 | if not USE_PREBUILT: 78 | # Prepare pop genome and indexes 79 | # check: 'prepare_pop_genome.done' 80 | include: 'shared/prepare_pop_genome.Snakefile' 81 | 82 | # Prepare grc and major genome and indexes 83 | # check: 'prepare_standard_genome.done' 84 | include: 'shared/prepare_standard_genome.Snakefile' 85 | 86 | # Align reads. 87 | # check: 'alignment_refflow.done' 88 | # include: 'shared/alignment.Snakefile' 89 | if ALN_MODE == 'single-end': 90 | include: 'shared/alignment_single_end.Snakefile' 91 | elif ALN_MODE == 'paired-end': 92 | include: 'shared/alignment_paired_end.Snakefile' 93 | 94 | # Lift and sort reads 95 | # check: 'leviosam.done', 'sort.done' 96 | include: 'shared/lift_and_sort.Snakefile' 97 | 98 | TODO_LIST = ['alignment_refflow.done', 'leviosam.done'] 99 | if not USE_PREBUILT: 100 | TODO_LIST.append('prepare_pop_genome.done') 101 | TODO_LIST.append('prepare_standard_genome.done') 102 | if SORT_SAM: 103 | TODO_LIST.append('sort.done') 104 | 105 | ''' Snakemake rules ''' 106 | rule all: 107 | input: 108 | expand(os.path.join(DIR, '{task}'), task = TODO_LIST) 109 | 110 | rule filter_vcf: 111 | input: 112 | vcf = os.path.join(DIR_VCF, VCF_PREFIX + '{CHROM}' + VCF_SUFFIX), 113 | chrom_map = CHROM_MAP 114 | output: 115 | vcf = temp(os.path.join(DIR, '{CHROM}_filtered.vcf')) 116 | shell: 117 | # Take PASS variants 118 | # Does not remove mnps, since they will be needed for constructing personalized reference genome, 119 | # and will be removed when building major and refflow references. 120 | '{BCFTOOLS} view -r {wildcards.CHROM} -c 1 -f PASS {input.vcf} | {BCFTOOLS} annotate --rename-chrs {input.chrom_map} -o {output.vcf}' 121 | 122 | rule aggregate_vcf: 123 | input: 124 | vcf = expand(os.path.join(DIR, '{CHROM}_filtered.vcf'), CHROM = CHROM) 125 | output: 126 | vcf = os.path.join(DIR, EXP_LABEL + '_filtered.vcf.gz') 127 | shell: 128 | '{BCFTOOLS} concat -O z -o {output.vcf} {input.vcf}' 129 | -------------------------------------------------------------------------------- /snakemake/config.yaml: -------------------------------------------------------------------------------- 1 | # Genomic reads to process 2 | ALN_MODE : 'single-end' 3 | READS1 : '../test/SRR622457_1-1k.fastq' 4 | READS2 : '' 5 | 6 | # Name of the tested sample 7 | INDIV : 'test' 8 | 9 | # Experiment label that will become prefixes for many files 10 | # For example, 'wg' (stands for whole-genome), 'chr1' 11 | EXP_LABEL : 'wg' 12 | 13 | # Directory where the outputs will be put 14 | DIR: 'run' 15 | 16 | # Max number of threads used for each rule. 17 | # Note: the total number of threads used is specified by snakemake -j 18 | THREADS : 16 19 | 20 | # Whether to use pre-built indexes based on the RandFlow-LD method based on GRCh38 and 1KG 21 | USE_PREBUILT : True 22 | 23 | # Whether to sort the output SAM 24 | SORT_SAM : False 25 | 26 | # Reference genome; usually a standard GRC genome 27 | GENOME : '../resources/GCA_000001405.15_GRCh38_no_alt_analysis_set.fna' 28 | 29 | # Chromosomes included 30 | CHROM : ['1', '2', '3', '4', '5', '6', '7', '8', '9', '10', '11', '12', '13', '14', '15', '16', '17', '18', '19', '20', '21', '22', 'X', 'Y'] 31 | # Prefix of a chromosome. Usually set to 'chr' (GRCh38) or '' (hg19) 32 | CHR_PREFIX : 'chr' 33 | 34 | # Directory where the 1KG VCFs are put 35 | DIR_VCF : '../resources/1kg_vcf' 36 | 37 | # Set prefix and suffix for VCFs 38 | VCF_PREFIX : 'ALL.chr' 39 | VCF_SUFFIX : '.shapeit2_integrated_snvindels_v2a_27022019.GRCh38.phased.vcf.gz' 40 | 41 | # 1KG super populations used to build second pass population genomes 42 | GROUP : ['EUR', 'AMR', 'EAS', 'SAS', 'AFR'] 43 | POP_LEVEL : 'superpop' 44 | # Replace with the below if using a second-pass reference set based on 1KG populations 45 | # GROUP : ['CHB', 'JPT', 'CHS', 'CDX', 'KHV', 'CEU', 'TSI', 'FIN', 'GBR', 'IBS', 'YRI', 'LWK', 'GWD', 'MSL', 'ESN', 'ASW', 'ACB', 'MXL', 'PUR', 'CLM', 'PEL', 'GIH', 'PJL', 'BEB', 'STU', 'ITU'] 46 | # POP_LEVEL : 'pop' 47 | 48 | # mapping quality cutoff to split read into committed and deferred groups 49 | ALN_MAPQ_THRSD : '10' 50 | 51 | ### Second-pass genome parameteres 52 | # POP_THRSD: 53 | # allele frequency threshold. 0: do not filter by frequency; 0.5: only use major alleles 54 | # POP_STOCHASTIC: 55 | # 1: stochastic update; 0: deterministic 56 | # POP_BLOCK_SIZE: 57 | # size of phase-preserving blocks. Set to 1 when doing independent sampling 58 | # POP_USE_LD: 59 | # 1: phase-preserving; 0: independent-sampling 60 | ### 61 | ### Phase-preserving stochastic update (1kbp-blocks) 62 | POP_THRSD : 0 63 | POP_STOCHASTIC : 1 64 | POP_BLOCK_SIZE : 1000 65 | POP_USE_LD : 1 66 | ### Independent-sampling stochastic update 67 | # POP_THRSD : 0 68 | # POP_STOCHASTIC : 1 69 | # POP_BLOCK_SIZE : 1 70 | # POP_USE_LD : 0 71 | ### Deterministic major ### 72 | # POP_THRSD : 0.5 73 | # POP_STOCHASTIC : 0 74 | # POP_BLOCK_SIZE : 1 75 | # POP_USE_LD : 0 76 | 77 | # Files specifying 1KG individual-population and population-superpopulation mappings 78 | FAMILY : '../resources/20130606_g1k.ped' 79 | SPOP : '../resources/1kg.superpopulation' 80 | 81 | # Chromosome for GRCh38 82 | LENGTH_MAP : '../resources/GRCh38.length_map' 83 | CHROM_MAP : '../resources/GRCh38.chrom_map' 84 | 85 | # Paths of software 86 | BCFTOOLS : 'bcftools' 87 | SAMTOOLS : 'samtools' 88 | LEVIOSAM : 'leviosam' 89 | PYTHON : 'python' 90 | DIR_SCRIPTS : '../src' 91 | 92 | # Random seed used in the reference flow stochastic reference genome update process and for aligner 93 | RAND_SEED : 0 94 | -------------------------------------------------------------------------------- /snakemake/shared/alignment_paired_end.Snakefile: -------------------------------------------------------------------------------- 1 | ''' This snakefile includes rules that perform paired-end alignment.''' 2 | ''' Perform first-pass alignment.''' 3 | rule align_to_major: 4 | input: 5 | reads1 = READS1, 6 | reads2 = READS2, 7 | idx = expand( 8 | os.path.join(DIR, 'major/indexes/' + EXP_LABEL + '-maj.{idx}.bt2'), 9 | idx = IDX_ITEMS) 10 | params: 11 | index = os.path.join(DIR, 'major/indexes/' + EXP_LABEL + '-maj') 12 | output: 13 | sam = os.path.join(DIR_FIRST_PASS, EXP_LABEL + '-major.sam') 14 | threads: THREADS 15 | shell: 16 | 'bowtie2 --threads {THREADS} -x {params.index} -1 {input.reads1} -2 {input.reads2} -S {output.sam}' 17 | 18 | ''' Split first-pass alignment into high-quality and low-quality files. ''' 19 | rule refflow_split_aln_by_mapq: 20 | ''' 21 | This is under a paired-end configuration 22 | ''' 23 | input: 24 | sam = os.path.join(DIR_FIRST_PASS, EXP_LABEL + '-major.sam') 25 | output: 26 | highq = os.path.join(DIR_FIRST_PASS, 27 | EXP_LABEL + '-major-mapqgeq' + ALN_MAPQ_THRSD + '.sam'), 28 | lowq = os.path.join(DIR_FIRST_PASS, 29 | EXP_LABEL + '-major-mapqlt' + ALN_MAPQ_THRSD + '.sam'), 30 | lowq_reads1 = os.path.join(DIR_FIRST_PASS, 31 | EXP_LABEL + '-major-mapqlt' + ALN_MAPQ_THRSD + '_1.fq'), 32 | lowq_reads2 = os.path.join(DIR_FIRST_PASS, 33 | EXP_LABEL + '-major-mapqlt' + ALN_MAPQ_THRSD + '_2.fq') 34 | params: 35 | fastq = os.path.join(DIR_FIRST_PASS, 36 | EXP_LABEL + '-major-mapqlt' + ALN_MAPQ_THRSD), 37 | split_strategy = 'optimistic' 38 | shell: 39 | '{PYTHON} {DIR_SCRIPTS}/split_sam_by_mapq.py -s {input.sam} \ 40 | -oh {output.highq} -ol {output.lowq} -oq {params.fastq} \ 41 | -t {ALN_MAPQ_THRSD} --paired-end --split-strategy {params.split_strategy}' 42 | 43 | ''' Align low-quality reads using population genomes.''' 44 | rule refflow_align_secondpass_paired_end: 45 | input: 46 | reads1 = os.path.join(DIR_FIRST_PASS, 47 | EXP_LABEL + '-major-mapqlt' + ALN_MAPQ_THRSD + '_1.fq'), 48 | reads2 = os.path.join(DIR_FIRST_PASS, 49 | EXP_LABEL + '-major-mapqlt' + ALN_MAPQ_THRSD + '_2.fq'), 50 | idx1 = os.path.join(DIR_POP_GENOME_BLOCK_IDX, WG_POP_GENOME_SUFFIX + '.1.bt2'), 51 | idx2 = os.path.join(DIR_POP_GENOME_BLOCK_IDX, WG_POP_GENOME_SUFFIX + '.2.bt2'), 52 | idx3 = os.path.join(DIR_POP_GENOME_BLOCK_IDX, WG_POP_GENOME_SUFFIX + '.3.bt2'), 53 | idx4 = os.path.join(DIR_POP_GENOME_BLOCK_IDX, WG_POP_GENOME_SUFFIX + '.4.bt2'), 54 | idx5 = os.path.join(DIR_POP_GENOME_BLOCK_IDX, WG_POP_GENOME_SUFFIX + '.rev.1.bt2'), 55 | idx6 = os.path.join(DIR_POP_GENOME_BLOCK_IDX, WG_POP_GENOME_SUFFIX + '.rev.2.bt2') 56 | params: 57 | index = os.path.join(DIR_POP_GENOME_BLOCK_IDX, WG_POP_GENOME_SUFFIX) 58 | output: 59 | sam = PREFIX_SECOND_PASS + '.sam' 60 | threads: THREADS 61 | shell: 62 | 'bowtie2 --reorder --threads {threads} -x {params.index} -1 {input.reads1} -2 {input.reads2} -S {output.sam};' 63 | 64 | rule refflow_merge_secondpass: 65 | input: 66 | sam = expand( 67 | PREFIX_SECOND_PASS + '.sam', 68 | INDIV = INDIV, GROUP = GROUP), 69 | maj = os.path.join(DIR_FIRST_PASS, EXP_LABEL + '-major-mapqlt{}.sam'.format(ALN_MAPQ_THRSD)) 70 | output: 71 | path = os.path.join( 72 | DIR_SECOND_PASS, 73 | EXP_LABEL + '-major-{}-{}.paths'.format(ALN_MAPQ_THRSD, POP_DIRNAME)), 74 | label = os.path.join( 75 | DIR_SECOND_PASS, 76 | EXP_LABEL + '-major-{}-{}.ids'.format(ALN_MAPQ_THRSD, POP_DIRNAME)), 77 | merge_paths = os.path.join( 78 | DIR_SECOND_PASS, 79 | EXP_LABEL + '-major-{}-{}.merge_paths'.format(ALN_MAPQ_THRSD, POP_DIRNAME)) 80 | params: 81 | prefix = os.path.join(DIR_SECOND_PASS, '2ndpass') 82 | run: 83 | dir_2p = os.path.join(DIR, 'experiments/' + wildcards.INDIV + '/' + POP_DIRNAME) 84 | shell('echo {input.maj} > {output.path};') 85 | shell('echo "maj" > {output.label};') 86 | for g in GROUP: 87 | fn = os.path.join( 88 | dir_2p, 89 | EXP_LABEL + '-major-{}-{}-{}.sam'.format( 90 | ALN_MAPQ_THRSD, g, POP_DIRNAME)) 91 | shell('ls {fn} >> {output.path};') 92 | shell('echo {g} >> {output.label};') 93 | shell('{PYTHON} {DIR_SCRIPTS}/merge_incremental.py -ns {output.path} \ 94 | -ids {output.label} -rs {RAND_SEED} -p {params.prefix} \ 95 | -l {output.merge_paths} --paired-end') 96 | 97 | rule check_alignment_refflow: 98 | input: 99 | merge_paths = expand( 100 | os.path.join(DIR_SECOND_PASS, 101 | EXP_LABEL + '-major-{}-{}.merge_paths'.format( 102 | ALN_MAPQ_THRSD, POP_DIRNAME)), 103 | INDIV = INDIV) 104 | output: 105 | touch(temp(os.path.join(DIR, 'alignment_refflow.done'))) 106 | -------------------------------------------------------------------------------- /snakemake/shared/alignment_single_end.Snakefile: -------------------------------------------------------------------------------- 1 | ''' This snakefile includes rules that perform single-end alignment.''' 2 | ''' Perform first-pass alignment.''' 3 | rule align_to_major: 4 | input: 5 | reads1 = READS1, 6 | idx = expand( 7 | os.path.join(DIR, 'major/indexes/' + EXP_LABEL + '-maj.{idx}.bt2'), 8 | idx = IDX_ITEMS) 9 | params: 10 | index = os.path.join(DIR, 'major/indexes/' + EXP_LABEL + '-maj') 11 | output: 12 | sam = os.path.join(DIR_FIRST_PASS, EXP_LABEL + '-major.sam') 13 | threads: THREADS 14 | shell: 15 | 'bowtie2 --threads {THREADS} -x {params.index} -U {input.reads1} -S {output.sam}' 16 | 17 | ''' Split first-pass alignment into high-quality and low-quality files. ''' 18 | rule refflow_split_aln_by_mapq: 19 | input: 20 | sam = os.path.join(DIR_FIRST_PASS, EXP_LABEL + '-major.sam') 21 | output: 22 | highq = os.path.join(DIR_FIRST_PASS, 23 | EXP_LABEL + '-major-mapqgeq' + ALN_MAPQ_THRSD + '.sam'), 24 | lowq = os.path.join(DIR_FIRST_PASS, 25 | EXP_LABEL + '-major-mapqlt' + ALN_MAPQ_THRSD + '.sam'), 26 | lowq_reads = os.path.join(DIR_FIRST_PASS, 27 | EXP_LABEL + '-major-mapqlt' + ALN_MAPQ_THRSD + '_1.fq'), 28 | shell: 29 | 'awk -v var="{ALN_MAPQ_THRSD}" \ 30 | \'{{ if ($5 >= var || $1 ~ /^@/) {{ print }} }}\' {input.sam} > \ 31 | {output.highq};' 32 | 'awk -v var={ALN_MAPQ_THRSD} \ 33 | \'{{ if ($5 < var || $1 ~ /^@/) {{ print }} }}\' {input.sam} > \ 34 | {output.lowq};' 35 | 'samtools fastq {output.lowq} > {output.lowq_reads}' 36 | 37 | ''' Align low-quality reads using population genomes.''' 38 | rule refflow_align_secondpass_single_end: 39 | input: 40 | reads1 = os.path.join(DIR_FIRST_PASS, 41 | EXP_LABEL + '-major-mapqlt' + ALN_MAPQ_THRSD + '_1.fq'), 42 | idx1 = os.path.join(DIR_POP_GENOME_BLOCK_IDX, WG_POP_GENOME_SUFFIX + '.1.bt2'), 43 | idx2 = os.path.join(DIR_POP_GENOME_BLOCK_IDX, WG_POP_GENOME_SUFFIX + '.2.bt2'), 44 | idx3 = os.path.join(DIR_POP_GENOME_BLOCK_IDX, WG_POP_GENOME_SUFFIX + '.3.bt2'), 45 | idx4 = os.path.join(DIR_POP_GENOME_BLOCK_IDX, WG_POP_GENOME_SUFFIX + '.4.bt2'), 46 | idx5 = os.path.join(DIR_POP_GENOME_BLOCK_IDX, WG_POP_GENOME_SUFFIX + '.rev.1.bt2'), 47 | idx6 = os.path.join(DIR_POP_GENOME_BLOCK_IDX, WG_POP_GENOME_SUFFIX + '.rev.2.bt2') 48 | params: 49 | index = os.path.join(DIR_POP_GENOME_BLOCK_IDX, WG_POP_GENOME_SUFFIX) 50 | output: 51 | sam = PREFIX_SECOND_PASS + '.sam' 52 | threads: THREADS 53 | shell: 54 | 'bowtie2 --reorder --threads {THREADS} -x {params.index} -U {input.reads1} -S {output.sam}' 55 | 56 | ''' Merge refflow results. ''' 57 | rule refflow_merge_secondpass: 58 | input: 59 | sam = expand( 60 | PREFIX_SECOND_PASS + '.sam', 61 | INDIV = INDIV, GROUP = GROUP), 62 | maj = os.path.join(DIR_FIRST_PASS, EXP_LABEL + '-major-mapqlt{}.sam'.format(ALN_MAPQ_THRSD)) 63 | output: 64 | path = os.path.join( 65 | DIR_SECOND_PASS, 66 | EXP_LABEL + '-major-{}-{}.paths'.format(ALN_MAPQ_THRSD, POP_DIRNAME)), 67 | label = os.path.join( 68 | DIR_SECOND_PASS, 69 | EXP_LABEL + '-major-{}-{}.ids'.format(ALN_MAPQ_THRSD, POP_DIRNAME)), 70 | merge_paths = os.path.join( 71 | DIR_SECOND_PASS, 72 | EXP_LABEL + '-major-{}-{}.merge_paths'.format(ALN_MAPQ_THRSD, POP_DIRNAME)) 73 | params: 74 | prefix = os.path.join(DIR_SECOND_PASS, '2ndpass') 75 | run: 76 | dir_2p = os.path.join(DIR, 'experiments/' + wildcards.INDIV + '/' + POP_DIRNAME) 77 | shell('echo {input.maj} > {output.path};') 78 | shell('echo "maj" > {output.label};') 79 | for g in GROUP: 80 | fn = os.path.join( 81 | dir_2p, 82 | EXP_LABEL + '-major-{}-{}-{}.sam'.format( 83 | ALN_MAPQ_THRSD, g, POP_DIRNAME)) 84 | shell('ls {fn} >> {output.path};') 85 | shell('echo {g} >> {output.label};') 86 | shell('{PYTHON} {DIR_SCRIPTS}/merge_incremental.py -ns {output.path} \ 87 | -ids {output.label} -rs {RAND_SEED} -p {params.prefix} \ 88 | -l {output.merge_paths}') 89 | 90 | rule check_alignment_refflow: 91 | input: 92 | merge_paths = expand( 93 | os.path.join(DIR_SECOND_PASS, 94 | EXP_LABEL + '-major-{}-{}.merge_paths'.format( 95 | ALN_MAPQ_THRSD, POP_DIRNAME)), 96 | INDIV = INDIV) 97 | output: 98 | touch(temp(os.path.join(DIR, 'alignment_refflow.done'))) 99 | -------------------------------------------------------------------------------- /snakemake/shared/functions.Snakefile: -------------------------------------------------------------------------------- 1 | def build_dict_indiv_to_pop(fn): 2 | dict_indiv_to_pop = {} 3 | df = pd.read_csv(fn, sep='\t') 4 | list_sample = df['Individual ID'] 5 | list_pop = df['Population'] 6 | assert len(list_sample) == len(list_pop) 7 | assert len(list_sample) == len(set(list_sample)) 8 | for i, s in enumerate(list_sample): 9 | dict_indiv_to_pop[s] = list_pop[i] 10 | return dict_indiv_to_pop 11 | 12 | def build_dict_pop_to_spop(fn): 13 | dict_pop_to_spop = {} 14 | df = pd.read_csv(fn, sep='\t', header=None) 15 | list_spop = df[2] 16 | list_pop = df[0] 17 | for i, p in enumerate(list_pop): 18 | dict_pop_to_spop[p] = list_spop[i] 19 | return dict_pop_to_spop 20 | -------------------------------------------------------------------------------- /snakemake/shared/lift_and_sort.Snakefile: -------------------------------------------------------------------------------- 1 | ''' 2 | The rules in this file performs coordinate system transformation for aligned reads 3 | and sort the reads. 4 | ''' 5 | rule lift_major_highq: 6 | input: 7 | sam = os.path.join(DIR_FIRST_PASS, EXP_LABEL + '-major-mapqgeq{}.sam'.format(ALN_MAPQ_THRSD)), 8 | lft = os.path.join(DIR_MAJOR, EXP_LABEL + '-major.lft') 9 | output: 10 | os.path.join(DIR_FIRST_PASS, EXP_LABEL + '-major-mapqgeq{}-liftover.sam'.format(ALN_MAPQ_THRSD)) 11 | params: 12 | os.path.join(DIR_FIRST_PASS, EXP_LABEL + '-major-mapqgeq{}-liftover'.format(ALN_MAPQ_THRSD)) 13 | threads: THREADS 14 | run: 15 | shell('{LEVIOSAM} lift -a {input.sam} -l {input.lft} -p {params} -t {threads}') 16 | 17 | #: Refflow -- second pass 18 | rule lift_refflow_secondpass_and_merge: 19 | input: 20 | maj_fp = os.path.join( 21 | DIR_FIRST_PASS, 22 | EXP_LABEL + '-major-mapqgeq{}-liftover.sam'.format(ALN_MAPQ_THRSD)), 23 | second_sam_path = os.path.join( 24 | DIR_SECOND_PASS, 25 | EXP_LABEL + '-major-{}-{}.merge_paths'.format(ALN_MAPQ_THRSD, POP_DIRNAME)), 26 | lft_pop = expand(os.path.join( 27 | DIR_POP_GENOME, POP_DIRNAME + '/' + 28 | EXP_LABEL + '-' + POP_LEVEL + '_{GROUP}_' + POP_DIRNAME + '.lft'), 29 | GROUP = GROUP), 30 | lft_maj = os.path.join(DIR_MAJOR, EXP_LABEL + '-major.lft'), 31 | output: 32 | lfted_refflow_sam = os.path.join(DIR_SECOND_PASS, 33 | EXP_LABEL + '-refflow-{}-{}-liftover.sam'.format(ALN_MAPQ_THRSD, POP_DIRNAME)), 34 | lfted_major_second_sam = os.path.join(DIR_SECOND_PASS, '2ndpass-maj-liftover.sam'), 35 | lfted_group_second_sam = [ 36 | os.path.join(DIR_SECOND_PASS, '2ndpass-') + 37 | g + '-liftover.sam' for g in GROUP] 38 | threads: THREADS 39 | run: 40 | list_sam = [] 41 | list_group = [] 42 | #: files should be 43 | #: DIR + '/experiments/{INDIV}/{POP_DIRNAME}/2ndpass-{}.sam' 44 | #: where g should be {GROUP} + 'maj' 45 | with open(input.second_sam_path, 'r') as f: 46 | for line in f: 47 | list_sam.append(line.rstrip()) 48 | bn = os.path.basename(line) 49 | split_bn = os.path.splitext(bn) 50 | list_group.append(split_bn[0].split('-')[-1]) 51 | for i, s in enumerate(list_sam): 52 | sys.stderr.write('sam={}, group = {}\n'.format(s, list_group[i])) 53 | 54 | # Copy lifted first pass sam. 55 | shell('cp {input.maj_fp} {output.lfted_refflow_sam};') 56 | # Run levioSAM and merge files. 57 | for i, sam in enumerate(list_sam): 58 | prefix = os.path.join(DIR, 59 | 'experiments/' + wildcards.INDIV + '/' + POP_DIRNAME + 60 | '/2ndpass-{}-liftover'.format(list_group[i])) 61 | if list_group[i] == 'maj': 62 | sys.stderr.write('sam={}, lft = {}\n'.format(sam, input.lft_maj)) 63 | shell('{LEVIOSAM} lift -a {sam} -l {input.lft_maj} -p {prefix} -t {threads};') 64 | # Append reads to all-in-one lifted SAM. 65 | shell('grep -hv "^@" {prefix}.sam >> {output.lfted_refflow_sam};') 66 | elif list_group[i] in GROUP: 67 | for lft in input.lft_pop: 68 | pop = os.path.basename(lft) 69 | if lft.count(list_group[i]) > 0: 70 | break 71 | sys.stderr.write('sam={}, lft = {}\n'.format(sam, lft)) 72 | shell('{LEVIOSAM} lift -a {sam} -l {lft} -p {prefix} -t {threads};') 73 | # Append reads to all-in-one lifted SAM. 74 | shell('grep -hv "^@" {prefix}.sam >> {output.lfted_refflow_sam};') 75 | 76 | rule check_leviosam: 77 | input: 78 | expand(os.path.join(DIR_SECOND_PASS, 79 | EXP_LABEL + '-refflow-{}-{}-liftover.sam'.format(ALN_MAPQ_THRSD, POP_DIRNAME)), 80 | INDIV = INDIV) 81 | output: 82 | touch(temp(os.path.join(DIR, 'leviosam.done'))) 83 | 84 | ''' 85 | Sort SAM records 86 | ''' 87 | rule sort_refflow: 88 | input: 89 | os.path.join(DIR_SECOND_PASS, 90 | EXP_LABEL + '-refflow-{}-{}-liftover.sam'.format(ALN_MAPQ_THRSD, POP_DIRNAME)) 91 | output: 92 | os.path.join(DIR_SECOND_PASS, 93 | EXP_LABEL + '-refflow-{}-{}-liftover-sorted.bam'.format(ALN_MAPQ_THRSD, POP_DIRNAME)) 94 | threads: 4 95 | run: 96 | shell('{SAMTOOLS} sort -@ {threads} -o {output} -O BAM {input};') 97 | 98 | rule check_sort: 99 | input: 100 | expand(os.path.join(DIR_SECOND_PASS, 101 | EXP_LABEL + '-refflow-{}-{}-liftover-sorted.bam'.format( 102 | ALN_MAPQ_THRSD, POP_DIRNAME)), INDIV = INDIV), 103 | output: 104 | touch(temp(os.path.join(DIR, 'sort.done'))) 105 | 106 | -------------------------------------------------------------------------------- /snakemake/shared/prepare_pop_genome.Snakefile: -------------------------------------------------------------------------------- 1 | ''' 2 | Rules for building population genomes 3 | ''' 4 | rule prepare_pop_indiv: 5 | output: 6 | expand( 7 | os.path.join(DIR, '1KG_indivs/sample_' + POP_LEVEL + '_{GROUP}.txt'), GROUP = GROUP) 8 | params: 9 | prefix = os.path.join(DIR, '1KG_indivs/sample') 10 | shell: 11 | '{PYTHON} {DIR_SCRIPTS}/list_indiv_from_pop.py ' 12 | '-p {FAMILY} -sp {SPOP} -op {params.prefix}' 13 | 14 | rule build_pop_vcf: 15 | ''' 16 | Filter VCF by population groups. 17 | Each output VCF includes only indivs in the specified population group. 18 | ''' 19 | input: 20 | vcf = os.path.join(DIR, EXP_LABEL + '_filtered.vcf.gz'), 21 | indiv_group = os.path.join(DIR, '1KG_indivs/sample_' + POP_LEVEL + '_{GROUP}.txt') 22 | output: 23 | vcf_gz = os.path.join(DIR_POP_GENOME, EXP_LABEL + '_' + POP_LEVEL + '_{GROUP}.vcf.gz') 24 | shell: 25 | '{BCFTOOLS} view -S {input.indiv_group} ' 26 | '--force-samples {input.vcf} -V mnps,other -m2 -M2 | bgzip > {output.vcf_gz}' 27 | 28 | rule get_pop_sample: 29 | input: 30 | vcf_gz = os.path.join(DIR_POP_GENOME, 31 | EXP_LABEL + '_' + POP_LEVEL + '_{GROUP}.vcf.gz') 32 | output: 33 | vcf_header = os.path.join(DIR_POP_GENOME, 34 | EXP_LABEL + '_' + POP_LEVEL + '_{GROUP}.samples') 35 | shell: 36 | '{BCFTOOLS} view -h {input.vcf_gz} | tail -1 ' 37 | '> {output.vcf_header}' 38 | 39 | rule filter_pop_vcf: 40 | input: 41 | vcf_gz = os.path.join(DIR_POP_GENOME, 42 | EXP_LABEL + '_' + POP_LEVEL + '_{GROUP}.vcf.gz'), 43 | vcf_header = os.path.join(DIR_POP_GENOME, 44 | EXP_LABEL + '_' + POP_LEVEL + '_{GROUP}.samples') 45 | output: 46 | vcf = os.path.join( 47 | DIR_POP_GENOME, 48 | EXP_LABEL + '_' + POP_LEVEL + '_{GROUP}_t' + str(POP_THRSD) + '.vcf' 49 | ) 50 | run: 51 | fn = list({input.vcf_header})[0] 52 | with open(fn, 'r') as f: 53 | for line in f: 54 | n = len(line.split()) - 9 55 | thrsd = int(n * 2 * float(POP_THRSD)) 56 | filt = 'AC > {}'.format(thrsd) 57 | break 58 | shell('{BCFTOOLS} view -i "{filt}" \ 59 | -v snps,indels {input.vcf_gz} > {output.vcf};') 60 | 61 | rule build_pop_genome: 62 | input: 63 | vcf = os.path.join( 64 | DIR_POP_GENOME, 65 | EXP_LABEL + '_' + POP_LEVEL + '_{GROUP}_t' + str(POP_THRSD) + '.vcf' 66 | ) 67 | output: 68 | os.path.join( 69 | DIR_POP_GENOME_BLOCK, 70 | WG_POP_GENOME_SUFFIX + '.fa' 71 | ), 72 | os.path.join( 73 | DIR_POP_GENOME_BLOCK, 74 | WG_POP_GENOME_SUFFIX + '.var' 75 | ), 76 | os.path.join( 77 | DIR_POP_GENOME_BLOCK, 78 | WG_POP_GENOME_SUFFIX + '.vcf' 79 | ) 80 | params: 81 | prefix = os.path.join( 82 | DIR_POP_GENOME_BLOCK, 83 | WG_POP_GENOME_SUFFIX 84 | ) 85 | run: 86 | if POP_STOCHASTIC == 1 and POP_USE_LD == 1: 87 | shell('{PYTHON} {DIR_SCRIPTS}/update_genome.py \ 88 | --ref {GENOME} --vcf {input.vcf} \ 89 | --out-prefix {params.prefix} \ 90 | --include-indels --stochastic -rs {RAND_SEED} \ 91 | --block-size {POP_BLOCK_SIZE} --ld') 92 | elif POP_STOCHASTIC == 1: 93 | shell('{PYTHON} {DIR_SCRIPTS}/update_genome.py \ 94 | --ref {GENOME} --vcf {input.vcf} \ 95 | --out-prefix {params.prefix} \ 96 | --include-indels --stochastic -rs {RAND_SEED} \ 97 | --block-size {POP_BLOCK_SIZE}') 98 | else: 99 | shell('{PYTHON} {DIR_SCRIPTS}/update_genome.py \ 100 | --ref {GENOME} --vcf {input.vcf} \ 101 | --out-prefix {params.prefix} \ 102 | --include-indels') 103 | 104 | rule build_pop_genome_index: 105 | input: 106 | genome = os.path.join(DIR_POP_GENOME_BLOCK, WG_POP_GENOME_SUFFIX + '.fa') 107 | output: 108 | os.path.join(DIR_POP_GENOME_BLOCK_IDX, WG_POP_GENOME_SUFFIX + '.1.bt2'), 109 | os.path.join(DIR_POP_GENOME_BLOCK_IDX, WG_POP_GENOME_SUFFIX + '.2.bt2'), 110 | os.path.join(DIR_POP_GENOME_BLOCK_IDX, WG_POP_GENOME_SUFFIX + '.3.bt2'), 111 | os.path.join(DIR_POP_GENOME_BLOCK_IDX, WG_POP_GENOME_SUFFIX + '.4.bt2'), 112 | os.path.join(DIR_POP_GENOME_BLOCK_IDX, WG_POP_GENOME_SUFFIX + '.rev.1.bt2'), 113 | os.path.join(DIR_POP_GENOME_BLOCK_IDX, WG_POP_GENOME_SUFFIX + '.rev.2.bt2') 114 | params: 115 | os.path.join(DIR_POP_GENOME_BLOCK_IDX, WG_POP_GENOME_SUFFIX) 116 | threads: THREADS 117 | shell: 118 | 'bowtie2-build --threads {threads} {input.genome} {params};' 119 | 120 | rule check_pop_genome: 121 | input: 122 | expand( 123 | DIR_POP_GENOME_BLOCK_IDX + WG_POP_GENOME_SUFFIX + '.{IDX_ITEMS}.bt2', 124 | GROUP=GROUP, IDX_ITEMS=IDX_ITEMS, POP_LEVEL=POP_LEVEL 125 | ) 126 | output: 127 | touch(temp(os.path.join(DIR, 'prepare_pop_genome.done'))) 128 | 129 | 130 | ''' 131 | Rules for building indexes for liftover. 132 | ''' 133 | rule leviosam_serialize_major: 134 | input: 135 | vcf_major = os.path.join(DIR, 'major/' + EXP_LABEL + '-maj.vcf'), 136 | length_map = LENGTH_MAP 137 | output: 138 | lft = os.path.join(DIR_MAJOR, EXP_LABEL + '-major.lft') 139 | params: 140 | os.path.join(DIR_MAJOR, EXP_LABEL + '-major') 141 | shell: 142 | '{LEVIOSAM} serialize -v {input.vcf_major} -p {params} -k {input.length_map}' 143 | 144 | rule leviosam_serialize_pop_genome: 145 | input: 146 | vcf = os.path.join(DIR_POP_GENOME, POP_DIRNAME + '/' + 147 | EXP_LABEL + '-' + POP_LEVEL + '_{GROUP}_' + POP_DIRNAME + '.vcf'), 148 | length_map = LENGTH_MAP 149 | output: 150 | lft = os.path.join( 151 | DIR_POP_GENOME, POP_DIRNAME + '/' + 152 | EXP_LABEL + '-' + POP_LEVEL +'_{GROUP}_' + POP_DIRNAME + '.lft') 153 | params: 154 | os.path.join( 155 | DIR_POP_GENOME, POP_DIRNAME + '/' + 156 | EXP_LABEL + '-' + POP_LEVEL +'_{GROUP}_' + POP_DIRNAME) 157 | run: 158 | shell('{LEVIOSAM} serialize -v {input.vcf} -p {params} -k {input.length_map}') 159 | 160 | -------------------------------------------------------------------------------- /snakemake/shared/prepare_standard_genome.Snakefile: -------------------------------------------------------------------------------- 1 | rule build_major: 2 | input: 3 | vcf = os.path.join(DIR, EXP_LABEL + '_filtered.vcf.gz'), 4 | genome = GENOME 5 | output: 6 | vcf = os.path.join(DIR, 'major/' + EXP_LABEL + '-maj.vcf'), 7 | vcfgz = os.path.join(DIR, 'major/' + EXP_LABEL + '-maj.vcf.gz'), 8 | vcfgz_idx = os.path.join(DIR, 'major/' + EXP_LABEL + '-maj.vcf.gz.csi'), 9 | out_genome = os.path.join(DIR, 'major/' + EXP_LABEL + '-maj.fa'), 10 | shell: 11 | '{BCFTOOLS} view -O z -q 0.5000001 -G -o {output.vcf} -v snps,indels -m2 -M2 {input.vcf};' 12 | 'bgzip -c {output.vcf} > {output.vcfgz};' 13 | '{BCFTOOLS} index {output.vcfgz};' 14 | '{BCFTOOLS} consensus -f {input.genome} -o {output.out_genome} {output.vcfgz}' 15 | 16 | rule build_major_index: 17 | input: 18 | os.path.join(DIR, 'major/' + EXP_LABEL + '-maj.fa') 19 | output: 20 | expand( 21 | os.path.join(DIR, 'major/indexes/' + EXP_LABEL + '-maj.{idx}.bt2'), 22 | idx = IDX_ITEMS) 23 | params: 24 | os.path.join(DIR, 'major/indexes/' + EXP_LABEL + '-maj') 25 | threads: THREADS 26 | shell: 27 | 'bowtie2-build --threads {threads} {input} {params}' 28 | 29 | rule check_standard_genomes: 30 | input: 31 | expand( 32 | os.path.join(DIR, 'major/indexes/' + EXP_LABEL + '-maj.{idx}.bt2'), 33 | idx = IDX_ITEMS), 34 | output: 35 | touch(temp(os.path.join(DIR, 'prepare_standard_genome.done'))) 36 | -------------------------------------------------------------------------------- /src/Makefile: -------------------------------------------------------------------------------- 1 | PRGNAME=refflow_utils 2 | CXX=g++ 3 | CXX_FLAGS=--std=c++14 -lpthread 4 | LIB=-lhts 5 | 6 | INC = add_aux.hpp split_sam.hpp merge_sam.hpp refflow_utils.hpp 7 | OBJ = add_aux.o split_sam.o merge_sam.o refflow_utils.o 8 | 9 | all: $(PRGNAME) 10 | 11 | %.o: %.cpp $(INC) 12 | $(CXX) -c -o $@ $< $(LIB) $(CXX_FLAGS) 13 | 14 | refflow_utils: $(OBJ) 15 | $(CXX) -o $@ $^ $(LIB) $(CXX_FLAGS) 16 | 17 | clean: 18 | rm *.o $(PRGNAME) 19 | -------------------------------------------------------------------------------- /src/add_aux.cpp: -------------------------------------------------------------------------------- 1 | #include 2 | 3 | #include 4 | #include 5 | 6 | #include "add_aux.hpp" 7 | #include "refflow_utils.hpp" 8 | 9 | 10 | /* Parse a raw AUX tag into right htslib format. 11 | * 12 | * Adapted from Vincenzo Pii 13 | * https://stackoverflow.com/questions/14265581/parse-split-a-string-in-c-using-string-delimiter-standard-c 14 | */ 15 | void parse_aux_tag(const std::string &aux, char (&aux_tag)[2], char &aux_type, 16 | size_t &aux_len, std::string &aux_content) { 17 | std::string token, s = aux, delimiter = ":"; 18 | size_t pos = 0, tag_idx = 0; 19 | while ((pos = s.find(delimiter)) != std::string::npos) { 20 | token = s.substr(0, pos); 21 | if (tag_idx == 0) { 22 | if (token.length() != 2) { 23 | std::cerr << "[Error] Aux tag name should have exactly two characters.\n"; 24 | std::cerr << "Given tag " << token << " doesn't satisfy\n"; 25 | exit(1); 26 | } 27 | aux_tag[0] = token[0]; 28 | aux_tag[1] = token[1]; 29 | } else if (tag_idx == 1) { 30 | if (token.length() != 1) { 31 | std::cerr << "[Error] Aux tag type should have exactly one character.\n"; 32 | std::cerr << "Given aux type " << token << " doesn't satisfy\n"; 33 | exit(1); 34 | } 35 | aux_type = token[0]; 36 | } else { 37 | std::cerr << "[Error] Aux tag should have exactly three fields.\n"; 38 | std::cerr << "Given tag " << aux_tag << " doesn't satisfy\n"; 39 | exit(1); 40 | } 41 | s.erase(0, pos + delimiter.length()); 42 | tag_idx ++; 43 | } 44 | aux_content = s; 45 | aux_len = aux_content.length() + 1; 46 | } 47 | 48 | void add_aux(add_aux_opts args) { 49 | // Read raw SAM file. 50 | samFile* sam_fp = sam_open(args.sam_fn.data(), "r"); 51 | bam_hdr_t* hdr = sam_hdr_read(sam_fp); 52 | 53 | char const *out_mode = (args.output_ext == "bam")? "wb" : "w"; 54 | std::string out_sam_fn = (args.output_prefix == "-")? 55 | "-" : args.output_prefix + "." + args.output_ext; 56 | samFile* out_sam_fp = sam_open(out_sam_fn.data(), out_mode); 57 | int write_hdr = sam_hdr_write(out_sam_fp, hdr); 58 | 59 | char aux_tag[2]; 60 | char aux_type; 61 | std::string aux_content; 62 | size_t aux_len; 63 | parse_aux_tag(args.added_tag, aux_tag, aux_type, aux_len, aux_content); 64 | 65 | bam1_t* aln = bam_init1(); 66 | while(true){ 67 | if (sam_read1(sam_fp, hdr, aln) < 0) break; 68 | 69 | uint8_t* aux_data = (uint8_t*) aux_content.data(); 70 | // If AUX type is 'i' (int) 71 | if (aux_type == 'i') { 72 | aux_len = 4; 73 | int32_t int_aux_content = std::stoi(aux_content); 74 | aux_data = (uint8_t*) &int_aux_content; 75 | } 76 | if (bam_aux_append(aln, aux_tag, aux_type, aux_len, aux_data) < 0) { 77 | std::cerr << "[Error] Failed to add aux tag for " << bam_get_qname(aln) << "\n"; 78 | exit(1); 79 | } 80 | 81 | if (sam_write1(out_sam_fp, hdr, aln) < 0) { 82 | std::cerr << "[Error] Failure when writing an alignment.\n"; 83 | std::cerr << "Read name: " << bam_get_qname(aln) << "\n"; 84 | exit(1); 85 | } 86 | } 87 | bam_destroy1(aln); 88 | sam_close(sam_fp); 89 | sam_close(out_sam_fp); 90 | } 91 | 92 | 93 | void add_aux_main(int argc, char** argv) { 94 | int c; 95 | add_aux_opts args; 96 | args.cmd = make_cmd(argc, argv); 97 | static struct option long_options[]{ 98 | {"sam_fn", required_argument, 0, 's'}, 99 | {"output_prefix", required_argument, 0, 'o'}, 100 | {"output_ext", required_argument, 0, 'O'}, 101 | {"added_tag", required_argument, 0, 'g'}, 102 | }; 103 | int long_idx = 0; 104 | while ((c = getopt_long(argc, argv, "hg:o:O:s:", long_options, &long_idx)) != -1){ 105 | switch (c){ 106 | case 'o': 107 | args.output_prefix = optarg; 108 | break; 109 | case 'O': 110 | args.output_ext = optarg; 111 | break; 112 | case 'g': 113 | args.added_tag = optarg; 114 | break; 115 | case 's': 116 | args.sam_fn = optarg; 117 | break; 118 | case 'h': 119 | add_aux_help(); 120 | exit(1); 121 | default: 122 | std::cerr << "Ignoring option " << c << " \n"; 123 | add_aux_help(); 124 | exit(1); 125 | } 126 | } 127 | if (args.output_ext != "sam" && args.output_ext != "bam") { 128 | std::cerr << "[Error] Unsupported output extension " << args.output_ext << "\n"; 129 | add_aux_help(); 130 | exit(1); 131 | } 132 | if (args.added_tag == "") { 133 | std::cerr << "[Error] Empty added tag. Set it using the `-g` option\n"; 134 | add_aux_help(); 135 | exit(1); 136 | } 137 | add_aux(args); 138 | } 139 | 140 | -------------------------------------------------------------------------------- /src/add_aux.hpp: -------------------------------------------------------------------------------- 1 | #ifndef ADD_AUX_H__ 2 | #define ADD_AUX_H__ 3 | 4 | struct add_aux_opts{ 5 | std::string cmd = ""; 6 | std::string added_tag = ""; 7 | std::string sam_fn = "-"; 8 | std::string output_ext = "sam"; 9 | std::string output_prefix = ""; 10 | }; 11 | 12 | static void add_aux_help(){ 13 | std::cerr << "\n"; 14 | std::cerr << "Usage: refflow_utils add_aux [options] -s -o -g \n"; 15 | std::cerr << "\n"; 16 | std::cerr << " -s SAM or BAM file to split\n"; 17 | std::cerr << " -o Prefix of the output files\n"; 18 | std::cerr << " -g AUX tag to be added ('i' and 'Z' formats are supported)\n"; 19 | std::cerr << "\n"; 20 | std::cerr << "Options (defaults in parentheses):\n"; 21 | std::cerr << " -O Output alignment format, can be sam or bam [sam]\n"; 22 | std::cerr << "\n"; 23 | } 24 | 25 | void parse_aux_tag(const std::string &aux, char (&aux_tag)[2], char &aux_type, 26 | int &aux_len, std::string &aux_content); 27 | 28 | void add_aux(add_aux_opts args); 29 | 30 | void add_aux_main(int argc, char** argv); 31 | 32 | 33 | #endif /* ADD_AUX_H__ */ 34 | -------------------------------------------------------------------------------- /src/download_1kg_pop_table.sh: -------------------------------------------------------------------------------- 1 | wget -P resources/ ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/technical/working/20130606_sample_info/20130606_g1k.ped 2 | -------------------------------------------------------------------------------- /src/download_1kg_vcf.sh: -------------------------------------------------------------------------------- 1 | # wget -P resources/1kg_vcf/ -r -l1 --no-parent -A "vcf.gz" http://ftp.1000genomes.ebi.ac.uk/vol1/ftp/data_collections/1000_genomes_project/release/20190312_biallelic_SNV_and_INDEL/ 2 | wget -P resources/1kg_vcf/ http://ftp.1000genomes.ebi.ac.uk/vol1/ftp/data_collections/1000_genomes_project/release/20190312_biallelic_SNV_and_INDEL/ALL.chr{1..22}.shapeit2_integrated_snvindels_v2a_27022019.GRCh38.phased.vcf.gz 3 | wget -P resources/1kg_vcf/ http://ftp.1000genomes.ebi.ac.uk/vol1/ftp/data_collections/1000_genomes_project/release/20190312_biallelic_SNV_and_INDEL/ALL.chr{X}.shapeit2_integrated_snvindels_v2a_27022019.GRCh38.phased.vcf.gz 4 | # http://ftp.1000genomes.ebi.ac.uk/vol1/ftp/data_collections/1000_genomes_project/release/20190312_biallelic_SNV_and_INDEL/ALL.chr*.vcf.gz 5 | -------------------------------------------------------------------------------- /src/download_genome.sh: -------------------------------------------------------------------------------- 1 | # This script downloads GRCh38 reference genome from NCBI 2 | 3 | wget -P resources/ ftp://ftp.ncbi.nlm.nih.gov/genomes/all/GCA/000/001/405/GCA_000001405.15_GRCh38/seqs_for_alignment_pipelines.ucsc_ids/GCA_000001405.15_GRCh38_no_alt_analysis_set.fna.gz 4 | bgzip -d resources/GCA_000001405.15_GRCh38_no_alt_analysis_set.fna.gz 5 | -------------------------------------------------------------------------------- /src/download_prebuilt_indexes.sh: -------------------------------------------------------------------------------- 1 | METHOD="${1:-randflow_ld}" 2 | wget -P resources/ https://genome-idx.s3.amazonaws.com/bt/flow/${METHOD}.tar.gz 3 | tar -zxvf resources/randflow_ld.tar.gz --directory snakemake/ 4 | -------------------------------------------------------------------------------- /src/list_indiv_from_pop.py: -------------------------------------------------------------------------------- 1 | ''' 2 | This script reads population/superpopulation information from 3 | 1000 Genomes Project and lists individuals wrt to population/ 4 | superpopulation. 5 | ''' 6 | 7 | import argparse 8 | import pandas as pd 9 | # from plot_from_exps import read_ped 10 | from utils import read_ped 11 | 12 | def parse_args(): 13 | parser = argparse.ArgumentParser() 14 | parser.add_argument( 15 | '-p', '--fn_ped', 16 | help='ped file recoding population and family info' 17 | ) 18 | parser.add_argument( 19 | '-sp', '--fn_superpopulation', 20 | help='superpopulation table' 21 | ) 22 | parser.add_argument( 23 | '-op', '--out-prefix', 24 | help='prefix of output lists' 25 | ) 26 | # parser.add_argument( 27 | # '-c', '--cluster', 28 | # help='txt file specifying unsupervised clusters' 29 | # ) 30 | return parser.parse_args() 31 | 32 | def process_db( 33 | fn_ped, 34 | fn_superpopulation 35 | ): 36 | ''' 37 | Create dictionaries specifying mapping between indiv ID and 38 | 1KG population/superpopulation labels. 39 | ''' 40 | dict_indiv_pop = read_ped(fn_ped) 41 | 42 | df_superpop = pd.read_csv( 43 | fn_superpopulation, 44 | sep='\t', 45 | header=None, 46 | index_col=0 47 | ) 48 | superpop_groups = df_superpop.groupby(2) 49 | 50 | dict_pop_superpop = {} 51 | for n, _ in superpop_groups: 52 | for i in superpop_groups.groups[n]: 53 | dict_pop_superpop[i] = n 54 | 55 | dict_indiv_superpop = {} 56 | for indiv in dict_indiv_pop.keys(): 57 | dict_indiv_superpop[indiv] = dict_pop_superpop[dict_indiv_pop[indiv]] 58 | 59 | return dict_indiv_pop, dict_indiv_superpop 60 | 61 | def list_indiv_from_pop( 62 | dict_indiv_pop, 63 | dict_indiv_superpop 64 | ): 65 | ''' 66 | Reads dictionaries and returns dictionaries of lists 67 | where each list records indivs belong to a pop/superpop 68 | ''' 69 | list_pop = list(set(dict_indiv_pop.values())) 70 | dict_list_pop = {} 71 | for pop in list_pop: 72 | list_pop = [] 73 | for indiv in dict_indiv_pop.keys(): 74 | if dict_indiv_pop[indiv] == pop: 75 | list_pop.append(indiv) 76 | dict_list_pop[pop] = list_pop 77 | 78 | list_superpop = list(set(dict_indiv_superpop.values())) 79 | dict_list_superpop = {} 80 | for superpop in list_superpop: 81 | list_superpop = [] 82 | for indiv in dict_indiv_superpop.keys(): 83 | if dict_indiv_superpop[indiv] == superpop: 84 | list_superpop.append(indiv) 85 | dict_list_superpop[superpop] = list_superpop 86 | return dict_list_pop, dict_list_superpop 87 | 88 | def write_to_files( 89 | dict_list_pop, 90 | dict_list_superpop, 91 | out_prefix, 92 | num_indiv 93 | ): 94 | ''' 95 | Writes dict of lists to files using given prefix 96 | ''' 97 | count_pop_indiv = 0 98 | for pop in dict_list_pop.keys(): 99 | fn_out = out_prefix + '_pop_' + pop + '.txt' 100 | f_out = open(fn_out, 'w') 101 | list_pop = dict_list_pop[pop] 102 | for indiv in list_pop: 103 | count_pop_indiv += 1 104 | f_out.write(indiv+'\n') 105 | f_out.close() 106 | assert count_pop_indiv == num_indiv 107 | 108 | count_superpop_indiv = 0 109 | for superpop in dict_list_superpop.keys(): 110 | fn_out = out_prefix + '_superpop_' + superpop + '.txt' 111 | f_out = open(fn_out, 'w') 112 | list_superpop = dict_list_superpop[superpop] 113 | for indiv in list_superpop: 114 | count_superpop_indiv += 1 115 | f_out.write(indiv+'\n') 116 | f_out.close() 117 | assert count_superpop_indiv == num_indiv 118 | 119 | return 120 | 121 | if __name__ == '__main__': 122 | args = parse_args() 123 | fn_ped = args.fn_ped 124 | fn_superpopulation = args.fn_superpopulation 125 | out_prefix = args.out_prefix 126 | 127 | dict_indiv_pop, dict_indiv_superpop = process_db( 128 | fn_ped, 129 | fn_superpopulation 130 | ) 131 | assert len(dict_indiv_pop.keys()) == len(dict_indiv_superpop.keys()) 132 | 133 | dict_list_pop, dict_list_superpop = list_indiv_from_pop( 134 | dict_indiv_pop, 135 | dict_indiv_superpop 136 | ) 137 | 138 | write_to_files( 139 | dict_list_pop, 140 | dict_list_superpop, 141 | out_prefix, 142 | len(dict_indiv_pop.keys()) 143 | ) 144 | -------------------------------------------------------------------------------- /src/merge_incremental.py: -------------------------------------------------------------------------------- 1 | import sys 2 | import random 3 | import argparse 4 | 5 | def parse_args(): 6 | parser = argparse.ArgumentParser() 7 | parser.add_argument( 8 | '-ns', '--sam-list', 9 | help='list of paths of SAM files' 10 | ) 11 | parser.add_argument( 12 | '-ids', '--id-list', 13 | help='list of ids of files' 14 | ) 15 | parser.add_argument( 16 | '-l', '--log', 17 | help='log file to write paths of merged files' 18 | ) 19 | parser.add_argument( 20 | '-rs', '--rand-seed', 21 | help='random seed for controlled randomness [None]' 22 | ) 23 | parser.add_argument( 24 | '-p', '--prefix', 25 | help='prefix of the merged files' 26 | ) 27 | parser.add_argument( 28 | '--paired-end', action='store_true', 29 | help="Set if reads are paired-end [Off]" 30 | ) 31 | args = parser.parse_args() 32 | return args 33 | 34 | def get_info_from_sam_line(line, flag = False): 35 | ''' 36 | Read one line from SAM and return some info. 37 | Default: QNAME, alignment score (AS:i) and MAPQ 38 | Optional: flag 39 | 40 | Args: 41 | line: a line from SAM 42 | flag: set True to return flag [False] 43 | 44 | Returns: 45 | QNAME (string) 46 | AS:i (int) 47 | MAPQ (int) 48 | flag (int) if `flag == True` 49 | ''' 50 | if line[0] == '@': 51 | return 'header', None 52 | line = line.split() 53 | name = line[0] 54 | flag = int(line[1]) 55 | mapq = int(line[4]) 56 | if mapq == 255: 57 | # https://samtools.github.io/hts-specs/SAMv1.pdf 58 | # No alignments should be assigned mapping quality 255. 59 | mapq = 0 60 | print ('Warning: it is not recommended to have a MAPQ of 255. Replaced it with 0') 61 | 62 | score = 1 #: represents unmapped 63 | for i in line: 64 | if i.startswith('AS:i'): 65 | score = int(i.split(':')[-1]) 66 | 67 | if not flag: 68 | return name, score, mapq 69 | else: 70 | return name, score, mapq, flag 71 | 72 | def compare_score_and_mapq(list_info): 73 | ''' 74 | Select the best record given a number of alignment results. 75 | Sorting criteria: 76 | 1. aligned > un-aligned 77 | 2. take higher alignment score 78 | 3. take higher mapping quality 79 | 4. if all the above are tied, pick one randomly 80 | 81 | Args: 82 | a list of alignment infos 83 | 84 | Returns: 85 | the best record from the provided list 86 | ''' 87 | order = list(range(len(list_info))) 88 | list_is_unaligned = [info[0] != 1 for info in list_info] 89 | list_as = [info[0] for info in list_info] 90 | list_mapq = [info[1] for info in list_info] 91 | 92 | #: sort order: if_aligned > AS > MAPQ 93 | #: so perform sorting in reversed order 94 | list_mapq, list_as, list_is_unaligned, order = \ 95 | zip( 96 | *sorted(zip(list_mapq, list_as, list_is_unaligned, order), reverse = True) 97 | ) 98 | list_as, list_mapq, list_is_unaligned, order = \ 99 | zip( 100 | *sorted(zip(list_as, list_mapq, list_is_unaligned, order), reverse = True) 101 | ) 102 | list_is_unaligned, list_as, list_mapq, order = \ 103 | zip( 104 | *sorted(zip(list_is_unaligned, list_as, list_mapq, order), reverse = True) 105 | ) 106 | 107 | for i in range(1, len(list_info)): 108 | #: if not a tie 109 | if (list_is_unaligned[i] != list_is_unaligned[0]) or \ 110 | (list_as[i] != list_as[0]) or \ 111 | (list_mapq[i] != list_mapq[0]): 112 | order = order[:i] 113 | break 114 | return random.sample(order, 1) 115 | 116 | def get_best_line(list_line): 117 | ''' 118 | Given a number of SAM lines, select the best one. 119 | 120 | Args: 121 | a list of SAM lines 122 | 123 | Returns: 124 | the index of the best line, and the line itself 125 | ''' 126 | list_info = [] 127 | list_name = [] 128 | for line in list_line: 129 | info = get_info_from_sam_line(line) 130 | list_name.append(info[0]) 131 | try: 132 | assert info[0] == list_name[0] 133 | except: 134 | print (info) 135 | print (list_name) 136 | exit (1) 137 | list_info.append(info[1:]) 138 | idx = compare_score_and_mapq(list_info)[0] 139 | return idx, list_line[idx] 140 | 141 | def get_best_pair(list_line): 142 | ''' 143 | Given a number of paired-end SAM records, select the best pair. 144 | 145 | Args: 146 | a list of SAM pairs. 147 | `list_line[i]` and `list_line[len(list_line)/2 + i]` are in pair. 148 | 149 | Returns: 150 | the index of the best pair, and the pair itself 151 | ''' 152 | list_info = [] 153 | list_name = [] 154 | for i, line in enumerate(list_line[: int(len(list_line)/2)]): 155 | # info: QNAME, AS:i, MAPQ, flag 156 | info = get_info_from_sam_line(line, flag = True) 157 | info_mate = get_info_from_sam_line(list_line[i + int(len(list_line)/2)], flag = True) 158 | 159 | # check if QNAMEs of a pair match and flags are reasonable 160 | try: 161 | assert info[0] == info_mate[0] 162 | except: 163 | print ('Error: read names between a pair do not match') 164 | print (info) 165 | print (info_mate) 166 | try: 167 | # flag 64 : first segment 168 | # flag 128 : second segment 169 | # A pair must have a first-seg read and a second-seg read 170 | assert ((info[3] & 64) ^ (info_mate[3] & 128)) or ((info[3] & 128) ^ (info_mate[3] & 64)) 171 | except: 172 | print ('Error: segment information between a pair does not match') 173 | print (info) 174 | print (info_mate) 175 | 176 | list_name.append(info[0]) 177 | 178 | # QNAMEs bewteen a set of matching SAM files must align 179 | try: 180 | assert info[0] == list_name[0] 181 | except: 182 | print ('Error: SAM records across input files are not aligned') 183 | print (info) 184 | print (list_name) 185 | exit (1) 186 | 187 | # compare sum of score and MAPQ 188 | pair_info = [] 189 | # `score == 1` means unmapped and we ignore unmapped segments 190 | if info[1] == 1 and info_mate[1] == 1: 191 | pair_info.append(1) 192 | elif info[1] == 1: 193 | pair_info.append(info_mate[1]) 194 | elif info_mate == 1: 195 | pair_info.append(info[1]) 196 | else: 197 | pair_info.append(info[1] + info_mate[1]) 198 | # MAPQ 199 | pair_info.append(info[2] + info_mate[2]) 200 | list_info.append(pair_info) 201 | 202 | idx = compare_score_and_mapq(list_info)[0] 203 | return idx, list_line[idx], list_line[idx + int(len(list_line)/2)] 204 | 205 | def merge_core(list_f_sam, list_f_out, is_paired_end): 206 | ''' 207 | This is the core function to perform merging. 208 | 209 | Args: 210 | list_f_sam: a list of SAM files to be merged 211 | list_f_out: 212 | a list of output SAM files after merging. 213 | `merge_core()` write results directly to the files 214 | is_paired_end: True if data is paired-end; False if single-end 215 | ''' 216 | len_list = len(list_f_sam) 217 | list_read = [] 218 | for line in list_f_sam[0]: 219 | # read all header lines for the first input SAM file 220 | if line[0] == '@': 221 | list_f_out[0].write(line) 222 | continue 223 | list_read.append(line) 224 | 225 | for i, f in enumerate(list_f_sam[1:]): 226 | f_line = f.readline() 227 | # read all header lines for the rest input SAM files 228 | while f_line[0] == '@': 229 | list_f_out[i+1].write(f_line) 230 | f_line = f.readline() 231 | list_read.append(f_line) 232 | 233 | try: 234 | # each read should be included by all SAM files 235 | if not is_paired_end: 236 | assert len(list_read) == len_list 237 | else: 238 | assert (len(list_read) == len_list) or (len(list_read) == 2 * len_list) 239 | except: 240 | print ('Error: number of alignments does not match') 241 | print ('len_read = {}, len_list = {}'.format(len(list_read), len_list)) 242 | print (list_read) 243 | exit (1) 244 | 245 | if not is_paired_end: 246 | best_idx, best_line = get_best_line(list_read) 247 | list_f_out[best_idx].write(best_line) 248 | list_read = [] 249 | else: 250 | if len(list_read) == 2 * len_list: 251 | best_idx, best_line1, best_line2 = get_best_pair(list_read) 252 | list_f_out[best_idx].write(best_line1) 253 | list_f_out[best_idx].write(best_line2) 254 | list_read = [] 255 | return 256 | 257 | def merge_incremental(args): 258 | ''' 259 | Handles files I/O and processes arguments 260 | ''' 261 | fn_sam = args.sam_list # 'to_merge.path' 262 | fn_ids = args.id_list # 'to_merge.id' 263 | fn_log = args.log # 'merged.path' 264 | prefix = args.prefix 265 | 266 | #: set random seed 267 | seed = args.rand_seed 268 | sys.stderr.write('Set random seed: {}\n'.format(seed)) 269 | random.seed(seed) 270 | 271 | with open(fn_sam, 'r') as f: 272 | list_fn_sam = [] #: list of SAM names 273 | list_f_sam = [] #: list of opened SAM files 274 | for line in f: 275 | fn_sam = line.rstrip() 276 | list_fn_sam.append(fn_sam) 277 | list_f_sam.append(open(fn_sam, 'r')) 278 | with open(fn_ids, 'r') as f: 279 | list_ids = [] 280 | for line in f: 281 | list_ids.append(line.rstrip()) 282 | 283 | # number of files should match number of labels 284 | len_list = len(list_fn_sam) 285 | try: 286 | assert len_list == len(list_ids) 287 | except: 288 | print ('Error: numbers of files and labels do not match') 289 | exit (1) 290 | 291 | if fn_log != None: 292 | f_log = open(fn_log, 'w') 293 | list_f_out = [] 294 | sys.stderr.write('output files: \n') 295 | for i in range(len(list_ids)): 296 | fn_out = prefix + '-' + list_ids[i] + '.sam' 297 | if fn_log != None: 298 | f_log.write(fn_out + '\n') 299 | sys.stderr.write(fn_out + '\n') 300 | list_f_out.append(open(fn_out, 'w')) 301 | 302 | merge_core(list_f_sam, list_f_out, args.paired_end) 303 | 304 | return 305 | 306 | if __name__ == '__main__': 307 | args = parse_args() 308 | merge_incremental(args) 309 | -------------------------------------------------------------------------------- /src/merge_sam.cpp: -------------------------------------------------------------------------------- 1 | #include 2 | #include 3 | #include 4 | #include 5 | #include 6 | 7 | #include 8 | #include 9 | #include 10 | 11 | #include "merge_sam.hpp" 12 | #include "refflow_utils.hpp" 13 | 14 | /* Fill the zipped vector with pairs consisting of the corresponding elements of 15 | * a, b, c and d. (This assumes that the vectors have equal length) 16 | * 17 | * Adapted from Marco13: 18 | * https://stackoverflow.com/questions/37368787/c-sort-one-vector-based-on-another-one/46370189 19 | */ 20 | template 21 | std::vector> zip(const std::vector &a, 22 | const std::vector &b, 23 | const std::vector &c, 24 | const std::vector &d) { 25 | std::vector> zipped; 26 | for (int i = 0; i < a.size(); ++i) { 27 | zipped.push_back(std::make_tuple(a[i], b[i], c[i], d[i])); 28 | } 29 | return zipped; 30 | } 31 | 32 | 33 | /* Write the first and second element of the pairs in the given zipped vector into 34 | * a, b, c and d. (This assumes that the vectors have equal length) 35 | * 36 | * Adapted from Marco13: 37 | * https://stackoverflow.com/questions/37368787/c-sort-one-vector-based-on-another-one/46370189 38 | */ 39 | template 40 | void unzip(const std::vector> &zipped, 41 | std::vector &a, 42 | std::vector &b, 43 | std::vector &c, 44 | std::vector &d) { 45 | for (int i = 0; i < a.size(); i++) { 46 | a[i] = std::get<0>(zipped[i]); 47 | b[i] = std::get<1>(zipped[i]); 48 | c[i] = std::get<2>(zipped[i]); 49 | d[i] = std::get<3>(zipped[i]); 50 | } 51 | } 52 | 53 | 54 | // std::vector read_file_as_vector(std::string list_fn) { 55 | // std::vector list; 56 | // std::ifstream read_list(list_fn.c_str()); 57 | // if (!read_list) { 58 | // std::cerr << "[Error] Cannot open file " << list_fn << "\n"; 59 | // exit(1); 60 | // } 61 | // std::string line; 62 | // while (std::getline(read_list, line)) { 63 | // if (line.size() > 0) { 64 | // list.push_back(line); 65 | // } 66 | // } 67 | // read_list.close(); 68 | // return list; 69 | // } 70 | 71 | 72 | /* Rank a set of alignments. 73 | * Order by: is_proper_pair > score > MAPQ 74 | */ 75 | int select_best_aln(const std::vector& pair_indicators, 76 | const std::vector& scores, 77 | const std::vector& mapqs, 78 | int& num_tied_best) { 79 | int vec_size = pair_indicators.size(); 80 | std::vector ranks(vec_size); 81 | std::iota(ranks.begin(), ranks.end(), 0); 82 | 83 | std::vector> zipped = zip(mapqs, scores, pair_indicators, ranks); 84 | std::sort(zipped.begin(), zipped.end(), 85 | [&](const auto& a, const auto& b) 86 | { 87 | return (std::get<2>(a) != std::get<2>(b))? std::get<2>(a) : 88 | (std::get<1>(a) != std::get<1>(b))? (std::get<1>(a) > std::get<1>(b)) : 89 | (std::get<0>(a) > std::get<0>(b)); 90 | }); 91 | num_tied_best = vec_size; 92 | for (int i = 0; i < vec_size - 1; i++) { 93 | // Compare elements in tuples (ranks should be excluded so cannot simply compare tuples). 94 | if (std::get<2>(zipped[i]) != std::get<2>(zipped[i+1]) || 95 | std::get<1>(zipped[i]) != std::get<1>(zipped[i+1]) || 96 | std::get<0>(zipped[i]) != std::get<0>(zipped[i+1])) { 97 | num_tied_best = i + 1; 98 | break; 99 | } 100 | } 101 | ranks.clear(); 102 | 103 | for (int i = 0; i < vec_size; i++) { 104 | ranks.push_back(std::get<3>(zipped[i])); 105 | } 106 | std::random_shuffle(ranks.begin(), ranks.begin() + num_tied_best, myrandom); 107 | 108 | return ranks[0]; 109 | } 110 | 111 | 112 | int select_best_aln_single_end(const std::vector& aln1s) { 113 | std::vector mapqs, scores; 114 | // We don't actually need `pair_indicators` in single-end mode. 115 | // Create this to make it easier to re-use the core comparison function. 116 | std::vector pair_indicators; 117 | for (int i = 0; i < aln1s.size(); i++) { 118 | bam1_core_t c_aln1 = aln1s[i]->core; 119 | pair_indicators.push_back(true); 120 | mapqs.push_back(c_aln1.qual); 121 | int score = (c_aln1.flag & BAM_FUNMAP)? INT_MIN : 122 | -bam_aux2i(bam_aux_get(aln1s[i], "NM")); 123 | scores.push_back(score); 124 | } 125 | int num_tied_best; 126 | int best_idx = select_best_aln( 127 | pair_indicators=pair_indicators, scores=scores, 128 | mapqs=mapqs, num_tied_best=num_tied_best); 129 | 130 | return best_idx; 131 | } 132 | 133 | 134 | int select_best_aln_paired_end(const std::vector& aln1s, 135 | const std::vector& aln2s, 136 | const int merge_pe_mode = MERGE_PE_SUM) { 137 | std::vector mapqs, scores; 138 | // Indicator 139 | // 1 if a pair is properly paired (TLEN != 0). We apply a loose threshold here. 140 | // 0 otherwise 141 | std::vector pair_indicators; 142 | for (int i = 0; i < aln1s.size(); i++) { 143 | bam1_core_t c_aln1 = aln1s[i]->core, c_aln2 = aln2s[i]->core; 144 | pair_indicators.push_back(c_aln1.isize != 0); 145 | if (merge_pe_mode == MERGE_PE_SUM) { 146 | // MERGE_PE_SUM mode sums MAPQ and AS. AS is set to 0 for an unaligned read. 147 | mapqs.push_back(c_aln1.qual + c_aln2.qual); 148 | int score = 0; 149 | score += (c_aln1.flag & BAM_FUNMAP)? INT_MIN / 2 : 150 | -bam_aux2i(bam_aux_get(aln1s[i], "NM")); 151 | score += (c_aln2.flag & BAM_FUNMAP)? INT_MIN / 2 : 152 | -bam_aux2i(bam_aux_get(aln2s[i], "NM")); 153 | scores.push_back(score); 154 | } else if (merge_pe_mode == MERGE_PE_MAX) { 155 | // MERGE_PE_MAX mode takes max MAPQ and AS. 156 | if (c_aln1.qual > c_aln2.qual) 157 | mapqs.push_back(c_aln1.qual); 158 | else 159 | mapqs.push_back(c_aln2.qual); 160 | int score = INT_MIN; 161 | if (!(c_aln1.flag & BAM_FUNMAP)) 162 | score = -bam_aux2i(bam_aux_get(aln1s[i], "NM")); 163 | if (!(c_aln2.flag & BAM_FUNMAP)) 164 | score = (score > -bam_aux2i(bam_aux_get(aln2s[i], "NM")))? 165 | score : -bam_aux2i(bam_aux_get(aln2s[i], "NM")); 166 | scores.push_back(score); 167 | } else{ 168 | std::cerr << "[Error] Invalid merging mode for paired-end alignments " 169 | << merge_pe_mode << "\n"; 170 | } 171 | } 172 | int num_tied_best; 173 | int best_idx = select_best_aln( 174 | pair_indicators=pair_indicators, scores=scores, mapqs=mapqs, num_tied_best=num_tied_best); 175 | 176 | return best_idx; 177 | } 178 | 179 | 180 | void merge_sam(merge_sam_opts args) { 181 | if (args.paired_end) 182 | std::cerr << "[Paired-end mode]\n"; 183 | else 184 | std::cerr << "[Single-end mode]\n"; 185 | 186 | std::srand(args.rand_seed); 187 | std::vector sam_fns; 188 | std::vector ids; 189 | // std::vector sam_fns = read_file_as_vector(args.sam_list); 190 | // std::vector ids = read_file_as_vector(args.id_list); 191 | for (auto& s: args.inputs) { 192 | std::regex regexz(":"); 193 | std::vector vec( 194 | std::sregex_token_iterator(s.begin(), s.end(), regexz, -1), 195 | std::sregex_token_iterator()); 196 | if (vec.size() != 2) { 197 | std::cerr << "[E::merge_sam] Invalid format: " << s << "\n"; 198 | exit(1); 199 | } 200 | ids.push_back(vec[0]); 201 | sam_fns.push_back(vec[1]); 202 | } 203 | for (int i = 0; i < sam_fns.size(); i++) { 204 | std::cerr << "File " << i << ": "; 205 | std::cerr << sam_fns[i] << " (" << ids[i] << ")\n"; 206 | } 207 | 208 | std::vector sam_fps; 209 | // std::vector out_sam_fps; 210 | std::vector hdrs; 211 | std::vector aln1s, aln2s; 212 | for (int i = 0; i < sam_fns.size(); i++) { 213 | // Read each SAM file listed in `--sam_list`. 214 | sam_fps.push_back(sam_open(sam_fns[i].data(), "r")); 215 | hdrs.push_back(sam_hdr_read(sam_fps[i])); 216 | if (sam_hdr_nref(hdrs[i]) != sam_hdr_nref(hdrs[0])) { 217 | std::cerr << "[W::merge_sam] Num REF in `" << sam_fns[i] << "` differs with `" 218 | << sam_fns[0] << "`. Please check.\n"; 219 | } 220 | aln1s.push_back(bam_init1()); 221 | if (args.paired_end) 222 | aln2s.push_back(bam_init1()); 223 | 224 | // std::string out_fn = args.output_prefix + "-" + ids[i] + ".bam"; 225 | // out_sam_fps.push_back(sam_open(out_fn.data(), "wb")); 226 | // if (sam_hdr_write(out_sam_fps[i], hdrs[i]) < 0) { 227 | // std::cerr << "[E::merge_sam] Failed to write SAM header to file " << out_fn << "\n"; 228 | // exit(-1); 229 | // } 230 | } 231 | std::string out_fn = args.output_prefix + ".bam"; 232 | samFile* out_fp = sam_open(out_fn.data(), "wb"); 233 | if (sam_hdr_write(out_fp, hdrs[0])) { 234 | std::cerr << "[E::merge_sam] Failed to write SAM header to file " << out_fn << "\n"; 235 | exit(1); 236 | } 237 | bool end = false; 238 | int num_records = 0; 239 | while (!end) { 240 | // If in paired-end mode: read two reads from each of the SAM files in each iteration. 241 | for (int i = 0; i < sam_fns.size(); i++) { 242 | if (args.paired_end) { 243 | while (1) { 244 | int read1 = sam_read1(sam_fps[i], hdrs[i], aln1s[i]); 245 | if (read1 < 0) { 246 | end = true; 247 | break; 248 | } 249 | if (!(aln1s[i]->core.flag & BAM_FSECONDARY) && 250 | !(aln1s[i]->core.flag & BAM_FSUPPLEMENTARY)) { 251 | break; 252 | } 253 | } 254 | while (1) { 255 | int read2 = sam_read1(sam_fps[i], hdrs[i], aln2s[i]); 256 | if (read2 < 0) { 257 | end = true; 258 | break; 259 | } 260 | if (!(aln2s[i]->core.flag & BAM_FSECONDARY) && 261 | !(aln2s[i]->core.flag & BAM_FSUPPLEMENTARY)) { 262 | break; 263 | } 264 | } 265 | // Check read names: they should be identical. 266 | if (strcmp(bam_get_qname(aln1s[i]), bam_get_qname(aln2s[i])) != 0) { 267 | std::cerr << "[Error] SAM file should be sorted by read name.\n"; 268 | std::cerr << "This can be done using `samtools sort -n`\n"; 269 | std::cerr << "Mismatched reads: " << bam_get_qname(aln1s[i]) << 270 | " and " << bam_get_qname(aln2s[i]) << "\n"; 271 | exit(1); 272 | } 273 | if (end) { 274 | break; 275 | } 276 | } else { 277 | // Single-end mode 278 | if (sam_read1(sam_fps[i], hdrs[i], aln1s[i]) < 0) { 279 | end = true; 280 | break; 281 | } 282 | } 283 | num_records ++; 284 | 285 | // Check if read names from all files are identical. 286 | if (i > 0) 287 | if (strcmp(bam_get_qname(aln1s[0]), bam_get_qname(aln1s[i])) != 0) { 288 | std::cerr << "[Error] Reads mismatch across SAM files.\n"; 289 | std::cerr << "Mismatched reads: " << bam_get_qname(aln1s[0]) << 290 | " and " << bam_get_qname(aln1s[i]) << "\n"; 291 | exit(1); 292 | } 293 | } 294 | 295 | if (end) 296 | break; 297 | if (args.paired_end) { 298 | int best_idx = select_best_aln_paired_end(aln1s, aln2s, MERGE_PE_SUM); 299 | bam_aux_append( 300 | aln1s[best_idx], "RF", 'Z', ids[best_idx].length() + 1, 301 | reinterpret_cast(const_cast(ids[best_idx].c_str()))); 302 | bam_aux_append( 303 | aln2s[best_idx], "RF", 'Z', ids[best_idx].length() + 1, 304 | reinterpret_cast(const_cast(ids[best_idx].c_str()))); 305 | // if (sam_write1(out_sam_fps[best_idx], hdrs[best_idx], aln1s[best_idx]) < 0 || 306 | // sam_write1(out_sam_fps[best_idx], hdrs[best_idx], aln2s[best_idx]) < 0) { 307 | // std::cerr << "[Error] Failed to write to file.\n"; 308 | // std::cerr << bam_get_qname(aln1s[best_idx]); 309 | // exit(-1); 310 | // } 311 | if (sam_write1(out_fp, hdrs[0], aln1s[best_idx]) < 0 || 312 | sam_write1(out_fp, hdrs[0], aln2s[best_idx]) < 0) { 313 | std::cerr << "[E::merge_sam] Failed to write to file.\n"; 314 | std::cerr << bam_get_qname(aln1s[best_idx]); 315 | exit(-1); 316 | } 317 | } else { 318 | int best_idx = select_best_aln_single_end(aln1s); 319 | bam_aux_append( 320 | aln1s[best_idx], "RF", 'Z', ids[best_idx].length() + 1, 321 | reinterpret_cast(const_cast(ids[best_idx].c_str()))); 322 | // if (sam_write1(out_sam_fps[best_idx], hdrs[best_idx], aln1s[best_idx]) < 0) { 323 | // std::cerr << "[Error] Failed to write to file.\n"; 324 | // exit(-1); 325 | // } 326 | if (sam_write1(out_fp, hdrs[0], aln1s[best_idx]) < 0) { 327 | std::cerr << "[Error] Failed to write to file.\n"; 328 | exit(-1); 329 | } 330 | } 331 | } 332 | for (int i = 0; i < sam_fns.size(); i++) { 333 | bam_destroy1(aln1s[i]); 334 | if (args.paired_end) 335 | bam_destroy1(aln2s[i]); 336 | } 337 | for (auto& s: sam_fps) { 338 | sam_close(s); 339 | } 340 | // for (auto& s: out_sam_fps) { 341 | // sam_close(s); 342 | // } 343 | sam_close(out_fp); 344 | 345 | if (args.paired_end) 346 | std::cerr << "[Completed] Processed " << num_records << " pairs of reads\n"; 347 | else 348 | std::cerr << "[Completed] Processed " << num_records << " reads\n"; 349 | } 350 | 351 | 352 | void merge_sam_main(int argc, char** argv) { 353 | int c; 354 | merge_sam_opts args; 355 | args.cmd = make_cmd(argc, argv); 356 | static struct option long_options[]{ 357 | {"inputs", required_argument, 0, 's'}, 358 | // {"sam_list", required_argument, 0, 's'}, 359 | // {"id_list", required_argument, 0, 'i'}, 360 | {"output_prefix", required_argument, 0, 'o'}, 361 | {"rand_seed", required_argument, 0, 'r'}, 362 | {"decoy_list", required_argument, 0, 'd'}, 363 | {"decoy_threshold", required_argument, 0, 'p'}, 364 | {"paired_end", no_argument, 0, 'm'} 365 | }; 366 | int long_idx = 0; 367 | while ((c = getopt_long(argc, argv, "hmd:o:p:r:s:", long_options, &long_idx)) != -1) { 368 | switch (c) { 369 | case 'm': 370 | args.paired_end = true; 371 | break; 372 | case 'd': 373 | args.decoy_list = optarg; 374 | break; 375 | // case 'i': 376 | // args.id_list = optarg; 377 | // break; 378 | case 'o': 379 | args.output_prefix = optarg; 380 | break; 381 | case 'p': 382 | args.decoy_threshold = atoi(optarg); 383 | break; 384 | case 'r': 385 | args.rand_seed = atoi(optarg); 386 | break; 387 | case 's': 388 | args.inputs.push_back(optarg); 389 | // args.sam_list = optarg; 390 | break; 391 | case 'h': 392 | merge_sam_help(); 393 | exit(1); 394 | default: 395 | std::cerr << "Ignoring option " << c << " \n"; 396 | merge_sam_help(); 397 | exit(1); 398 | } 399 | } 400 | merge_sam(args); 401 | } 402 | 403 | -------------------------------------------------------------------------------- /src/merge_sam.hpp: -------------------------------------------------------------------------------- 1 | #ifndef MERGE_SAM_H__ 2 | #define MERGE_SAM_H__ 3 | #include 4 | #include 5 | 6 | #include 7 | 8 | 9 | const int MERGE_PE_SUM = 0; 10 | const int MERGE_PE_MAX = 1; 11 | // const int MERGE_PE_PAIR_TLEN_THRESHOLD = 2000; 12 | 13 | struct merge_sam_opts{ 14 | bool paired_end = false; 15 | std::string cmd = ""; 16 | int decoy_threshold = 0; 17 | int rand_seed = 0; 18 | // std::string sam_list = ""; 19 | // std::string id_list = ""; 20 | std::string decoy_list = ""; 21 | std::string output_prefix = ""; 22 | std::vector inputs; 23 | }; 24 | 25 | static void merge_sam_help(){ 26 | std::cerr << "\n"; 27 | std::cerr << "Usage: refflow_utils merge [options] -s -i -o \n"; 28 | std::cerr << "\n"; 29 | std::cerr << " Path to a list of SAM/BAM files to merge. One path in each line.\n"; 30 | std::cerr << " Path to a list of labels corresponding to lines in \n"; 31 | std::cerr << " Prefix of the output files\n"; 32 | std::cerr << "\n"; 33 | std::cerr << "Options (defaults in parentheses):\n"; 34 | std::cerr << " -m Set to perform merging in pairs [off]\n"; 35 | std::cerr << " -r Random seed used by the program [0]\n"; 36 | // TODO 37 | // std::cerr << " -O Output alignment format, can be sam or bam [sam]\n"; 38 | // decoy_list 39 | // decoy_threshold 40 | std::cerr << "\n"; 41 | } 42 | 43 | template 44 | std::vector> zip(const std::vector &a, 45 | const std::vector &b, 46 | const std::vector &c, 47 | const std::vector &d); 48 | 49 | template 50 | void unzip(const std::vector> &zipped, 51 | std::vector &a, 52 | std::vector &b, 53 | std::vector &c, 54 | std::vector &d); 55 | 56 | /* Random generator function 57 | * 58 | * From http://www.cplusplus.com/reference/algorithm/random_shuffle/ 59 | */ 60 | static int myrandom (int i) { return std::rand() % i;} 61 | 62 | std::vector read_file_as_vector(std::string list_fn); 63 | 64 | int select_best_aln(const std::vector& pair_indicators, 65 | const std::vector& scores, 66 | const std::vector& mapqs, 67 | int& num_tied_best); 68 | 69 | int select_best_aln_paired_end(const std::vector& aln1s, 70 | const std::vector& aln2s, 71 | const int merge_pe_mode); 72 | 73 | void merge_sam(merge_sam_opts args); 74 | 75 | void merge_sam_main(int argc, char** argv); 76 | 77 | #endif /* MERGE_SAM_H__ */ 78 | -------------------------------------------------------------------------------- /src/refflow_utils.cpp: -------------------------------------------------------------------------------- 1 | #include 2 | #include 3 | #include 4 | 5 | #include 6 | #include 7 | 8 | #include "add_aux.hpp" 9 | #include "merge_sam.hpp" 10 | #include "split_sam.hpp" 11 | #include "refflow_utils.hpp" 12 | 13 | 14 | int main(int argc, char** argv) { 15 | /* Time measuring is borrowed from carlduke 16 | * https://stackoverflow.com/questions/17432502/how-can-i-measure-cpu-time-and-wall-clock-time-on-both-linux-windows 17 | */ 18 | double start_cputime = std::clock(); 19 | auto start_walltime = std::chrono::system_clock::now(); 20 | if (!strcmp(argv[optind], "add_aux")) { 21 | std::cerr << "[add_aux] Add an AUX tag for all alignments in a SAM/BAM file\n"; 22 | add_aux_main(argc, argv); 23 | std::cerr << "[add_aux] Completed.\n"; 24 | } else if (!strcmp(argv[optind], "split")) { 25 | std::cerr << 26 | "[split] Split a SAM/BAM file into high- and low-quality sub SAM/BAM files and " << 27 | "generate FASTQ files for low-quality reads.\n"; 28 | split_sam_main(argc, argv); 29 | std::cerr << "[split] Completed.\n"; 30 | } else if (!strcmp(argv[optind], "merge")) { 31 | std::cerr << "[merge] Merge a list of SAM/BAM files that contain the same set of reads " << 32 | "in the same order.\n"; 33 | std::cerr << " Best-ranked alignments will be selected in separate SAM/BAM " << 34 | "files, labelled according to a list of IDs.\n"; 35 | merge_sam_main(argc, argv); 36 | std::cerr << "[merge] Completed.\n"; 37 | } else { 38 | std::cerr << "Subcommand " << argv[optind] << " not found.\n"; 39 | std::cerr << "\n"; 40 | std::cerr << "Currently supported subcommands:\n"; 41 | std::cerr << " - add_aux: add AUX tag for alignments\n"; 42 | std::cerr << " - split: split alignments by MAPQ\n"; 43 | std::cerr << " - merge: select best alignments from a set of plural alignments\n"; 44 | std::cerr << "\n"; 45 | std::cerr << "Please try `refflow_utils -h` to see usages\n"; 46 | std::cerr << "\n"; 47 | exit(0); 48 | } 49 | double cpu_duration = (std::clock() - start_cputime) / (double)CLOCKS_PER_SEC; 50 | std::chrono::duration wall_duration = (std::chrono::system_clock::now() - start_walltime); 51 | std::cerr << "\n"; 52 | std::cerr << "Finished in " << cpu_duration << " CPU seconds, or " << 53 | wall_duration.count() << " wall clock seconds\n"; 54 | 55 | return 0; 56 | } 57 | -------------------------------------------------------------------------------- /src/refflow_utils.hpp: -------------------------------------------------------------------------------- 1 | #ifndef REFFLOW_UTILS_H__ 2 | #define REFFLOW_UTILS_H__ 3 | #include 4 | #include 5 | 6 | static int8_t seq_comp_table[16] = { 0, 8, 4, 12, 2, 10, 6, 14, 1, 9, 5, 13, 3, 11, 7, 15 }; 7 | 8 | 9 | static std::string make_cmd(int argc, char** argv) { 10 | std::string cmd(""); 11 | for (auto i = 0; i < argc; ++i) { 12 | cmd += std::string(argv[i]) + " "; 13 | } 14 | return cmd; 15 | } 16 | 17 | 18 | #endif /* REFFLOW_UTILS_H__ */ 19 | -------------------------------------------------------------------------------- /src/split_sam.cpp: -------------------------------------------------------------------------------- 1 | #include 2 | #include 3 | #include 4 | #include 5 | 6 | #include 7 | #include 8 | #include 9 | 10 | #include "split_sam.hpp" 11 | #include "refflow_utils.hpp" 12 | 13 | 14 | /* Return the read, reverse complemented if necessary 15 | Adapted from: https://github.com/samtools/samtools/blob/develop/bam_fastq.c 16 | */ 17 | static std::string get_read(const bam1_t *rec){ 18 | int len = rec->core.l_qseq + 1; 19 | char *seq = (char *)bam_get_seq(rec); 20 | std::string read = ""; 21 | 22 | for (int n = 0; n < rec->core.l_qseq; n++) { 23 | if (rec->core.flag & BAM_FREVERSE) 24 | read.append(1, seq_nt16_str[seq_comp_table[bam_seqi(seq, n)]]); 25 | else 26 | read.append(1, seq_nt16_str[bam_seqi(seq, n)]); 27 | } 28 | if (rec->core.flag & BAM_FREVERSE) 29 | std::reverse(read.begin(), read.end()); 30 | return read; 31 | } 32 | 33 | 34 | /* Read an aln from a SAM file. Skip records carrying flags specified by an exclusion list. 35 | * This is a wrapper around sam_read1() from htslib. The flags to skip is passed in a vector. 36 | */ 37 | int sam_read1_selective(samFile* sam_fp, bam_hdr_t* hdr, bam1_t* aln, 38 | const std::vector& exclude_flag){ 39 | // Return -1 if cannot read from sam_fp. 40 | if (sam_read1(sam_fp, hdr, aln) < 0) return -1; 41 | // Read until aln is not in the exclusion list. 42 | while(true){ 43 | bool keep = true; 44 | for (int i = 0; i < exclude_flag.size(); i++){ 45 | if (aln->core.flag & exclude_flag[i]){ 46 | keep = false; 47 | break; 48 | } 49 | } 50 | if (keep) return 0; 51 | if (sam_read1(sam_fp, hdr, aln) < 0) return -1; 52 | } 53 | } 54 | 55 | 56 | /* Write a bam1_t object to a FASTQ record. 57 | */ 58 | void write_fq_from_bam(bam1_t* aln, std::ofstream& out_fq){ 59 | out_fq << "@" << bam_get_qname(aln) << "\n"; 60 | out_fq << get_read(aln) << "\n+\n"; 61 | std::string qual_seq(""); 62 | uint8_t* qual = bam_get_qual(aln); 63 | if (qual[0] == 255) qual_seq = "*"; 64 | else { 65 | for (auto i = 0; i < aln->core.l_qseq; ++i) { 66 | qual_seq += (char) (qual[i] + 33); 67 | } 68 | } 69 | if (aln->core.flag & BAM_FREVERSE) 70 | std::reverse(qual_seq.begin(), qual_seq.end()); 71 | out_fq << qual_seq << "\n"; 72 | } 73 | 74 | 75 | /* Fetch alignments that are unapped or mapped with low MAPQ. */ 76 | void split_sam(split_sam_opts args){ 77 | // Read raw SAM file. 78 | samFile* sam_fp = (args.sam_fn == "")? 79 | sam_open("-", "r") : 80 | sam_open(args.sam_fn.data(), "r"); 81 | bam_hdr_t* hdr = sam_hdr_read(sam_fp); 82 | 83 | char const *out_mode = (args.output_ext == "bam")? "wb" : "w"; 84 | // High-quality alignments (SAM). 85 | std::string out_sam_hq_fn = args.output_prefix + 86 | "-high_qual." + args.output_ext; 87 | samFile* out_sam_hq_fp = (args.write_hq_to_stdout)? 88 | sam_open("-", out_mode) : 89 | sam_open(out_sam_hq_fn.data(), out_mode); 90 | int write_hdr = sam_hdr_write(out_sam_hq_fp, hdr); 91 | // Low-quality alignments (SAM). 92 | std::string out_sam_lq_fn = args.output_prefix + "-low_qual." + args.output_ext; 93 | samFile* out_sam_lq_fp = sam_open(out_sam_lq_fn.data(), out_mode); 94 | write_hdr = sam_hdr_write(out_sam_lq_fp, hdr); 95 | // Low-quality reads (paired-end FQ). 96 | std::ofstream out_fq1, out_fq2; 97 | out_fq1.open(args.output_prefix + "-R1.fq"); 98 | out_fq2.open(args.output_prefix + "-R2.fq"); 99 | 100 | // Read two reads in one iteration to for the paired-end mode. 101 | bam1_t* aln1 = bam_init1(), * aln2 = bam_init1(); 102 | while(true){ 103 | std::vector exclude_flags = {BAM_FSUPPLEMENTARY}; 104 | if (sam_read1_selective(sam_fp, hdr, aln1, exclude_flags) < 0 || 105 | sam_read1_selective(sam_fp, hdr, aln2, exclude_flags) < 0) 106 | break; 107 | // Check read names: they should be identical. 108 | if (strcmp(bam_get_qname(aln1), bam_get_qname(aln2)) != 0){ 109 | std::cerr << "[Error] Input SAM file should be sorted by read name.\n"; 110 | std::cerr << "This can be done using `samtools sort -n`\n"; 111 | std::cerr << bam_get_qname(aln1) << " " << bam_get_qname(aln2) << "\n"; 112 | exit(1); 113 | } 114 | bam1_core_t c_aln1 = aln1->core, c_aln2 = aln2->core; 115 | 116 | bool is_low_quality = false; 117 | if (c_aln1.tid != c_aln1.mtid) 118 | is_low_quality = true; 119 | else if (args.split_strategy == SPLIT_OPT) 120 | is_low_quality = (c_aln1.qual < args.mapq_threshold && c_aln2.qual < args.mapq_threshold); 121 | else if (args.split_strategy == SPLIT_PES) 122 | is_low_quality = (c_aln1.qual < args.mapq_threshold || c_aln2.qual < args.mapq_threshold); 123 | 124 | if (is_low_quality){ 125 | if (aln1->core.flag & BAM_FREAD1){ 126 | write_fq_from_bam(aln1, out_fq1); 127 | write_fq_from_bam(aln2, out_fq2); 128 | } 129 | else{ 130 | write_fq_from_bam(aln1, out_fq2); 131 | write_fq_from_bam(aln2, out_fq1); 132 | } 133 | if (sam_write1(out_sam_lq_fp, hdr, aln1) < 0 || sam_write1(out_sam_lq_fp, hdr, aln2) < 0){ 134 | std::cerr << "[Error] Failure when writing low-qualiy alignments to SAM.\n"; 135 | exit(1); 136 | } 137 | } 138 | else{ 139 | if (sam_write1(out_sam_hq_fp, hdr, aln1) < 0 || sam_write1(out_sam_hq_fp, hdr, aln2) < 0){ 140 | std::cerr << "[Error] Failure when writing high-qualiy alignments to SAM.\n"; 141 | exit(1); 142 | } 143 | } 144 | } 145 | sam_close(sam_fp); 146 | sam_close(out_sam_hq_fp); 147 | sam_close(out_sam_lq_fp); 148 | out_fq1.close(); 149 | out_fq2.close(); 150 | } 151 | 152 | 153 | void split_sam_main(int argc, char** argv){ 154 | int c; 155 | split_sam_opts args; 156 | args.cmd = make_cmd(argc, argv); 157 | static struct option long_options[]{ 158 | {"write_hq_to_stdout", no_argument, 0, 'd'}, 159 | {"sam_fn", required_argument, 0, 's'}, 160 | {"output_prefix", required_argument, 0, 'o'}, 161 | {"output_ext", required_argument, 0, 'O'}, 162 | {"split_strategy", required_argument, 0, 'p'}, 163 | {"mapq_threshold", required_argument, 0, 'q'} 164 | }; 165 | int long_idx = 0; 166 | while((c = getopt_long(argc, argv, "dho:O:p:q:s:", long_options, &long_idx)) != -1){ 167 | switch (c){ 168 | case 'd': 169 | args.write_hq_to_stdout = true; 170 | break; 171 | case 'o': 172 | args.output_prefix = optarg; 173 | break; 174 | case 'O': 175 | args.output_ext = optarg; 176 | break; 177 | case 'p': 178 | args.split_strategy = atoi(optarg); 179 | break; 180 | case 'q': 181 | args.mapq_threshold = atoi(optarg); 182 | break; 183 | case 's': 184 | args.sam_fn = optarg; 185 | break; 186 | case 'h': 187 | split_sam_help(); 188 | exit(1); 189 | default: 190 | std::cerr << "Ignoring option " << c << " \n"; 191 | split_sam_help(); 192 | exit(1); 193 | } 194 | } 195 | if (args.output_ext != "sam" && args.output_ext != "bam"){ 196 | std::cerr << "[Error] Unsupported output extension " << args.output_ext << "\n"; 197 | split_sam_help(); 198 | exit(1); 199 | } 200 | split_sam(args); 201 | } 202 | 203 | -------------------------------------------------------------------------------- /src/split_sam.hpp: -------------------------------------------------------------------------------- 1 | #ifndef SPLIT_SAM_H__ 2 | #define SPLIT_SAM_H__ 3 | 4 | const int SPLIT_OPT = 0; 5 | const int SPLIT_PES = 1; 6 | 7 | 8 | struct split_sam_opts{ 9 | std::string cmd = ""; 10 | bool write_hq_to_stdout = false; 11 | int split_strategy = SPLIT_PES; 12 | int mapq_threshold = 10; 13 | std::string sam_fn = ""; 14 | std::string output_prefix = ""; 15 | std::string output_ext = "sam"; 16 | }; 17 | 18 | static void split_sam_help(){ 19 | std::cerr << "\n"; 20 | std::cerr << "Usage: refflow_utils split [options] -s -o \n"; 21 | std::cerr << "\n"; 22 | std::cerr << " SAM or BAM file to split\n"; 23 | std::cerr << " Prefix of the output files\n"; 24 | std::cerr << "\n"; 25 | std::cerr << "Options (defaults in parentheses):\n"; 26 | std::cerr << " -q MAPQ threshold of splitting [10]\n"; 27 | std::cerr << " -O Output alignment format, can be sam or bam [sam]\n"; 28 | std::cerr << " -p Split strategy: set to 0 for the optimistic mode\n"; 29 | std::cerr << " set to 1 for the pessimistic mode (1)\n"; 30 | std::cerr << " The optmistic mode treats a pair as high-quality if either alignments has >= threshold MAPQ\n"; 31 | std::cerr << " The pessimistic mode requires both segments with >= threshold MAPQ to be considered high-quality\n"; 32 | std::cerr << " -d Set to write high-quality alignments to stdout\n"; 33 | std::cerr << "\n"; 34 | } 35 | 36 | void split_sam(split_sam_opts args); 37 | 38 | void split_sam_main(int argc, char** argv); 39 | 40 | int sam_read1_selective(samFile* sam_fp, bam_hdr_t* hdr, bam1_t* aln, const std::vector& exclude_flag); 41 | 42 | void write_fq_from_bam(bam1_t* aln, std::ofstream& out_fq); 43 | 44 | static std::string get_read(const bam1_t *rec); 45 | 46 | #endif /* SPLIT_SAM_H__ */ 47 | -------------------------------------------------------------------------------- /src/split_sam_by_mapq.py: -------------------------------------------------------------------------------- 1 | import argparse 2 | import threading 3 | 4 | class ReadWriteLock: 5 | ''' From https://www.oreilly.com/library/view/python-cookbook/0596001673/ch06s04.html ''' 6 | """ A lock object that allows many simultaneous "read locks", but 7 | only one "write lock." """ 8 | 9 | def __init__(self): 10 | self._read_ready = threading.Condition(threading.Lock()) 11 | self._readers = 0 12 | 13 | def acquire_read(self): 14 | """ Acquire a read lock. Blocks only if a thread has 15 | acquired the write lock. """ 16 | self._read_ready.acquire() 17 | try: 18 | self._readers += 1 19 | #print ('acquire_read()', self._readers) 20 | finally: 21 | self._read_ready.release() 22 | #print ('acquire_read(): _read_ready.release()') 23 | 24 | def release_read(self): 25 | """ Release a read lock. """ 26 | self._read_ready.acquire() 27 | try: 28 | self._readers -= 1 29 | if not self._readers: 30 | self._read_ready.notifyAll() 31 | finally: 32 | self._read_ready.release() 33 | 34 | def acquire_write(self): 35 | """ Acquire a write lock. Blocks until there are no 36 | acquired read or write locks. """ 37 | self._read_ready.acquire() 38 | while self._readers > 0: 39 | self._read_ready.wait() 40 | 41 | def release_write(self): 42 | """ Release a write lock. """ 43 | self._read_ready.release() 44 | 45 | 46 | def parse_args(): 47 | parser = argparse.ArgumentParser() 48 | parser.add_argument( 49 | '-s', '--sam', 50 | help='Original SAM file' 51 | ) 52 | parser.add_argument( 53 | '-oh', '--sam-highq', 54 | help='High quality alignments (in SAM format)' 55 | ) 56 | parser.add_argument( 57 | '-ol', '--sam-lowq', 58 | help='Low quality alignments (in SAM format)' 59 | ) 60 | parser.add_argument( 61 | '-oq', '--fastq-lowq-prefix', 62 | help='Predix of low quality reads (in FASTQ format). \ 63 | There will be two files for a SAM with paired-end data. \ 64 | Leave empty to not write to FASTQ [None]' 65 | ) 66 | parser.add_argument( 67 | '-t', '--mapq-threshold', type=int, default=10, 68 | help='Mapping quality threshold, reads with MAPQ greater or \ 69 | equal to `-t` are called high quality [10]' 70 | ) 71 | parser.add_argument( 72 | '-pe', '--paired-end', action='store_true', 73 | help="Set if reads are paired-end [Off]" 74 | ) 75 | parser.add_argument( 76 | '-ss', '--split-strategy', 77 | help='Split strategy (only needed when using paired-end reads): \ 78 | "optimistic/opt" takes the pair if any of it is high quality, \ 79 | "pessimistic/pes" takes the pair when both are high quality' 80 | ) 81 | # parser.add_argument( 82 | # '-rs', '--rand-seed', 83 | # help='random seed for controlled randomness [None]' 84 | # ) 85 | # parser.add_argument( 86 | # '-p', '--prefix', 87 | # help='prefix of the merged files' 88 | # ) 89 | args = parser.parse_args() 90 | return args 91 | 92 | def get_reverse_complement(seq): 93 | ''' 94 | Perform reverse and complement transformation 95 | ''' 96 | complement = {'A': 'T', 'C': 'G', 'G': 'C', 'T': 'A', 'a': 't', 'c': 'g', 'g': 'c', 't': 'a'} 97 | return "".join(complement.get(base, base) for base in reversed(seq)) 98 | 99 | def write_line_to_fastq(line, f_fastq): 100 | ''' 101 | Write a SAM line to a FASTQ file 102 | ''' 103 | fields = line.split() 104 | is_reverse = int(fields[1]) & 16 105 | if is_reverse: 106 | # reverse and complement SEQ, reverse QUAL 107 | f_fastq.write('@' + fields[0] + '\n') 108 | f_fastq.write(get_reverse_complement(fields[9]) + '\n') 109 | f_fastq.write('+\n') 110 | f_fastq.write(fields[10][::-1] + '\n') 111 | else: 112 | f_fastq.write('@' + fields[0] + '\n') 113 | f_fastq.write(fields[9] + '\n') 114 | f_fastq.write('+\n') 115 | f_fastq.write(fields[10] + '\n') 116 | 117 | def process_paired_end_data_line( 118 | line, 119 | line_nxt, 120 | fhigh_out, 121 | flow_out, 122 | flow_fq1_out, 123 | flow_fq2_out, 124 | mapq_threshold, 125 | split_strategy 126 | ): 127 | fields = line.split() 128 | fields_nxt = line_nxt.split() 129 | try: 130 | assert(fields[0] == fields_nxt[0]) 131 | except: 132 | print ('Warning: singleton read') 133 | print (line.rstrip()) 134 | print (line_nxt.rstrip()) 135 | exit(1) 136 | 137 | if split_strategy in ['opt', 'optimistic']: 138 | mapq = max(int(fields[4]), int(fields_nxt[4])) 139 | else: 140 | mapq = min(int(fields[4]), int(fields_nxt[4])) 141 | 142 | concordant = (fields[2] == fields_nxt[2]) 143 | 144 | if mapq >= mapq_threshold and concordant: 145 | fhigh_out.write(line) 146 | fhigh_out.write(line_nxt) 147 | else: 148 | flow_out.write(line) 149 | flow_out.write(line_nxt) 150 | if flow_fq1_out: 151 | flag = int(fields[1]) 152 | flag_nxt = int(fields_nxt[1]) 153 | # first/second segment: current/next line 154 | if (flag & 64) and (flag_nxt & 128): 155 | write_line_to_fastq(line, flow_fq1_out) 156 | write_line_to_fastq(line_nxt, flow_fq2_out) 157 | # first/second segment: next/current line 158 | elif (flag & 128) and (flag_nxt & 64): 159 | write_line_to_fastq(line, flow_fq2_out) 160 | write_line_to_fastq(line_nxt, flow_fq1_out) 161 | else: 162 | print ('Error: read is not paired-end') 163 | print (line) 164 | print (line_nxt) 165 | exit (1) 166 | 167 | def process_paired_end_data_parallel_core( 168 | f_in, fhigh_out, flow_out, flow_fq1_out, flow_fq2_out, mapq_threshold, split_strategy, rw_lock 169 | ): 170 | chunk_size = 10000 171 | lines = [] 172 | while 1: 173 | # spins until the lock is released 174 | rw_lock.acquire_write() 175 | 176 | # reads header. Headers are not counted in the chunk. 177 | while 1: 178 | line = f_in.readline() 179 | if line and (line[0] == '@'): 180 | fhigh_out.write(line) 181 | flow_out.write(line) 182 | else: 183 | # non-header line or end-of-file 184 | break 185 | if line: 186 | lines.append(line) 187 | for i in range(chunk_size - 1): 188 | line_nxt = f_in.readline() 189 | if line_nxt: 190 | lines.append(line_nxt) 191 | rw_lock.release_write() 192 | 193 | if not lines: 194 | return 195 | 196 | for i in range(0, len(lines), 2): 197 | process_paired_end_data_line(lines[i], lines[i+1], fhigh_out, flow_out, flow_fq1_out, flow_fq2_out, mapq_threshold, split_strategy) 198 | lines = [] 199 | 200 | def process_paired_end_data(f_in, fhigh_out, flow_out, fastq_prefix, mapq_threshold, split_strategy): 201 | ''' 202 | Process paired-end data 203 | ''' 204 | if fastq_prefix: 205 | flow_fq1_out = open(fastq_prefix + '_1.fq', 'w') 206 | flow_fq2_out = open(fastq_prefix + '_2.fq', 'w') 207 | 208 | # Runs in parallel, but this will not be faster. 209 | # rw_lock = ReadWriteLock() 210 | # threads = [] 211 | # for i in range(4): 212 | # t = threading.Thread( 213 | # target = process_paired_end_data_parallel_core, 214 | # args=(f_in, fhigh_out, flow_out, flow_fq1_out, flow_fq2_out, mapq_threshold, split_strategy, rw_lock) 215 | # ) 216 | # t.start() 217 | # threads.append(t) 218 | 219 | # for t in threads: 220 | # print (t) 221 | # t.join() 222 | 223 | for line in f_in: 224 | if line[0] == '@': 225 | fhigh_out.write(line) 226 | flow_out.write(line) 227 | else: 228 | line_nxt = f_in.readline() 229 | process_paired_end_data_line(line, line_nxt, fhigh_out, flow_out, flow_fq1_out, flow_fq2_out, mapq_threshold, split_strategy) 230 | 231 | def split_sam_by_mapq(args): 232 | mapq_threshold = args.mapq_threshold 233 | is_paired_end = args.paired_end 234 | if not is_paired_end: 235 | split_strategy = '' 236 | else: 237 | split_strategy = args.split_strategy 238 | assert split_strategy in ['optimistic', 'opt', 'pessimistic', 'pes'] 239 | 240 | f_in = open(args.sam, 'r') 241 | fhigh_out = open(args.sam_highq, 'w') 242 | flow_out = open(args.sam_lowq, 'w') 243 | fastq_prefix = args.fastq_lowq_prefix 244 | 245 | if not is_paired_end: 246 | process_single_end_data(f_in, fhigh_out, flow_out, fastq_prefix, mapq_threshold) 247 | else: 248 | process_paired_end_data(f_in, fhigh_out, flow_out, fastq_prefix, mapq_threshold, split_strategy) 249 | 250 | if __name__ == '__main__': 251 | args = parse_args() 252 | split_sam_by_mapq(args) 253 | -------------------------------------------------------------------------------- /src/update_genome.py: -------------------------------------------------------------------------------- 1 | ''' 2 | Updates a genome with a set of SNPs (and INDELS) and 3 | write to a new file. 4 | ''' 5 | 6 | import sys 7 | import argparse 8 | import random 9 | import copy 10 | from collections import OrderedDict 11 | 12 | def get_mutation_type(orig, alts): 13 | ''' 14 | Compare length of REF allele and ALT allele(s) and report variant type. 15 | 16 | Args: 17 | orig: ORIG genotype (string) 18 | alts: ALT genotype (string), if multi-allelic, split by ',' 19 | Return: 20 | a string that can be the following: 21 | SNP: equal in length, length == 1 22 | MNP: equal in length, length > 1 23 | INDEL: not equal in length 24 | MULT: multiple ALT alleles 25 | Assertions: 26 | there is only one REF allele 27 | ''' 28 | assert orig.count(',') == 0 29 | if alts.count(',') == 0: 30 | if len(orig) == len(alts) and len(orig) == 1: 31 | return 'SNP' 32 | elif len(orig) == len(alts) and len(orig) != 1: 33 | return 'MNP' 34 | elif len(orig) != len(alts): 35 | return 'INDEL' 36 | return 'MULT' 37 | 38 | def get_allele_freq(info, num_haps, data_source, gnomad_ac_field): 39 | ''' 40 | Returns allele frequency for a variant. 41 | Not using the "AF" attribute because it's calculated 42 | based on the entire 1KG population. 43 | ''' 44 | attrs = info.split(';') 45 | #: for 1kg data, calculate allele frequency using phasing information 46 | if data_source == '1kg': 47 | for a in attrs: 48 | if a[:3] == 'AC=': 49 | try: 50 | count = int(a[3:]) 51 | #: when there are multiple alleles, 52 | #: use the highest frequency 53 | except: 54 | a = a[3:].split(',') 55 | inta = [int(i) for i in a] 56 | count = max(inta) 57 | return float(count) / num_haps 58 | #: for genomad data, use pre-calculated allele frequency 59 | elif data_source == 'gnomad': 60 | for a in attrs: 61 | field = a.split('=')[0] 62 | if field == gnomad_ac_field: 63 | return float(a.split('=')[1]) / num_haps 64 | return -1 65 | 66 | 67 | def process_vcf_header(line, indiv, f_vcf, data_source, is_ld): 68 | ''' 69 | Process the header line of a VCF file 70 | 71 | Args: 72 | line (string): header line from the VCF file 73 | indiv (string): targeted sample (can be None) 74 | f_vcf (file): file that we are writing to 75 | data_source (string): project that generates the call set 76 | is_ld (boolean): if maintain local LD 77 | 78 | Returns: 79 | col (int/None): column index for `indiv`, None if `indiv` is not provided 80 | num_haps (int/None): number of samples in the call set, None if the call set not including phasing 81 | labels (list): split fields of `line` 82 | ''' 83 | labels = line.rstrip().split('\t') 84 | # if `indiv` is set, select the corresponding column 85 | col = None 86 | num_haps = None 87 | if indiv != None: 88 | for i in range(9, len(labels)): 89 | if labels[i] == indiv: 90 | col = i 91 | if not col: 92 | print('Error! Couldn\'t find individual %s in VCF' % indiv) 93 | exit(1) 94 | f_vcf.write('\t'.join(labels[:9]) + f'\t{labels[col]}\n') 95 | else: 96 | if data_source == '1kg': 97 | if is_ld: 98 | f_vcf.write('\t'.join(labels[:9]) + '\tLD_SAMPLE\n') 99 | else: 100 | # skip sample genotype columns 101 | f_vcf.write('\t'.join(labels[:9]) + '\n') 102 | else: 103 | f_vcf.write(line) 104 | 105 | if data_source == '1kg': 106 | # calculate number of haplotypes (2 x number of samples) 107 | num_haps = 2 * (len(labels) - 9) 108 | 109 | return col, num_haps, labels 110 | 111 | 112 | def write_to_fasta(dict_genome, out_prefix, suffix, line_width = 60): 113 | ''' 114 | Write genome to a FASTA file 115 | ''' 116 | f_fasta = open(f'{out_prefix}{suffix}.fa', 'w') 117 | 118 | # uncomment to show output in lexical order 119 | # for key in sorted(dict_genome.keys()): 120 | 121 | # show output following input order 122 | for key in dict_genome.keys(): 123 | # write full contig name 124 | f_fasta.write(f'>{dict_genome[key][0]}\n') 125 | # write sequence, 60 chars per line 126 | for i in range(0, len(dict_genome[key][1]), line_width): 127 | f_fasta.write(''.join(dict_genome[key][1][i: i + line_width]) + '\n') 128 | f_fasta.close() 129 | 130 | 131 | def update_allele( 132 | orig, 133 | alts, 134 | allele, 135 | indels, 136 | head, 137 | loc, 138 | f_var, 139 | hap, 140 | hap_str, 141 | offset, 142 | offset_other, 143 | chrom 144 | ): 145 | ''' 146 | Update an allele 147 | ''' 148 | 149 | if len(orig) != len(alts[allele-1]): 150 | v_type = 'INDEL' 151 | else: 152 | v_type = 'SNP' 153 | flag_skip = False 154 | if indels: 155 | # ignores conflicts or overlapped variants 156 | # but accepts overlapped INS 157 | if loc == head - 1 and (len(orig) < len(alts[allele - 1])): 158 | print ('Warning: overlapped INS at {0}, {1} for hap{2}'.format(loc, chrom, hap_str)) 159 | new_offset = add_alt(hap, loc-1, orig, alts[allele-1], offset, True) 160 | elif loc >= head: 161 | new_offset = add_alt(hap, loc-1, orig, alts[allele-1], offset, False) 162 | else: 163 | flag_skip = True 164 | print ('Warning: conflict at {0}, {1} for hap{2}'.format(loc, chrom, hap_str)) 165 | else: 166 | new_offset = 0 167 | hap[loc+offset-1] = alts[allele-1] 168 | 169 | if not flag_skip: 170 | f_var.write( 171 | '%s\t%s\t%s\t%s\t%s\t%s\t%s\t%s\t%s\n' % 172 | (hap_str, chrom, v_type, str(loc), str(loc+offset), orig, alts[allele-1], str(new_offset), str(offset_other) ) 173 | ) 174 | offset = new_offset 175 | head = loc + len(orig) 176 | 177 | return head, hap, offset 178 | 179 | 180 | def update_variant( 181 | row, 182 | col, 183 | ld_hap, 184 | indiv, 185 | indels, 186 | data_source, 187 | is_ld, 188 | f_var, 189 | f_vcf, 190 | dict_genome, 191 | dict_genome_B, 192 | offsetA, 193 | offsetB, 194 | headA, 195 | headB, 196 | ld_indiv 197 | ): 198 | ''' 199 | Update a variant, which may have one or two alleles 200 | ''' 201 | 202 | chrom = row[0] 203 | loc = int(row[1]) 204 | orig = row[3] 205 | alts = row[4].split(',') 206 | 207 | if is_ld: 208 | alleles = row[col].split('|') 209 | # always put the haplotype as alleleA for simplicity 210 | # the haplotype is not limited to gt[0] (it is randomly chose) 211 | alleleA = int(alleles[ld_hap]) 212 | alleleB = 0 213 | elif indiv != None: 214 | # no `indiv` selected, take the allele 215 | alleles = row[col].split('|') 216 | alleleA = int(alleles[0]) 217 | alleleB = int(alleles[1]) 218 | else: 219 | #: always uses allele "1" 220 | alleleA = 1 221 | alleleB = 0 222 | 223 | if alleleA > 0: 224 | headA, dict_genome[chrom][1], offsetA = update_allele( 225 | orig=orig, 226 | alts=alts, 227 | allele=alleleA, 228 | indels=indels, 229 | head=headA, 230 | loc=loc, 231 | f_var=f_var, 232 | hap=dict_genome[chrom][1], 233 | hap_str='A', 234 | offset=offsetA, 235 | offset_other=offsetB, 236 | chrom=chrom 237 | ) 238 | 239 | if alleleB > 0 and indiv != None: 240 | headB, dict_genome_B[chrom][1], offsetB = update_allele( 241 | orig=orig, 242 | alts=alts, 243 | allele=alleleB, 244 | indels=indels, 245 | head=headB, 246 | loc=loc, 247 | f_var=f_var, 248 | hap=dict_genome_B[chrom][1], 249 | hap_str='B', 250 | offset=offsetB, 251 | offset_other=offsetA, 252 | chrom=chrom 253 | ) 254 | 255 | if (alleleA > 0) or \ 256 | (alleleB > 0 and indiv != None): 257 | if data_source == '1kg': 258 | if indiv != None: 259 | f_vcf.write('\t'.join(row[:9]) + f'\t{row[col]}\n') 260 | else: 261 | if is_ld: 262 | f_vcf.write('\t'.join(row[:9]) + f':SP\t{row[col]}:{ld_indiv}\n') 263 | else: 264 | f_vcf.write('\t'.join(row[:9]) + '\n') 265 | else: 266 | f_vcf.write(line) 267 | return headA, dict_genome, offsetA, headB, dict_genome_B, offsetB 268 | 269 | def update_genome( 270 | indiv, 271 | dict_genome, 272 | vcf, 273 | out_prefix, 274 | indels, 275 | var_only, 276 | is_stochastic, 277 | block_size, 278 | is_ld, 279 | exclude_list, 280 | data_source, 281 | gnomad_ac_field, 282 | gnomad_pop_count, 283 | gnomad_af_th 284 | ): 285 | ''' 286 | Handles variant updating for the entire genome. 287 | This function mainly handles different updating settings and reads the VCF file. 288 | 289 | Variant updating follows the heirarchy: 290 | update_genome() -> update_variant() -> update_allele() 291 | - genome-level - variant-level - allele-level 292 | ''' 293 | # assertions 294 | if is_ld: 295 | assert indiv == None 296 | # currently only supports 1000 Genomes ('1kg') and GnomAD ('gnomad') datasets 297 | assert data_source in ['1kg', 'gnomad'] 298 | if data_source == 'gnomad': 299 | # GnomAD has no phasing information 300 | assert indiv == None 301 | assert is_ld == False 302 | assert exclude_list == '' 303 | 304 | ''' 305 | ##fileformat=VCFv4.1 306 | ''' 307 | if indiv != None: 308 | dict_genome_B = copy.deepcopy(dict_genome) 309 | else: 310 | dict_genome_B = None 311 | f_var = open(out_prefix + '.var', 'w') 312 | f_vcf = open(out_prefix + '.vcf', 'w') 313 | 314 | ''' 315 | Format of a .var file: 316 | hap(A/B) chrm var_type ref_pos hap_pos offset 317 | ''' 318 | if vcf != None: 319 | f = open(vcf, 'r') 320 | else: 321 | f = sys.stdin 322 | 323 | labels = None 324 | offsetA = 0 325 | offsetB = 0 326 | headA = 0 327 | headB = 0 328 | chrom = '' 329 | ld_hap = None 330 | ld_indiv = None 331 | 332 | if data_source == 'gnomad': 333 | num_haps = gnomad_pop_count 334 | else: 335 | num_haps = 0 336 | 337 | current_block_pos = 0 338 | rr = 0 # initial number for random.random() 339 | exclude_list = exclude_list.split(',') 340 | 341 | if is_ld: 342 | header_to_add_format = True 343 | else: 344 | header_to_add_format = False 345 | 346 | for line in f: 347 | #: Skip header lines 348 | if line[0] == '#' and line[1] == '#': 349 | if header_to_add_format and line.startswith('##FORMAT'): 350 | f_vcf.write('##FORMAT=\n') 351 | f_vcf.write(line) 352 | continue 353 | if line[0] == '#': 354 | col, num_haps, labels = process_vcf_header(line, indiv, f_vcf, data_source, is_ld) 355 | continue 356 | 357 | row = line.rstrip().split('\t') 358 | v_type = get_mutation_type(row[3], row[4]) 359 | 360 | # switch to a new contig 361 | # we assume variants at different contigs are not interleaved 362 | if row[0] != chrom: 363 | headA = 0 364 | headB = 0 365 | offsetA = 0 366 | offsetB = 0 367 | chrom = row[0] 368 | current_block_pos = 0 369 | loc = int(row[1]) 370 | 371 | #: filter based on gnomad_af_th if it is set (gnomad only) 372 | if (not is_stochastic) and data_source == 'gnomad': 373 | freq = get_allele_freq(row[7], num_haps, data_source, gnomad_ac_field) 374 | if freq <= gnomad_af_th: 375 | continue 376 | 377 | #: no LD stochastic update for 1kg and gnomad 378 | if is_stochastic and is_ld == False: 379 | freq = get_allele_freq(row[7], num_haps, data_source, gnomad_ac_field) 380 | if freq < 0: 381 | continue 382 | #: only updates the random number when exceeding current block 383 | if loc >= current_block_pos + block_size: 384 | # print ('--update block--') 385 | # print ('prev rr = {0}, block_pos = {1}'.format(rr, current_block_pos)) 386 | rr = random.random() 387 | current_block_pos = int(loc / block_size) * block_size 388 | # print ('updt rr = {0}, block_pos = {1}'.format(rr, current_block_pos)) 389 | 390 | if rr > freq: 391 | # skip this allele 392 | continue 393 | # print ('selected, rr = {}'.format(rr), row[:2], freq) 394 | 395 | #: LD-preserving stochastic update for 1kg, this mode is not supported when using gnomad data 396 | if is_stochastic and is_ld and data_source == '1kg': 397 | if loc >= current_block_pos + block_size: 398 | while 1: 399 | # randomly pick an individual 400 | ld_indiv = random.choice(labels[9:]) 401 | # randomly pick a haplotype 402 | ld_hap = random.choice([0,1]) 403 | if ld_indiv in exclude_list: 404 | print ('exclude {0}: {1}-{2}'.format(current_block_pos, ld_indiv, ld_hap)) 405 | continue 406 | current_block_pos = int(loc / block_size) * block_size 407 | for i in range(9, len(labels)): 408 | if labels[i] == ld_indiv: 409 | col = i 410 | if not col: 411 | print('Error! Couldn\'t find individual %s in VCF' % indiv) 412 | exit() 413 | break 414 | 415 | if v_type == 'SNP' or (indels and v_type in ['INDEL', 'MULT']): 416 | headA, dict_genome, offsetA, headB, dict_genome_B, offsetB = update_variant( 417 | row = row, 418 | col = col, 419 | ld_hap = ld_hap, 420 | indiv = indiv, 421 | indels = indels, 422 | data_source = data_source, 423 | is_ld = is_ld, 424 | f_var = f_var, 425 | f_vcf = f_vcf, 426 | dict_genome = dict_genome, 427 | dict_genome_B = dict_genome_B, 428 | offsetA = offsetA, 429 | offsetB = offsetB, 430 | headA = headA, 431 | headB = headB, 432 | ld_indiv = ld_indiv 433 | ) 434 | 435 | f_vcf.close() 436 | f_var.close() 437 | 438 | if not var_only: 439 | if indiv: 440 | # diploid output genome if `indiv` is set 441 | write_to_fasta(dict_genome, out_prefix, '_hapA') 442 | write_to_fasta(dict_genome_B, out_prefix, '_hapB') 443 | else: 444 | # haploid, no suffix 445 | write_to_fasta(dict_genome, out_prefix, '') 446 | 447 | # # write hapB sequence when `indiv` is set (diploid) 448 | # if indiv != None: 449 | # fB = open(f'{out_prefix}_hapB.fa', 'w') 450 | # # show output in lexical order 451 | # # for key in sorted(dict_genome_B.keys()): 452 | # for key in dict_genome_B.keys(): 453 | # # write full contig name 454 | # fB.write(f'>{dict_genome_B[key][0]}\n') 455 | # # write sequence, 60 chars per line 456 | # for i in range(0, len(dict_genome_B[key][1]), 60): 457 | # fB.write(''.join(dict_genome_B[key][1][i: i+60]) + '\n') 458 | # fB.close() 459 | f.close() 460 | 461 | def add_alt(genome, loc, orig, alt, offset, overlap_ins): 462 | ''' 463 | loc here is the index for str 464 | i.e., loc = vcf_loc -1 465 | ''' 466 | loc += offset 467 | 468 | if len(orig) == 1 and len(alt) == 1: 469 | # SNP 470 | # if genome[loc] != orig: 471 | # print (loc, genome[loc], alt) 472 | genome[loc] = alt 473 | elif len(orig) > len(alt): 474 | # Deletion 475 | for i in range(len(alt)): 476 | genome[loc+i] = alt[i] 477 | del genome[loc+len(alt):loc+len(orig)] 478 | offset -= (len(orig) - len(alt)) 479 | elif len(alt) > len(orig): 480 | #: Insertion 481 | 482 | # if overlap_ins: 483 | # for i in range(len(orig)): 484 | # if genome[loc+i] != alt[i]: 485 | # print ('Warning: genome ({0}) differs from ALT ({1}) at {2}'.format(genome[loc+i], alt[i], loc)) 486 | 487 | #: don't replace if overlap_ins is True 488 | if not overlap_ins: 489 | for i in range(len(orig)): 490 | genome[loc+i] = alt[i] 491 | genome[loc+len(orig):loc+len(orig)] = list(alt[len(orig):]) 492 | offset += len(alt) - len(orig) 493 | else: 494 | # len(orig)=len(alt)>1 : Weird combination of SNP/In/Del 495 | for i in range(len(alt)): 496 | genome[loc+i] = alt[i] 497 | 498 | return offset 499 | 500 | def read_genome(fn_genome): 501 | ''' 502 | Read a fasta file and returns a dictionary storing contig sequences 503 | 504 | Args: 505 | fn_genome: file name of the genome (in fasta format) 506 | 507 | Return: 508 | dict_genome (dict) 509 | key: short contig name (from '>' to the first space) 510 | value: [full contig name (string), sequence (list)] 511 | sequence is represented in lists for easier modification later 512 | ''' 513 | if not (sys.version_info.major == 3 and sys.version_info.minor < 6): 514 | # Use OrderedDict to maintain dictionary oder as input 515 | dict_genome = OrderedDict() 516 | else: 517 | # Python 3.6's dict is now ordered by insertion order 518 | dict_genome = {} 519 | 520 | f_genome = open(fn_genome, 'r') 521 | curr_contig = '' 522 | curr_seq = '' 523 | full_contig_name = '' 524 | for line in f_genome: 525 | line = line.rstrip() 526 | if line[0] == '>': 527 | # update previous contig 528 | if curr_contig != '': 529 | dict_genome[curr_contig] = [full_contig_name, list(curr_seq[:])] 530 | full_contig_name = line[1:] 531 | curr_contig = line.split()[0][1:] 532 | curr_seq = '' 533 | else: 534 | curr_seq += line 535 | if curr_contig != '': 536 | dict_genome[curr_contig] = [full_contig_name, list(curr_seq[:])] 537 | 538 | return dict_genome 539 | 540 | if __name__ == '__main__': 541 | # Print file's docstring if -h is invoked 542 | parser = argparse.ArgumentParser(description=__doc__, 543 | formatter_class=argparse.RawDescriptionHelpFormatter) 544 | parser.add_argument( 545 | '-r', '--ref', type=str, required=True, help='Path to fasta file containing reference genome' 546 | ) 547 | parser.add_argument( 548 | '-v', '--vcf', type=str, help="Path to VCF file containing mutation information" 549 | ) 550 | parser.add_argument( 551 | '-op', '--out-prefix', type=str, required=True, help="Path to output prefix" 552 | ) 553 | parser.add_argument( 554 | '-s', '--name', type=str, help="Name of individual in VCF to process; leave blank to allow all variants [None]" 555 | ) 556 | parser.add_argument( 557 | '-i', '--include-indels', action='store_true', help="Set to extract both SNPs and INDELs [Off]" 558 | ) 559 | parser.add_argument( 560 | '-S', '--stochastic', action='store_true', help="Set to enable stochastic flipping [Off]" 561 | ) 562 | parser.add_argument( 563 | '-rs', '--rand-seed', help="random seed for controlled randomness [None]" 564 | ) 565 | parser.add_argument( 566 | '--var-only', action='store_true', help="Set to report .var file only (no .fa output) [Off]" 567 | ) 568 | parser.add_argument( 569 | '-b', '--block-size', type=int, default=1, help="Size of block for stochastic update [1]" 570 | ) 571 | parser.add_argument( 572 | '-l', '--ld', action='store_true', help="Set to enable pseudo-LD blocking [Off]" 573 | ) 574 | parser.add_argument( 575 | '-ex', '--exclude-name', type=str, default='', help="Name of individuals in VCF to exclude; separate by comma ['']" 576 | ) 577 | parser.add_argument( 578 | '-d', '--data-source', type=str, default='1kg', help="Source of population genomic data, currently support '1kg' and 'gnomad' ['1kg']" 579 | ) 580 | parser.add_argument( 581 | '--gnomad-ac-field', type=str, default='AC', 582 | help="GnomAD allele count field; activated only in stochastic mode; \ 583 | can be changed depending on popultion of interest ['AC']" 584 | ) 585 | parser.add_argument( 586 | '--gnomad-pop-count', type=int, help="Size of GnomAD population [INT]" 587 | ) 588 | parser.add_argument( 589 | '--gnomad-af-th', type=float, default=0, 590 | help="GnomAD allele frequency threshold. Variants with frequency lower than this value \ 591 | will not be updated [0]" 592 | ) 593 | 594 | args = parser.parse_args(sys.argv[1:]) 595 | 596 | if args.name == None: 597 | print ('Note: no individual specified, all variants are considered') 598 | # if args.stochastic == 1: 599 | if args.stochastic: 600 | print ('Note: stochastic update is enabled') 601 | if args.rand_seed: 602 | random.seed(args.rand_seed) 603 | print ('Set random seed: {}'.format(args.rand_seed)) 604 | 605 | dict_genome = read_genome(args.ref) 606 | update_genome( 607 | indiv = args.name, 608 | dict_genome = dict_genome, 609 | vcf = args.vcf, 610 | out_prefix = args.out_prefix, 611 | indels = args.include_indels, 612 | var_only = args.var_only, 613 | is_stochastic = args.stochastic, 614 | block_size = args.block_size, 615 | is_ld = args.ld, 616 | exclude_list = args.exclude_name, 617 | data_source = args.data_source, 618 | gnomad_ac_field = args.gnomad_ac_field, 619 | gnomad_pop_count = args.gnomad_pop_count, 620 | gnomad_af_th = args.gnomad_af_th 621 | ) 622 | -------------------------------------------------------------------------------- /src/utils.py: -------------------------------------------------------------------------------- 1 | import pandas as pd 2 | 3 | def read_ped(ped_fn): 4 | ''' 5 | Reads the ped file and returns a dict where keys are indivs and 6 | values are corresponding populations 7 | ''' 8 | ped_df = pd.read_csv(ped_fn, sep='\t') 9 | dd = ped_df[['Individual ID', 'Population']] 10 | 11 | popd = {} 12 | for i in range(dd.shape[0]): 13 | key = dd['Individual ID'][i] 14 | value = dd['Population'][i] 15 | popd[key] = value 16 | 17 | return popd 18 | --------------------------------------------------------------------------------