├── CHANGELOG.md ├── CONTRIBUTING.md ├── LICENSE ├── Volunteers.md └── README.md /CHANGELOG.md: -------------------------------------------------------------------------------- 1 | # (2024-01-30) 2 | 3 | 4 | ### Features 5 | 6 | * add docs ([d01b29a](https://github.com/Jwindler/Assembly_tools/commit/d01b29a19b46f939c3dc5a1f8b78e4d7342bc4ac)) 7 | 8 | 9 | 10 | -------------------------------------------------------------------------------- /CONTRIBUTING.md: -------------------------------------------------------------------------------- 1 | # Contributing 2 | 3 | **Your contributions are always welcome!** 4 | 5 | - Open an [issue](https://github.com/Jwindler/Assembly_tools/issues) with any suggestion/correction 6 | - Send a [Pull Request](https://github.com/Jwindler/Assembly_tools/pulls) 7 | - **Please refer to our format for submitting your application.** 8 | - If you have any suggestions, please raise an issue or contact me by e-mail jzjlab@163.com -------------------------------------------------------------------------------- /LICENSE: -------------------------------------------------------------------------------- 1 | MIT License 2 | 3 | Copyright (c) 2024 Jwindler 4 | 5 | Permission is hereby granted, free of charge, to any person obtaining a copy 6 | of this software and associated documentation files (the "Software"), to deal 7 | in the Software without restriction, including without limitation the rights 8 | to use, copy, modify, merge, publish, distribute, sublicense, and/or sell 9 | copies of the Software, and to permit persons to whom the Software is 10 | furnished to do so, subject to the following conditions: 11 | 12 | The above copyright notice and this permission notice shall be included in all 13 | copies or substantial portions of the Software. 14 | 15 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR 16 | IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, 17 | FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE 18 | AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER 19 | LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, 20 | OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE 21 | SOFTWARE. 22 | -------------------------------------------------------------------------------- /Volunteers.md: -------------------------------------------------------------------------------- 1 | # Volunteers plan 2 | 3 | **Announcement Calling for Volunteers: Open Source Genome Assembly Project** 4 | 5 | Dear bioinformatics enthusiasts, we sincerely invite you to join our open source genome assembly project! 6 | We aim to bring together volunteers from around the world who are interested in the development and application of genome assembly tools and work together to advance the field. 7 | 8 | By working together, we will create an open, flexible and easy-to-use resource that will allow researchers to easily select and apply the tools that best suit their research purposes. 9 | 10 | 11 | 12 | ## Description 13 | 14 | 1. Our project will collect and organize currently available genome assembly tools, including but not limited to assembly software for different algorithms, auxiliary tools, and related documents and tutorials. 15 | 2. We will establish a unified code base and document base through the Github platform to facilitate users to review and contribute. 16 | 17 | 18 | 19 | ## Purpose 20 | 21 | 1. Provide a comprehensive genome assembly tool resource to help researchers conduct genome assembly and analysis more efficiently. 22 | 2. Promote open source collaboration and knowledge sharing in the field of genome assembly, and accelerate the innovation and development of tools. 23 | 24 | 25 | 26 | ## Volunteers 27 | 28 | We welcome volunteers with the following characteristics to join our project: 29 | 1. Have strong interest in bioinformatics and genome assembly, and have relevant knowledge background. 30 | 2. Familiar with commonly used programming languages (such as Python, R) and version control tools (such as Git). 31 | 3. Have good teamwork spirit and communication skills, and be able to actively participate in discussions and contribute your own ideas and suggestions. 32 | 4. Applicants with experience in contributing to open source projects will be given priority. 33 | 34 | 35 | 36 | ## Benefits 37 | 38 | As a member of our program, you will gain the following benefits: 39 | 40 | 1. Participate in an open source project and contribute to the development of the genome assembly field. 41 | 2. Collaborate with peers from around the world to learn and share the latest technologies and ideas. 42 | 3. Improve personal technical skills and teamwork skills, and gain valuable practical experience in projects. 43 | 4. Establish a good social network, meet like-minded partners, and jointly pursue the progress of bioinformatics. 44 | 45 | 46 | 47 | If you are interested in our project and would like to contribute to it, please send an email to jzjlab@163.com and briefly introduce your personal background and how you would like to participate. 48 | 49 | We eagerly look forward to your joining, and let us work together to build a more prosperous genome assembly community! 50 | 51 | Sincerely invite, 52 | 53 | Open source genome assembly project 54 | 55 | 2024.1.30 -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | # Assembly analysis tools and papers 2 | 3 | ![GitHub Repo stars](https://img.shields.io/github/stars/Jwindler/Assembly_tools) ![GitHub pull requests](https://img.shields.io/github/issues-pr/Jwindler/Assembly_tools) ![GitHub License](https://img.shields.io/github/license/Jwindler/Assembly_tools) 4 | 5 | Genome Assembly tools are added by pipeline. **Welcome contribute and get in touch!** 6 | 7 | [Google Group](https://groups.google.com/g/assembly-tools) | [Volunteers](https://github.com/Jwindler/Assembly_tools/blob/main/Volunteers.md) | [CONTRIBUTING](https://github.com/Jwindler/Assembly_tools/blob/main/CONTRIBUTING.md) 8 | 9 | 10 | 11 | > **If there is an error in cited papers or tool does not included in list, please raise an** [ISSUE](https://github.com/Jwindler/Assembly_tools/issues). 12 | 13 | 14 | 15 | ## Table of content 16 | 17 | - [Assembly analysis tools and papers](#assembly-analysis-tools-and-papers) 18 | - [Table of content](#table-of-content) 19 | - [Survery](#survery) 20 | - [Conitg](#conitg) 21 | - [Scaffold](#scaffold) 22 | - [Polish](#polish) 23 | - [Evaluation](#evaluation) 24 | 25 | 26 | 27 | 28 | 29 | 30 | ## Survery 31 | 32 | 33 | 34 | | Name | Introduction | Paper | Url | Note | Public Date | 35 | | ----------- | ------------------------------------------------------------ | ------------------------------------------------------------ | --------------------------------------------------- | --------------- | ----------- | 36 | | GenomeScope | Fast genome analysis from unassembled short reads. | [*Bioinformatics*](https://doi.org/10.1093/bioinformatics/btx153) | [Github](https://github.com/schatzlab/genomescope) | | 2017.3 | 37 | | smudgeplot | Such an approach also allows us to analyze obscure genomes with duplications, various ploidy levels, etc. | [*Nature Communications*](https://doi.org/10.1038/s41467-020-14998-3) | [Github](https://github.com/KamilSJaron/smudgeplot) | GenomeScope 2.0 | 2020.3 | 38 | | Jellyfish | Jellyfish is a tool for fast, memory-efficient counting of k-mers in DNA. | [*Bioinformatics*](https://doi.org/10.1093/bioinformatics/btr011) | [Github](https://github.com/gmarcais/Jellyfish) | | 2011.1 | 39 | | nQuire | A statistical framework for ploidy estimation using NGS short-read data. | [*BMC Bioinformatics*](https://bmcbioinformatics.biomedcentral.com/articles/10.1186/s12859-018-2128-z) | [Github](https://github.com/clwgg/nQuire) | | 2018.4 | 40 | | KMC | Counting and manipulating k-mer statistics. | [*Bioinformatics*](https://academic.oup.com/bioinformatics/article/33/17/2759/3796399?login=false) | [Github](https://github.com/refresh-bio/KMC) | | 2017.5 | 41 | | KAT | a K-mer analysis toolkit to quality control NGS datasets and genome assemblies | [*Bioinformatics*](https://academic.oup.com/bioinformatics/article/33/4/574/2664339) | [Github](https://github.com/TGAC/KAT) | | 2016.11 | 42 | 43 | 44 | 45 | 46 | 47 | ## Contig 48 | 49 | 50 | 51 | | Name | Introduction | Paper | Url | Note | Public Date | 52 | | -------------- | ------------------------------------------------------------ | ------------------------------------------------------------ | ------------------------------------------------------------ | ---- | ----------- | 53 | | Hifiasm | Hifiasm is a fast haplotype-resolved de novo assembler initially designed for PacBio HiFi reads. | [*Nat Methods*](https://doi.org/10.1038/s41592-020-01056-5) | [Github](https://github.com/chhylp123/hifiasm) | | 2021.2 | 54 | | HiCanu | designed for high-noise single-molecule sequencing (such as the [PacBio](http://www.pacb.com/) [RS II](http://www.pacb.com/products-and-services/pacbio-systems/rsii/)/[Sequel](http://www.pacb.com/products-and-services/pacbio-systems/sequel/) or [Oxford Nanopore](https://www.nanoporetech.com/) [MinION](https://nanoporetech.com/products)). | [*Genome Research*](https://genome.cshlp.org/content/early/2020/08/14/gr.263566.120) | [Github](https://github.com/marbl/canu) | | 2020.8 | 55 | | NextDenovo | NextDenovo is a string graph-based *de novo* assembler for long reads (CLR, ~~HiFi~~ and ONT). | [*bioRxiv*](https://www.biorxiv.org/content/10.1101/2023.03.09.531669v1) | [Github](https://github.com/Nextomics/NextDenovo) | | 2023.3 | 56 | | IPA | | | [Github](https://github.com/PacificBiosciences/pbipa) | | | 57 | | Flye | De novo assembler for single molecule sequencing reads using repeat graphs. | [*Nature Methods*](https://doi.org/10.1038/s41592-020-00971-x) | [Github](https://github.com/fenderglass/Flye) | | 2020.10 | 58 | | Peregrine | Peregrine is a fast genome assembler for accurate long reads (length > 10kb, accuracy > 99%). | [*bioRxiv*](https://www.biorxiv.org/content/10.1101/705616v1.full-text) | [Github](https://github.com/cschin/Peregrine) | | 2019.7 | 59 | | HGAP4 | HGAP4 is suitable for assembling a wide range of genome sizes and complexity. | [*Nature Methods*](https://www.nature.com/articles/nmeth.2474) | [PacBio](https://www.pacb.com/videos/tutorial-hgap4-de-novo-assembly-application/) | | 2013.5 | 60 | | Wtdbg2 | A fuzzy Bruijn graph approach to long noisy reads assembly. | [*Nature Methods*](https://www.nature.com/articles/s41592-019-0669-3) | [Github](https://github.com/ruanjue/wtdbg2) | | 2019.12 | 61 | | Falcon | a set of tools for fast aligning long reads for consensus and assembly. | [*Nature Methods*](https://www.nature.com/articles/nmeth.4035) | [Github](https://github.com/PacificBiosciences/FALCON/) | | 2016.10 | 62 | | SMARTdenovo | Ultra-fast de novo assembler using long noisy reads. | [*Gigabyte*](https://gigabytejournal.com/articles/15) | [Github](https://github.com/ruanjue/smartdenovo) | | 2021.3 | 63 | | miniasm | Ultrafast de novo assembly for long noisy reads (though having no consensus step) | [*Bioinformatics*](https://doi.org/10.1093/bioinformatics/btw152) | [Github](https://github.com/lh3/miniasm) | | 2016.6 | 64 | | necat | Nanopore data assembler | [*Nature Communications*](https://www.nature.com/articles/s41467-020-20236-7) | [Github](https://github.com/xiaochuanle/NECAT) | | 2021.1 | 65 | | Hypo-Assembler | A diploid genome polisher and assembler. | [*Nature Methods*](https://www.nature.com/articles/s41592-023-02142-0) | [Github](https://github.com/kensung-lab/hypo-assembler) | | 2024.3 | 66 | | Verkko | a hybrid genome assembly pipeline developed for T2T assembly of HiFi or ONT reads. | [*Nature Biotechnology*](https://doi.org/10.1038/s41587-023-01662-6) | [Github](https://github.com/marbl/verkko) | | 2023.2 | 67 | | NextPolish2 | Repeat-aware polishing genomes assembled using HiFi long reads. | [*GPB*](https://doi.org/10.1093/gpbjnl/qzad009) | [Github](https://github.com/Nextomics/NextPolish2) | | 2024.1 | 68 | | Merfin | Evaluate variant calls and its combination with k-mer multiplicity. | [*Nature Methods*](https://doi.org/10.1038/s41592-022-01445-y) | [Github](https://github.com/arangrhie/merfin) | | 2022.3 | 69 | | SOAPdenovo2 | Next generation sequencing reads de novo assembler. | [*Bioinformatics*](https://doi.org/10.1093/bioinformatics/btv033) | [Github](https://github.com/aquaskyline/SOAPdenovo2) | | 2015.1 | 70 | | Canu | A single molecule sequence assembler for genomes large and small. | [*Genome Research*](https://doi.org/10.1101%2Fgr.215087.116) | [Github](https://github.com/marbl/canu) | | 2017.5 | 71 | | MECAT2 | | [*Nature Methods*](https://www.nature.com/articles/nmeth.4432) | [Github](https://github.com/xiaochuanle/MECAT2) | | 2017.9 | 72 | | | | | | | | 73 | 74 | 75 | 76 | 77 | 78 | ## Scaffold 79 | 80 | 81 | 82 | | Name | Introduction | Paper | Url | Note | Public Date | 83 | | ------------ | ------------------------------------------------------------ | ------------------------------------------------------------ | ------------------------------------------------------ | ----------------------------------------------- | ----------- | 84 | | 3D-DNA | Scaffold genome with Hi-C data. | [*Science*](https://www.science.org/doi/10.1126/science.aal3327) | [Github](https://github.com/aidenlab/3d-dna) | Use Hi-C data | 2017.3 | 85 | | LACHESIS | Use Hi-C data for ultra-long-range scaffolding of *de novo* genome assemblies. | [*Nature Biotechnology*](https://www.nature.com/articles/nbt.2727) | [Github](https://github.com/shendurelab/LACHESIS) | LACHESIS is no longer being actively developed. | 2013.12 | 86 | | SALSA2 | A tool to scaffold long read assemblies with Hi-C data. | [*bioRxiv*](https://www.biorxiv.org/content/10.1101/261149v1) | [Github](https://github.com/marbl/SALSA) | | 2018.2 | 87 | | YaHS | Yet another Hi-C scaffolding tool. | [*Bioinformatics*](https://doi.org/10.1093/bioinformatics/btac808) | [Github](https://github.com/c-zhou/yahs) | recommend | 2022.12 | 88 | | instaGRAAL | Large genome reassembly based on Hi-C data, continuation of GRAAL. | [*Nature Communications*](https://www.nature.com/articles/ncomms6695) | [Github](https://github.com/koszullab/instaGRAAL) | NVIDIA graphics card is required | 2014.12 | 89 | | EndHiC | a fast and easy-to-use Hi-C scaffolding tool. | [*Quantitative Biology*](https://doi.org/10.48550/arXiv.2111.15411) | [Github](https://github.com/fanagislab/EndHiC) | | 2021.11 | 90 | | Pin_hic | A Hi-C scaffolding method. | [*BMC Bioinformatics*](https://bmcbioinformatics.biomedcentral.com/articles/10.1186/s12859-021-04453-5) | [Github](https://github.com/dfguan/pin_hic) | | 2021.11 | 91 | | AutoHiC | A novel genome assembly pipeline based on deep learning. | [*bioRxiv*](https://doi.org/10.1101/2023.08.27.555031) | [Github](https://github.com/Jwindler/AutoHiC) | recommend (Deep Learning) | 2023.8 | 92 | | ALLHiC | phasing and scaffolding polyploid genomes based on Hi-C data. | [*Nature Plants*](https://www.nature.com/articles/s41477-019-0487-8) | [Github](https://github.com/tangerzhang/ALLHiC) | recommend (Plant) | 2019.8 | 93 | | Juicebox | a point-and-click interface for using Hi-C heatmaps to identify and correct errors in a genome assembly. | [*bioRxiv*](https://www.biorxiv.org/content/10.1101/254797v1) | [Github](https://github.com/aidenlab/Juicebox) | | 2018.1 | 94 | | SLR | Scaffolding using long reads obtained by the third generation sequencing technologies. | [*BMC Bioinformatics*](https://bmcbioinformatics.biomedcentral.com/articles/10.1186/s12859-019-3114-9) | [Github](https://github.com/luojunwei/SLR) | | 2019.10 | 95 | | LongStitch | Correct and scaffold assemblies using long reads. | [*BMC Bioinformatics*](https://bmcbioinformatics.biomedcentral.com/articles/10.1186/s12859-021-04451-7) | [Github](https://github.com/bcgsc/LongStitch) | | 2021.10 | 96 | | RagTag | a collection of software tools for scaffolding and improving modern genome assemblies. | [*Genome Biology*](https://doi.org/10.1186/s13059-022-02823-7) | [Github](https://github.com/malonge/RagTag) | | 2022.12 | 97 | | HapHiC | a fast, reference-independent, allele-aware scaffolding tool based on Hi-C data. | [*bioRxiv*](https://doi.org/10.1101/2023.11.18.567668) | [Github](https://github.com/zengxiaofei/HapHiC) | | 2023.11 | 98 | | scaffhic | Pipeline for genome scaffolding by modelling distributions of HiC pairs. | | [Github](https://github.com/wtsi-hpag/scaffHiC) | | | 99 | | HiCAssembler | Software to assemble contigs/scaffolds into chromosomes using Hi-C data. | [*Genes & Dev*](https://doi.org/10.1101/gad.328971.119) | [Github](https://github.com/maxplanck-ie/HiCAssembler) | | 2019.10 | 100 | | HaploHiC | comprehensive haplotype division of Hi-C PE-reads based on local contacts ratio. | | [Github](https://github.com/Nobel-Justin/HaploHiC) | | | 101 | | DipAsm | Efficient chromosome-scale haplotype-resolved assembly of human genomes. | [*bioRxiv*](https://www.biorxiv.org/content/10.1101/810341v2) | [Github](https://github.com/shilpagarg/DipAsm) | | 2020.7 | 102 | 103 | 104 | 105 | 106 | 107 | ## Polish 108 | 109 | 110 | 111 | | Name | Introduction | Paper | Url | Note | Public Date | 112 | | --------------- | ------------------------------------------------------------ | ------------------------------------------------------------ | ------------------------------------------------------------ | ---- | ----------- | 113 | | YAGcloser | Yet-Another-Gap-Closer based on spanning of long reads. | [*Journal of Heredity*](https://academic.oup.com/jhered/article/113/6/665/6585917) | [Github](https://github.com/merlyescalona/yagcloser) | | 2022.5 | 114 | | TGS-Gapcloser | A gap-closing software tool that uses long reads to enhance genome assembly. | [*GigaScience*](https://doi.org/10.1093/gigascience/giaa094) | [Github](https://github.com/BGI-Qingdao/TGS-GapCloser) | | 2020.9 | 115 | | DENTIST | Close assembly gaps using long-reads at high accuracy. | [*GigaScience*](https://doi.org/10.1093/gigascience/giab100) | [Github](https://github.com/a-ludi/dentist) | | 2022.1 | 116 | | Redundans | a pipeline that assists an assembly of heterozygous/polymorphic genomes. | [*Nucleic Acids Research*](https://doi.org/10.1093/nar/gkw294) | [Github](https://github.com/Gabaldonlab/redundans) | | 2016.4 | 117 | | Purge Haplotigs | an effective tool for the early stages of curating highly heterozygous genome assemblies produced from third-generation long read sequencing. | [*BMC Bioinformatics*](https://bmcbioinformatics.biomedcentral.com/articles/10.1186/s12859-018-2485-7) | [Bitbucket](https://bitbucket.org/mroachawri/purge_haplotigs) | | 2018.11 | 118 | | Purge_dups | haplotypic duplication identification tool. | [*Bioinformatics*](https://doi.org/10.1093/bioinformatics/btaa025) | [Github](https://github.com/dfguan/purge_dups) | | 2020.1 | 119 | | Pilon | an automated genome assembly improvement and variant detection tool. | [*PLOS ONE*](https://doi.org/10.1371/journal.pone.0112963) | [Github](https://github.com/broadinstitute/pilon) | | 2014.11 | 120 | | Racon | Ultrafast consensus module for raw de novo genome assembly of long uncorrected reads. | [*Genome Research*](https://genome.cshlp.org/content/early/2017/01/18/gr.214270.116) | [Github](https://github.com/isovic/racon) | | 2017.1 | 121 | | nextpolish | Fast and accurately polish the genome generated by long reads. | [*Bioinformatics*](https://doi.org/10.1093/bioinformatics/btz891) | [Github](https://github.com/Nextomics/NextPolish) | | 2020.4 | 122 | | HaploMerger2 | | [*Genome Research*](https://genome.cshlp.org/content/22/8/1581.long) | [Github](https://github.com/mapleforest/HaploMerger2) | | 2012.5 | 123 | | GapFiller | | [Horticulture Research](https://doi.org/10.1093/hr/uhad127) | [Github](https://github.com/aaranyue/quarTeT#GapFiller) | | 2023.10 | 124 | | RegCloser | | [BMC Bioinformatics](https://bmcbioinformatics.biomedcentral.com/articles/10.1186/s12859-023-05367-0) | [Github](https://github.com/csh3/RegCloser.git) | | 2023.6 | 125 | 126 | 127 | 128 | ## Evaluation 129 | 130 | **Genome assembly evaluation tools.** 131 | 132 | | Name | Introduction | Paper | Url | Note | Public Date | 133 | | -------------- | ------------------------------------------------------------ | ------------------------------------------------------------ | ------------------------------------------------------------ | ----------------- | ----------- | 134 | | QUAST | a quality assessment tool for evaluating and comparing genome assemblies. | [*Bioinformatics*](https://doi.org/10.1093/bioinformatics/btt086) | [Github](https://github.com/ablab/quast) | | 2013.2 | 135 | | BioNanoAnalyst | a visualisation tool to assess genome assembly quality using BioNano data. | [*BMC Bioinformatics*](https://bmcbioinformatics.biomedcentral.com/articles/10.1186/s12859-017-1735-4) | [Github](https://github.com/AppliedBioinformatics/BioNanoAnalyst) | Use BioNano data | 2017.6 | 136 | | CRAQ | Identification of errors in draft genome assemblies with single-base pair resolution for quality assessment and improvement. | [*Nature Communications*](https://www.nature.com/articles/s41467-023-42336-w) | [Github](https://github.com/JiaoLaboratory/CRAQ) | Single base scale | 2023.10 | 137 | | BUSCO | assessing genome assembly and annotation completeness with single-copy orthologs. | [*Bioinformatics*](https://doi.org/10.1093/bioinformatics/btv351) | [BUSCO](https://busco.ezlab.org/) | | 2015.6 | 138 | | Merqury | k-mer based assembly evaluation | [*Genome Biology*](https://genomebiology.biomedcentral.com/articles/10.1186/s13059-020-02134-9) | [Github](https://github.com/marbl/merqury) | | 2020.9 | 139 | | Klumpy | A Tool to Evaluate the Integrity of Long-Read Genome Assemblies and Illusive Sequence Motifs. | [*bioRxiv*](https://www.biorxiv.org/content/10.1101/2024.02.14.580330v1.full) | [Bitbucket](https://bitbucket.org/Gio12/klumpy/src/master/) | | | 140 | | GAEP | a comprehensive genome assembly evaluating pipeline. | [*JGG*](https://pubmed.ncbi.nlm.nih.gov/37245652/) | [Github](https://github.com/zy-optimistic/GAEP) | | 2023.5 | 141 | | Flagger | Evaluating genome assemblies. | [*Nature*](https://www.nature.com/articles/s41586-023-05896-x) | [Github](https://github.com/mobinasri/flagger) | | 2023.5 | 142 | | Asset | assembly evaluation tool. | | [Github](https://github.com/dfguan/asset) | | | 143 | | Inspector | A tool for evaluating long-read de novo assembly results. | [*Genome Biology*](https://genomebiology.biomedcentral.com/articles/10.1186/s13059-021-02527-4) | [Github](https://github.com/Maggi-Chen/Inspector) | | 2021.11 | 144 | 145 | 146 | 147 | 148 | 149 | 150 | 151 | 152 | 153 | --------------------------------------------------------------------------------