├── .gitattributes ├── Instruction or Manual.pdf └── README.md /.gitattributes: -------------------------------------------------------------------------------- 1 | DeepBSA.zip filter=lfs diff=lfs merge=lfs -text 2 | -------------------------------------------------------------------------------- /Instruction or Manual.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/lizhao007/DeepBSA/HEAD/Instruction or Manual.pdf -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | # Background 2 | DeepBSA is a novel bulked segregant analysis (BSA) software for the dissection of complex traits. Two brand-new algorithms are developed in DeepBSA, deep learning (DL) and k-value (K), which can be applied to different numbers (at least 2) of bulked pools. DeepBSA also integrates five widely used algorithms - ED4, delta SNP_index, G', Ridit, and SmoothLOD, and DL performs better than them with absolute bias and signal-to-noise ratio in our simulation. Overall, DeepBSA provides a user-friendly, OS-compatible, and all-in-one pipeline, which does not need sophisticated bioinformatics skills for BSA. 3 | 4 | # Installation 5 | DeepBSA is available for both Windows and Linux, and the download link is http://zeasystemsbio.hzau.edu.cn/tools.html. The alternate cloud download link is: 6 | 链接:https://pan.baidu.com/s/1PbqOu5fDXK2RU5Hi3G4p6A?pwd=c71e 7 | 提取码:c71e 8 | 9 | # Update history 10 | ### 2024.5 version 1.6:The flashback problem of the Windows version has been fixed, and the problem of data volume has been tested. VCF pretreatment of 100,000 SNP in two mixed pools takes about 3 minutes, and calculation and drawing take about 10 seconds. The pretreatment of a VCF file with 10 million SNP is about 80 minutes (same as CSV format), and the calculation and drawing are about 25 minutes.(Windows 版本的闪退问题已经修复,同时测试了数据量的问题。两个混池10万SNP的VCF预处理约3分钟,计算及画图约10秒;1000万SNP的VCF文件预处理约80分钟(CSV格式相同),计算及画图约25分钟.) 11 | 12 | 2023.9 version 1.5:Improved drawing and fixed small bugs. 13 | 14 | 2022.11.15 version 1.4: Improving the Simulator's function and offering the software for Linux. 15 | 16 | 2022.08.30 version 1.3: Adding PDF file of mapping result and CSV file of algorithm value. 17 | 18 | 2022.08.16 version 1.2 19 | 20 | 2022.07.25 version 1.1 21 | 22 | # Input 23 | The input file for DeepBSA is the VCF file, which contains genomic variants for all bulked pools. For the genomic variant calling, we'd love to recommend using GATK using the guided bioinformatic pipeline as follows: 24 | 25 | ``` 26 | ***Taking two mixed pools as examples*** 27 | ##building reference index 28 | samtools faidx Referencegenome.fa 29 | bwa index Referencegenome.fa 30 | 31 | ##mapping 32 | bwa mem -t 8 -M -P Referencegenome.fa High_Forward.fastq High_Reverse.fastq >bsa_H.sam 33 | bwa mem -t 8 -M -P Referencegenome.fa Low_Forward.fastq Low_Reverse.fastq >bsa_L.sam 34 | 35 | ##pretreatment for GATK SNP calling for hight pool 36 | java -jar ${EBROOTPICARD}/picard.jar CleanSam INPUT=bsa_H.sam OUTPUT=bsa_H_cleaned.sam 37 | java -jar ${EBROOTPICARD}/picard.jar FixMateInformation INPUT=bsa_H_cleaned.sam OUTPUT=bsa_H_cleaned_fixed.sam SO=coordinate 38 | java -jar ${EBROOTPICARD}/picard.jar AddOrReplaceReadGroups INPUT=bsa_H_cleaned_fixed.sam OUTPUT=bsa_H_cleaned_fixed_group.bam LB=bsa_H SO=coordinate RGPL=illumina PU=barcode SM=bsa_H 39 | samtools index bsa_H_cleaned_fixed_group.bam 40 | java -jar ${EBROOTPICARD}/picard.jar MarkDuplicatesWithMateCigar INPUT=bsa_H_cleaned_fixed_group.bam OUTPUT=bsa_H_cleaned_fixed_group_DEDUP.bam M=bsa_H_cleaned_fixed_group_DEDUP.mx AS=true REMOVE_DUPLICATES=true MINIMUM_DISTANCE=500 41 | samtools index bsa_H_cleaned_fixed_group_DEDUP.bam 42 | 43 | ##same pretreatment for GATK SNP calling for low pool 44 | 45 | ##genomic variant calling 46 | java -Xmx64g -jar $EBROOTGATK/GenomeAnalysisTK.jar -T HaplotypeCaller -R Referencegenome.fa -nct 8 -I bsa_H_cleaned_fixed_group_DEDUP.bam -I bsa_L_cleaned_fixed_group_DEDUP.bam -o bsa_H_L_snps_indels.vcf 47 | ``` 48 | 49 | # Usage 50 | ## For windows 51 | 52 | The “Instruction or Manual” file can be downloaded from GitHub and it is also packed into the DeepBSA_windows.zip. 53 | ## For Linux 54 | 55 | ### Requirement 56 | R and Python 3.7(or greater) should be installed. Other required Python packages can be quickly installed by running "./requirment.txt" in the main directory as follows. 57 | ``` 58 | #Install 59 | wget http://zeasystemsbio.hzau.edu.cn/Tools/DeepBSA_linux_v1.4.tar.gz 60 | tar -xvzf DeepBSA_linux_v1.4.tar.gz 61 | cd DeepBSA_linux_v1.4/ 62 | ./requirment.txt 63 | 64 | #QTL mapping 65 | cd bin/ 66 | python3 main.py -h 67 | 68 | #usage: main.py [-h] --i I [--m M] [--p P] [--p1 P1] [--p2 P2] [--p3 P3] [--s S] [--w W] [--t T] 69 | optional arguments: 70 | -h, --help show this help message and exit 71 | --i I The input file path(vcf/csv). 72 | --m M The algorithm(DL/K/ED4/SNP/SmoothG/SmoothLOD/Ridit) used. The default is DL. 73 | --p P Whether to pretreatment data(1[True] or 0[False]). The default is True. 74 | --p1 P1 Pretreatment step 1: Number of read thread, the SNP whose number is lower than it will be filtered. The default is 0. 75 | --p2 P2 Pretreatment step 2: Chi-square test(1[True] or 0[False]). The default is 1[True]. 76 | --p3 P3 Pretreatment step 3: Continuity test(1[True] or 0[False]). The default is 1[True]. 77 | --s S The function to smooth the result(Tri-kernel-smooth\LOWESS\Moving Average). The default is LOWESS 78 | --w W Windows size of LOESS. The number ranges from 0-1. 0 presents the best size for minimum AICc. The default is 0(auto). 79 | --t T The threshold to find peaks(float). Default is 0(auto) 80 | 81 | #Data simulation 82 | cd DeepBSA_linux_v1.4/bin/ 83 | python3 simulate_progress.py -h 84 | 85 | #usage: simulate_progress.py [-h] --i I --p P --r R --e E --s S 86 | optional arguments: 87 | -h, --help show this help message and exit 88 | --i I individual 89 | --p P pools 90 | --r R ratio 91 | --e E effective points 92 | --s S save path 93 | 94 | ``` 95 | More parameters details can be found in the “Instruction or Manual” file. 96 | 97 | # Cite 98 | 99 | Li Z., Chen X., Shi S., Zhang H., Wang X., Chen H., Li W., and Li L. (2022). DeepBSA: A deep-learning algorithm improves bulked segregant analysis for dissecting complex traits. Mol. Plant. doi: https://doi.org/10.1016/j.molp.2022.08.004. 100 | 101 | --------------------------------------------------------------------------------