├── Get_MR1.0 help.pdf ├── Get_MR2.0 help.pdf ├── 1.0 ├── Install_Reference_Genome.r ├── Get_MR1.0dependence.R └── Get_MR1.0.r ├── Get_MR2.0 help.md ├── Get_MR1.0 help.md ├── README.md ├── LICENSE └── 2.0 └── Get_MR2.0.r /Get_MR1.0 help.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/HaobinZhou/Get_MR/HEAD/Get_MR1.0 help.pdf -------------------------------------------------------------------------------- /Get_MR2.0 help.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/HaobinZhou/Get_MR/HEAD/Get_MR2.0 help.pdf -------------------------------------------------------------------------------- /1.0/Install_Reference_Genome.r: -------------------------------------------------------------------------------- 1 | BiocManager::install("SNPlocs.Hsapiens.dbSNP155.GRCh37") 2 | BiocManager::install("BSgenome.Hsapiens.1000genomes.hs37d5") 3 | BiocManager::install("SNPlocs.Hsapiens.dbSNP155.GRCh38") 4 | BiocManager::install("BSgenome.Hsapiens.NCBI.GRCh38") 5 | -------------------------------------------------------------------------------- /1.0/Get_MR1.0dependence.R: -------------------------------------------------------------------------------- 1 | ## 保证你的R版本是4.2及以上!! 2 | install.packages("devtools") 3 | if (!require("BiocManager", quietly = TRUE)) 4 | install.packages("BiocManager") 5 | devtools::install_github("rondolab/MR-PRESSO") 6 | devtools::install_github("MRCIEU/TwoSampleMR") 7 | devtools::install_github("explodecomputer/plinkbinr") 8 | devtools::install_github("mrcieu/ieugwasr") 9 | devtools::install_github("jean997/cause") 10 | BiocManager::install("MungeSumstats") 11 | devtools::install_github("n-mounier/MRlap") 12 | devtools::install_github("mglev1n/ldscr") 13 | install.packages(c('dplyr','tidyr','data.table','stringr','vroom','mr.raps','pbapply','doParallel')) 14 | -------------------------------------------------------------------------------- /Get_MR2.0 help.md: -------------------------------------------------------------------------------- 1 | ## Get_MR2.0 2 | 3 | ### 写在前面 4 | 5 | **欢迎来到向量化与并行化的世界!**本次更新就一个重大功能,就是并行化运行mr分析,以最优的效率批量跑大量的数据。家用电脑较新服务器基本可以实现2小时批量运行10000个因素,如果你有高性能服务器,恭喜你,30分钟以内就能跑完。 6 | 7 | 8 | 9 | 由于本次主要是思路与方法的分享,所以函数的帮助文档写的不多,主要还是看示例代码即可,应该还是很容易上手的。 10 | 11 | 12 | 13 | **公众号回复:“示例”即得示例代码** 14 | 15 | 16 | 17 | ## cyclemr 18 | 19 | ### 描述 20 | 21 | 这个函数是用于执行循环Mendelian Randomization (CycleMR) 分析的功能函数,可以在R语言中使用。该函数可以将数据分配到多个计算节点中运行,提高MR分析的效率。 22 | 23 | ### 用法 24 | 25 | 直接在R语言中调用这个函数,如下所示: 26 | 27 | ``` 28 | # 调用cyclemr函数 29 | mr_results <- cyclemr(dat = data, cl_num = 4, type = "list") 30 | ``` 31 | 32 | ### 参数 33 | 34 | - `dat`: harmonise_data后的数据,可以是数据框或列表类型。默认是list 35 | - `cl_num`: 批量化线程数。 36 | - `type`: MR分析数据类型,可以是"list"或"data",默认为"list"。主要使用情况也是list 37 | 38 | > 注: 如何判断自己的电脑能开启的最大线程数? 39 | > 40 | > ![image-20230501190452753](A:\OneDrive\GET\get_mr\assets\image-20230501190452753.png) 41 | > 42 | > 在任务管理器可看到 43 | > 44 | > * 内核: 计算机核心数 45 | > 46 | > * 逻辑处理器: 计算机总线程数 47 | > 48 | > 比如说很多处理器宣传是8核16线程,这个8就指的是内核,这个16指的是逻辑处理器。 49 | > 50 | > 本质上,16只是将每个核心一分为2,但是他们能干的活是一样的。所以一般设置为内核数即可满载CPU 51 | > 52 | > 当然这不能一概而论,因为每个CPU和厂家调度不一样,如果你发现使用内核数不能让CPU跑到100%,则尝试用逻辑处理器数 53 | 54 | ### 返回值 55 | 56 | - `cyclemr`函数返回一个包含MR分析结果的数据框。 57 | 58 | ### 使用举例 59 | 60 | 下面是使用这个函数的一个示例: 61 | 62 | ``` 63 | mr_results <- cyclemr(dat = data, cl_num = 16, type = "list") 64 | ``` 65 | 66 | 67 | 68 | ### 运行时间参考 69 | 70 | 在设置无误情况下,这是我手头有的所有电脑测试出的运行时间: 71 | 72 | 运行10000个数据。(ieu批量数据前10000个)除了服务器外,其他均使用Windows系统 73 | 74 | | CPU | 核数(运行时开的线程数) | 时间 | 75 | | --------------------------------------- | ------------------------ | --------- | 76 | | i9-12900H(拯救者2022 Y9000P 狂暴模式) | 14核20线程(14) | 1小时28分 | 77 | | r7-5700X(台式) | 8核16线程(16) | 1小时38分 | 78 | | r9-6800H (yoga2022 14S 性能模式) | 8核16线程(16) | 1小时54分 | 79 | | r5-3500X (台式) | 6核12线程 (12) | 3小时47分 | 80 | | r5-4600H (拯救者 2020 R7000 狂暴模式) | 6核12线程 (12) | 约4小时 | 81 | | 双路 EPYC 7T83 (服务器 Linux) | 128核256线程(128) | 11分钟 | 82 | 83 | **欢迎各位补充自己手头的机器的运行时间数据,尤其M系列的苹果处理器数据** 84 | 85 | 86 | 87 | ## 一些小工具 88 | 89 | ## get_rsid 90 | 91 | ### 描述 92 | 93 | 根据CHR和POS,从ensemble官网中获取rsID。 94 | 95 | ### 用法 96 | 97 | ``` 98 | get_rsid(chr, pos, version = 'hg38') 99 | ``` 100 | 101 | ### 参数 102 | 103 | - `chr`:染色体号。 104 | - `pos`:基因位置。 105 | - `version`:表示使用的基因组版本,默认为最新的版本(`'hg38'`)。也可选择hg19. 106 | - 注: GRCh37=hg19,GRCh38=hg38 107 | 108 | ### 详细说明 109 | 110 | 该函数基于生物信息学数据库Ensembl SNP Mart来查询给定位置的相关信息。如果未指定基因组版本,则默认使用最新的版本(hg38)。该函数会根据指定的基因组版本选择正确的URL。如果您想查询其他版本的数据,可以将`version`参数设置为相应版本的字符串。 111 | 112 | 参考:[How to find rsID with biomaRt in R (bioconductor.org)](https://support.bioconductor.org/p/9135301/) 113 | 114 | ### 使用举例 115 | 116 | ``` 117 | ds4 <- data.frame(CHR = c("8", "8", "8", "8", "8"),POS = c('101592213', '106973048', '108690829', '102569817', '108580746')) 118 | res<-get_rsid(chr=ds4$CHR, pos=ds4$POS, version = 'hg38') 119 | ``` 120 | 121 | 122 | 123 | ### 错误说明 124 | 125 | 如果出现: 126 | 127 | ``` 128 | Error in curl::curl_fetch_memory(url, handle = handle) : 129 | Timeout was reached: [grch37.ensembl.org:80] Operation timed out after 300013 milliseconds with 7909 bytes received 130 | ``` 131 | 132 | 这并不是代码问题,而是网络超时了,ensemble的API经常拥堵,多试几次即可。当然也有可能请求的数据量太大,也可能会出现这个问题。 133 | 134 | 135 | 136 | ## get_exposure 和 get_outcome 137 | 138 | ### 描述 139 | 140 | 这两个函数是用于进行双样本MR(Two-sample Mendelian Randomization)分析的数据处理和提取过程的功能函数。 141 | 142 | - `get_exposure`函数用于从给定的遗传仪器ID中提取出暴露(exposure)数据。 143 | - `get_outcome`函数用于从给定的遗传仪器ID和暴露数据中提取出结果(outcome)数据。 144 | 145 | ### 用法 146 | 147 | 使用这两个函数前,需要先安装并加载TwoSampleMR包。 148 | 149 | ``` 150 | library(TwoSampleMR) 151 | ``` 152 | 153 | 然后可以直接在R语言中调用这两个函数,如下所示: 154 | 155 | ``` 156 | # 调用get_exposure函数 157 | exposure_data <- get_exposure(id = "ieu-a-1", pval = 5e-8, r2 = 0.001, kb = 10000) 158 | 159 | # 调用get_outcome函数 160 | outcome_data <- get_outcome(id = "ieu-a-1", expo = exposure_data) 161 | ``` 162 | 163 | ### 参数 164 | 165 | - `get_exposure`函数的参数说明: 166 | - `id`: 遗传仪器ID。 167 | - `pval`: 提取暴露数据的P值阈值,默认为5e-08。 168 | - `r2`: 遗传仪器间的LD(linkage disequilibrium)值的阈值,默认为0.001。 169 | - `kb`: 遗传仪器的范围(以kb为单位),默认为10000。 170 | - `get_outcome`函数的参数说明: 171 | - `id`: 遗传仪器ID。 172 | - `expo`: 暴露数据,TwoSampleMR包格式 173 | 174 | 175 | 176 | ## get_ao 177 | 178 | ### 描述 179 | 180 | 获取OPEN GWAS数据库所有可用的ID。可以指定获取某个前缀的ID 181 | 182 | ### 用法 183 | 184 | 使用这个函数前,需要先安装并加载TwoSampleMR包。然后可以直接在R语言中调用这个函数,如下所示: 185 | 186 | ``` 187 | # 调用get_ao函数 188 | ao <- get_ao()## 不限定来源则返回所有id 189 | ao <- get_ao(a = "finn")## 这样就会返回finn的所有可用id 190 | ``` 191 | 192 | ### 参数 193 | 194 | - `a`: (可选)数据来源。 195 | 196 | ### 备注: 来源的名称 197 | 198 | 来自OPEN GWAS [Browse the IEU OpenGWAS project (mrcieu.ac.uk)](https://gwas.mrcieu.ac.uk/datasets/?trait__icontains=12) 5.1获取 199 | 200 | | Batch | Description | [Count](https://gwas.mrcieu.ac.uk/datasets/?trait__icontains=12#counts) | 201 | | ------------------------------------------------------------ | ------------------------------------------------------------ | ------------------------------------------------------------ | 202 | | [bbj-a](https://gwas.mrcieu.ac.uk/datasets/?gwas_id__icontains=bbj-a) | [Biobank Japan release of disease traits](http://jenger.riken.jp/en/) | 120 | 203 | | [ebi-a](https://gwas.mrcieu.ac.uk/datasets/?gwas_id__icontains=ebi-a) | [Datasets that satisfy minimum requirements imported from the EBI database of complete GWAS summary data](https://www.ebi.ac.uk/gwas/downloads/summary-statistics) | 2,585 | 204 | | [eqtl-a](https://gwas.mrcieu.ac.uk/datasets/?gwas_id__icontains=eqtl-a) | [eQTLGen 2019 results, comprising all cis and some trans regions of gene expression in whole blood](https://www.eqtlgen.org/) | 19,942 | 205 | | [finn-b](https://gwas.mrcieu.ac.uk/datasets/?gwas_id__icontains=finn-b) | [FinnGen biobank analysis round 5](https://www.finngen.fi/) | 2,803 | 206 | | [ieu-a](https://gwas.mrcieu.ac.uk/datasets/?gwas_id__icontains=ieu-a) | [GWAS summary datasets generated by many different consortia that have been manually collected and curated, initially developed for MR-Base](https://elifesciences.org/articles/34408) | 440 | 207 | | [ieu-b](https://gwas.mrcieu.ac.uk/datasets/?gwas_id__icontains=ieu-b) | [GWAS summary datasets generated by many different consortia that have been manually collected and curated, initially developed for MR-Base (round 2)](https://elifesciences.org/articles/34408) | 207 | 208 | | [met-a](https://gwas.mrcieu.ac.uk/datasets/?gwas_id__icontains=met-a) | [Human blood metabolites analysed by Shin et al 2014](https://www.ncbi.nlm.nih.gov/pubmed/24816252) | 452 | 209 | | [met-b](https://gwas.mrcieu.ac.uk/datasets/?gwas_id__icontains=met-b) | [Human immune system traits analysed by Roederer et al 2015](https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4393780/) | 150 | 210 | | [met-c](https://gwas.mrcieu.ac.uk/datasets/?gwas_id__icontains=met-c) | [Circulating metabolites analysed by Kettunen et al 2016](https://www.ncbi.nlm.nih.gov/pubmed/27005778) | 123 | 211 | | [met-d](https://gwas.mrcieu.ac.uk/datasets/?gwas_id__icontains=met-d) | [Metabolic biomarkers in the UK Biobank measured by Nightingale Health 2020](https://www.ukbiobank.ac.uk/learn-more-about-uk-biobank/news/nightingale-health-and-uk-biobank-announces-major-initiative-to-analyse-half-a-million-blood-samples-to-facilitate-global-medical-research) | 249 | 212 | | [prot-a](https://gwas.mrcieu.ac.uk/datasets/?gwas_id__icontains=prot-a) | [Complete GWAS summary data on protein levels as described by Sun et al 2018](https://www.ncbi.nlm.nih.gov/pubmed/29875488) | 3,282 | 213 | | [prot-b](https://gwas.mrcieu.ac.uk/datasets/?gwas_id__icontains=prot-b) | [Complete GWAS summary data on protein levels as described by Folkersen et al 2017](https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5393901/) | 83 | 214 | | [prot-c](https://gwas.mrcieu.ac.uk/datasets/?gwas_id__icontains=prot-c) | [Complete GWAS summary data on protein levels as described by Suhre et al 2017](https://pubmed.ncbi.nlm.nih.gov/28240269) | 1,124 | 215 | | [ubm-a](https://gwas.mrcieu.ac.uk/datasets/?gwas_id__icontains=ubm-a) | [Complete GWAS summary data on brain region volumes as described by Elliott et al 2018](https://www.ncbi.nlm.nih.gov/pubmed/30305740) | 3,143 | 216 | | [ukb-a](https://gwas.mrcieu.ac.uk/datasets/?gwas_id__icontains=ukb-a) | [Neale lab analysis of UK Biobank phenotypes, round 1](http://www.nealelab.is/blog/2017/7/19/rapid-gwas-of-thousands-of-phenotypes-for-337000-samples-in-the-uk-biobank) | 596 | 217 | | [ukb-b](https://gwas.mrcieu.ac.uk/datasets/?gwas_id__icontains=ukb-b) | [IEU analysis of UK Biobank phenotypes](https://data.bris.ac.uk/data/dataset/pnoat8cxo0u52p6ynfaekeigi) | 2,514 | 218 | | [ukb-d](https://gwas.mrcieu.ac.uk/datasets/?gwas_id__icontains=ukb-d) | [Neale lab analysis of UK Biobank phenotypes, round 2](http://www.nealelab.is/uk-biobank) | 904 | 219 | | [ukb-e](https://gwas.mrcieu.ac.uk/datasets/?gwas_id__icontains=ukb-e) | [Pan-ancestry genetic analysis of the UK Biobank performed at the Broad Institute](https://pan.ukbb.broadinstitute.org/) | 3,873 | 220 | 221 | 222 | 223 | ## clean_outcome_from_exposure 224 | 225 | ### 描述 226 | 227 | (主要用于向量化)用于清洗outcome。将exposure中(list形式,也就是批量化的形式存在的exposure)不存在的SNP从outcome中剔除,大幅精简outcome,并大幅提升harmonise_data的速度。 228 | 229 | **实测清洗与不清洗outcome对比,速度相差一百倍以上。** 230 | 231 | ### 用法 232 | 233 | 直接在R语言中调用这个函数,如下所示: 234 | 235 | ``` 236 | # 调用clean_outcome_from_exposure函数 237 | cleaned_outcome <- clean_outcome_from_exposure(expo = exposure_data, outcome = outcome_data) 238 | ``` 239 | 240 | ### 参数 241 | 242 | - `expo`: 暴露数据,格式为**list**,TwoSampleMR包格式。比如get_exposure批量化获取下来的数据 243 | - `outcome`: 结果数据,TwoSampleMR包格式。 244 | 245 | 246 | 247 | ## clean_GWAS 248 | 249 | ### 描述 250 | 251 | 这个函数是用于清洗遗传关联数据集,使其符合特定的数据集要求的功能函数。 252 | 253 | ### 用法 254 | 255 | 直接在R语言中调用这个函数,如下所示: 256 | 257 | ``` 258 | # 调用clean_GWAS函数 259 | cleaned_GWAS_list <- clean_GWAS(list = GWAS_list, clean = c("bbj", "eqtl")) 260 | ``` 261 | 262 | ### 参数 263 | 264 | - `list`:一个list。里面包含每个暴露的data.frame。 具体参考批量运行get_exposure后的结果 265 | - `clean`: 需要清洗的数据集名称,一个字符型向量类型。具体参考get_ao的附注。 266 | 267 | ### 返回值 268 | 269 | - `clean_GWAS`函数返回一个清洗后的遗传关联数据集列表,符合特定数据集要求。 270 | 271 | 272 | 273 | ## 作者信息 274 | 275 | - 代码作者:广州医科大学 第一临床学院 周浩彬 第二临床学院 谢治鑫 276 | - 帮助文档作者: 周浩彬 277 | - 时间:2023/5/1 278 | - 适配版本: Get_MR2.0 279 | - 开源许可证:GPL3.0 280 | - 公众号: GetScience 281 | - 致谢:感谢广州医科大学 第六临床学院 黄覃耀和 南山学院 林子凯在孟德尔随机化概念,代码思路等提供的重要的建设性建议。 -------------------------------------------------------------------------------- /Get_MR1.0 help.md: -------------------------------------------------------------------------------- 1 | # Get_MR1.0 2 | 3 | ## 1. 写在前面: 4 | 5 | ### 1.1 项目地址 6 | 7 | **github:**[HaobinZhou/Get_MR: A package for running MR In batches and in parallel quickly (github.com)](https://github.com/HaobinZhou/Get_MR) 8 | 9 | **如果觉得好用,可以点一下github项目上的小星星吗,这是我们继续开源的最大动力,谢谢!** 10 | 11 | 12 | 13 | 14 | 15 | ### 1.2 R包使用方法: 16 | 17 | **R包以R脚本的形式提供,打开R包,全选运行,即得到所有function** 18 | 19 | 1. 进入github[HaobinZhou/Get_MR: A package for running MR In batches and in parallel quickly (github.com)](https://github.com/HaobinZhou/Get_MR),下载代码zip 20 | 21 | 2. ```R 22 | source("./Get_MR1.0.r") ## 填文件所在地址 23 | 24 | ## 或者直接打开R文件,全选代码运行也可以! 25 | ``` 26 | 27 | 28 | 29 | 30 | 31 | ### 1.3 常见问题: 32 | 33 | 1. **本地clump,1000G处理好的MAF文件,MRlap依赖文件如何获取**: GetScience公众号可免费获取已处理好文件,回复"依赖文件"即得链接。源文件请查看本文相应function介绍处 34 | 35 | 2. **输入clump文件路径后总是报错**: 36 | 37 | ```R 38 | #尤其注意这个文件名的书写,因为他们是二进制文件,不需要写后缀!只需要选取对应的人种即可,比如欧洲人: 39 | LD_file="S:/GWAS数据/本地LD依赖文件/EUR" 40 | 41 | ## 这个问题我回答好多遍啦! 42 | ``` 43 | 44 | 45 | 46 | 3. **第一次使用如何安装关联R包:**[Get_MR/1.0 at main · HaobinZhou/Get_MR (github.com)](https://github.com/HaobinZhou/Get_MR/tree/main/1.0) 47 | 48 | 1. 如果不需要使用`MungeSumstats`包(相关函数包括:`format_Mun`,`get_chr_pos`,`format_getmr`中`source="ukb_nosnp"`) ,则只需要运行[Get_MR1.0dependence.R](https://github.com/HaobinZhou/Get_MR/blob/main/1.0/Get_MR1.0dependence.R) 49 | 2. 如果需要使用`MungeSumstats`包,则还需运行[Install_Reference_Genome.r](https://github.com/HaobinZhou/Get_MR/blob/main/1.0/Install_Reference_Genome.r) 这个包括了hg19和hg38的基因组参考文件,总大小达到了5G!**如果直接安装失败,在GetScience公众号回复"基因组参考"可得下载链接,并本地安装**(推荐) 50 | 51 | 4. **Bug反馈**:代码仅由两人编写,难免出现错误。欢迎提交bug到GetScience公众号后台! 52 | 53 | 5. **感谢所有Get_MR使用的R包作者**,是因为他们我们才得以轻松实现这么多复杂的功能。他们都是开源的,因此我们承诺Get_MR将**永久免费开源**。这意味着使用者可以随意地修改,分发代码,但前提是遵守: 54 | 55 | **1.本代码不得用于任何商业或盈利目的** 56 | 57 | **2.未经代码作者的同意,本代码不得用于任何形式的销售或商业交易** 58 | 59 | **3.本代码可以在非商业性的科研、学术研究和个人使用的情况下免费使用** 60 | 61 | **4.在使用本代码并重新打包并向公众发放时,请引用我们的公众号原文** 62 | 63 | 64 | 65 | ## 2. 帮助文档目录 66 | 67 | [TOC] 68 | 69 | #进阶MR分析 70 | 71 | ## LDSC_rg 72 | 73 | 用于计算两个数据框中SNP之间的遗传相关性(rg)。 74 | 75 | ### 用法 76 | 77 | ```R 78 | LDSC_rg(expo, outcome, an, sample_prev = NA, population_prev = NA, 79 | ld, wld, chr_filter = c(1:22), n_blocks = 200) 80 | ``` 81 | 82 | ### 参数 83 | 84 | - `expo`: 一个数据框,其中包含一个遗传暴露指标的多个SNP和它们与结果变量的rg。 85 | - `outcome`: 一个数据框,其中包含一个结果变量的多个SNP和它们与遗传暴露指标的rg。 86 | - `an`: 它是一个字符串,目前还没有作用(因为我们提供的依赖文件只有eur的,其他人种还没更新) 87 | - `sample_prev`: 遗传暴露指标的样本流行病学先验患病率。默认为 `NA`。 88 | - `population_prev`: 遗传暴露指标的人群流行病学先验患病率。默认为 `NA`。 89 | - `ld`: 本地LD依赖文件 90 | - `wld`: 本地weighted LD 依赖文件 91 | - `chr_filter`: 一个整数向量,用于指定要使用的染色体。默认为包含1-22的整数向量。 92 | - `n_blocks`: 用于计算加权LD矩阵的块数。默认为200。 93 | 94 | ### 返回值 95 | 96 | 一个具有以下元素的列表: 97 | 98 | - `rg`: 两个数据框中SNP之间的遗传相关性(rg)。 99 | - `pval`: `rg` 的双侧P值。 100 | - `N_snps`: 参与计算rg的SNP数量。 101 | 102 | ### 示例 103 | 104 | **具体用法参照:mr_lap和LDSC_rg示例.r** 可通过公众号GetScience回复示例获取文件 105 | 106 | 107 | 108 | ## mr_lap 109 | 110 | ### 描述 111 | 112 | mrlap是一种矫正样本重叠后的双样本MR方法。可用于怀疑有样本重叠的数据中。 113 | 114 | R包官网:[n-mounier/MRlap: R package to perform two-sample Mendelian Randomisation (MR) analyses using (potentially) overlapping samples (github.com)](https://github.com/n-mounier/MRlap) 115 | 116 | ### 语法 117 | 118 | ```R 119 | mr_lap(expo, outcome, ld, hm3, pval, r2, kb, MR_reverse = 1e-03, save_logfiles = F) 120 | ``` 121 | 122 | 123 | 124 | ### 参数 125 | 126 | - `expo`: 数据框,为TwoSampleMR包格式的数据 127 | - `outcome`: 数据框,为TwoSampleMR包格式的数据 128 | - `ld`: 数据框,本地LD文件路径 129 | - `hm3`: 数据框,本地HapMap3文件路径 130 | - `pval`: 数值,MR 工具变量阈值。 131 | - `r2`: 数值,clump阈值 132 | - `kb`: 数值,clump阈值 133 | - `MR_reverse`: 数值,MR 的方向翻转阈值。 134 | - `save_logfiles`: 逻辑值,是否保存日志文件。 135 | 136 | ### 值 137 | 138 | - res: mrlap 结果。 139 | 140 | 141 | 142 | ### 用法 143 | 144 | **具体用法参照:mr_lap和LDSC_rg示例.r** 可通过公众号GetScience回复示例获取文件 145 | 146 | 147 | 148 | ## cause_getmr函数 149 | 150 | ### 描述 151 | 152 | 一键式执行cause。可批量化执行多暴露对一结局或一暴露对多结局 153 | 154 | ### 用法 155 | 156 | ```R 157 | ## 不并行化运行 158 | cause_getmr(expo, outcome, LD_file, r2 = 0.001, kb = 10000, pval = 1e-05) 159 | 160 | ## 并行化运行 161 | cl<-makeCluster(2) ## 填你想要的并行化的核数,核数越多,需要的运行内存越大 162 | cause_getmr(expo, outcome, LD_file, r2 = 0.001, kb = 10000, pval = 1e-05,cl=cl) 163 | ``` 164 | 165 | ### 参数 166 | 167 | - `expo`: TwoSampleMR的暴露格式的数据。 168 | - `outcome`: TwoSampleMR的暴露格式的数据。 169 | - 注意!expo和outcome,可以是data.frame的形式,也可以是一个list(如list[[1:n]]里都包含数据的data.frame)。但不能outcome和expo同时都是list。当expo或outcome,其中一个为list的情况下,是批量运行一对一的cause。比如我读取了10个暴露和1个结局,将10个暴露lapply读取进来就会是一个list。这时候`cause_cyclemr` 自动运行每个暴露对结局的cause,也就是批量化执行. 170 | 171 | ```R 172 | ## 比如我这里读取3个暴露文件和1和结局文件 173 | id<-c('a.gz','b.gz','c.gz') 174 | expo<-lapply(id,FUN=fread) 175 | outcome<-fread("outcome.gz") 176 | cl=makeCluster(4)## 内存不够的也可以不并行化运行 177 | res<-cause_getmr(expo, outcome, LD_file, r2 = 0.001, kb = 10000, pval = 1e-05,cl=cl) 178 | stopCluster(cl) 179 | ## 这样返回的结果就是3个暴露分别对一个结局的cause结果。 180 | ``` 181 | 182 | - `LD_file`: 包含LD信息的PLINK文件名。因为需要大批量地clump,在线clump很容易报错,因此我们采用本地clump。需要本地参考文件。下载地址: http://fileserve.mrcieu.ac.uk/ld/1kg.v3.tgz 。 或关注公众号GetScience直接获取。 183 | 184 | ```R 185 | #尤其注意这个文件名的书写,因为他们是二进制文件,不需要写后缀!只需要选取对应的人种即可,比如欧洲人: 186 | LD_file="S:/GWAS数据/本地LD依赖文件/EUR" 187 | 188 | ## 这个问题我回答好多遍啦! 189 | ``` 190 | 191 | - `r2`: LD的R平方阈值。默认值为0.001。 192 | - `kb`: LD的距离阈值(以kb为单位)。默认值为10000。 193 | - `pval`: 用于LD clumping的p值阈值。默认值为1e-05。 194 | - `cl`: 并行计算的cluster对象。默认值为NULL。在外部使用cl=makeCluster(n),n为你想并行化的核数。注意核数太多不要爆内存了。 195 | 196 | ### 值 197 | 198 | `cause_cyclemr`函数返回cause结果 199 | 200 | 201 | 202 | ## RAPS_getmr 203 | 204 | ### 描述 205 | 206 | `RAPS_getmr`函数执行基于RAPS的MR并返回结果,并画图 207 | 208 | ### 用法 209 | 210 | ```R 211 | expo<-fread('a.gz') 212 | outcome<-fread('b.gz') 213 | expo<-format_data(...) 214 | outcome<-format_data(...) ## format_data是TwoSampleMR包的函数,格式化。 215 | expo<-pblapply(expo,pval=1,kb=10000,r2=0.001,LD_file=LD_file,FUN=clean_expo) ## 数据很大,建议本地clump,在线很容易报错 216 | dat<-harmonise(expo,outcome) 217 | res<-RAPS_getmr(dat, dir_figure) 218 | ``` 219 | 220 | ### 参数 221 | 222 | - `dat`: TwoSampleMR包 harmonise_data后输出的数据 223 | - `dir_figure`: 保存结果图形的目录。 224 | 225 | ### 值 226 | 227 | `RAPS_getmr`函数返回一个包含基于RAPS的MR结果的数据框。 228 | 229 | 230 | 231 | 232 | 233 | ## mr_Presso 234 | 235 | ### 描述 236 | 237 | 执行MR-PRESSO 238 | 239 | ### 语法 240 | 241 | ```R 242 | mr_Presso(dat, num = 10000) 243 | ``` 244 | 245 | 246 | 247 | ### 参数 248 | 249 | - `dat`: 数据框,包含基因表达和疾病风险关联分析的数据。 250 | - `num`: 整数,模拟数量。 251 | 252 | ### 值 253 | 254 | - `mr_presso_res`: MR-PRESSO 结果。 255 | 256 | ### 用法 257 | 258 | ```R 259 | dat<-harmonise_data(exposure,outcome) ## TwoSampleMR包的harmonise_data函数输出的结果 260 | mr_presso_res <- mr_Presso(dat, num = 10000) 261 | ``` 262 | 263 | 264 | 265 | ## mr_presso_pval函数 266 | 267 | ### 描述 268 | 269 | 提取 MR-PRESSO 结果中的主要结果 270 | 271 | ### 语法 272 | 273 | ```R 274 | mr_presso_pval(mr_presso_res) 275 | ``` 276 | 277 | 278 | 279 | ### 参数 280 | 281 | - `mr_presso_res`: MR-PRESSO 结果。 282 | 283 | ### 值 284 | 285 | - mr_presso_main: MR-PRESSO 主要结果。 286 | 287 | ### 用法 288 | 289 | ```R 290 | mr_presso_main <- mr_presso_pval(mr_presso_res) ##mr_Presso输出的结果 291 | ``` 292 | 293 | 294 | 295 | ## mr_presso_snp函数 296 | 297 | ### 描述 298 | 299 | 根据 MR-PRESSO 分析结果,将离群值剔除,返回剔除离群值后的dat(我一般称为dat_aj, 也就是 adjusted_data), 可用于后续的IVW等分析。 300 | 301 | ### 语法 302 | 303 | ```R 304 | mr_presso_snp(mr_presso_res, mr_presso_main, dat, type = "list") 305 | ``` 306 | 307 | 308 | 309 | ### 参数 310 | 311 | - `mr_presso_res`: MR-PRESSO 结果。 312 | - `mr_presso_main`: MR-PRESSO 主要结果。 313 | - `dat`: 数据框或数据框列表,包含基因表达和疾病风险关联分析的数据。 314 | - `type`: 字符串,输入数据类型。可选值为 "list" 或 "data"。如果是列表形式的(批量化运行后的结果),就是`list`,如果是普通数据框就是data 315 | 316 | ### 值 317 | 318 | 过滤后的数据框或数据框列表。 319 | 320 | ### 用法 321 | 322 | ```R 323 | dat<-harmonise_data(exposure,outcome) ## TwoSampleMR包的harmonise_data函数输出的结果 324 | mr_presso_res <- mr_Presso(dat, num = 10000) 325 | mr_presso_main <- mr_presso_pval(mr_presso_res) 326 | data_aj <- mr_presso_snp(mr_presso_res, mr_presso_main, dat, type = "data") 327 | 328 | ## 用矫正的data可以用于后续的分析,例如重新计算mr 329 | res_aj<-mr(data_aj) 330 | ``` 331 | 332 | 333 | 334 | 335 | 336 | 337 | 338 | 339 | 340 | # 快捷预处理及质控工具 341 | 342 | ## format_Mun 343 | 344 | ### 介绍 345 | 346 | 运用MungeSumstats包标准化GWAS 摘要统计数据(包括hg19和hg38转换)。该函数可以将来自Finngen R8和其他来源的 GWAS 摘要统计数据文件清洗为标准的GWAS文件,并可将基因组位置从 `ref_genome` 转换到 `convert_ref_genome`。 347 | 348 | ### 用法 349 | 350 | ```R 351 | format_Mun(file, source = "finn_r8", save_path = NULL, lift = F, ref_genome = "hg38", convert_ref_genome = "hg19") 352 | ``` 353 | 354 | ### 参数 355 | 356 | - `file`:字符向量或数据框,表示要格式化的 GWAS 摘要统计数据文件或数据框。如果输入的是字符向量,则表示文件的路径。如果输入的是数据框,则表示要格式化的数据框。 357 | - `source`:字符向量,表示输入文件的来源。默认为 `"finn_r8"`。 358 | - `save_path`:字符向量,表示格式化文件要保存的路径。默认为 `NULL`。 359 | - `lift`:逻辑值,表示是否将基因组位置从 `ref_genome` 转换到 `convert_ref_genome`。默认为 `F`。 360 | - `ref_genome`:字符向量,表示 GWAS 摘要统计数据文件使用的参考基因组。默认为 `"hg38"`。 361 | - `convert_ref_genome`:字符向量,表示要将基因组位置转换到的参考基因组。默认为 `"hg19"`。 362 | 363 | ### 例子 364 | 365 | ```R 366 | # 从文件中格式化数据 367 | format_Mun("my_sumstats.txt", save_path = "~/formatted_sumstats", lift = F, ref_genome = "hg38") 368 | 369 | # 从数据框中格式化数据 370 | my_sumstats_df <- read.csv("my_sumstats.csv") 371 | format_Mun(my_sumstats_df, save_path = "~/formatted_sumstats", lift = F, ref_genome = "hg38") 372 | 373 | #格式化数据并升降版本 374 | format_Mun(my_sumstats_df, save_path = "~/formatted_sumstats", lift = T, ref_genome = "hg38", convert_ref_genome = "hg19") ## 从hg38转为hg19 375 | ``` 376 | 377 | ### 返回值 378 | 379 | 该函数返回格式化的数据框并将其写入磁盘文件。`save_path`指定保存的位置 380 | 381 | 382 | 383 | 384 | 385 | ## format_getmr 386 | 387 | ### 介绍 388 | 389 | 预设的快捷格式化 GWAS 摘要统计数据,这个函数用于将来自多个数据来源的 GWAS 摘要统计数据转换为TwoSampleMR 包所需的格式。 390 | 391 | ### 用法 392 | 393 | ``` 394 | format_getmr(data, type = "exposure", source = "finn_r8") 395 | ``` 396 | 397 | ### 参数 398 | 399 | - `data`:数据框,表示要格式化的 GWAS 摘要统计数据。 400 | - `type`:字符向量,表示数据类型,可以是 "exposure" 或 "outcome"。默认为 "exposure"。 401 | - `source`:字符向量,表示数据来源。默认为 "finn_r8"。目前支持的来源有: 402 | - "finn_r8": [Data download - FinnGen Documentation (gitbook.io)](https://finngen.gitbook.io/documentation/data-download) 403 | - "ukb_nosnp": 尼尔数据库(UKB),因为没有rsid,因此需要匹配(已一键完成)。[www.nealelab.is/uk-biobank](http://www.nealelab.is/uk-biobank) 404 | - "Mun": 来自MungeSumstats包格式化后的数据 405 | - "covid": [COVID19-hg GWAS meta-analyses round 7 (covid19hg.org)](https://www.covid19hg.org/results/r7/) 406 | - "outcome" : 已经格式化为TwoSampleMR包的“outcome”格式 407 | - "exposure":已经格式化为TwoSampleMR包的“exposure”格式 408 | - "fast_ukb": [fastGWA | Yang Lab (westlake.edu.cn)](https://yanglab.westlake.edu.cn/data/ukb_fastgwa/imp_binary/) 409 | - "bac": 2021年肠菌原文数据 [Large-scale association analyses identify host factors influencing human gut microbiome composition - PMC (nih.gov)](https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8515199/) 410 | 411 | ### 例子 412 | 413 | ```R 414 | my_data <- fread("my_data.gz") 415 | format_getmr(my_data, type = "finn_r8", source = "Mun") 416 | ``` 417 | 418 | ### 返回值 419 | 420 | 该函数返回格式化的数据框。 421 | 422 | 423 | 424 | ## format_trait 425 | 426 | ### 介绍 427 | 428 | 这个函数用于格式化 GWAS 摘要统计数据中的表型信息,使其符合命名规范,易于保存为文件(例如批量保存计算R2和F值后的文件)。 429 | 430 | 主要是为了解决,在Windows系统下,保存文件的名称中不能包含特殊字符,例如`:`,`|`。 431 | 432 | ### 用法 433 | 434 | ```R 435 | format_trait(list, short = FALSE, short_num = "40") 436 | ``` 437 | 438 | ### 参数 439 | 440 | - `list`:列表,表示要格式化的 GWAS 摘要统计数据列表。 441 | - `short`:逻辑值,表示是否要将表型名称缩短。默认为 FALSE。 442 | - `short_num`:字符向量,表示缩短表型名称的长度。默认为 "40"。 443 | 444 | ### 例子 445 | 446 | ```R 447 | my_list <- list(data1, data2, data3) 448 | format_trait(my_list, short = TRUE, short_num = "20") 449 | ``` 450 | 451 | ### 返回值 452 | 453 | 该函数返回格式化后的 GWAS 摘要统计数据列表。 454 | 455 | 456 | 457 | 458 | 459 | ## read_vcf_getmr 460 | 461 | ### 介绍 462 | 463 | 这个函数用于从 VCF 文件中读取摘要统计数据。并保存为压缩文件。默认是.gz为后缀的压缩文件。方便下次读取以及节省空间。 464 | 465 | 这是因为读取VCF文件将消耗大量电脑资源。我们建议批量读取VCF文件后储存为易于读取的压缩包形式。下次读取方便快捷。因此本函数不会直接返回数据框,而是保存为文件 466 | 467 | ### 用法 468 | 469 | ``` 470 | read_vcf_getmr(file_name, nThread = 8, type = ".gz") 471 | ``` 472 | 473 | ### 参数 474 | 475 | - `file_name`:字符向量,表示要读取的 VCF 文件名。 476 | - `nThread`:整数,表示要使用的线程数。默认为 8。 477 | - `type`:字符向量,表示输出文件类型。默认为 ".gz"。 478 | 479 | ### 例子 480 | 481 | ```R 482 | my_file <- "my_file.vcf" 483 | read_vcf_getmr(my_file, nThread = 4, type = ".gz") 484 | ``` 485 | 486 | ### 返回值 487 | 488 | 该函数没有返回值,而是将读取的数据写入文件。 489 | 490 | 491 | 492 | ## read_easy 493 | 494 | ### 介绍 495 | 496 | 这个函数用于从文件中读取 GWAS 摘要统计数据。并返回经过P值筛选的文件。一般用于批量读取大量文件时。比如我要批量读取100个暴露数据,每个数据占用运行内存2G。如果100个,则200G,不是一般电脑可以承受。因此每次读取将直接筛选p值,压缩大小 497 | 498 | ### 用法 499 | 500 | ```R 501 | read_easy(file_name, pval = 5e-08) 502 | ``` 503 | 504 | ### 参数 505 | 506 | - `file_name`:字符向量,表示要读取的文件名。 507 | - `pval`:数字,表示筛选摘要统计数据的显著性水平。默认为 5e-08。 508 | 509 | ### 例子 510 | 511 | ```R 512 | my_file <- "my_file.csv" 513 | read_easy(my_file, pval = 1e-06) 514 | ``` 515 | 516 | ### 返回值 517 | 518 | 该函数返回摘要统计数据的数据框。 519 | 520 | 521 | 522 | 523 | 524 | ## get_eaf_from_1000G 525 | 526 | ### 介绍 527 | 528 | 从1000G的MAF文件中提取EAF并将其与输入数据匹配。 529 | 530 | ### 用法 531 | 532 | ``` 533 | get_eaf_from_1000G(dat, path, type = "exposure") 534 | ``` 535 | 536 | ### 参数 537 | 538 | - `dat`:一个数据框,为TwoSampleMR包格式的数据 539 | - `path`:一个字符串,表示包含1000G MAF文件`fileFrequency.frq`的目录路径。 540 | - `type`:一个字符串,表示数据是“exposure”(暴露因素)还是“outcome”(结果),默认为“exposure”。 541 | 542 | ### 值 543 | 544 | 一个数据框,其中包含输入数据的EAF和类型信息(原始、修正或错误)。 545 | 546 | ### 详细说明 547 | 548 | 该函数将读取1000G MAF文件,然后将其与输入数据进行匹配。对于每个SNP,该函数将检查输入数据中的效应等位基因与1000G中的效应等位基因是否匹配。 549 | 550 | 如果不匹配,则将EAF设置为NA并将其类型设置为“error”。对于匹配的SNP,EAF将设置为1000G MAF中的值(如果输入数据的效应等位基因是minor allele),或1-MAF(如果输入数据的效应等位基因是major allele),并将其类型设置为“raw”或“corrected”。 551 | 552 | 如果`type`参数设置为“outcome”,则函数将使用输入数据中的结果等位基因来查找EAF。 553 | 554 | 在处理完所有SNP后,该函数将输出一些有关匹配成功、失败以及NA的信息,以及类型信息的说明。 555 | 556 | ### 示例 557 | 558 | 以下是使用该函数的示例: 559 | 560 | ```R 561 | dat <- get_eaf_from_1000G(dat, "S:/GWAS数据/本地LD依赖文件", type = "exposure") 562 | 563 | # 检查输出 564 | head(dat) 565 | ``` 566 | 567 | `fileFrequency.frq`为PLINK1.9输出的,根据1000G数据提取的MAF数据 568 | 569 | 1000G参考文件下载地址:http://fileserve.mrcieu.ac.uk/ld/1kg.v3.tgz 570 | 571 | 可自行提取MAF数据。或从`GetScience`公众号中获取已经提取好的`fileFrequency.frq`文件 572 | 573 | 574 | 575 | ## get_chr_pos 576 | 577 | 该函数利用MungeSumstats包匹配rsid的染色体位置。 578 | 579 | ### 用法 580 | 581 | ```R 582 | get_chr_pos(dat, type = "exposure") 583 | ``` 584 | 585 | ### 参数 586 | 587 | - `dat`:一个数据框,TwoSampleMR格式 588 | - `type`:一个字符串,表示要获取SNP染色体位置和参考序列的SNP类型。可选值为"exposure"或"outcome"。 589 | 590 | ### 返回值 591 | 592 | 一个数据框,其中包含输入数据框的信息,以及新列`chr.exposure`或`chr.outcome`,表示每个SNP的染色体编号。新列`pos.exposure`或`pos.outcome`表示每个SNP在染色体上的位置。 593 | 594 | ### 函数说明 595 | 596 | 该函数使用`format_sumstats`和`format_data`函数从1000G项目中获取SNP的染色体位置和参考序列信息。 597 | 598 | ### 示例 599 | 600 | 以下示例演示如何使用`get_chr_pos`函数: 601 | 602 | ```R 603 | # 获取曝露变量SNP的染色体位置和参考序列 604 | exposure_chr_pos <- get_chr_pos(dat, type = "exposure") 605 | 606 | # 获取结果变量SNP的染色体位置和参考序列 607 | outcome_chr_pos <- get_chr_pos(dat, type = "outcome") 608 | ``` 609 | 610 | 611 | 612 | ## get_f函数 613 | 614 | ### 描述 615 | 616 | `get_f`函数计算样本的F统计量并返回F值大于指定阈值的样本数据。返回的数据包括每个SNP的R2和F值 617 | 618 | ### 用法 619 | 620 | ```R 621 | get_f(dat, F_value = 10) 622 | ``` 623 | 624 | ### 参数 625 | 626 | - `dat`: TwoSampleMR格式,一定要包含`eaf.exposure`, `beta.exposure`, `se.exposure`, 和`samplesize.exposure`的数据框。 627 | - `F_value`: 指定的F统计量的阈值,F值大于该阈值的样本将被返回。默认值为10。 628 | 629 | 630 | 631 | 注,本计算公式为 632 | 633 | F值:![img](A:\OneDrive\GET\assets\clip_image002.gif) R2值:![image-20230327151305186](A:\OneDrive\GET\assets\image-20230327151305186.png) 634 | 635 | 公式参考文献: 636 | 637 | [A Multivariate Genome-Wide Association Analysis of 10 LDL Subfractions, and Their Response to Statin Treatment, in 1868 Caucasians - PMC (nih.gov)](https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4405269/) 638 | 639 | [Large-scale association analyses identify host factors influencing human gut microbiome composition - PMC (nih.gov)](https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8515199/) 640 | 641 | 642 | 643 | 644 | 645 | 646 | 647 | ### 648 | 649 | ## mr_dircreate_base 650 | 651 | ### 描述 652 | 653 | `mr_dircreate_base`函数创建基本的目录结构以保存MR分析结果和图形。 654 | 655 | ### 用法 656 | 657 | ```R 658 | mr_dircreate_base(root_dir, project_name, date = NULL) 659 | ``` 660 | 661 | ### 参数 662 | 663 | - `root_dir`: 保存结果文件夹的根目录。 664 | - `project_name`: 项目名称,用于在根目录下创建一个以此命名的子目录。 665 | - `date`: 日期(可选),用于在子目录名称中添加日期以区分不同日期的结果文件夹。默认为`NULL`,表示不添加日期。 666 | 667 | ### 值 668 | 669 | `mr_dircreate_base`函数返回一个包含结果文件夹路径的列表。 670 | 671 | ### 示例 672 | 673 | ```R 674 | # 创建结果文件夹 675 | res_dir <- mr_dircreate_base("path/to/root/dir", "project_name", date = "20220327") 676 | 677 | # 打印结果文件夹路径 678 | print(res_dir) 679 | ``` 680 | 681 | ### 注意事项 682 | 683 | - `root_dir`参数指定保存结果文件夹的根目录。 684 | - `project_name`参数指定项目名称,用于在根目录下创建一个以此命名的子目录。 685 | - `date`参数(可选)用于在子目录名称中添加日期以区分不同日期的结果文件夹。默认为`NULL`,表示不添加日期。 686 | - `mr_dircreate_base`函数将在根目录下创建一个名为`project_name`的子目录,并在该子目录下创建4个子目录,分别命名为`1.figure`、`2.table`、`3.figure of sig res`和`4.snp with Fval`。函数返回一个包含结果文件夹路径的列表。 687 | 688 | 689 | 690 | 691 | 692 | 693 | 694 | ## clean_expo 695 | 696 | ### 描述 697 | 698 | 用于快捷筛选工具变量的函数,可执行P值筛选,EAF值筛选,clump。 699 | 700 | ### 用法 701 | 702 | ```R 703 | ## 完整可选参数 704 | clean_expo(expo, pval, low_af = 0.5, high_af = 0.5, clump = TRUE, kb = 10000, r2 = 0.001, LD_file = NULL, af_filter = FALSE) 705 | 706 | ##不提供LD_file则自动在线clump 707 | clean_expo(expo, pval, clump = TRUE, kb = 10000, r2 = 0.001) 708 | ##提供则本地clump 709 | clean_expo(expo, pval, clump = TRUE, kb = 10000, r2 = 0.001,LD_file=LD_file) 710 | ``` 711 | 712 | ### 参数 713 | 714 | - `expo`: 一个数据框,其中包含遗传暴露指标的SNP名称、beta值、标准误、p值和频率。 715 | - `pval`: 用于筛选遗传暴露指标的p值阈值。p值小于此阈值的SNP将被保留。 716 | - `low_af`: 频率过滤的下限值。默认为0.5。如果`af_filter`为TRUE,则只有遗传暴露指标的频率低于此值或高于`high_af`时,才会被保留。 717 | - `high_af`: 频率过滤的上限值。默认为0.5。 718 | - `clump`: 一个逻辑值,指示是否使用PLINK进行SNP聚类。默认为TRUE。 719 | - `kb`: 聚类的窗口大小(以kb为单位)。默认为10000。 720 | - `r2`: LD阈值。默认为0.001。 721 | - `LD_file`: PLINK二进制文件的路径。如果未提供,则默认在线clump。 722 | - `af_filter`: 一个逻辑值,指示是否启用频率过滤。默认为FALSE。 723 | 724 | 725 | 726 | ## clean_list 727 | 728 | ### 描述 729 | 730 | 用于清理列表中元素行数的R包。常用于批量化质量控制。 731 | 732 | ### 用法 733 | 734 | ```R 735 | clean_list(list, nrow = 10) 736 | ``` 737 | 738 | ### 参数 739 | 740 | - `list`: 一个列表,其中包含多个元素。 741 | - `nrow`: 用于筛选元素的行数阈值。如果元素的行数小于此阈值,则该元素将被删除。默认为10。 742 | 743 | ### 返回值 744 | 745 | 一个列表,其中仅包含行数大于 `nrow` 的元素。 746 | 747 | ### 例子 748 | 749 | ``` 750 | # 创建一个包含5个数据框的列表,每个数据框包含1-5行 751 | set.seed(123) 752 | lst <- list(data.frame(a = rnorm(1), b = rnorm(1)), 753 | data.frame(a = rnorm(2), b = rnorm(2)), 754 | data.frame(a = rnorm(3), b = rnorm(3)), 755 | data.frame(a = rnorm(4), b = rnorm(4)), 756 | data.frame(a = rnorm(5), b = rnorm(5))) 757 | 758 | # 运行 clean_list 函数,将阈值设置为2 759 | cleaned_lst <- clean_list(lst, nrow = 3) 760 | 761 | # 查看清理后的列表 762 | cleaned_lst 763 | ``` 764 | 765 | 766 | 767 | 768 | 769 | ## clean_IV_from_outsig 770 | 771 | 用于从一个数据框中清理具有显著的MR反向因果效应P值的IV。 772 | 773 | ### 用法 774 | 775 | ```R 776 | clean_IV_from_outsig(dat, MR_reverse = 1e-03) 777 | ``` 778 | 779 | ### 参数 780 | 781 | - `dat`: 一个数据框,其中包含每个IV和其与结果变量之间的MR反向因果效应的P值。 782 | - `MR_reverse`: 用于筛选IV的MR反向因果效应P值阈值。具有P值小于此阈值的IV将被保留。默认为1e-03。 783 | 784 | ### 返回值 785 | 786 | 一个数据框,其中包含P值大于 `MR_reverse` 值的IV(也就是反向不显著的IVs)。 787 | 788 | 789 | 790 | ## 作者信息 791 | 792 | * 代码作者:广州医科大学 第一临床学院 周浩彬 第二临床学院 谢治鑫 793 | 794 | * 帮助文档作者: 周浩彬 795 | * 时间:2023/3/27 796 | * 适配版本: Get_MR1.0 797 | * 开源许可证:GPL3.0 798 | * 公众号: GetScience 799 | 800 | * 致谢:感谢广州医科大学 第六临床学院 黄覃耀和 南山学院 林子凯在孟德尔随机化概念,代码思路等提供的重要的建设性建议。 801 | 802 | -------------------------------------------------------------------------------- /1.0/Get_MR1.0.r: -------------------------------------------------------------------------------- 1 | library(mr.raps) 2 | library(TwoSampleMR) 3 | #library(plyr) 4 | library(dplyr) 5 | #library(fs) 6 | #library(ggplot2) 7 | #library(lubridate) 8 | library(ieugwasr) 9 | library(plinkbinr) 10 | library(tidyr) 11 | #library(progress) 12 | library(data.table) 13 | library(MRPRESSO) 14 | library(parallel) 15 | #library(foreach) 16 | library(doParallel) 17 | library(pbapply) 18 | library(stringr) 19 | library(cause) 20 | library(vroom) 21 | library(MungeSumstats) 22 | #library(GenomicFiles) 23 | #library(meta) 24 | #library(readr) 25 | #library(readxl) 26 | #library(forestploter) 27 | library(ldscr) 28 | library(MRlap) 29 | #library(meta) 30 | #library(forestplot) 31 | #library(biomaRt) 32 | #library(gwasglue) 33 | #library(coloc) 34 | 35 | # format 36 | format_Mun<-function(file,source="finn_r8",save_path=NULL,lift=F,ref_genome = "hg38", 37 | convert_ref_genome = "hg19"){ 38 | library(data.table) 39 | library(MungeSumstats) 40 | library(dplyr) 41 | if(class(file)!="data.frame"){ 42 | dat<-fread(file) 43 | if(source=="finn_r8"){ 44 | dat<-dat%>%dplyr::select(SNP=rsids,CHR=`#chrom`,BP=pos,A1=ref, 45 | A2=alt,FRQ=af_alt,BETA=beta, 46 | SE=sebeta,P=pval) 47 | } 48 | } 49 | dat<-as.data.frame(dat) 50 | dat<-format_sumstats(dat,return_data = TRUE) 51 | if(is.null(save_path)==F){setwd(save_path)} 52 | dat<-fwrite(dat,paste0(file,'(Mun_format',ref_genome,').gz')) 53 | te<-fs::dir_info( tempdir()) 54 | te<-subset(te,size>1e+8) 55 | fs::file_delete(te$path) 56 | if(lift==T){ 57 | dat <- MungeSumstats::liftover(sumstats_dt = dat, 58 | ref_genome = ref_genome, 59 | convert_ref_genome = convert_ref_genome) 60 | dat<-fwrite(dat,paste0(file,'(Mun_format',convert_ref_genome,').gz')) 61 | } 62 | gc() 63 | } 64 | 65 | format_getmr<-function(data,type="exposure",source="finn_r8"){ 66 | library(TwoSampleMR) 67 | library(MungeSumstats) 68 | 69 | if(source=="finn_r8"){ 70 | data<-format_data(data,type=type, 71 | id_col = "id",chr_col ="#chrom",pos="pos", 72 | snp_col = "rsids",beta_col = "beta",se_col = "sebeta", 73 | effect_allele_col = "alt",other_allele_col = "ref", 74 | eaf_col = "af_alt",phenotype_col = "phenotype",pval_col = "pval") 75 | } 76 | 77 | 78 | 79 | if(source=="ukb_nosnp"){ 80 | data<-separate(data,variant,c("chr","pos","ref","alt"),sep = ":") 81 | 82 | data<-dplyr::select(data,chr,pos,ref,alt,minor_AF,beta,se,pval) 83 | name<-colnames(data) 84 | colnames(data) <-c('CHR','BP','A1','A2','FRQ','BETA', 'SE','P') 85 | 86 | expo_rs_done<-format_sumstats(data,ref_genome = "GRCh37",return_data = TRUE) 87 | 88 | expo_rs_done$BP<-as.character(expo_rs_done$BP) 89 | colnames(data)<-name 90 | data<-merge(data,expo_rs_done,by.x=c("chr","pos"),by.y=c("CHR","BP"),all.x=TRUE) 91 | 92 | data<-format_data(data,type=type, 93 | samplesize_col = "n_complete_samples", 94 | snp_col = "SNP",effect_allele_col = "alt", 95 | other_allele_col = "ref",eaf_col ="minor_AF", 96 | beta_col="beta",se_col = "se",pval_col = "pval", 97 | chr_col="chr",pos_col ="pos",id='id',phenotype_col = 'phenotype') 98 | } 99 | 100 | if(source=="Mun"){ 101 | data<-format_data(data,type=type, 102 | snp_col = 'SNP', 103 | chr_col = 'CHR', 104 | pos_col = 'BP', 105 | effect_allele_col = 'A2', 106 | other_allele_col = "A1", 107 | se_col = 'SE', 108 | beta_col= 'BETA', 109 | eaf_col = 'FRQ', 110 | id_col = 'id', 111 | phenotype_col = "phenotype", 112 | pval_col = "P" 113 | ) 114 | } 115 | 116 | if(source=="covid"){ 117 | data<-format_data(data,type=type, 118 | id_col = "id",chr_col ="#CHR",pos="POS", 119 | snp_col = "rsid",beta_col = "all_inv_var_meta_beta",se_col = "all_inv_var_meta_sebeta", 120 | effect_allele_col = "ALT",other_allele_col = "REF", 121 | eaf_col = "all_meta_AF",phenotype_col = "phenotype", 122 | ncase_col = "all_inv_var_meta_cases",ncontrol_col = "all_inv_var_meta_controls", 123 | pval_col = "all_inv_var_meta_p") 124 | } 125 | 126 | if(source=="outcome"){ 127 | data<-format_data(data,type=type, 128 | id_col = "id.outcome",chr_col ="chr.outcome",pos="pos.outcome", 129 | snp_col = "SNP",beta_col = "beta.outcome",se_col = "se.outcome", 130 | effect_allele_col = "effect_allele.outcome",other_allele_col = "other_allele.outcome", 131 | eaf_col = "eaf.outcome",phenotype_col = "outcome", 132 | samplesize_col = "samplesize.outcome", 133 | pval_col = "pval.outcome") 134 | 135 | } 136 | if(source=="exposure"){ 137 | data<-format_data(data,type=type, 138 | id_col = "id.exposure",chr_col ="chr.exposure",pos="pos.exposure", 139 | snp_col = "SNP",beta_col = "beta.exposure",se_col = "se.exposure", 140 | effect_allele_col = "effect_allele.exposure",other_allele_col = "other_allele.exposure", 141 | eaf_col = "eaf.exposure",phenotype_col = "exposure", 142 | samplesize_col = "samplesize.exposure", 143 | pval_col = "pval.exposure") 144 | 145 | } 146 | 147 | if(source=="fast_ukb"){ 148 | data<-format_data(data,type=type,id_col="id", 149 | phenotype_col = "phenotype",snp_col = "SNP", 150 | effect_allele_col = "A1",other_allele_col = "A2", 151 | eaf_col ="AF1",beta_col="BETA",se_col = "SE", 152 | pval_col = "P",chr_col = "CHR",pos_col = "POS") 153 | } 154 | 155 | 156 | 157 | if(source=="bac"){ 158 | data<-format_data(data,type=type,id_col="bac", 159 | phenotype_col = "phenotype",snp_col = "rsID", 160 | effect_allele_col = "eff.allele",other_allele_col = "ref.allele", 161 | beta_col="beta",se_col = "SE", 162 | pval_col = "P.weightedSumZ",chr_col = "chr",pos_col = "bp",samplesize_col = "N") 163 | } 164 | 165 | if(source=="finn_r7"){ 166 | data<-format_data(data,type=type,snp_col = "rsids", 167 | effect_allele_col = "alt",other_allele_col = "ref", 168 | beta_col="beta",se_col = "sebeta",eaf_col = "af_alt", 169 | pval_col = "pval",chr_col = "chrom",pos_col = "pos") 170 | } 171 | return(data) 172 | } 173 | 174 | 175 | format_trait<-function(list,short=FALSE,short_num="40"){ 176 | for(i in 1:length(list)){ 177 | expo<-data.frame(exposure=list[[i]] 178 | $exposure) 179 | try(expo<-separate(expo,exposure,c('a','b'),sep='\\|\\|')) 180 | if(short==TRUE){try(expo$a <- substr(expo$a, 1, short_num))} 181 | try(expo<-expo$a) 182 | try(expo<-gsub(":","_",expo)) 183 | try(expo<-gsub("-","_",expo)) 184 | try(expo<-gsub(" ","_",expo)) 185 | try(expo<-gsub(",","_",expo)) 186 | try(expo<-gsub("/","_",expo)) 187 | list[[i]]$exposure<-expo[1] 188 | 189 | } 190 | 191 | 192 | return(list) 193 | } 194 | 195 | # read 196 | read_vcf_getmr<-function(file_name,nThread = 8,type=".gz"){ 197 | name<-file_name 198 | 199 | for(i in 1:nrow(name)){ 200 | dat<-read_sumstats(paste0("./",name[i]),nThread = nThread,nrow=Inf,standardise_headers = FALSE,mapping_file = sumstatsColHeaders) 201 | 202 | vroom_write(dat,paste0(name[i],type)) 203 | 204 | gc() 205 | } 206 | print(i) 207 | 208 | } 209 | 210 | read_easy<-function(file_name,pval=5e-08){ 211 | library(data.table) 212 | dat<-fread(file_name) 213 | dat<-subset(dat,pval.exposure%select(SNP,effect_allele.exposure,other_allele.exposure,eaf.exposure,beta.exposure,se.exposure,pval.exposure) 360 | colnames(dat_d) <-c('SNP','A1','A2','FRQ','BETA', 'SE','P') 361 | dat_d<-format_sumstats(dat_d,ref_genome = "GRCh37",return_data = TRUE) 362 | dat_d<-format_data(dat_d,type=type, 363 | snp_col = 'SNP', 364 | chr_col = 'CHR', 365 | pos_col = 'BP', 366 | effect_allele_col = 'A1', 367 | other_allele_col = "A2", 368 | se_col = 'SE', 369 | beta_col= 'BETA', 370 | eaf_col = 'FRQ', 371 | pval_col = "P" 372 | ) 373 | dat_d<-dat_d%>%select(SNP,chr.exposure,pos.exposure) 374 | dat<-merge(dat,dat_d,by="SNP",all.x=T) 375 | 376 | return(dat) 377 | } 378 | } 379 | if(type=="outcome"){ 380 | if(is.na(dat$eaf.outcome[1])==T || is.null(dat$eaf.outcome)==T){ 381 | print("需要先运行get_eaf_from_1000G来匹配eaf,再匹配chr和pos") 382 | } 383 | else{ 384 | dat_d<-dat%>%select(SNP,effect_allele.outcome,other_allele.outcome,eaf.outcome,beta.outcome,se.outcome,pval.outcome) 385 | colnames(dat_d) <-c('SNP','A1','A2','FRQ','BETA', 'SE','P') 386 | dat_d<-format_sumstats(dat_d,ref_genome = "GRCh37",return_data = TRUE) 387 | dat_d<-format_data(dat_d,type=type, 388 | snp_col = 'SNP', 389 | chr_col = 'CHR', 390 | pos_col = 'BP', 391 | effect_allele_col = 'A1', 392 | other_allele_col = "A2", 393 | se_col = 'SE', 394 | beta_col= 'BETA', 395 | eaf_col = 'FRQ', 396 | pval_col = "P" 397 | ) 398 | dat_d<-dat_d%>%select(SNP,chr.outcome,pos.outcome) 399 | dat<-merge(dat,dat_d,by="SNP",all.x=T) 400 | 401 | return(dat) 402 | } 403 | 404 | } 405 | } 406 | 407 | get_f<-function(dat,F_value=10){ 408 | log<-is.na(dat$eaf.exposure) 409 | log<-unique(log) 410 | if(length(log)==1) 411 | {if(log==TRUE){ 412 | print("数据不包含eaf,无法计算F统计量") 413 | return(dat)} 414 | } 415 | if(is.null(dat$beta.exposure[1])==T || is.na(dat$beta.exposure[1])==T){print("数据不包含beta,无法计算F统计量") 416 | return(dat)} 417 | if(is.null(dat$se.exposure[1])==T || is.na(dat$se.exposure[1])==T){print("数据不包含se,无法计算F统计量") 418 | return(dat)} 419 | if(is.null(dat$samplesize.exposure[1])==T || is.na(dat$samplesize.exposure[1])==T){print("数据不包含samplesize(样本量),无法计算F统计量") 420 | return(dat)} 421 | 422 | 423 | if("FALSE"%in%log && is.null(dat$beta.exposure[1])==F && is.na(dat$beta.exposure[1])==F && is.null(dat$se.exposure[1])==F && is.na(dat$se.exposure[1])==F && is.null(dat$samplesize.exposure[1])==F && is.na(dat$samplesize.exposure[1])==F){ 424 | R2<-(2*(1-dat$eaf.exposure)*dat$eaf.exposure*(dat$beta.exposure^2))/((2*(1-dat$eaf.exposure)*dat$eaf.exposure*(dat$beta.exposure^2))+(2*(1-dat$eaf.exposure)*dat$eaf.exposure*(dat$se.exposure^2)*dat$samplesize.exposure)) 425 | F<- (dat$samplesize.exposure-2)*R2/(1-R2) 426 | dat$R2<-R2 427 | dat$F<-F 428 | dat<-subset(dat,F>F_value) 429 | return(dat) 430 | } 431 | } 432 | 433 | 434 | # MR 435 | cause_getmr<-function(expo,outcome,LD_file,r2=0.001, 436 | kb=10000,pval=1e-05,cl=NULL){ 437 | format_cause_expo<-function(dat){ 438 | library(cause) 439 | dat<-gwas_format(dat,snp='SNP',beta_hat ='beta.exposure', 440 | se='se.exposure',A1="effect_allele.exposure", 441 | A2="other_allele.exposure",chrom='chr.exposure', 442 | pos="pos.exposure",p_value = 'pval.exposure' 443 | ) 444 | return(dat) 445 | } 446 | format_cause_out<-function(dat){ 447 | library(cause) 448 | dat<-gwas_format(dat,snp='SNP',beta_hat ='beta.outcome', 449 | se='se.outcome',A1="effect_allele.outcome", 450 | A2="other_allele.outcome",chrom='chr.outcome', 451 | pos="pos.outcome",p_value = 'pval.outcome' 452 | ) 453 | return(dat) 454 | } 455 | sample_cause<-function(dat,num_snp){ 456 | set.seed(123) 457 | VAR<-with(dat,sample(snp,size=num_snp,replace=FALSE)) 458 | return(VAR) 459 | } 460 | datap_cause<-function(dat){ 461 | dat$p1<-pnorm(abs(dat$beta_hat_1/dat$seb1),lower.tail=F)*2 462 | return(dat) 463 | } 464 | dat_clump_cause<-function(dat){ 465 | for_clump<-data.frame(dat$snp,dat$p1) 466 | colnames(for_clump)<-c('rsid','pval') 467 | return(for_clump) 468 | } 469 | datap_cause<-function(dat){ 470 | dat$p1<-pnorm(abs(dat$beta_hat_1/dat$seb1),lower.tail=F)*2 471 | return(dat) 472 | } 473 | ld_local<-function(dat,r2,kb,p,LD_file){ 474 | library(plinkbinr) 475 | 476 | plink_pathway<-get_plink_exe() 477 | 478 | dat<-ld_clump(dat,clump_r2 = r2,clump_kb = kb, 479 | clump_p = p, 480 | plink_bin =plink_pathway , bfile =LD_file) 481 | 482 | return(dat) 483 | } 484 | if("list"%in%class(outcome)){ 485 | single_expo=T 486 | single_outcome=F} 487 | if("list"%in%class(expo)){ 488 | single_outcome=T 489 | single_expo=F} 490 | if(single_expo==T){ 491 | outcome<-pblapply(outcome,FUN=format_cause_out,cl=cl) 492 | expo<-format_cause_expo(expo) 493 | } 494 | if(single_outcome==T){ 495 | expo<-pblapply(expo,FUN=format_cause_expo,cl=cl) 496 | outcome<-format_cause_out(outcome) 497 | } 498 | 499 | 500 | if(single_outcome==T){dat<-pblapply(expo,outcome,FUN=gwas_merge,cl=cl)} 501 | if(single_expo==T){dat<-pblapply(outcome,expo, 502 | FUN=function(outcome,exposure)gwas_merge(exposure,outcome), 503 | cl=cl)} 504 | 505 | 506 | VAR<-pblapply(dat,1000000,FUN=sample_cause) 507 | 508 | est<-foreach(i=1:length(VAR)) %do%{ 509 | library(cause) 510 | data<-est_cause_params(dat[[i]],VAR[[i]]) 511 | } 512 | dat<-pblapply(dat,FUN=datap_cause) 513 | for_clump<-lapply(dat, dat_clump_cause) 514 | clumped<-pblapply(for_clump,0.001,10000,1e-05,LD_file,FUN=ld_local) 515 | top_vars<-pblapply(clumped,FUN=function(dat) return(dat$rsid)) 516 | cause_res<-list() 517 | for(i in 1:length(dat)){ 518 | library(cause) 519 | res<-cause(dat[[i]],est[[i]],top_vars[[i]]) 520 | cause_res[[i]]<-res 521 | print(i) 522 | } 523 | cause_table<-data.frame() 524 | for (i in 1:length(cause_res)){ 525 | res<-cause_res[[i]] 526 | elpd<-res$elpd 527 | elpd$p<-pnorm(-elpd$z,lower.tail = F) 528 | name<-paste0('file',i) 529 | elpd$file<-name 530 | cause_table<-rbind(cause_table,elpd) 531 | } 532 | return(cause_table) 533 | } 534 | 535 | # RAPS 536 | RAPS_getmr<-function(dat,dir_figure){ 537 | setwd(dir_figure) 538 | res<-try(mr.raps(dat,over.dispersion = TRUE)) 539 | if(class(res)%in%"try-error"){} 540 | else{ 541 | exposure<-dat$id.exposure 542 | dir_create(exposure) 543 | dir_in<-paste0(dir_figure,'/',exposure) 544 | setwd(dir_in) 545 | #plot_name<-paste0(exposure,'raps.pdf') 546 | ggsave(file='raps_plot.pdf',plot=plot(res),width=9,height=5) 547 | res<-data.frame(beta.raps=res$beta.hat,se.raps=res$beta.se, 548 | eov=res$tau2.hat,se.eov=res$tau2.se,OR.raps=NA, 549 | or_lci95.raps=NA,or_uci95.raps=NA) 550 | res$OR.raps<- exp(res$beta.raps) 551 | res$or_lci95.raps<-exp(res$beta.raps)-(res$beta.raps*1.96) 552 | res$or_uci95.raps<-exp(res$beta.raps)+(res$beta.raps*1.96) 553 | res$pval.raps<-2*pnorm(abs(res$beta.raps/res$se.raps),lower.tail=F) 554 | res$pval.eov<-2*pnorm(abs(res$eov/res$se.eov),lower.tail=F) 555 | if(is.na(res$pval.eov)==FALSE){ 556 | if(res$pval.eov >0.05 ){ 557 | res1<-mr.raps(dat,over.dispersion = F) 558 | setwd(dir_in) 559 | #plot_name<-paste0(exposure,'raps.pdf') 560 | ggsave(file='raps_plot.pdf',plot=plot(res1),width=9,height=5) 561 | res1<-data.frame(beta.raps=res1$beta.hat,se.raps=res1$beta.se, 562 | eov=NA,se.eov=NA) 563 | res1$eov<-res$eov 564 | res1$se.eov<-res$se.eov 565 | res1$OR.raps<- exp(res1$beta.raps) 566 | res1$or_lci95.raps<-exp(res1$beta.raps)-(res1$beta.raps*1.96) 567 | res1$or_uci95.raps<-exp(res1$beta.raps)+(res1$beta.raps*1.96) 568 | res1$pval.raps<-2*pnorm(abs(res1$beta.raps/res1$se.raps),lower.tail=F) 569 | res1$pval.eov<-2*pnorm(abs(res1$eov/res1$se.eov),lower.tail=F) 570 | 571 | res<-res1 572 | } 573 | } 574 | 575 | return(res) 576 | } 577 | } 578 | 579 | mr_dircreate_base<-function(root_dir,project_name,date=NULL){ 580 | library(fs) 581 | dir_name<-root_dir 582 | setwd(dir_name) 583 | 584 | if(is.null(date)==FALSE){data_u<-date}else{data_u<-Sys.Date()} 585 | 586 | dir_name2<-project_name 587 | dir_name3<-paste0(dir_name2,data_u) 588 | dir_create(dir_name3) 589 | setwd(paste0(dir_name,"/",dir_name3)) 590 | dir_name4<-"1.figure" 591 | dir_name5<-"2.table" 592 | dir_name6<-"3.figure of sig res" 593 | dir_name7<-"4.snp with Fval" 594 | 595 | paste<-paste0(dir_name,"/",dir_name3,"/") 596 | 597 | dir1<-paste0(paste,dir_name4) 598 | dir.create(dir1) 599 | 600 | dir2<-paste0(paste,dir_name5) 601 | dir.create(dir2) 602 | 603 | dir3<-paste0(paste,dir_name6) 604 | dir.create(dir3) 605 | 606 | dir4<-paste0(paste,dir_name7) 607 | dir.create(dir4) 608 | 609 | res<-list(paste=paste,dir1=dir1,dir2=dir2,dir3=dir3,dir4=dir4) 610 | 611 | return(res) 612 | } 613 | 614 | 615 | 616 | # PRESSO 617 | mr_Presso<-function(dat,num=10000){ 618 | library(TwoSampleMR) 619 | library(MRPRESSO) 620 | library(dplyr) 621 | set.seed(123) 622 | try (mr_presso_res<-mr_presso(BetaOutcome ="beta.outcome", BetaExposure = "beta.exposure", SdOutcome ="se.outcome", SdExposure = "se.exposure", 623 | OUTLIERtest = TRUE,DISTORTIONtest = TRUE, data = dat, 624 | SignifThreshold = 0.05,NbDistribution = num)) 625 | return(mr_presso_res) 626 | 627 | } 628 | mr_presso_pval<-function(mr_presso_res){ 629 | try ( mr_presso_main<-mr_presso_res$`Main MR results`) 630 | try ( mr_presso_main[3:5,]<-NA) 631 | return(mr_presso_main) 632 | } 633 | 634 | 635 | mr_presso_snp<-function(mr_presso_res,mr_presso_main,dat,type="list"){ 636 | data_re<-list() 637 | if(type=="list"){ 638 | for(i in 1:length(mr_presso_res)){ 639 | res<-mr_presso_res$`MR-PRESSO results`[[i]] 640 | main<-mr_presso_main[[i]] 641 | data<-dat[[i]] 642 | try(if(is.na(main[2,6])==FALSE){ 643 | outliers<-which(res$`Outlier Test`$Pvalue<0.05) 644 | data$mr_keep[outliers]<-FALSE 645 | }) 646 | data_re[[i]]<-data 647 | names(data_re)[[i]]<-names(dat)[[i]] 648 | } 649 | return(data_re) 650 | } 651 | 652 | if(type=="data"){ 653 | res<-mr_presso_res$`MR-PRESSO results` 654 | main<-mr_presso_main 655 | data<-dat 656 | try(if(is.na(main[2,6])==FALSE){ 657 | outliers<-which(res$`Outlier Test`$Pvalue<0.05) 658 | data$mr_keep[outliers]<-FALSE 659 | }) 660 | return(data) 661 | } 662 | } 663 | 664 | #MRlap 665 | 666 | mr_lap<-function(expo,outcome,ld,hm3,pval,r2,kb,MR_reverse=1e-03,save_logfiles=F){ 667 | expo<-expo%>%select(rsid=SNP,chr=chr.exposure,pos=pos.exposure,alt=effect_allele.exposure, 668 | ref=other_allele.exposure,N=samplesize.exposure,beta=beta.exposure, 669 | se=se.exposure) 670 | expo<-as.data.frame(expo) 671 | outcome<-outcome%>%select(rsid=SNP,chr=chr.outcome,pos=pos.outcome,alt=effect_allele.outcome, 672 | ref=other_allele.outcome,N=samplesize.outcome,beta=beta.outcome, 673 | se=se.outcome) 674 | outcome<-as.data.frame(outcome) 675 | n_expo<-expo$exposure[1] 676 | n_out<-outcome$outcome[1] 677 | 678 | try(res<-MRlap::MRlap(exposure = expo, 679 | exposure_name = n_expo, 680 | outcome = outcome, 681 | outcome_name = n_out, 682 | ld = ld, 683 | hm3 = hm3,MR_threshold=pval,MR_pruning_dist=kb, 684 | MR_pruning_LD=r2,MR_reverse=MR_reverse,save_logfiles=save_logfiles 685 | )) 686 | try(snp<-data.frame(SNP_MRlap=res$MRcorrection$IVs)) 687 | try(res_c<-as.data.frame(res$MRcorrection[-4])%>%select(nsnp=m_IVs, 688 | beta_MRlap=corrected_effect, 689 | se_MRlap=corrected_effect_se, 690 | p_MRlap=corrected_effect_p)) 691 | try(res<-c(res,res_c,snp)) 692 | 693 | try(return(res)) 694 | } 695 | 696 | 697 | # clean 698 | clean_expo<-function(expo,pval,low_af=0.5,high_af=0.5, 699 | clump=TRUE,kb=10000,r2=0.001,LD_file=NULL,af_filter=FALSE){ 700 | library(TwoSampleMR) 701 | dat<-subset(expo,pval.exposurehigh_af)} 703 | 704 | if(clump==TRUE){ 705 | 706 | if(is.null(LD_file)==TRUE){ 707 | dat<-clump_data(dat,clump_kb = kb,clump_r2 = r2)} 708 | else{ 709 | library(plinkbinr) 710 | plink_pathway<-get_plink_exe() 711 | snp<-data.frame(rsid=dat$SNP,pval=dat$pval.exposure) 712 | snp<-try(ld_clump(snp,clump_kb = kb,clump_r2 = r2,plink_bin =plink_pathway , bfile =LD_file)) 713 | if ("try-error"%in% class(snp)){return(dat)} 714 | else{ 715 | snp<-data.frame(SNP=snp$rsid) 716 | dat<-merge(snp,dat,by.x="SNP",by.y='SNP') 717 | } 718 | } 719 | 720 | } 721 | return(dat) 722 | } 723 | 724 | clean_list<-function(list,nrow=10){ 725 | l<-lapply(list,nrow) 726 | n<-data.frame() 727 | for(i in 1:length(l)){ 728 | if(is.null(l[[i]])==T || l[[i]]==0 )next 729 | 730 | n1<-data.frame(i,l[[i]]) 731 | n<-rbind(n,n1) 732 | } 733 | colnames(n)<-c('l','row') 734 | n<-subset(n,row>nrow) 735 | list<-list[n$l] 736 | return(list) 737 | } 738 | 739 | clean_IV_from_outsig<-function(dat,MR_reverse=1e-03){ 740 | dat<-subset(dat,pval.outcome>MR_reverse) 741 | return(dat) 742 | } 743 | 744 | 745 | LDSC_rg<-function(expo,outcome,an,sample_prev=NA, 746 | population_prev=NA,ld,wld,chr_filter=c(1:22),n_blocks=200){ 747 | id.o<-outcome$id.outcome[1] 748 | id.e<-expo$id.exposure[1] 749 | 750 | expo<-expo%>%mutate(Z=beta.exposure/se.exposure) 751 | expo<-expo%>%select(SNP=SNP,N=samplesize.exposure,Z=Z 752 | ,A1=effect_allele.exposure 753 | ,A2=other_allele.exposure) 754 | expo<-as_tibble(expo) 755 | 756 | outcome<-outcome%>%mutate(Z=beta.outcome/se.outcome) 757 | outcome<-outcome%>%select(SNP=SNP,N=samplesize.outcome,Z=Z 758 | ,A1=effect_allele.outcome 759 | ,A2=other_allele.outcome) 760 | outcome<-as_tibble(outcome) 761 | 762 | 763 | dat<-list(expo,outcome) 764 | names(dat)<-c(id.e,id.o) 765 | 766 | rm(expo,outcome) 767 | 768 | 769 | res<-try(ldscr::ldsc_rg(dat,ancestry = an,sample_prev=sample_prev, 770 | population_prev=population_prev,ld=ld,wld=wld, 771 | n_blocks=n_blocks,chr_filter=chr_filter)) 772 | 773 | return(res) 774 | 775 | } 776 | 777 | -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | # Get_MR 2 | 3 | 已失效的网盘链接已重新更新如下:https://pan.baidu.com/s/1yyYA4cVHKnyCce9YSPRA9A?pwd=ipvl 4 | 提取码:ipvl 5 | 6 | # Get_MR2.0 7 | 8 | ## 更新说明 9 | ### 5.7 : 10 | 1. 修复cyclemr函数中clean函数bug 11 | 2. 补充说明,get_outcome函数修改了TwoSampleMR包原函数的默认值,将maf阈值调整为0.4,并不获取proxy(默认值为0.3与获取proxy)。由于这个函数主要是为了方便批量预实验(添加了错误防停止运行机制),因此建议正式做数据采用官方函数。 12 | 13 | ## 写在前面 14 | 15 | **欢迎来到向量化与并行化的世界** 本次更新就一个重大功能,就是并行化运行mr分析,以最优的效率批量跑大量的数据。家用电脑较新服务器基本可以实现2小时批量运行10000个因素,如果你有高性能服务器,恭喜你,30分钟以内就能跑完。 16 | 17 | 18 | 19 | 由于本次主要是思路与方法的分享,所以函数的帮助文档写的不多,主要还是看示例代码即可,应该还是很容易上手的。 20 | 21 | 22 | 23 | **公众号回复:“示例”即得示例代码** (不会还没关注我们公众号吧#doge,GetScience, 等你来!) 24 | 25 | 26 | 27 | ## cyclemr 28 | 29 | ### 描述 30 | 31 | 这个函数是用于执行循mr 分析的功能函数,可以在R语言中使用。该函数可以将数据分配到多个计算节点中运行,提高MR分析的效率。 32 | 33 | ### 用法 34 | 35 | 直接在R语言中调用这个函数,如下所示: 36 | 37 | ``` 38 | # 调用cyclemr函数 39 | mr_results <- cyclemr(dat = data, cl_num = 4, type = "list") 40 | ``` 41 | 42 | ### 参数 43 | 44 | - `dat`: harmonise_data后的数据,可以是数据框或列表类型。默认是list 45 | - `cl_num`: 批量化线程数。 46 | - `type`: MR分析数据类型,可以是"list"或"data",默认为"list"。主要使用情况也是list 47 | 48 | > 注: 如何判断自己的电脑能开启的最大线程数? 49 | > 50 | > 在任务管理器可看到 51 | > 52 | > * 内核: 计算机核心数 53 | > 54 | > * 逻辑处理器: 计算机总线程数 55 | > 56 | > 比如说很多处理器宣传是8核16线程,这个8就指的是内核,这个16指的是逻辑处理器。 57 | > 58 | > 本质上,16只是将每个核心一分为2,但是他们能干的活是一样的。所以一般设置为内核数即可满载CPU 59 | > 60 | > 当然这不能一概而论,因为每个CPU和厂家调度不一样,如果你发现使用内核数不能让CPU跑到100%,则尝试用逻辑处理器数 61 | 62 | ### 返回值 63 | 64 | - `cyclemr`函数返回一个包含MR分析结果的数据框。 65 | 66 | ### 使用举例 67 | 68 | 下面是使用这个函数的一个示例: 69 | 70 | ``` 71 | mr_results <- cyclemr(dat = data, cl_num = 16, type = "list") 72 | ``` 73 | 74 | 75 | 76 | ### 运行时间参考 77 | 78 | 在设置无误情况下,这是我手头有的所有电脑测试出的运行时间: 79 | 80 | 运行10000个数据。(ieu批量数据前10000个)除了服务器外,其他均使用Windows系统 81 | 82 | | CPU | 核数(运行时开的线程数) | 时间 | 83 | | --------------------------------------- | ------------------------ | --------- | 84 | | i9-12900H(拯救者2022 Y9000P 狂暴模式) | 14核20线程(14) | 1小时28分 | 85 | | r7-5700X(台式) | 8核16线程(16) | 1小时38分 | 86 | | r9-6800H (yoga2022 14S 性能模式) | 8核16线程(16) | 1小时54分 | 87 | | r5-3500X (台式) | 6核12线程 (12) | 3小时47分 | 88 | | r5-4600H (拯救者 2020 R7000 狂暴模式) | 6核12线程 (12) | 约4小时 | 89 | | 双路 EPYC 7T83 (服务器 Linux) | 128核256线程(128) | 11分钟 | 90 | 91 | **欢迎各位补充自己手头的机器的运行时间数据,尤其M系列的苹果处理器数据** 92 | 93 | 94 | 95 | ## 一些小工具 96 | 97 | ## get_rsid 98 | 99 | ### 描述 100 | 101 | 根据CHR和POS,从ensemble官网中获取rsID。 102 | 103 | ### 用法 104 | 105 | ``` 106 | get_rsid(chr, pos, version = 'hg38') 107 | ``` 108 | 109 | ### 参数 110 | 111 | - `chr`:染色体号。 112 | - `pos`:基因位置。 113 | - `version`:表示使用的基因组版本,默认为最新的版本(`'hg38'`)。也可选择hg19. 114 | - 注: GRCh37=hg19,GRCh38=hg38 115 | 116 | ### 详细说明 117 | 118 | 该函数基于生物信息学数据库Ensembl SNP Mart来查询给定位置的相关信息。如果未指定基因组版本,则默认使用最新的版本(hg38)。该函数会根据指定的基因组版本选择正确的URL。如果您想查询其他版本的数据,可以将`version`参数设置为相应版本的字符串。 119 | 120 | 参考:[How to find rsID with biomaRt in R (bioconductor.org)](https://support.bioconductor.org/p/9135301/) 121 | 122 | ### 使用举例 123 | 124 | ``` 125 | ds4 <- data.frame(CHR = c("8", "8", "8", "8", "8"),POS = c('101592213', '106973048', '108690829', '102569817', '108580746')) 126 | res<-get_rsid(chr=ds4$CHR, pos=ds4$POS, version = 'hg38') 127 | ``` 128 | 129 | 130 | 131 | ### 错误说明 132 | 133 | 如果出现: 134 | 135 | ``` 136 | Error in curl::curl_fetch_memory(url, handle = handle) : 137 | Timeout was reached: [grch37.ensembl.org:80] Operation timed out after 300013 milliseconds with 7909 bytes received 138 | ``` 139 | 140 | 这并不是代码问题,而是网络超时了,ensemble的API经常拥堵,多试几次即可。当然也有可能请求的数据量太大,也可能会出现这个问题。 141 | 142 | 143 | 144 | ## get_exposure 和 get_outcome 145 | 146 | ### get_outcome有特殊说明,见**更新说明** 147 | 148 | ### 描述 149 | 150 | 这两个函数是用于进行双样本MR(Two-sample Mendelian Randomization)分析的数据处理和提取过程的功能函数。 151 | 152 | - `get_exposure`函数用于从给定的遗传仪器ID中提取出暴露(exposure)数据。 153 | - `get_outcome`函数用于从给定的遗传仪器ID和暴露数据中提取出结果(outcome)数据。 154 | 155 | ### 用法 156 | 157 | 使用这两个函数前,需要先安装并加载TwoSampleMR包。 158 | 159 | ``` 160 | library(TwoSampleMR) 161 | ``` 162 | 163 | 然后可以直接在R语言中调用这两个函数,如下所示: 164 | 165 | ``` 166 | # 调用get_exposure函数 167 | exposure_data <- get_exposure(id = "ieu-a-1", pval = 5e-8, r2 = 0.001, kb = 10000) 168 | 169 | # 调用get_outcome函数 170 | outcome_data <- get_outcome(id = "ieu-a-1", expo = exposure_data) 171 | ``` 172 | 173 | ### 参数 174 | 175 | - `get_exposure`函数的参数说明: 176 | - `id`: 遗传仪器ID。 177 | - `pval`: 提取暴露数据的P值阈值,默认为5e-08。 178 | - `r2`: 遗传仪器间的LD(linkage disequilibrium)值的阈值,默认为0.001。 179 | - `kb`: 遗传仪器的范围(以kb为单位),默认为10000。 180 | - `get_outcome`函数的参数说明: 181 | - `id`: 遗传仪器ID。 182 | - `expo`: 暴露数据,TwoSampleMR包格式 183 | 184 | 185 | 186 | ## get_ao 187 | 188 | ### 描述 189 | 190 | 获取OPEN GWAS数据库所有可用的ID。可以指定获取某个前缀的ID 191 | 192 | ### 用法 193 | 194 | 使用这个函数前,需要先安装并加载TwoSampleMR包。然后可以直接在R语言中调用这个函数,如下所示: 195 | 196 | ``` 197 | # 调用get_ao函数 198 | ao <- get_ao()## 不限定来源则返回所有id 199 | ao <- get_ao(a = "finn")## 这样就会返回finn的所有可用id 200 | ``` 201 | 202 | ### 参数 203 | 204 | - `a`: (可选)数据来源。 205 | 206 | ### 备注: 来源的名称 207 | 208 | 来自OPEN GWAS [Browse the IEU OpenGWAS project (mrcieu.ac.uk)](https://gwas.mrcieu.ac.uk/datasets/?trait__icontains=12) 5.1获取 209 | 210 | | Batch | Description | [Count](https://gwas.mrcieu.ac.uk/datasets/?trait__icontains=12#counts) | 211 | | ------------------------------------------------------------ | ------------------------------------------------------------ | ------------------------------------------------------------ | 212 | | [bbj-a](https://gwas.mrcieu.ac.uk/datasets/?gwas_id__icontains=bbj-a) | [Biobank Japan release of disease traits](http://jenger.riken.jp/en/) | 120 | 213 | | [ebi-a](https://gwas.mrcieu.ac.uk/datasets/?gwas_id__icontains=ebi-a) | [Datasets that satisfy minimum requirements imported from the EBI database of complete GWAS summary data](https://www.ebi.ac.uk/gwas/downloads/summary-statistics) | 2,585 | 214 | | [eqtl-a](https://gwas.mrcieu.ac.uk/datasets/?gwas_id__icontains=eqtl-a) | [eQTLGen 2019 results, comprising all cis and some trans regions of gene expression in whole blood](https://www.eqtlgen.org/) | 19,942 | 215 | | [finn-b](https://gwas.mrcieu.ac.uk/datasets/?gwas_id__icontains=finn-b) | [FinnGen biobank analysis round 5](https://www.finngen.fi/) | 2,803 | 216 | | [ieu-a](https://gwas.mrcieu.ac.uk/datasets/?gwas_id__icontains=ieu-a) | [GWAS summary datasets generated by many different consortia that have been manually collected and curated, initially developed for MR-Base](https://elifesciences.org/articles/34408) | 440 | 217 | | [ieu-b](https://gwas.mrcieu.ac.uk/datasets/?gwas_id__icontains=ieu-b) | [GWAS summary datasets generated by many different consortia that have been manually collected and curated, initially developed for MR-Base (round 2)](https://elifesciences.org/articles/34408) | 207 | 218 | | [met-a](https://gwas.mrcieu.ac.uk/datasets/?gwas_id__icontains=met-a) | [Human blood metabolites analysed by Shin et al 2014](https://www.ncbi.nlm.nih.gov/pubmed/24816252) | 452 | 219 | | [met-b](https://gwas.mrcieu.ac.uk/datasets/?gwas_id__icontains=met-b) | [Human immune system traits analysed by Roederer et al 2015](https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4393780/) | 150 | 220 | | [met-c](https://gwas.mrcieu.ac.uk/datasets/?gwas_id__icontains=met-c) | [Circulating metabolites analysed by Kettunen et al 2016](https://www.ncbi.nlm.nih.gov/pubmed/27005778) | 123 | 221 | | [met-d](https://gwas.mrcieu.ac.uk/datasets/?gwas_id__icontains=met-d) | [Metabolic biomarkers in the UK Biobank measured by Nightingale Health 2020](https://www.ukbiobank.ac.uk/learn-more-about-uk-biobank/news/nightingale-health-and-uk-biobank-announces-major-initiative-to-analyse-half-a-million-blood-samples-to-facilitate-global-medical-research) | 249 | 222 | | [prot-a](https://gwas.mrcieu.ac.uk/datasets/?gwas_id__icontains=prot-a) | [Complete GWAS summary data on protein levels as described by Sun et al 2018](https://www.ncbi.nlm.nih.gov/pubmed/29875488) | 3,282 | 223 | | [prot-b](https://gwas.mrcieu.ac.uk/datasets/?gwas_id__icontains=prot-b) | [Complete GWAS summary data on protein levels as described by Folkersen et al 2017](https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5393901/) | 83 | 224 | | [prot-c](https://gwas.mrcieu.ac.uk/datasets/?gwas_id__icontains=prot-c) | [Complete GWAS summary data on protein levels as described by Suhre et al 2017](https://pubmed.ncbi.nlm.nih.gov/28240269) | 1,124 | 225 | | [ubm-a](https://gwas.mrcieu.ac.uk/datasets/?gwas_id__icontains=ubm-a) | [Complete GWAS summary data on brain region volumes as described by Elliott et al 2018](https://www.ncbi.nlm.nih.gov/pubmed/30305740) | 3,143 | 226 | | [ukb-a](https://gwas.mrcieu.ac.uk/datasets/?gwas_id__icontains=ukb-a) | [Neale lab analysis of UK Biobank phenotypes, round 1](http://www.nealelab.is/blog/2017/7/19/rapid-gwas-of-thousands-of-phenotypes-for-337000-samples-in-the-uk-biobank) | 596 | 227 | | [ukb-b](https://gwas.mrcieu.ac.uk/datasets/?gwas_id__icontains=ukb-b) | [IEU analysis of UK Biobank phenotypes](https://data.bris.ac.uk/data/dataset/pnoat8cxo0u52p6ynfaekeigi) | 2,514 | 228 | | [ukb-d](https://gwas.mrcieu.ac.uk/datasets/?gwas_id__icontains=ukb-d) | [Neale lab analysis of UK Biobank phenotypes, round 2](http://www.nealelab.is/uk-biobank) | 904 | 229 | | [ukb-e](https://gwas.mrcieu.ac.uk/datasets/?gwas_id__icontains=ukb-e) | [Pan-ancestry genetic analysis of the UK Biobank performed at the Broad Institute](https://pan.ukbb.broadinstitute.org/) | 3,873 | 230 | 231 | 232 | 233 | ## clean_outcome_from_exposure 234 | 235 | ### 描述 236 | 237 | (主要用于向量化)用于清洗outcome。将exposure中(list形式,也就是批量化的形式存在的exposure)不存在的SNP从outcome中剔除,大幅精简outcome,并大幅提升harmonise_data的速度。 238 | 239 | **实测清洗与不清洗outcome对比,速度相差一百倍以上。** 240 | 241 | ### 用法 242 | 243 | 直接在R语言中调用这个函数,如下所示: 244 | 245 | ``` 246 | # 调用clean_outcome_from_exposure函数 247 | cleaned_outcome <- clean_outcome_from_exposure(expo = exposure_data, outcome = outcome_data) 248 | ``` 249 | 250 | ### 参数 251 | 252 | - `expo`: 暴露数据,格式为**list**,TwoSampleMR包格式。比如get_exposure批量化获取下来的数据 253 | - `outcome`: 结果数据,TwoSampleMR包格式。 254 | 255 | 256 | 257 | ## clean_GWAS 258 | 259 | ### 描述 260 | 261 | 这个函数是用于清洗遗传关联数据集,使其符合特定的数据集要求的功能函数。 262 | 263 | ### 用法 264 | 265 | 直接在R语言中调用这个函数,如下所示: 266 | 267 | ``` 268 | # 调用clean_GWAS函数 269 | cleaned_GWAS_list <- clean_GWAS(list = GWAS_list, clean = c("bbj", "eqtl")) 270 | ``` 271 | 272 | ### 参数 273 | 274 | - `list`:一个list。里面包含每个暴露的data.frame。 具体参考批量运行get_exposure后的结果 275 | - `clean`: 需要清洗的数据集名称,一个字符型向量类型。具体参考get_ao的附注。 276 | 277 | ### 返回值 278 | 279 | - `clean_GWAS`函数返回一个清洗后的遗传关联数据集列表,符合特定数据集要求。 280 | 281 | 282 | 283 | # Get_MR1.0 284 | 285 | ## 近期更改与提示: 286 | 287 | **4.22**:**(重要)** 修复PRESSO的bug,请使用PRESSO时务必更新代码!! 288 | 289 | **4.19**:注意TwoSampleMR包有一列是mr_keep。请在运行我们get_mr包的所有分析函数前,如果需要使用TwoSampleMR包格式的数据,请将mr_keep状态为false的删掉(如果是没有harmonise之前,列名为mr_keep.exposure/mr_keep.outcome)。因为带有未通过质控的SNP,可能会带来未知的错误!可以参考以下代码 290 | ```R 291 | 292 | dat<-harmonise_data(exposure,outcome) 293 | 294 | ## 在运行完harmonise_data后可能会产生mr_keep状态为false的,不能用于MR分析的SNP,我们把他删掉: 295 | 296 | dat<-subset(dat,mr_keep==T) 297 | 298 | 299 | ## 然后再继续使用各种功能 300 | ``` 301 | 302 | ## 1. 写在前面: 303 | 304 | ### 1.1 项目地址 305 | 306 | **github:**[HaobinZhou/Get_MR: A package for running MR In batches and in parallel quickly (github.com)](https://github.com/HaobinZhou/Get_MR) 307 | 308 | **如果觉得好用,可以点一下github项目上的小星星吗,这是我们继续开源的最大动力,谢谢!** 309 | 310 | 311 | 312 | 313 | 314 | ### 1.2 R包使用方法: 315 | 316 | **R包以R脚本的形式提供,打开R包,全选运行,即得到所有function** 317 | 318 | 1. 进入github[HaobinZhou/Get_MR: A package for running MR In batches and in parallel quickly (github.com)](https://github.com/HaobinZhou/Get_MR),下载代码zip 319 | 320 | 2. ```R 321 | source("./Get_MR1.0.r") ## 填文件所在地址 322 | 323 | ## 或者直接打开R文件,全选代码运行也可以! 324 | ``` 325 | 326 | 327 | 328 | 329 | 330 | ### 1.3 常见问题: 331 | 332 | 1. **本地clump,1000G处理好的MAF文件,MRlap依赖文件如何获取**: GetScience公众号可免费获取已处理好文件,回复"依赖文件"即得链接。源文件请查看本文相应function介绍处 333 | 334 | 2. **输入clump文件路径后总是报错**: 335 | 336 | ```R 337 | #尤其注意这个文件名的书写,因为他们是二进制文件,不需要写后缀!只需要选取对应的人种即可,比如欧洲人: 338 | LD_file="S:/GWAS数据/本地LD依赖文件/EUR" 339 | 340 | ## 这个问题我回答好多遍啦! 341 | ``` 342 | 343 | 344 | 345 | 3. **第一次使用如何安装关联R包:**[Get_MR/1.0 at main · HaobinZhou/Get_MR (github.com)](https://github.com/HaobinZhou/Get_MR/tree/main/1.0) 346 | 347 | 1. 如果不需要使用`MungeSumstats`包(相关函数包括:`format_Mun`,`get_chr_pos`,`format_getmr`中`source="ukb_nosnp"`) ,则只需要运行[Get_MR1.0dependence.R](https://github.com/HaobinZhou/Get_MR/blob/main/1.0/Get_MR1.0dependence.R) 348 | 2. 如果需要使用`MungeSumstats`包,则还需运行[Install_Reference_Genome.r](https://github.com/HaobinZhou/Get_MR/blob/main/1.0/Install_Reference_Genome.r) 这个包括了hg19和hg38的基因组参考文件,总大小达到了5G!**如果直接安装失败,在GetScience公众号回复"基因组参考"可得下载链接,并本地安装**(推荐) 349 | 350 | 4. **Bug反馈**:代码仅由两人编写,难免出现错误。欢迎提交bug到GetScience公众号后台! 351 | 352 | 5. **感谢所有Get_MR使用的R包作者**,是因为他们我们才得以轻松实现这么多复杂的功能。他们都是开源的,因此我们承诺Get_MR将**永久免费开源**。这意味着使用者可以随意地修改,分发代码,但前提是遵守: 353 | 354 | **1.本代码不得用于任何商业或盈利目的** 355 | 356 | **2.未经代码作者的同意,本代码不得用于任何形式的销售或商业交易** 357 | 358 | **3.本代码可以在非商业性的科研、学术研究和个人使用的情况下免费使用** 359 | 360 | **4.在使用本代码并重新打包并向公众发放时,请引用我们的公众号原文** 361 | 362 | 363 | 364 | ## 2. 帮助文档目录 365 | 366 | 367 | 368 | # 进阶MR分析 369 | 370 | ## LDSC_rg 371 | 372 | 用于计算两个数据框中SNP之间的遗传相关性(rg)。 373 | 374 | ### 用法 375 | 376 | ```R 377 | LDSC_rg(expo, outcome, an, sample_prev = NA, population_prev = NA, 378 | ld, wld, chr_filter = c(1:22), n_blocks = 200) 379 | ``` 380 | 381 | ### 参数 382 | 383 | - `expo`: 一个数据框,其中包含一个遗传暴露指标的多个SNP和它们与结果变量的rg。 384 | - `outcome`: 一个数据框,其中包含一个结果变量的多个SNP和它们与遗传暴露指标的rg。 385 | - `an`: 它是一个字符串,目前还没有作用(因为我们提供的依赖文件只有eur的,其他人种还没更新) 386 | - `sample_prev`: 遗传暴露指标的样本流行病学先验患病率。默认为 `NA`。 387 | - `population_prev`: 遗传暴露指标的人群流行病学先验患病率。默认为 `NA`。 388 | - `ld`: 本地LD依赖文件 389 | - `wld`: 本地weighted LD 依赖文件 390 | - `chr_filter`: 一个整数向量,用于指定要使用的染色体。默认为包含1-22的整数向量。 391 | - `n_blocks`: 用于计算加权LD矩阵的块数。默认为200。 392 | 393 | ### 返回值 394 | 395 | 一个具有以下元素的列表: 396 | 397 | - `rg`: 两个数据框中SNP之间的遗传相关性(rg)。 398 | - `pval`: `rg` 的双侧P值。 399 | - `N_snps`: 参与计算rg的SNP数量。 400 | 401 | ### 示例 402 | 403 | **具体用法参照:mr_lap和LDSC_rg示例.r** 可通过公众号GetScience回复示例获取文件 404 | 405 | 406 | 407 | ## mr_lap 408 | 409 | ### 描述 410 | 411 | mrlap是一种矫正样本重叠后的双样本MR方法。可用于怀疑有样本重叠的数据中。 412 | 413 | R包官网:[n-mounier/MRlap: R package to perform two-sample Mendelian Randomisation (MR) analyses using (potentially) overlapping samples (github.com)](https://github.com/n-mounier/MRlap) 414 | 415 | ### 语法 416 | 417 | ```R 418 | mr_lap(expo, outcome, ld, hm3, pval, r2, kb, MR_reverse = 1e-03, save_logfiles = F) 419 | ``` 420 | 421 | 422 | 423 | ### 参数 424 | 425 | - `expo`: 数据框,为TwoSampleMR包格式的数据 426 | - `outcome`: 数据框,为TwoSampleMR包格式的数据 427 | - `ld`: 数据框,本地LD文件路径 428 | - `hm3`: 数据框,本地HapMap3文件路径 429 | - `pval`: 数值,MR 工具变量阈值。 430 | - `r2`: 数值,clump阈值 431 | - `kb`: 数值,clump阈值 432 | - `MR_reverse`: 数值,MR 的方向翻转阈值。 433 | - `save_logfiles`: 逻辑值,是否保存日志文件。 434 | 435 | ### 值 436 | 437 | - res: mrlap 结果。 438 | 439 | 440 | 441 | ### 用法 442 | 443 | **具体用法参照:mr_lap和LDSC_rg示例.r** 可通过公众号GetScience回复示例获取文件 444 | 445 | 446 | 447 | ## cause_getmr函数 448 | 449 | ### 描述 450 | 451 | 一键式执行cause。可批量化执行多暴露对一结局或一暴露对多结局 452 | 453 | ### 用法 454 | 455 | ```R 456 | ## 不并行化运行 457 | cause_getmr(expo, outcome, LD_file, r2 = 0.001, kb = 10000, pval = 1e-05) 458 | 459 | ## 并行化运行 460 | cl<-makeCluster(2) ## 填你想要的并行化的核数,核数越多,需要的运行内存越大 461 | cause_getmr(expo, outcome, LD_file, r2 = 0.001, kb = 10000, pval = 1e-05,cl=cl) 462 | ``` 463 | 464 | ### 参数 465 | 466 | - `expo`: TwoSampleMR的暴露格式的数据。 467 | - `outcome`: TwoSampleMR的暴露格式的数据。 468 | - 注意!expo和outcome,可以是data.frame的形式,也可以是一个list(如list[[1:n]]里都包含数据的data.frame)。但不能outcome和expo同时都是list。当expo或outcome,其中一个为list的情况下,是批量运行一对一的cause。比如我读取了10个暴露和1个结局,将10个暴露lapply读取进来就会是一个list。这时候`cause_cyclemr` 自动运行每个暴露对结局的cause,也就是批量化执行. 469 | 470 | ```R 471 | ## 比如我这里读取3个暴露文件和1和结局文件 472 | id<-c('a.gz','b.gz','c.gz') 473 | expo<-lapply(id,FUN=fread) 474 | outcome<-fread("outcome.gz") 475 | cl=makeCluster(4)## 内存不够的也可以不并行化运行 476 | res<-cause_getmr(expo, outcome, LD_file, r2 = 0.001, kb = 10000, pval = 1e-05,cl=cl) 477 | stopCluster(cl) 478 | ## 这样返回的结果就是3个暴露分别对一个结局的cause结果。 479 | ``` 480 | 481 | - `LD_file`: 包含LD信息的PLINK文件名。因为需要大批量地clump,在线clump很容易报错,因此我们采用本地clump。需要本地参考文件。下载地址: http://fileserve.mrcieu.ac.uk/ld/1kg.v3.tgz 。 或关注公众号GetScience直接获取。 482 | 483 | ```R 484 | #尤其注意这个文件名的书写,因为他们是二进制文件,不需要写后缀!只需要选取对应的人种即可,比如欧洲人: 485 | LD_file="S:/GWAS数据/本地LD依赖文件/EUR" 486 | 487 | ## 这个问题我回答好多遍啦! 488 | ``` 489 | 490 | - `r2`: LD的R平方阈值。默认值为0.001。 491 | - `kb`: LD的距离阈值(以kb为单位)。默认值为10000。 492 | - `pval`: 用于LD clumping的p值阈值。默认值为1e-05。 493 | - `cl`: 并行计算的cluster对象。默认值为NULL。在外部使用cl=makeCluster(n),n为你想并行化的核数。注意核数太多不要爆内存了。 494 | 495 | ### 值 496 | 497 | `cause_cyclemr`函数返回cause结果 498 | 499 | 500 | 501 | ## RAPS_getmr 502 | 503 | ### 描述 504 | 505 | `RAPS_getmr`函数执行基于RAPS的MR并返回结果,并画图 506 | 507 | ### 用法 508 | 509 | ```R 510 | expo<-fread('a.gz') 511 | outcome<-fread('b.gz') 512 | expo<-format_data(...) 513 | outcome<-format_data(...) ## format_data是TwoSampleMR包的函数,格式化。 514 | expo<-pblapply(expo,pval=1,kb=10000,r2=0.001,LD_file=LD_file,FUN=clean_expo) ## 数据很大,建议本地clump,在线很容易报错 515 | dat<-harmonise(expo,outcome) 516 | res<-RAPS_getmr(dat, dir_figure) 517 | ``` 518 | 519 | ### 参数 520 | 521 | - `dat`: TwoSampleMR包 harmonise_data后输出的数据 522 | - `dir_figure`: 保存结果图形的目录。 523 | 524 | ### 值 525 | 526 | `RAPS_getmr`函数返回一个包含基于RAPS的MR结果的数据框。 527 | 528 | 529 | 530 | 531 | 532 | ## mr_Presso 533 | 534 | ### 描述 535 | 536 | 执行MR-PRESSO 537 | 538 | ### 语法 539 | 540 | ```R 541 | mr_Presso(dat, num = 10000) 542 | ``` 543 | 544 | 545 | 546 | ### 参数 547 | 548 | - `dat`: 数据框,包含基因表达和疾病风险关联分析的数据。 549 | - `num`: 整数,模拟数量。 550 | 551 | ### 值 552 | 553 | - `mr_presso_res`: MR-PRESSO 结果。 554 | 555 | ### 用法 556 | 557 | ```R 558 | dat<-harmonise_data(exposure,outcome) ## TwoSampleMR包的harmonise_data函数输出的结果 559 | mr_presso_res <- mr_Presso(dat, num = 10000) 560 | ``` 561 | 562 | 563 | 564 | ## mr_presso_pval函数 565 | 566 | ### 描述 567 | 568 | 提取 MR-PRESSO 结果中的主要结果 569 | 570 | ### 语法 571 | 572 | ```R 573 | mr_presso_pval(mr_presso_res) 574 | ``` 575 | 576 | 577 | 578 | ### 参数 579 | 580 | - `mr_presso_res`: MR-PRESSO 结果。 581 | 582 | ### 值 583 | 584 | - mr_presso_main: MR-PRESSO 主要结果。 585 | 586 | ### 用法 587 | 588 | ```R 589 | mr_presso_main <- mr_presso_pval(mr_presso_res) ##mr_Presso输出的结果 590 | ``` 591 | 592 | 593 | 594 | ## mr_presso_snp函数 595 | 596 | ### 描述 597 | 598 | 根据 MR-PRESSO 分析结果,将离群值剔除,返回剔除离群值后的dat(我一般称为dat_aj, 也就是 adjusted_data), 可用于后续的IVW等分析。 599 | 600 | ### 语法 601 | 602 | ```R 603 | mr_presso_snp(mr_presso_res, mr_presso_main, dat, type = "list") 604 | ``` 605 | 606 | 607 | 608 | ### 参数 609 | 610 | - `mr_presso_res`: MR-PRESSO 结果。 611 | - `mr_presso_main`: MR-PRESSO 主要结果。 612 | - `dat`: 数据框或数据框列表,包含基因表达和疾病风险关联分析的数据。 613 | - `type`: 字符串,输入数据类型。可选值为 "list" 或 "data"。如果是列表形式的(批量化运行后的结果),就是`list`,如果是普通数据框就是data 614 | 615 | ### 值 616 | 617 | 过滤后的数据框或数据框列表。 618 | 619 | ### 用法 620 | 621 | ```R 622 | dat<-harmonise_data(exposure,outcome) ## TwoSampleMR包的harmonise_data函数输出的结果 623 | mr_presso_res <- mr_Presso(dat, num = 10000) 624 | mr_presso_main <- mr_presso_pval(mr_presso_res) 625 | data_aj <- mr_presso_snp(mr_presso_res, mr_presso_main, dat, type = "data") 626 | 627 | ## 用矫正的data可以用于后续的分析,例如重新计算mr 628 | res_aj<-mr(data_aj) 629 | ``` 630 | 631 | 632 | 633 | 634 | 635 | 636 | 637 | 638 | 639 | # 快捷预处理及质控工具 640 | 641 | ## format_Mun 642 | 643 | ### 介绍 644 | 645 | 运用MungeSumstats包标准化GWAS 摘要统计数据(包括hg19和hg38转换)。该函数可以将来自Finngen R8和其他来源的 GWAS 摘要统计数据文件清洗为标准的GWAS文件,并可将基因组位置从 `ref_genome` 转换到 `convert_ref_genome`。 646 | 647 | ### 用法 648 | 649 | ```R 650 | format_Mun(file, source = "finn_r8", save_path = NULL, lift = F, ref_genome = "hg38", convert_ref_genome = "hg19") 651 | ``` 652 | 653 | ### 参数 654 | 655 | - `file`:字符向量或数据框,表示要格式化的 GWAS 摘要统计数据文件或数据框。如果输入的是字符向量,则表示文件的路径。如果输入的是数据框,则表示要格式化的数据框。 656 | - `source`:字符向量,表示输入文件的来源。默认为 `"finn_r8"`。 657 | - `save_path`:字符向量,表示格式化文件要保存的路径。默认为 `NULL`。 658 | - `lift`:逻辑值,表示是否将基因组位置从 `ref_genome` 转换到 `convert_ref_genome`。默认为 `F`。 659 | - `ref_genome`:字符向量,表示 GWAS 摘要统计数据文件使用的参考基因组。默认为 `"hg38"`。 660 | - `convert_ref_genome`:字符向量,表示要将基因组位置转换到的参考基因组。默认为 `"hg19"`。 661 | 662 | ### 例子 663 | 664 | ```R 665 | # 从文件中格式化数据 666 | format_Mun("my_sumstats.txt", save_path = "~/formatted_sumstats", lift = F, ref_genome = "hg38") 667 | 668 | # 从数据框中格式化数据 669 | my_sumstats_df <- read.csv("my_sumstats.csv") 670 | format_Mun(my_sumstats_df, save_path = "~/formatted_sumstats", lift = F, ref_genome = "hg38") 671 | 672 | #格式化数据并升降版本 673 | format_Mun(my_sumstats_df, save_path = "~/formatted_sumstats", lift = T, ref_genome = "hg38", convert_ref_genome = "hg19") ## 从hg38转为hg19 674 | ``` 675 | 676 | ### 返回值 677 | 678 | 该函数返回格式化的数据框并将其写入磁盘文件。`save_path`指定保存的位置 679 | 680 | 681 | 682 | 683 | 684 | ## format_getmr 685 | 686 | ### 介绍 687 | 688 | 预设的快捷格式化 GWAS 摘要统计数据,这个函数用于将来自多个数据来源的 GWAS 摘要统计数据转换为TwoSampleMR 包所需的格式。 689 | 690 | ### 用法 691 | 692 | ``` 693 | format_getmr(data, type = "exposure", source = "finn_r8") 694 | ``` 695 | 696 | ### 参数 697 | 698 | - `data`:数据框,表示要格式化的 GWAS 摘要统计数据。 699 | - `type`:字符向量,表示数据类型,可以是 "exposure" 或 "outcome"。默认为 "exposure"。 700 | - `source`:字符向量,表示数据来源。默认为 "finn_r8"。目前支持的来源有: 701 | - "finn_r8": [Data download - FinnGen Documentation (gitbook.io)](https://finngen.gitbook.io/documentation/data-download) 702 | - "ukb_nosnp": 尼尔数据库(UKB),因为没有rsid,因此需要匹配(已一键完成)。[www.nealelab.is/uk-biobank](http://www.nealelab.is/uk-biobank) 703 | - "Mun": 来自MungeSumstats包格式化后的数据 704 | - "covid": [COVID19-hg GWAS meta-analyses round 7 (covid19hg.org)](https://www.covid19hg.org/results/r7/) 705 | - "outcome" : 已经格式化为TwoSampleMR包的“outcome”格式 706 | - "exposure":已经格式化为TwoSampleMR包的“exposure”格式 707 | - "fast_ukb": [fastGWA | Yang Lab (westlake.edu.cn)](https://yanglab.westlake.edu.cn/data/ukb_fastgwa/imp_binary/) 708 | - "bac": 2021年肠菌原文数据 [Large-scale association analyses identify host factors influencing human gut microbiome composition - PMC (nih.gov)](https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8515199/) 709 | 710 | ### 例子 711 | 712 | ```R 713 | my_data <- fread("my_data.gz") 714 | format_getmr(my_data, type = "finn_r8", source = "Mun") 715 | ``` 716 | 717 | ### 返回值 718 | 719 | 该函数返回格式化的数据框。 720 | 721 | 722 | 723 | ## format_trait 724 | 725 | ### 介绍 726 | 727 | 这个函数用于格式化 GWAS 摘要统计数据中的表型信息,使其符合命名规范,易于保存为文件(例如批量保存计算R2和F值后的文件)。 728 | 729 | 主要是为了解决,在Windows系统下,保存文件的名称中不能包含特殊字符,例如`:`,`|`。 730 | 731 | ### 用法 732 | 733 | ```R 734 | format_trait(list, short = FALSE, short_num = "40") 735 | ``` 736 | 737 | ### 参数 738 | 739 | - `list`:列表,表示要格式化的 GWAS 摘要统计数据列表。 740 | - `short`:逻辑值,表示是否要将表型名称缩短。默认为 FALSE。 741 | - `short_num`:字符向量,表示缩短表型名称的长度。默认为 "40"。 742 | 743 | ### 例子 744 | 745 | ```R 746 | my_list <- list(data1, data2, data3) 747 | format_trait(my_list, short = TRUE, short_num = "20") 748 | ``` 749 | 750 | ### 返回值 751 | 752 | 该函数返回格式化后的 GWAS 摘要统计数据列表。 753 | 754 | 755 | 756 | 757 | 758 | ## read_vcf_getmr 759 | 760 | ### 介绍 761 | 762 | 这个函数用于从 VCF 文件中读取摘要统计数据。并保存为压缩文件。默认是.gz为后缀的压缩文件。方便下次读取以及节省空间。 763 | 764 | 这是因为读取VCF文件将消耗大量电脑资源。我们建议批量读取VCF文件后储存为易于读取的压缩包形式。下次读取方便快捷。因此本函数不会直接返回数据框,而是保存为文件 765 | 766 | ### 用法 767 | 768 | ``` 769 | read_vcf_getmr(file_name, nThread = 8, type = ".gz") 770 | ``` 771 | 772 | ### 参数 773 | 774 | - `file_name`:字符向量,表示要读取的 VCF 文件名。 775 | - `nThread`:整数,表示要使用的线程数。默认为 8。 776 | - `type`:字符向量,表示输出文件类型。默认为 ".gz"。 777 | 778 | ### 例子 779 | 780 | ```R 781 | my_file <- "my_file.vcf" 782 | read_vcf_getmr(my_file, nThread = 4, type = ".gz") 783 | ``` 784 | 785 | ### 返回值 786 | 787 | 该函数没有返回值,而是将读取的数据写入文件。 788 | 789 | 790 | 791 | ## read_easy 792 | 793 | ### 介绍 794 | 795 | 这个函数用于从文件中读取 GWAS 摘要统计数据。并返回经过P值筛选的文件。一般用于批量读取大量文件时。比如我要批量读取100个暴露数据,每个数据占用运行内存2G。如果100个,则200G,不是一般电脑可以承受。因此每次读取将直接筛选p值,压缩大小 796 | 797 | ### 用法 798 | 799 | ```R 800 | read_easy(file_name, pval = 5e-08) 801 | ``` 802 | 803 | ### 参数 804 | 805 | - `file_name`:字符向量,表示要读取的文件名。 806 | - `pval`:数字,表示筛选摘要统计数据的显著性水平。默认为 5e-08。 807 | 808 | ### 例子 809 | 810 | ```R 811 | my_file <- "my_file.csv" 812 | read_easy(my_file, pval = 1e-06) 813 | ``` 814 | 815 | ### 返回值 816 | 817 | 该函数返回摘要统计数据的数据框。 818 | 819 | 820 | 821 | 822 | 823 | ## get_eaf_from_1000G 824 | 825 | ### 介绍 826 | 827 | 从1000G的MAF文件中提取EAF并将其与输入数据匹配。 828 | 829 | ### 用法 830 | 831 | ``` 832 | get_eaf_from_1000G(dat, path, type = "exposure") 833 | ``` 834 | 835 | ### 参数 836 | 837 | - `dat`:一个数据框,为TwoSampleMR包格式的数据 838 | - `path`:一个字符串,表示包含1000G MAF文件`fileFrequency.frq`的目录路径。 839 | - `type`:一个字符串,表示数据是“exposure”(暴露因素)还是“outcome”(结果),默认为“exposure”。 840 | 841 | ### 值 842 | 843 | 一个数据框,其中包含输入数据的EAF和类型信息(原始、修正或错误)。 844 | 845 | ### 详细说明 846 | 847 | 该函数将读取1000G MAF文件,然后将其与输入数据进行匹配。对于每个SNP,该函数将检查输入数据中的效应等位基因与1000G中的效应等位基因是否匹配。 848 | 849 | 如果不匹配,则将EAF设置为NA并将其类型设置为“error”。对于匹配的SNP,EAF将设置为1000G MAF中的值(如果输入数据的效应等位基因是minor allele),或1-MAF(如果输入数据的效应等位基因是major allele),并将其类型设置为“raw”或“corrected”。 850 | 851 | 如果`type`参数设置为“outcome”,则函数将使用输入数据中的结果等位基因来查找EAF。 852 | 853 | 在处理完所有SNP后,该函数将输出一些有关匹配成功、失败以及NA的信息,以及类型信息的说明。 854 | 855 | ### 示例 856 | 857 | 以下是使用该函数的示例: 858 | 859 | ```R 860 | dat <- get_eaf_from_1000G(dat, "S:/GWAS数据/本地LD依赖文件", type = "exposure") 861 | 862 | # 检查输出 863 | head(dat) 864 | ``` 865 | 866 | `fileFrequency.frq`为PLINK1.9输出的,根据1000G数据提取的MAF数据 867 | 868 | 1000G参考文件下载地址:http://fileserve.mrcieu.ac.uk/ld/1kg.v3.tgz 869 | 870 | 可自行提取MAF数据。或从`GetScience`公众号中获取已经提取好的`fileFrequency.frq`文件 871 | 872 | 873 | 874 | ## get_chr_pos 875 | 876 | 该函数利用MungeSumstats包匹配rsid的染色体位置。 877 | 878 | ### 用法 879 | 880 | ```R 881 | get_chr_pos(dat, type = "exposure") 882 | ``` 883 | 884 | ### 参数 885 | 886 | - `dat`:一个数据框,TwoSampleMR格式 887 | - `type`:一个字符串,表示要获取SNP染色体位置和参考序列的SNP类型。可选值为"exposure"或"outcome"。 888 | 889 | ### 返回值 890 | 891 | 一个数据框,其中包含输入数据框的信息,以及新列`chr.exposure`或`chr.outcome`,表示每个SNP的染色体编号。新列`pos.exposure`或`pos.outcome`表示每个SNP在染色体上的位置。 892 | 893 | ### 函数说明 894 | 895 | 该函数使用`format_sumstats`和`format_data`函数从1000G项目中获取SNP的染色体位置和参考序列信息。 896 | 897 | ### 示例 898 | 899 | 以下示例演示如何使用`get_chr_pos`函数: 900 | 901 | ```R 902 | # 获取曝露变量SNP的染色体位置和参考序列 903 | exposure_chr_pos <- get_chr_pos(dat, type = "exposure") 904 | 905 | # 获取结果变量SNP的染色体位置和参考序列 906 | outcome_chr_pos <- get_chr_pos(dat, type = "outcome") 907 | ``` 908 | 909 | 910 | 911 | ## get_f函数 912 | 913 | ### 描述 914 | 915 | `get_f`函数计算样本的F统计量并返回F值大于指定阈值的样本数据。返回的数据包括每个SNP的R2和F值 916 | 917 | ### 用法 918 | 919 | ```R 920 | get_f(dat, F_value = 10) 921 | ``` 922 | 923 | ### 参数 924 | 925 | - `dat`: TwoSampleMR格式,一定要包含`eaf.exposure`, `beta.exposure`, `se.exposure`, 和`samplesize.exposure`的数据框。 926 | - `F_value`: 指定的F统计量的阈值,F值大于该阈值的样本将被返回。默认值为10。 927 | 928 | 929 | 930 | 注,本计算公式为 931 | 932 | F值:![img](A:\OneDrive\GET\assets\clip_image002.gif) R2值:![image-20230327151305186](A:\OneDrive\GET\assets\image-20230327151305186.png) 933 | 934 | 公式参考文献: 935 | 936 | [A Multivariate Genome-Wide Association Analysis of 10 LDL Subfractions, and Their Response to Statin Treatment, in 1868 Caucasians - PMC (nih.gov)](https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4405269/) 937 | 938 | [Large-scale association analyses identify host factors influencing human gut microbiome composition - PMC (nih.gov)](https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8515199/) 939 | 940 | 941 | 942 | 943 | 944 | 945 | 946 | ### 947 | 948 | ## mr_dircreate_base 949 | 950 | ### 描述 951 | 952 | `mr_dircreate_base`函数创建基本的目录结构以保存MR分析结果和图形。 953 | 954 | ### 用法 955 | 956 | ```R 957 | mr_dircreate_base(root_dir, project_name, date = NULL) 958 | ``` 959 | 960 | ### 参数 961 | 962 | - `root_dir`: 保存结果文件夹的根目录。 963 | - `project_name`: 项目名称,用于在根目录下创建一个以此命名的子目录。 964 | - `date`: 日期(可选),用于在子目录名称中添加日期以区分不同日期的结果文件夹。默认为`NULL`,表示不添加日期。 965 | 966 | ### 值 967 | 968 | `mr_dircreate_base`函数返回一个包含结果文件夹路径的列表。 969 | 970 | ### 示例 971 | 972 | ```R 973 | # 创建结果文件夹 974 | res_dir <- mr_dircreate_base("path/to/root/dir", "project_name", date = "20220327") 975 | 976 | # 打印结果文件夹路径 977 | print(res_dir) 978 | ``` 979 | 980 | ### 注意事项 981 | 982 | - `root_dir`参数指定保存结果文件夹的根目录。 983 | - `project_name`参数指定项目名称,用于在根目录下创建一个以此命名的子目录。 984 | - `date`参数(可选)用于在子目录名称中添加日期以区分不同日期的结果文件夹。默认为`NULL`,表示不添加日期。 985 | - `mr_dircreate_base`函数将在根目录下创建一个名为`project_name`的子目录,并在该子目录下创建4个子目录,分别命名为`1.figure`、`2.table`、`3.figure of sig res`和`4.snp with Fval`。函数返回一个包含结果文件夹路径的列表。 986 | 987 | 988 | 989 | 990 | 991 | 992 | 993 | ## clean_expo 994 | 995 | ### 描述 996 | 997 | 用于快捷筛选工具变量的函数,可执行P值筛选,EAF值筛选,clump。 998 | 999 | ### 用法 1000 | 1001 | ```R 1002 | ## 完整可选参数 1003 | clean_expo(expo, pval, low_af = 0.5, high_af = 0.5, clump = TRUE, kb = 10000, r2 = 0.001, LD_file = NULL, af_filter = FALSE) 1004 | 1005 | ##不提供LD_file则自动在线clump 1006 | clean_expo(expo, pval, clump = TRUE, kb = 10000, r2 = 0.001) 1007 | ##提供则本地clump 1008 | clean_expo(expo, pval, clump = TRUE, kb = 10000, r2 = 0.001,LD_file=LD_file) 1009 | ``` 1010 | 1011 | ### 参数 1012 | 1013 | - `expo`: 一个数据框,其中包含遗传暴露指标的SNP名称、beta值、标准误、p值和频率。 1014 | - `pval`: 用于筛选遗传暴露指标的p值阈值。p值小于此阈值的SNP将被保留。 1015 | - `low_af`: 频率过滤的下限值。默认为0.5。如果`af_filter`为TRUE,则只有遗传暴露指标的频率低于此值或高于`high_af`时,才会被保留。 1016 | - `high_af`: 频率过滤的上限值。默认为0.5。 1017 | - `clump`: 一个逻辑值,指示是否使用PLINK进行SNP聚类。默认为TRUE。 1018 | - `kb`: 聚类的窗口大小(以kb为单位)。默认为10000。 1019 | - `r2`: LD阈值。默认为0.001。 1020 | - `LD_file`: PLINK二进制文件的路径。如果未提供,则默认在线clump。 1021 | - `af_filter`: 一个逻辑值,指示是否启用频率过滤。默认为FALSE。 1022 | 1023 | 1024 | 1025 | ## clean_list 1026 | 1027 | ### 描述 1028 | 1029 | 用于清理列表中元素行数的R包。常用于批量化质量控制。 1030 | 1031 | ### 用法 1032 | 1033 | ```R 1034 | clean_list(list, nrow = 10) 1035 | ``` 1036 | 1037 | ### 参数 1038 | 1039 | - `list`: 一个列表,其中包含多个元素。 1040 | - `nrow`: 用于筛选元素的行数阈值。如果元素的行数小于此阈值,则该元素将被删除。默认为10。 1041 | 1042 | ### 返回值 1043 | 1044 | 一个列表,其中仅包含行数大于 `nrow` 的元素。 1045 | 1046 | ### 例子 1047 | 1048 | ``` 1049 | # 创建一个包含5个数据框的列表,每个数据框包含1-5行 1050 | set.seed(123) 1051 | lst <- list(data.frame(a = rnorm(1), b = rnorm(1)), 1052 | data.frame(a = rnorm(2), b = rnorm(2)), 1053 | data.frame(a = rnorm(3), b = rnorm(3)), 1054 | data.frame(a = rnorm(4), b = rnorm(4)), 1055 | data.frame(a = rnorm(5), b = rnorm(5))) 1056 | 1057 | # 运行 clean_list 函数,将阈值设置为2 1058 | cleaned_lst <- clean_list(lst, nrow = 3) 1059 | 1060 | # 查看清理后的列表 1061 | cleaned_lst 1062 | ``` 1063 | 1064 | 1065 | 1066 | 1067 | 1068 | ## clean_IV_from_outsig 1069 | 1070 | 用于从一个数据框中清理具有显著的MR反向因果效应P值的IV。 1071 | 1072 | ### 用法 1073 | 1074 | ```R 1075 | clean_IV_from_outsig(dat, MR_reverse = 1e-03) 1076 | ``` 1077 | 1078 | ### 参数 1079 | 1080 | - `dat`: 一个数据框,其中包含每个IV和其与结果变量之间的MR反向因果效应的P值。 1081 | - `MR_reverse`: 用于筛选IV的MR反向因果效应P值阈值。具有P值小于此阈值的IV将被保留。默认为1e-03。 1082 | 1083 | ### 返回值 1084 | 1085 | 一个数据框,其中包含P值大于 `MR_reverse` 值的IV(也就是反向不显著的IVs)。 1086 | 1087 | 1088 | 1089 | 1090 | ## 使用声明: 1091 | 1. 禁止一切倒卖我们代码的行为。我们承诺已开源代码永久性免费开源,公开可得 1092 | 2. 使用代码前,务必确认代码无误。我们没有利用这个R包发表文献,也没有进行倒卖,或授课。**因此,我们无法确保我们的代码每一环,完全的,彻底的,100%的,没有错误**。我们的代码本质上是分享经验,我们并没有因为这些代码获得任何利益,也没有利用它来当成文章发表,虽然我们会尽力,但我们无法保证彻底的完美。**我们对所有因代码可能存在的错误导致的纠纷不负有任何责任。** 1093 | 3. 我们不会参与任何商业行为和组织,我们没有与任何组织进行合作。任何倒卖行为都是侵权行为,非倒卖的利用行为属于第三方的行为,与我们无关,由第三方对本身内容负责。因代码错误导致的纠纷参考第2条声明。 1094 | 4. 我们分批开源不是利益因素导致的,我们没有将更新的版本的Get_MR进行售卖!我们本意是为了保护内部小伙伴的努力成果,优先使用我们最新开发的前沿功能,万望大家理解! 1095 | 1096 | 1097 | 1098 | 1099 | -------------------------------------------------------------------------------- /LICENSE: -------------------------------------------------------------------------------- 1 | GNU GENERAL PUBLIC LICENSE 2 | Version 3, 29 June 2007 3 | 4 | Copyright (C) 2007 Free Software Foundation, Inc. 5 | Everyone is permitted to copy and distribute verbatim copies 6 | of this license document, but changing it is not allowed. 7 | 8 | Preamble 9 | 10 | The GNU General Public License is a free, copyleft license for 11 | software and other kinds of works. 12 | 13 | The licenses for most software and other practical works are designed 14 | to take away your freedom to share and change the works. By contrast, 15 | the GNU General Public License is intended to guarantee your freedom to 16 | share and change all versions of a program--to make sure it remains free 17 | software for all its users. We, the Free Software Foundation, use the 18 | GNU General Public License for most of our software; it applies also to 19 | any other work released this way by its authors. You can apply it to 20 | your programs, too. 21 | 22 | When we speak of free software, we are referring to freedom, not 23 | price. Our General Public Licenses are designed to make sure that you 24 | have the freedom to distribute copies of free software (and charge for 25 | them if you wish), that you receive source code or can get it if you 26 | want it, that you can change the software or use pieces of it in new 27 | free programs, and that you know you can do these things. 28 | 29 | To protect your rights, we need to prevent others from denying you 30 | these rights or asking you to surrender the rights. Therefore, you have 31 | certain responsibilities if you distribute copies of the software, or if 32 | you modify it: responsibilities to respect the freedom of others. 33 | 34 | For example, if you distribute copies of such a program, whether 35 | gratis or for a fee, you must pass on to the recipients the same 36 | freedoms that you received. You must make sure that they, too, receive 37 | or can get the source code. And you must show them these terms so they 38 | know their rights. 39 | 40 | Developers that use the GNU GPL protect your rights with two steps: 41 | (1) assert copyright on the software, and (2) offer you this License 42 | giving you legal permission to copy, distribute and/or modify it. 43 | 44 | For the developers' and authors' protection, the GPL clearly explains 45 | that there is no warranty for this free software. For both users' and 46 | authors' sake, the GPL requires that modified versions be marked as 47 | changed, so that their problems will not be attributed erroneously to 48 | authors of previous versions. 49 | 50 | Some devices are designed to deny users access to install or run 51 | modified versions of the software inside them, although the manufacturer 52 | can do so. This is fundamentally incompatible with the aim of 53 | protecting users' freedom to change the software. The systematic 54 | pattern of such abuse occurs in the area of products for individuals to 55 | use, which is precisely where it is most unacceptable. Therefore, we 56 | have designed this version of the GPL to prohibit the practice for those 57 | products. If such problems arise substantially in other domains, we 58 | stand ready to extend this provision to those domains in future versions 59 | of the GPL, as needed to protect the freedom of users. 60 | 61 | Finally, every program is threatened constantly by software patents. 62 | States should not allow patents to restrict development and use of 63 | software on general-purpose computers, but in those that do, we wish to 64 | avoid the special danger that patents applied to a free program could 65 | make it effectively proprietary. To prevent this, the GPL assures that 66 | patents cannot be used to render the program non-free. 67 | 68 | The precise terms and conditions for copying, distribution and 69 | modification follow. 70 | 71 | TERMS AND CONDITIONS 72 | 73 | 0. Definitions. 74 | 75 | "This License" refers to version 3 of the GNU General Public License. 76 | 77 | "Copyright" also means copyright-like laws that apply to other kinds of 78 | works, such as semiconductor masks. 79 | 80 | "The Program" refers to any copyrightable work licensed under this 81 | License. Each licensee is addressed as "you". "Licensees" and 82 | "recipients" may be individuals or organizations. 83 | 84 | To "modify" a work means to copy from or adapt all or part of the work 85 | in a fashion requiring copyright permission, other than the making of an 86 | exact copy. The resulting work is called a "modified version" of the 87 | earlier work or a work "based on" the earlier work. 88 | 89 | A "covered work" means either the unmodified Program or a work based 90 | on the Program. 91 | 92 | To "propagate" a work means to do anything with it that, without 93 | permission, would make you directly or secondarily liable for 94 | infringement under applicable copyright law, except executing it on a 95 | computer or modifying a private copy. Propagation includes copying, 96 | distribution (with or without modification), making available to the 97 | public, and in some countries other activities as well. 98 | 99 | To "convey" a work means any kind of propagation that enables other 100 | parties to make or receive copies. Mere interaction with a user through 101 | a computer network, with no transfer of a copy, is not conveying. 102 | 103 | An interactive user interface displays "Appropriate Legal Notices" 104 | to the extent that it includes a convenient and prominently visible 105 | feature that (1) displays an appropriate copyright notice, and (2) 106 | tells the user that there is no warranty for the work (except to the 107 | extent that warranties are provided), that licensees may convey the 108 | work under this License, and how to view a copy of this License. If 109 | the interface presents a list of user commands or options, such as a 110 | menu, a prominent item in the list meets this criterion. 111 | 112 | 1. Source Code. 113 | 114 | The "source code" for a work means the preferred form of the work 115 | for making modifications to it. "Object code" means any non-source 116 | form of a work. 117 | 118 | A "Standard Interface" means an interface that either is an official 119 | standard defined by a recognized standards body, or, in the case of 120 | interfaces specified for a particular programming language, one that 121 | is widely used among developers working in that language. 122 | 123 | The "System Libraries" of an executable work include anything, other 124 | than the work as a whole, that (a) is included in the normal form of 125 | packaging a Major Component, but which is not part of that Major 126 | Component, and (b) serves only to enable use of the work with that 127 | Major Component, or to implement a Standard Interface for which an 128 | implementation is available to the public in source code form. A 129 | "Major Component", in this context, means a major essential component 130 | (kernel, window system, and so on) of the specific operating system 131 | (if any) on which the executable work runs, or a compiler used to 132 | produce the work, or an object code interpreter used to run it. 133 | 134 | The "Corresponding Source" for a work in object code form means all 135 | the source code needed to generate, install, and (for an executable 136 | work) run the object code and to modify the work, including scripts to 137 | control those activities. However, it does not include the work's 138 | System Libraries, or general-purpose tools or generally available free 139 | programs which are used unmodified in performing those activities but 140 | which are not part of the work. For example, Corresponding Source 141 | includes interface definition files associated with source files for 142 | the work, and the source code for shared libraries and dynamically 143 | linked subprograms that the work is specifically designed to require, 144 | such as by intimate data communication or control flow between those 145 | subprograms and other parts of the work. 146 | 147 | The Corresponding Source need not include anything that users 148 | can regenerate automatically from other parts of the Corresponding 149 | Source. 150 | 151 | The Corresponding Source for a work in source code form is that 152 | same work. 153 | 154 | 2. Basic Permissions. 155 | 156 | All rights granted under this License are granted for the term of 157 | copyright on the Program, and are irrevocable provided the stated 158 | conditions are met. This License explicitly affirms your unlimited 159 | permission to run the unmodified Program. The output from running a 160 | covered work is covered by this License only if the output, given its 161 | content, constitutes a covered work. This License acknowledges your 162 | rights of fair use or other equivalent, as provided by copyright law. 163 | 164 | You may make, run and propagate covered works that you do not 165 | convey, without conditions so long as your license otherwise remains 166 | in force. You may convey covered works to others for the sole purpose 167 | of having them make modifications exclusively for you, or provide you 168 | with facilities for running those works, provided that you comply with 169 | the terms of this License in conveying all material for which you do 170 | not control copyright. Those thus making or running the covered works 171 | for you must do so exclusively on your behalf, under your direction 172 | and control, on terms that prohibit them from making any copies of 173 | your copyrighted material outside their relationship with you. 174 | 175 | Conveying under any other circumstances is permitted solely under 176 | the conditions stated below. Sublicensing is not allowed; section 10 177 | makes it unnecessary. 178 | 179 | 3. Protecting Users' Legal Rights From Anti-Circumvention Law. 180 | 181 | No covered work shall be deemed part of an effective technological 182 | measure under any applicable law fulfilling obligations under article 183 | 11 of the WIPO copyright treaty adopted on 20 December 1996, or 184 | similar laws prohibiting or restricting circumvention of such 185 | measures. 186 | 187 | When you convey a covered work, you waive any legal power to forbid 188 | circumvention of technological measures to the extent such circumvention 189 | is effected by exercising rights under this License with respect to 190 | the covered work, and you disclaim any intention to limit operation or 191 | modification of the work as a means of enforcing, against the work's 192 | users, your or third parties' legal rights to forbid circumvention of 193 | technological measures. 194 | 195 | 4. Conveying Verbatim Copies. 196 | 197 | You may convey verbatim copies of the Program's source code as you 198 | receive it, in any medium, provided that you conspicuously and 199 | appropriately publish on each copy an appropriate copyright notice; 200 | keep intact all notices stating that this License and any 201 | non-permissive terms added in accord with section 7 apply to the code; 202 | keep intact all notices of the absence of any warranty; and give all 203 | recipients a copy of this License along with the Program. 204 | 205 | You may charge any price or no price for each copy that you convey, 206 | and you may offer support or warranty protection for a fee. 207 | 208 | 5. Conveying Modified Source Versions. 209 | 210 | You may convey a work based on the Program, or the modifications to 211 | produce it from the Program, in the form of source code under the 212 | terms of section 4, provided that you also meet all of these conditions: 213 | 214 | a) The work must carry prominent notices stating that you modified 215 | it, and giving a relevant date. 216 | 217 | b) The work must carry prominent notices stating that it is 218 | released under this License and any conditions added under section 219 | 7. This requirement modifies the requirement in section 4 to 220 | "keep intact all notices". 221 | 222 | c) You must license the entire work, as a whole, under this 223 | License to anyone who comes into possession of a copy. This 224 | License will therefore apply, along with any applicable section 7 225 | additional terms, to the whole of the work, and all its parts, 226 | regardless of how they are packaged. This License gives no 227 | permission to license the work in any other way, but it does not 228 | invalidate such permission if you have separately received it. 229 | 230 | d) If the work has interactive user interfaces, each must display 231 | Appropriate Legal Notices; however, if the Program has interactive 232 | interfaces that do not display Appropriate Legal Notices, your 233 | work need not make them do so. 234 | 235 | A compilation of a covered work with other separate and independent 236 | works, which are not by their nature extensions of the covered work, 237 | and which are not combined with it such as to form a larger program, 238 | in or on a volume of a storage or distribution medium, is called an 239 | "aggregate" if the compilation and its resulting copyright are not 240 | used to limit the access or legal rights of the compilation's users 241 | beyond what the individual works permit. Inclusion of a covered work 242 | in an aggregate does not cause this License to apply to the other 243 | parts of the aggregate. 244 | 245 | 6. Conveying Non-Source Forms. 246 | 247 | You may convey a covered work in object code form under the terms 248 | of sections 4 and 5, provided that you also convey the 249 | machine-readable Corresponding Source under the terms of this License, 250 | in one of these ways: 251 | 252 | a) Convey the object code in, or embodied in, a physical product 253 | (including a physical distribution medium), accompanied by the 254 | Corresponding Source fixed on a durable physical medium 255 | customarily used for software interchange. 256 | 257 | b) Convey the object code in, or embodied in, a physical product 258 | (including a physical distribution medium), accompanied by a 259 | written offer, valid for at least three years and valid for as 260 | long as you offer spare parts or customer support for that product 261 | model, to give anyone who possesses the object code either (1) a 262 | copy of the Corresponding Source for all the software in the 263 | product that is covered by this License, on a durable physical 264 | medium customarily used for software interchange, for a price no 265 | more than your reasonable cost of physically performing this 266 | conveying of source, or (2) access to copy the 267 | Corresponding Source from a network server at no charge. 268 | 269 | c) Convey individual copies of the object code with a copy of the 270 | written offer to provide the Corresponding Source. This 271 | alternative is allowed only occasionally and noncommercially, and 272 | only if you received the object code with such an offer, in accord 273 | with subsection 6b. 274 | 275 | d) Convey the object code by offering access from a designated 276 | place (gratis or for a charge), and offer equivalent access to the 277 | Corresponding Source in the same way through the same place at no 278 | further charge. You need not require recipients to copy the 279 | Corresponding Source along with the object code. If the place to 280 | copy the object code is a network server, the Corresponding Source 281 | may be on a different server (operated by you or a third party) 282 | that supports equivalent copying facilities, provided you maintain 283 | clear directions next to the object code saying where to find the 284 | Corresponding Source. Regardless of what server hosts the 285 | Corresponding Source, you remain obligated to ensure that it is 286 | available for as long as needed to satisfy these requirements. 287 | 288 | e) Convey the object code using peer-to-peer transmission, provided 289 | you inform other peers where the object code and Corresponding 290 | Source of the work are being offered to the general public at no 291 | charge under subsection 6d. 292 | 293 | A separable portion of the object code, whose source code is excluded 294 | from the Corresponding Source as a System Library, need not be 295 | included in conveying the object code work. 296 | 297 | A "User Product" is either (1) a "consumer product", which means any 298 | tangible personal property which is normally used for personal, family, 299 | or household purposes, or (2) anything designed or sold for incorporation 300 | into a dwelling. In determining whether a product is a consumer product, 301 | doubtful cases shall be resolved in favor of coverage. For a particular 302 | product received by a particular user, "normally used" refers to a 303 | typical or common use of that class of product, regardless of the status 304 | of the particular user or of the way in which the particular user 305 | actually uses, or expects or is expected to use, the product. A product 306 | is a consumer product regardless of whether the product has substantial 307 | commercial, industrial or non-consumer uses, unless such uses represent 308 | the only significant mode of use of the product. 309 | 310 | "Installation Information" for a User Product means any methods, 311 | procedures, authorization keys, or other information required to install 312 | and execute modified versions of a covered work in that User Product from 313 | a modified version of its Corresponding Source. The information must 314 | suffice to ensure that the continued functioning of the modified object 315 | code is in no case prevented or interfered with solely because 316 | modification has been made. 317 | 318 | If you convey an object code work under this section in, or with, or 319 | specifically for use in, a User Product, and the conveying occurs as 320 | part of a transaction in which the right of possession and use of the 321 | User Product is transferred to the recipient in perpetuity or for a 322 | fixed term (regardless of how the transaction is characterized), the 323 | Corresponding Source conveyed under this section must be accompanied 324 | by the Installation Information. But this requirement does not apply 325 | if neither you nor any third party retains the ability to install 326 | modified object code on the User Product (for example, the work has 327 | been installed in ROM). 328 | 329 | The requirement to provide Installation Information does not include a 330 | requirement to continue to provide support service, warranty, or updates 331 | for a work that has been modified or installed by the recipient, or for 332 | the User Product in which it has been modified or installed. Access to a 333 | network may be denied when the modification itself materially and 334 | adversely affects the operation of the network or violates the rules and 335 | protocols for communication across the network. 336 | 337 | Corresponding Source conveyed, and Installation Information provided, 338 | in accord with this section must be in a format that is publicly 339 | documented (and with an implementation available to the public in 340 | source code form), and must require no special password or key for 341 | unpacking, reading or copying. 342 | 343 | 7. Additional Terms. 344 | 345 | "Additional permissions" are terms that supplement the terms of this 346 | License by making exceptions from one or more of its conditions. 347 | Additional permissions that are applicable to the entire Program shall 348 | be treated as though they were included in this License, to the extent 349 | that they are valid under applicable law. If additional permissions 350 | apply only to part of the Program, that part may be used separately 351 | under those permissions, but the entire Program remains governed by 352 | this License without regard to the additional permissions. 353 | 354 | When you convey a copy of a covered work, you may at your option 355 | remove any additional permissions from that copy, or from any part of 356 | it. (Additional permissions may be written to require their own 357 | removal in certain cases when you modify the work.) You may place 358 | additional permissions on material, added by you to a covered work, 359 | for which you have or can give appropriate copyright permission. 360 | 361 | Notwithstanding any other provision of this License, for material you 362 | add to a covered work, you may (if authorized by the copyright holders of 363 | that material) supplement the terms of this License with terms: 364 | 365 | a) Disclaiming warranty or limiting liability differently from the 366 | terms of sections 15 and 16 of this License; or 367 | 368 | b) Requiring preservation of specified reasonable legal notices or 369 | author attributions in that material or in the Appropriate Legal 370 | Notices displayed by works containing it; or 371 | 372 | c) Prohibiting misrepresentation of the origin of that material, or 373 | requiring that modified versions of such material be marked in 374 | reasonable ways as different from the original version; or 375 | 376 | d) Limiting the use for publicity purposes of names of licensors or 377 | authors of the material; or 378 | 379 | e) Declining to grant rights under trademark law for use of some 380 | trade names, trademarks, or service marks; or 381 | 382 | f) Requiring indemnification of licensors and authors of that 383 | material by anyone who conveys the material (or modified versions of 384 | it) with contractual assumptions of liability to the recipient, for 385 | any liability that these contractual assumptions directly impose on 386 | those licensors and authors. 387 | 388 | All other non-permissive additional terms are considered "further 389 | restrictions" within the meaning of section 10. If the Program as you 390 | received it, or any part of it, contains a notice stating that it is 391 | governed by this License along with a term that is a further 392 | restriction, you may remove that term. If a license document contains 393 | a further restriction but permits relicensing or conveying under this 394 | License, you may add to a covered work material governed by the terms 395 | of that license document, provided that the further restriction does 396 | not survive such relicensing or conveying. 397 | 398 | If you add terms to a covered work in accord with this section, you 399 | must place, in the relevant source files, a statement of the 400 | additional terms that apply to those files, or a notice indicating 401 | where to find the applicable terms. 402 | 403 | Additional terms, permissive or non-permissive, may be stated in the 404 | form of a separately written license, or stated as exceptions; 405 | the above requirements apply either way. 406 | 407 | 8. Termination. 408 | 409 | You may not propagate or modify a covered work except as expressly 410 | provided under this License. Any attempt otherwise to propagate or 411 | modify it is void, and will automatically terminate your rights under 412 | this License (including any patent licenses granted under the third 413 | paragraph of section 11). 414 | 415 | However, if you cease all violation of this License, then your 416 | license from a particular copyright holder is reinstated (a) 417 | provisionally, unless and until the copyright holder explicitly and 418 | finally terminates your license, and (b) permanently, if the copyright 419 | holder fails to notify you of the violation by some reasonable means 420 | prior to 60 days after the cessation. 421 | 422 | Moreover, your license from a particular copyright holder is 423 | reinstated permanently if the copyright holder notifies you of the 424 | violation by some reasonable means, this is the first time you have 425 | received notice of violation of this License (for any work) from that 426 | copyright holder, and you cure the violation prior to 30 days after 427 | your receipt of the notice. 428 | 429 | Termination of your rights under this section does not terminate the 430 | licenses of parties who have received copies or rights from you under 431 | this License. If your rights have been terminated and not permanently 432 | reinstated, you do not qualify to receive new licenses for the same 433 | material under section 10. 434 | 435 | 9. Acceptance Not Required for Having Copies. 436 | 437 | You are not required to accept this License in order to receive or 438 | run a copy of the Program. Ancillary propagation of a covered work 439 | occurring solely as a consequence of using peer-to-peer transmission 440 | to receive a copy likewise does not require acceptance. However, 441 | nothing other than this License grants you permission to propagate or 442 | modify any covered work. These actions infringe copyright if you do 443 | not accept this License. Therefore, by modifying or propagating a 444 | covered work, you indicate your acceptance of this License to do so. 445 | 446 | 10. Automatic Licensing of Downstream Recipients. 447 | 448 | Each time you convey a covered work, the recipient automatically 449 | receives a license from the original licensors, to run, modify and 450 | propagate that work, subject to this License. You are not responsible 451 | for enforcing compliance by third parties with this License. 452 | 453 | An "entity transaction" is a transaction transferring control of an 454 | organization, or substantially all assets of one, or subdividing an 455 | organization, or merging organizations. If propagation of a covered 456 | work results from an entity transaction, each party to that 457 | transaction who receives a copy of the work also receives whatever 458 | licenses to the work the party's predecessor in interest had or could 459 | give under the previous paragraph, plus a right to possession of the 460 | Corresponding Source of the work from the predecessor in interest, if 461 | the predecessor has it or can get it with reasonable efforts. 462 | 463 | You may not impose any further restrictions on the exercise of the 464 | rights granted or affirmed under this License. For example, you may 465 | not impose a license fee, royalty, or other charge for exercise of 466 | rights granted under this License, and you may not initiate litigation 467 | (including a cross-claim or counterclaim in a lawsuit) alleging that 468 | any patent claim is infringed by making, using, selling, offering for 469 | sale, or importing the Program or any portion of it. 470 | 471 | 11. Patents. 472 | 473 | A "contributor" is a copyright holder who authorizes use under this 474 | License of the Program or a work on which the Program is based. The 475 | work thus licensed is called the contributor's "contributor version". 476 | 477 | A contributor's "essential patent claims" are all patent claims 478 | owned or controlled by the contributor, whether already acquired or 479 | hereafter acquired, that would be infringed by some manner, permitted 480 | by this License, of making, using, or selling its contributor version, 481 | but do not include claims that would be infringed only as a 482 | consequence of further modification of the contributor version. For 483 | purposes of this definition, "control" includes the right to grant 484 | patent sublicenses in a manner consistent with the requirements of 485 | this License. 486 | 487 | Each contributor grants you a non-exclusive, worldwide, royalty-free 488 | patent license under the contributor's essential patent claims, to 489 | make, use, sell, offer for sale, import and otherwise run, modify and 490 | propagate the contents of its contributor version. 491 | 492 | In the following three paragraphs, a "patent license" is any express 493 | agreement or commitment, however denominated, not to enforce a patent 494 | (such as an express permission to practice a patent or covenant not to 495 | sue for patent infringement). To "grant" such a patent license to a 496 | party means to make such an agreement or commitment not to enforce a 497 | patent against the party. 498 | 499 | If you convey a covered work, knowingly relying on a patent license, 500 | and the Corresponding Source of the work is not available for anyone 501 | to copy, free of charge and under the terms of this License, through a 502 | publicly available network server or other readily accessible means, 503 | then you must either (1) cause the Corresponding Source to be so 504 | available, or (2) arrange to deprive yourself of the benefit of the 505 | patent license for this particular work, or (3) arrange, in a manner 506 | consistent with the requirements of this License, to extend the patent 507 | license to downstream recipients. "Knowingly relying" means you have 508 | actual knowledge that, but for the patent license, your conveying the 509 | covered work in a country, or your recipient's use of the covered work 510 | in a country, would infringe one or more identifiable patents in that 511 | country that you have reason to believe are valid. 512 | 513 | If, pursuant to or in connection with a single transaction or 514 | arrangement, you convey, or propagate by procuring conveyance of, a 515 | covered work, and grant a patent license to some of the parties 516 | receiving the covered work authorizing them to use, propagate, modify 517 | or convey a specific copy of the covered work, then the patent license 518 | you grant is automatically extended to all recipients of the covered 519 | work and works based on it. 520 | 521 | A patent license is "discriminatory" if it does not include within 522 | the scope of its coverage, prohibits the exercise of, or is 523 | conditioned on the non-exercise of one or more of the rights that are 524 | specifically granted under this License. You may not convey a covered 525 | work if you are a party to an arrangement with a third party that is 526 | in the business of distributing software, under which you make payment 527 | to the third party based on the extent of your activity of conveying 528 | the work, and under which the third party grants, to any of the 529 | parties who would receive the covered work from you, a discriminatory 530 | patent license (a) in connection with copies of the covered work 531 | conveyed by you (or copies made from those copies), or (b) primarily 532 | for and in connection with specific products or compilations that 533 | contain the covered work, unless you entered into that arrangement, 534 | or that patent license was granted, prior to 28 March 2007. 535 | 536 | Nothing in this License shall be construed as excluding or limiting 537 | any implied license or other defenses to infringement that may 538 | otherwise be available to you under applicable patent law. 539 | 540 | 12. No Surrender of Others' Freedom. 541 | 542 | If conditions are imposed on you (whether by court order, agreement or 543 | otherwise) that contradict the conditions of this License, they do not 544 | excuse you from the conditions of this License. If you cannot convey a 545 | covered work so as to satisfy simultaneously your obligations under this 546 | License and any other pertinent obligations, then as a consequence you may 547 | not convey it at all. For example, if you agree to terms that obligate you 548 | to collect a royalty for further conveying from those to whom you convey 549 | the Program, the only way you could satisfy both those terms and this 550 | License would be to refrain entirely from conveying the Program. 551 | 552 | 13. Use with the GNU Affero General Public License. 553 | 554 | Notwithstanding any other provision of this License, you have 555 | permission to link or combine any covered work with a work licensed 556 | under version 3 of the GNU Affero General Public License into a single 557 | combined work, and to convey the resulting work. The terms of this 558 | License will continue to apply to the part which is the covered work, 559 | but the special requirements of the GNU Affero General Public License, 560 | section 13, concerning interaction through a network will apply to the 561 | combination as such. 562 | 563 | 14. Revised Versions of this License. 564 | 565 | The Free Software Foundation may publish revised and/or new versions of 566 | the GNU General Public License from time to time. Such new versions will 567 | be similar in spirit to the present version, but may differ in detail to 568 | address new problems or concerns. 569 | 570 | Each version is given a distinguishing version number. If the 571 | Program specifies that a certain numbered version of the GNU General 572 | Public License "or any later version" applies to it, you have the 573 | option of following the terms and conditions either of that numbered 574 | version or of any later version published by the Free Software 575 | Foundation. If the Program does not specify a version number of the 576 | GNU General Public License, you may choose any version ever published 577 | by the Free Software Foundation. 578 | 579 | If the Program specifies that a proxy can decide which future 580 | versions of the GNU General Public License can be used, that proxy's 581 | public statement of acceptance of a version permanently authorizes you 582 | to choose that version for the Program. 583 | 584 | Later license versions may give you additional or different 585 | permissions. However, no additional obligations are imposed on any 586 | author or copyright holder as a result of your choosing to follow a 587 | later version. 588 | 589 | 15. Disclaimer of Warranty. 590 | 591 | THERE IS NO WARRANTY FOR THE PROGRAM, TO THE EXTENT PERMITTED BY 592 | APPLICABLE LAW. EXCEPT WHEN OTHERWISE STATED IN WRITING THE COPYRIGHT 593 | HOLDERS AND/OR OTHER PARTIES PROVIDE THE PROGRAM "AS IS" WITHOUT WARRANTY 594 | OF ANY KIND, EITHER EXPRESSED OR IMPLIED, INCLUDING, BUT NOT LIMITED TO, 595 | THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR 596 | PURPOSE. THE ENTIRE RISK AS TO THE QUALITY AND PERFORMANCE OF THE PROGRAM 597 | IS WITH YOU. SHOULD THE PROGRAM PROVE DEFECTIVE, YOU ASSUME THE COST OF 598 | ALL NECESSARY SERVICING, REPAIR OR CORRECTION. 599 | 600 | 16. Limitation of Liability. 601 | 602 | IN NO EVENT UNLESS REQUIRED BY APPLICABLE LAW OR AGREED TO IN WRITING 603 | WILL ANY COPYRIGHT HOLDER, OR ANY OTHER PARTY WHO MODIFIES AND/OR CONVEYS 604 | THE PROGRAM AS PERMITTED ABOVE, BE LIABLE TO YOU FOR DAMAGES, INCLUDING ANY 605 | GENERAL, SPECIAL, INCIDENTAL OR CONSEQUENTIAL DAMAGES ARISING OUT OF THE 606 | USE OR INABILITY TO USE THE PROGRAM (INCLUDING BUT NOT LIMITED TO LOSS OF 607 | DATA OR DATA BEING RENDERED INACCURATE OR LOSSES SUSTAINED BY YOU OR THIRD 608 | PARTIES OR A FAILURE OF THE PROGRAM TO OPERATE WITH ANY OTHER PROGRAMS), 609 | EVEN IF SUCH HOLDER OR OTHER PARTY HAS BEEN ADVISED OF THE POSSIBILITY OF 610 | SUCH DAMAGES. 611 | 612 | 17. Interpretation of Sections 15 and 16. 613 | 614 | If the disclaimer of warranty and limitation of liability provided 615 | above cannot be given local legal effect according to their terms, 616 | reviewing courts shall apply local law that most closely approximates 617 | an absolute waiver of all civil liability in connection with the 618 | Program, unless a warranty or assumption of liability accompanies a 619 | copy of the Program in return for a fee. 620 | 621 | END OF TERMS AND CONDITIONS 622 | 623 | How to Apply These Terms to Your New Programs 624 | 625 | If you develop a new program, and you want it to be of the greatest 626 | possible use to the public, the best way to achieve this is to make it 627 | free software which everyone can redistribute and change under these terms. 628 | 629 | To do so, attach the following notices to the program. It is safest 630 | to attach them to the start of each source file to most effectively 631 | state the exclusion of warranty; and each file should have at least 632 | the "copyright" line and a pointer to where the full notice is found. 633 | 634 | 635 | Copyright (C) 636 | 637 | This program is free software: you can redistribute it and/or modify 638 | it under the terms of the GNU General Public License as published by 639 | the Free Software Foundation, either version 3 of the License, or 640 | (at your option) any later version. 641 | 642 | This program is distributed in the hope that it will be useful, 643 | but WITHOUT ANY WARRANTY; without even the implied warranty of 644 | MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the 645 | GNU General Public License for more details. 646 | 647 | You should have received a copy of the GNU General Public License 648 | along with this program. If not, see . 649 | 650 | Also add information on how to contact you by electronic and paper mail. 651 | 652 | If the program does terminal interaction, make it output a short 653 | notice like this when it starts in an interactive mode: 654 | 655 | Copyright (C) 656 | This program comes with ABSOLUTELY NO WARRANTY; for details type `show w'. 657 | This is free software, and you are welcome to redistribute it 658 | under certain conditions; type `show c' for details. 659 | 660 | The hypothetical commands `show w' and `show c' should show the appropriate 661 | parts of the General Public License. Of course, your program's commands 662 | might be different; for a GUI interface, you would use an "about box". 663 | 664 | You should also get your employer (if you work as a programmer) or school, 665 | if any, to sign a "copyright disclaimer" for the program, if necessary. 666 | For more information on this, and how to apply and follow the GNU GPL, see 667 | . 668 | 669 | The GNU General Public License does not permit incorporating your program 670 | into proprietary programs. If your program is a subroutine library, you 671 | may consider it more useful to permit linking proprietary applications with 672 | the library. If this is what you want to do, use the GNU Lesser General 673 | Public License instead of this License. But first, please read 674 | . 675 | -------------------------------------------------------------------------------- /2.0/Get_MR2.0.r: -------------------------------------------------------------------------------- 1 | library(mr.raps) 2 | library(TwoSampleMR) 3 | library(dplyr) 4 | library(fs) 5 | library(ieugwasr) 6 | library(plinkbinr) 7 | library(tidyr) 8 | library(data.table) 9 | library(MRPRESSO) 10 | library(parallel) 11 | library(doParallel) 12 | library(pbapply) 13 | library(stringr) 14 | library(cause) 15 | library(vroom) 16 | library(MungeSumstats) 17 | library(ldscr) 18 | library(MRlap) 19 | 20 | 21 | select<-dplyr::select 22 | mcl<-makeCluster 23 | 24 | 25 | ## 2.0更新内容 26 | 27 | clean<-function(list,clean_0=FALSE){ 28 | 29 | l<-pblapply(list,class) 30 | ll<-c() 31 | for(i in 1:length(l)){ 32 | if(l[[i]][1]%in% "try-error"){} 33 | else{ll<-c(ll,i)} 34 | } 35 | list<-list[ll] 36 | 37 | if(clean_0==TRUE){ 38 | list<-Filter(function(x) nrow(x)!=0, list) 39 | } 40 | return(list) 41 | } 42 | 43 | get_rsid<-function(chr,pos,version='hg38'){ 44 | library(biomaRt) 45 | dat<-data.frame(CHR=chr,start=pos,end=pos) 46 | if(version=="hg38"){ 47 | ver<-listEnsemblArchives()%>%subset(name!='Ensembl GRCh37') 48 | ur<-ver$url[1] 49 | snp_mart <- useEnsembl(biomart="ENSEMBL_MART_SNP", 50 | host=ur, 51 | dataset="hsapiens_snp") 52 | position <- apply(dat, 1, paste, collapse = ":") 53 | res<-getBM(attributes = c('refsnp_id', 'allele', 'chrom_start'), 54 | filters = 'chromosomal_region', 55 | values = position, 56 | mart = snp_mart) 57 | } 58 | if(version=="hg19"){ 59 | snp_mart <- useEnsembl(biomart="ENSEMBL_MART_SNP", 60 | host="https://grch37.ensembl.org", 61 | dataset="hsapiens_snp") 62 | position <- apply(dat, 1, paste, collapse = ":") 63 | res<-getBM(attributes = c('refsnp_id', 'allele', 'chrom_start'), 64 | filters = 'chromosomal_region', 65 | values = position, 66 | mart = snp_mart) 67 | } 68 | 69 | return(res) 70 | } 71 | 72 | get_exposure = function(id,pval=5e-08,r2=0.001,kb=10000) { 73 | library(TwoSampleMR) 74 | dat<-try(data.frame(extract_instruments(id,p1=pval,clump = T,p2=1,r2=r2,kb=kb))) 75 | if ('try-error'%in%class(dat)) { 76 | dat=NA 77 | } 78 | return(dat) 79 | } 80 | 81 | get_outcome=function(id,expo){ 82 | library(TwoSampleMR) 83 | dat<-try(data.frame(extract_outcome_data(expo$SNP,id,proxies = F,maf_threshold=0.4))) 84 | if ('try-error'%in%class(dat)) { 85 | dat=NA 86 | } 87 | 88 | try( return(dat)) 89 | } 90 | 91 | get_ao<-function(a=NULL){ 92 | ao<-available_outcomes() 93 | ao<-separate(ao,id,c("a","b","c"),sep="-",remove = FALSE) 94 | if(is.null(a)==F){ao<-subset(ao,a==a)} 95 | return(ao) 96 | } 97 | 98 | get_exposure_wrong_num<-function(list){ 99 | res<-is.na(list) 100 | num=0 101 | foreach (i=1:length(res))%do%{ 102 | if(res[i]==TRUE) num=num+1 103 | } 104 | return(num) 105 | } 106 | 107 | clean_outcome_from_exposure<-function(expo,outcome){ 108 | snp<-lapply(expo,FUN=function(x)data.frame(SNP=x$SNP)) 109 | snp<-bind_rows(snp) 110 | snp<-data.frame(SNP=unique(snp$SNP)) 111 | outcome_adj<-merge(outcome,snp) 112 | return(outcome_adj) 113 | } 114 | 115 | clean_GWAS<-function(list,clean=c("bbj","eqtl")){ 116 | clean_GWAS_logit<-function(dat,clean){ 117 | id<-dat$id.exposure[1] 118 | id<-as.data.frame(id) 119 | id<-separate(id,"id",c('a','b','c'),sep = "-") 120 | if(id$a==clean){ 121 | return(data.frame(lg=1)) 122 | } 123 | else(return(data.frame(lg=0))) 124 | } 125 | print("正在获取指定数据集信息") 126 | for(i in 1:length(clean)){ 127 | lg<-pblapply(list,clean[i],FUN=clean_GWAS_logit) 128 | 129 | lg<-bind_rows(lg) 130 | 131 | list<-list[which(lg$lg==0)] 132 | } 133 | return(list) 134 | 135 | } 136 | 137 | 138 | 139 | cyclemr<-function(dat,cl_num,type="list"){ 140 | clean<-function(list,clean_0=FALSE){ 141 | 142 | l<-pblapply(list,class) 143 | ll<-c() 144 | for(i in 1:length(l)){ 145 | if(l[[i]][1]%in% "try-error"){} 146 | else{ll<-c(ll,i)} 147 | } 148 | list<-list[ll] 149 | 150 | if(clean_0==TRUE){ 151 | list<-Filter(function(x) nrow(x)!=0, list) 152 | } 153 | return(list) 154 | } 155 | # base mr 156 | mr_base<-function(dat){ 157 | library(TwoSampleMR) 158 | library(dplyr) 159 | try(mr<-mr(dat) ) 160 | try(mr<-dplyr::select(mr,id.exposure,id.outcome,method,nsnp,b,se,pval)) 161 | try(mr_OR<-generate_odds_ratios(mr_res = mr(dat))) 162 | try(mr_OR<-dplyr::select(mr_OR,id.exposure,id.outcome,method,nsnp,b,se,pval,lo_ci,up_ci,or, or_lci95,or_uci95)) 163 | try(mr_OR<-dplyr::rename(mr_OR,b.OR="b",se.OR="se",pval.OR="pval")) 164 | try( mr_p_OR<-merge(mr,mr_OR)) 165 | try(return(mr_p_OR)) 166 | } 167 | 168 | mr_egger<-function(dat){ 169 | library(TwoSampleMR) 170 | library(dplyr) 171 | mr_egger<-mr_pleiotropy_test(dat) 172 | try( mr_egger<-dplyr::select(mr_egger,id.exposure,id.outcome,egger_intercept,se,pval)) 173 | try( mr_egger<-dplyr::rename(mr_egger,se.egger="se",pval.egger="pval")) 174 | try( mr_egger[2:5,]<-NA) 175 | try( return(mr_egger)) 176 | } 177 | 178 | mr_test<-function(dat){ 179 | library(TwoSampleMR) 180 | library(dplyr) 181 | mr_heterogeneity<-mr_heterogeneity(dat) 182 | try (mr_heterogeneity<-dplyr::select(mr_heterogeneity,id.exposure,id.outcome,method,Q, Q_df,Q_pval)) 183 | try( mr_heterogeneity<-dplyr::rename(mr_heterogeneity,method.he="method")) 184 | try (mr_heterogeneity[3:5,]<-NA ) 185 | try( return(mr_heterogeneity)) 186 | } 187 | 188 | # presso 189 | cycle_presso<-function(dat){ 190 | library(TwoSampleMR) 191 | library(MRPRESSO) 192 | library(dplyr) 193 | nsnp_filter=6 194 | try (mr_presso_res<-mr_presso(BetaOutcome ="beta.outcome", BetaExposure = "beta.exposure", SdOutcome ="se.outcome", SdExposure = "se.exposure", 195 | OUTLIERtest = TRUE,DISTORTIONtest = TRUE, data = dat, 196 | SignifThreshold = 0.05)) 197 | try ( mr_presso_main<-mr_presso_res$`Main MR results`) 198 | try(mr_presso_main) 199 | try ( mr_presso_main[3:5,]<-NA) 200 | try( return(mr_presso_main)) 201 | } 202 | 203 | mr_Presso<-function(dat,num=10000){ 204 | library(TwoSampleMR) 205 | library(MRPRESSO) 206 | library(dplyr) 207 | 208 | nsnp_filter=6 209 | set.seed(123) 210 | try (mr_presso_res<-mr_presso(BetaOutcome ="beta.outcome", BetaExposure = "beta.exposure", SdOutcome ="se.outcome", SdExposure = "se.exposure", 211 | OUTLIERtest = TRUE,DISTORTIONtest = TRUE, data = dat, 212 | SignifThreshold = 0.05,NbDistribution = num)) 213 | return(mr_presso_res) 214 | 215 | } 216 | mr_presso_pval<-function(mr_presso_res){ 217 | try ( mr_presso_main<-mr_presso_res$`Main MR results`) 218 | try ( mr_presso_main[3:5,]<-NA) 219 | return(mr_presso_main) 220 | } 221 | 222 | 223 | mr_presso_snp<-function(mr_presso_res,mr_presso_main,dat,type="list"){ 224 | data_re<-list() 225 | if(type=="list"){ 226 | for(i in 1:length(mr_presso_res)){ 227 | res<-mr_presso_res$`MR-PRESSO results`[[i]] 228 | main<-mr_presso_main[[i]] 229 | data<-dat[[i]] 230 | try(if(is.na(main[2,6])==FALSE){ 231 | outliers<-which(res$`Outlier Test`$Pvalue<0.05) 232 | data$mr_keep[outliers]<-FALSE 233 | }) 234 | data_re[[i]]<-data 235 | names(data_re)[[i]]<-names(dat)[[i]] 236 | } 237 | return(data_re) 238 | } 239 | 240 | if(type=="data"){ 241 | res<-mr_presso_res$`MR-PRESSO results` 242 | main<-mr_presso_main 243 | data<-dat 244 | try(if(is.na(main[2,6])==FALSE){ 245 | outliers<-which(res$`Outlier Test`$Pvalue<0.05) 246 | data$mr_keep[outliers]<-FALSE 247 | }) 248 | return(data) 249 | } 250 | } 251 | bind_basemr<-function(base.res,egger.res,test.res){ 252 | res_all<-data.frame() 253 | for(i in 1:nrow(base.res)){ 254 | if(i%%5==1){ 255 | id<-base.res$id.exposure[i] 256 | num.bg<-which(egger.res$id.exposure==id) 257 | num.end<-num.bg+4 258 | res<-cbind(base.res[i:(i+4),],egger.res[num.bg:num.end,3:5]) 259 | num.bg<-which(test.res$id.exposure==id)[1] 260 | num.end<-num.bg+4 261 | res<-cbind(res,test.res[num.bg:num.end,3:6]) 262 | res_all<-rbind(res_all,res) 263 | } 264 | } 265 | return(res_all) 266 | } 267 | 268 | if(type=="list"){ 269 | cl <- makeCluster(cl_num) 270 | 271 | base.res<-pblapply(dat, FUN=mr_base,cl=cl) 272 | 273 | egger.res<-pblapply(dat, FUN=mr_egger) 274 | 275 | test.res<-pblapply(dat,FUN=mr_test) 276 | 277 | stopCluster(cl) 278 | 279 | base.res<-clean(base.res) 280 | 281 | egger.res<-clean(egger.res) 282 | 283 | test.res<-clean(test.res) 284 | 285 | base.res<-bind_rows (base.res) 286 | 287 | base.res<-subset(base.res,nsnp>2) ## 防止出现wald ratio,导致错误 288 | 289 | egger.res<-bind_rows (egger.res) 290 | 291 | test.res<-bind_rows (test.res) 292 | 293 | res_all<-bind_basemr(base.res,egger.res,test.res) 294 | 295 | } 296 | 297 | if(type=="data"){ 298 | 299 | presso_res<-mr_Presso(dat,10000) 300 | 301 | presso_pval<-mr_presso_pval(presso_res) 302 | 303 | dat_aj<-mr_presso_snp(presso_res,presso_pval,dat,"data") 304 | 305 | base.res<-mr_base(dat) 306 | 307 | egger.res<-mr_egger(dat) 308 | 309 | test.res<-mr_test(dat) 310 | 311 | base.res<-bind_rows (base.res) 312 | 313 | res_all<-cbind(base.res,presso_pval,egger.res,test.res) 314 | } 315 | 316 | return(res_all) 317 | } 318 | 319 | 320 | bind_basemr<-function(base.res,egger.res,test.res){ 321 | res_all<-data.frame() 322 | for(i in 1:nrow(base.res)){ 323 | if(i%%5==1){ 324 | id<-base.res$id.exposure[i] 325 | num.bg<-which(egger.res$id.exposure==id) 326 | num.end<-num.bg+4 327 | res<-cbind(base.res[i:(i+4),],egger.res[num.bg:num.end,3:5]) 328 | num.bg<-which(test.res$id.exposure==id)[1] 329 | num.end<-num.bg+4 330 | res<-cbind(res,test.res[num.bg:num.end,3:6]) 331 | res_all<-rbind(res_all,res) 332 | } 333 | } 334 | return(res_all) 335 | } 336 | 337 | 338 | bind_pressomr<-function(base_res,presso_pval,egger_res,test_res){ 339 | 340 | presso_id<-names(presso_pval) 341 | presso_pval<-bind_rows(presso_pval) 342 | base_res<-bind_rows(base_res) 343 | egger_res<-bind_rows(egger_res) 344 | test_res<-bind_rows(test_res) 345 | 346 | res_all<-data.frame() 347 | for(i in 1:nrow(base_res)){ 348 | if(i%%5==1){ 349 | id<-base_res$id.exposure[i] 350 | 351 | num.bg<-5*(which(presso_id==id)[1]-1)+1 352 | num.end<-num.bg+4 353 | res<-cbind(base_res[i:(i+4),],presso_pval[num.bg:num.end,2:6]) 354 | 355 | num.bg<-which(egger_res$id.exposure==id) 356 | num.end<-num.bg+4 357 | res<-cbind(res,egger_res[num.bg:num.end,3:5]) 358 | 359 | num.bg<-which(test_res$id.exposure==id)[1] 360 | num.end<-num.bg+4 361 | res<-cbind(res,test_res[num.bg:num.end,3:6]) 362 | 363 | res_all<-rbind(res_all,res) 364 | } 365 | } 366 | return(res_all) 367 | } 368 | 369 | 370 | bind_trait<-function(res_all,ao){ 371 | res_all<-data.frame(trait=NA,res_all) 372 | for(i in 1:nrow(res_all)){ 373 | try( num<-which(ao$id==res_all$id.exposure[i])) 374 | try( res_all$trait[i]<-ao$trait[num]) 375 | } 376 | return(res_all) 377 | } 378 | 379 | bind_trait_outcome<-function(res_all,ao){ 380 | res_all<-data.frame(trait=NA,res_all) 381 | for(i in 1:nrow(res_all)){ 382 | num<-which(ao$id==res_all$id.outcome[i]) 383 | res_all$trait[i]<-ao$trait[num] 384 | } 385 | return(res_all) 386 | } 387 | 388 | clean_cl<-function(){ 389 | cl<<-makeCluster(1) 390 | stopCluster(cl) 391 | gc() 392 | } 393 | 394 | clean_GWAS_id<-function(dat,id){ 395 | lid<-lapply(dat,FUN = function(x)x$id.exposure[1]) 396 | lid<-unlist(lid) 397 | l<-c() 398 | for (i in 1:length(id)) { 399 | l1<-which(lid==id[i]) 400 | l<-c(l,l1) 401 | } 402 | dat<-dat[l] 403 | return(dat) 404 | } 405 | 406 | 407 | ## 1.0内容 408 | # format 409 | format_Mun<-function(file,source="finn_r8",save_path=NULL,lift=F,ref_genome = "hg38", 410 | convert_ref_genome = "hg19"){ 411 | library(data.table) 412 | library(MungeSumstats) 413 | library(dplyr) 414 | if(class(file)!="data.frame"){ 415 | dat<-fread(file) 416 | if(source=="finn_r8"){ 417 | dat<-dat%>%dplyr::select(SNP=rsids,CHR=`#chrom`,BP=pos,A1=ref, 418 | A2=alt,FRQ=af_alt,BETA=beta, 419 | SE=sebeta,P=pval) 420 | } 421 | } 422 | dat<-as.data.frame(dat) 423 | dat<-format_sumstats(dat,return_data = TRUE) 424 | if(is.null(save_path)==F){setwd(save_path)} 425 | dat<-fwrite(dat,paste0(file,'(Mun_format',ref_genome,').gz')) 426 | te<-fs::dir_info( tempdir()) 427 | te<-subset(te,size>1e+8) 428 | fs::file_delete(te$path) 429 | if(lift==T){ 430 | dat <- MungeSumstats::liftover(sumstats_dt = dat, 431 | ref_genome = ref_genome, 432 | convert_ref_genome = convert_ref_genome) 433 | dat<-fwrite(dat,paste0(file,'(Mun_format',convert_ref_genome,').gz')) 434 | } 435 | gc() 436 | } 437 | 438 | format_cyclemr<-function(data,type="exposure",source="finn_r8"){ 439 | library(TwoSampleMR) 440 | library(MungeSumstats) 441 | 442 | if(source=="finn_r8"){ 443 | data<-format_data(data,type=type, 444 | id_col = "id",chr_col ="#chrom",pos="pos", 445 | snp_col = "rsids",beta_col = "beta",se_col = "sebeta", 446 | effect_allele_col = "alt",other_allele_col = "ref", 447 | eaf_col = "af_alt",phenotype_col = "phenotype",pval_col = "pval") 448 | } 449 | 450 | 451 | 452 | if(source=="ukb_nosnp"){ 453 | data<-separate(data,variant,c("chr","pos","ref","alt"),sep = ":") 454 | 455 | data<-dplyr::select(data,chr,pos,ref,alt,minor_AF,beta,se,pval) 456 | name<-colnames(data) 457 | colnames(data) <-c('CHR','BP','A1','A2','FRQ','BETA', 'SE','P') 458 | 459 | expo_rs_done<-format_sumstats(data,ref_genome = "GRCh37",return_data = TRUE) 460 | 461 | expo_rs_done$BP<-as.character(expo_rs_done$BP) 462 | colnames(data)<-name 463 | data<-merge(data,expo_rs_done,by.x=c("chr","pos"),by.y=c("CHR","BP"),all.x=TRUE) 464 | 465 | data<-format_data(data,type=type, 466 | samplesize_col = "n_complete_samples", 467 | snp_col = "SNP",effect_allele_col = "alt", 468 | other_allele_col = "ref",eaf_col ="minor_AF", 469 | beta_col="beta",se_col = "se",pval_col = "pval", 470 | chr_col="chr",pos_col ="pos",id='id',phenotype_col = 'phenotype') 471 | } 472 | 473 | if(source=="Mun"){ 474 | data<-format_data(data,type=type, 475 | snp_col = 'SNP', 476 | chr_col = 'CHR', 477 | pos_col = 'BP', 478 | effect_allele_col = 'A2', 479 | other_allele_col = "A1", 480 | se_col = 'SE', 481 | beta_col= 'BETA', 482 | eaf_col = 'FRQ', 483 | id_col = 'id', 484 | phenotype_col = "phenotype", 485 | pval_col = "P" 486 | ) 487 | } 488 | 489 | if(source=="covid"){ 490 | data<-format_data(data,type=type, 491 | id_col = "id",chr_col ="#CHR",pos="POS", 492 | snp_col = "rsid",beta_col = "all_inv_var_meta_beta",se_col = "all_inv_var_meta_sebeta", 493 | effect_allele_col = "ALT",other_allele_col = "REF", 494 | eaf_col = "all_meta_AF",phenotype_col = "phenotype", 495 | ncase_col = "all_inv_var_meta_cases",ncontrol_col = "all_inv_var_meta_controls", 496 | pval_col = "all_inv_var_meta_p") 497 | } 498 | 499 | if(source=="outcome"){ 500 | data<-format_data(data,type=type, 501 | id_col = "id.outcome",chr_col ="chr.outcome",pos="pos.outcome", 502 | snp_col = "SNP",beta_col = "beta.outcome",se_col = "se.outcome", 503 | effect_allele_col = "effect_allele.outcome",other_allele_col = "other_allele.outcome", 504 | eaf_col = "eaf.outcome",phenotype_col = "outcome", 505 | samplesize_col = "samplesize.outcome", 506 | pval_col = "pval.outcome") 507 | 508 | } 509 | if(source=="exposure"){ 510 | data<-format_data(data,type=type, 511 | id_col = "id.exposure",chr_col ="chr.exposure",pos="pos.exposure", 512 | snp_col = "SNP",beta_col = "beta.exposure",se_col = "se.exposure", 513 | effect_allele_col = "effect_allele.exposure",other_allele_col = "other_allele.exposure", 514 | eaf_col = "eaf.exposure",phenotype_col = "exposure", 515 | samplesize_col = "samplesize.exposure", 516 | pval_col = "pval.exposure") 517 | 518 | } 519 | 520 | if(source=="fast_ukb"){ 521 | data<-format_data(data,type=type,id_col="id", 522 | phenotype_col = "phenotype",snp_col = "SNP", 523 | effect_allele_col = "A1",other_allele_col = "A2", 524 | eaf_col ="AF1",beta_col="BETA",se_col = "SE", 525 | pval_col = "P",chr_col = "CHR",pos_col = "POS") 526 | } 527 | 528 | 529 | 530 | if(source=="bac"){ 531 | data<-format_data(data,type=type,id_col="bac", 532 | phenotype_col = "phenotype",snp_col = "rsID", 533 | effect_allele_col = "eff.allele",other_allele_col = "ref.allele", 534 | beta_col="beta",se_col = "SE", 535 | pval_col = "P.weightedSumZ",chr_col = "chr",pos_col = "bp",samplesize_col = "N") 536 | } 537 | 538 | if(source=="finn_r7"){ 539 | data<-format_data(data,type=type,snp_col = "rsids", 540 | effect_allele_col = "alt",other_allele_col = "ref", 541 | beta_col="beta",se_col = "sebeta",eaf_col = "af_alt", 542 | pval_col = "pval",chr_col = "chrom",pos_col = "pos") 543 | } 544 | return(data) 545 | } 546 | 547 | 548 | format_trait<-function(list,short=FALSE,short_num="40"){ 549 | for(i in 1:length(list)){ 550 | expo<-data.frame(exposure=list[[i]] 551 | $exposure) 552 | try(expo<-separate(expo,exposure,c('a','b'),sep='\\|\\|')) 553 | if(short==TRUE){try(expo$a <- substr(expo$a, 1, short_num))} 554 | try(expo<-expo$a) 555 | try(expo<-gsub(":","_",expo)) 556 | try(expo<-gsub("-","_",expo)) 557 | try(expo<-gsub(" ","_",expo)) 558 | try(expo<-gsub(",","_",expo)) 559 | try(expo<-gsub("/","_",expo)) 560 | list[[i]]$exposure<-expo[1] 561 | 562 | } 563 | 564 | 565 | return(list) 566 | } 567 | 568 | # read 569 | read_vcf_getmr<-function(file_name,nThread = 8,type=".gz"){ 570 | name<-file_name 571 | 572 | for(i in 1:nrow(name)){ 573 | dat<-read_sumstats(paste0("./",name[i]),nThread = nThread,nrow=Inf,standardise_headers = FALSE,mapping_file = sumstatsColHeaders) 574 | 575 | vroom_write(dat,paste0(name[i],type)) 576 | 577 | gc() 578 | } 579 | print(i) 580 | 581 | } 582 | 583 | read_easy<-function(file_name,pval=5e-08){ 584 | library(data.table) 585 | dat<-fread(file_name) 586 | dat<-subset(dat,pval.exposure%select(SNP,effect_allele.exposure,other_allele.exposure,eaf.exposure,beta.exposure,se.exposure,pval.exposure) 733 | colnames(dat_d) <-c('SNP','A1','A2','FRQ','BETA', 'SE','P') 734 | dat_d<-format_sumstats(dat_d,ref_genome = "GRCh37",return_data = TRUE) 735 | dat_d<-format_data(dat_d,type=type, 736 | snp_col = 'SNP', 737 | chr_col = 'CHR', 738 | pos_col = 'BP', 739 | effect_allele_col = 'A1', 740 | other_allele_col = "A2", 741 | se_col = 'SE', 742 | beta_col= 'BETA', 743 | eaf_col = 'FRQ', 744 | pval_col = "P" 745 | ) 746 | dat_d<-dat_d%>%select(SNP,chr.exposure,pos.exposure) 747 | dat<-merge(dat,dat_d,by="SNP",all.x=T) 748 | 749 | return(dat) 750 | } 751 | } 752 | if(type=="outcome"){ 753 | if(is.na(dat$eaf.outcome[1])==T || is.null(dat$eaf.outcome)==T){ 754 | print("需要先运行get_eaf_from_1000G来匹配eaf,再匹配chr和pos") 755 | } 756 | else{ 757 | dat_d<-dat%>%select(SNP,effect_allele.outcome,other_allele.outcome,eaf.outcome,beta.outcome,se.outcome,pval.outcome) 758 | colnames(dat_d) <-c('SNP','A1','A2','FRQ','BETA', 'SE','P') 759 | dat_d<-format_sumstats(dat_d,ref_genome = "GRCh37",return_data = TRUE) 760 | dat_d<-format_data(dat_d,type=type, 761 | snp_col = 'SNP', 762 | chr_col = 'CHR', 763 | pos_col = 'BP', 764 | effect_allele_col = 'A1', 765 | other_allele_col = "A2", 766 | se_col = 'SE', 767 | beta_col= 'BETA', 768 | eaf_col = 'FRQ', 769 | pval_col = "P" 770 | ) 771 | dat_d<-dat_d%>%select(SNP,chr.outcome,pos.outcome) 772 | dat<-merge(dat,dat_d,by="SNP",all.x=T) 773 | 774 | return(dat) 775 | } 776 | 777 | } 778 | } 779 | 780 | get_f<-function(dat,F_value=10){ 781 | log<-is.na(dat$eaf.exposure) 782 | log<-unique(log) 783 | if(length(log)==1) 784 | {if(log==TRUE){ 785 | print("数据不包含eaf,无法计算F统计量") 786 | return(dat)} 787 | } 788 | if(is.null(dat$beta.exposure[1])==T || is.na(dat$beta.exposure[1])==T){print("数据不包含beta,无法计算F统计量") 789 | return(dat)} 790 | if(is.null(dat$se.exposure[1])==T || is.na(dat$se.exposure[1])==T){print("数据不包含se,无法计算F统计量") 791 | return(dat)} 792 | if(is.null(dat$samplesize.exposure[1])==T || is.na(dat$samplesize.exposure[1])==T){print("数据不包含samplesize(样本量),无法计算F统计量") 793 | return(dat)} 794 | 795 | 796 | if("FALSE"%in%log && is.null(dat$beta.exposure[1])==F && is.na(dat$beta.exposure[1])==F && is.null(dat$se.exposure[1])==F && is.na(dat$se.exposure[1])==F && is.null(dat$samplesize.exposure[1])==F && is.na(dat$samplesize.exposure[1])==F){ 797 | R2<-(2*(1-dat$eaf.exposure)*dat$eaf.exposure*(dat$beta.exposure^2))/((2*(1-dat$eaf.exposure)*dat$eaf.exposure*(dat$beta.exposure^2))+(2*(1-dat$eaf.exposure)*dat$eaf.exposure*(dat$se.exposure^2)*dat$samplesize.exposure)) 798 | F<- (dat$samplesize.exposure-2)*R2/(1-R2) 799 | dat$R2<-R2 800 | dat$F<-F 801 | dat<-subset(dat,F>F_value) 802 | return(dat) 803 | } 804 | } 805 | 806 | 807 | # MR 808 | cause_getmr<-function(expo,outcome,LD_file,r2=0.001, 809 | kb=10000,pval=1e-05,cl=NULL){ 810 | format_cause_expo<-function(dat){ 811 | library(cause) 812 | dat<-gwas_format(dat,snp='SNP',beta_hat ='beta.exposure', 813 | se='se.exposure',A1="effect_allele.exposure", 814 | A2="other_allele.exposure",chrom='chr.exposure', 815 | pos="pos.exposure",p_value = 'pval.exposure' 816 | ) 817 | return(dat) 818 | } 819 | format_cause_out<-function(dat){ 820 | library(cause) 821 | dat<-gwas_format(dat,snp='SNP',beta_hat ='beta.outcome', 822 | se='se.outcome',A1="effect_allele.outcome", 823 | A2="other_allele.outcome",chrom='chr.outcome', 824 | pos="pos.outcome",p_value = 'pval.outcome' 825 | ) 826 | return(dat) 827 | } 828 | sample_cause<-function(dat,num_snp){ 829 | set.seed(123) 830 | VAR<-with(dat,sample(snp,size=num_snp,replace=FALSE)) 831 | return(VAR) 832 | } 833 | datap_cause<-function(dat){ 834 | dat$p1<-pnorm(abs(dat$beta_hat_1/dat$seb1),lower.tail=F)*2 835 | return(dat) 836 | } 837 | dat_clump_cause<-function(dat){ 838 | for_clump<-data.frame(dat$snp,dat$p1) 839 | colnames(for_clump)<-c('rsid','pval') 840 | return(for_clump) 841 | } 842 | datap_cause<-function(dat){ 843 | dat$p1<-pnorm(abs(dat$beta_hat_1/dat$seb1),lower.tail=F)*2 844 | return(dat) 845 | } 846 | ld_local<-function(dat,r2,kb,p,LD_file){ 847 | library(plinkbinr) 848 | 849 | plink_pathway<-get_plink_exe() 850 | 851 | dat<-ld_clump(dat,clump_r2 = r2,clump_kb = kb, 852 | clump_p = p, 853 | plink_bin =plink_pathway , bfile =LD_file) 854 | 855 | return(dat) 856 | } 857 | if("list"%in%class(outcome)){ 858 | single_expo=T 859 | single_outcome=F} 860 | if("list"%in%class(expo)){ 861 | single_outcome=T 862 | single_expo=F} 863 | if(single_expo==T){ 864 | outcome<-pblapply(outcome,FUN=format_cause_out,cl=cl) 865 | expo<-format_cause_expo(expo) 866 | } 867 | if(single_outcome==T){ 868 | expo<-pblapply(expo,FUN=format_cause_expo,cl=cl) 869 | outcome<-format_cause_out(outcome) 870 | } 871 | 872 | 873 | if(single_outcome==T){dat<-pblapply(expo,outcome,FUN=gwas_merge,cl=cl)} 874 | if(single_expo==T){dat<-pblapply(outcome,expo, 875 | FUN=function(outcome,exposure)gwas_merge(exposure,outcome), 876 | cl=cl)} 877 | 878 | 879 | VAR<-pblapply(dat,1000000,FUN=sample_cause) 880 | 881 | est<-foreach(i=1:length(VAR)) %do%{ 882 | library(cause) 883 | data<-est_cause_params(dat[[i]],VAR[[i]]) 884 | } 885 | dat<-pblapply(dat,FUN=datap_cause) 886 | for_clump<-lapply(dat, dat_clump_cause) 887 | clumped<-pblapply(for_clump,0.001,10000,1e-05,LD_file,FUN=ld_local) 888 | top_vars<-pblapply(clumped,FUN=function(dat) return(dat$rsid)) 889 | cause_res<-list() 890 | for(i in 1:length(dat)){ 891 | library(cause) 892 | res<-cause(dat[[i]],est[[i]],top_vars[[i]]) 893 | cause_res[[i]]<-res 894 | print(i) 895 | } 896 | cause_table<-data.frame() 897 | for (i in 1:length(cause_res)){ 898 | res<-cause_res[[i]] 899 | elpd<-res$elpd 900 | elpd$p<-pnorm(-elpd$z,lower.tail = F) 901 | name<-paste0('file',i) 902 | elpd$file<-name 903 | cause_table<-rbind(cause_table,elpd) 904 | } 905 | return(cause_table) 906 | } 907 | 908 | # RAPS 909 | RAPS_getmr<-function(dat,dir_figure){ 910 | setwd(dir_figure) 911 | res<-try(mr.raps(dat,over.dispersion = TRUE)) 912 | if(class(res)%in%"try-error"){} 913 | else{ 914 | exposure<-dat$id.exposure 915 | dir_create(exposure) 916 | dir_in<-paste0(dir_figure,'/',exposure) 917 | setwd(dir_in) 918 | #plot_name<-paste0(exposure,'raps.pdf') 919 | ggsave(file='raps_plot.pdf',plot=plot(res),width=9,height=5) 920 | res<-data.frame(beta.raps=res$beta.hat,se.raps=res$beta.se, 921 | eov=res$tau2.hat,se.eov=res$tau2.se,OR.raps=NA, 922 | or_lci95.raps=NA,or_uci95.raps=NA) 923 | res$OR.raps<- exp(res$beta.raps) 924 | res$or_lci95.raps<-exp(res$beta.raps)-(res$beta.raps*1.96) 925 | res$or_uci95.raps<-exp(res$beta.raps)+(res$beta.raps*1.96) 926 | res$pval.raps<-2*pnorm(abs(res$beta.raps/res$se.raps),lower.tail=F) 927 | res$pval.eov<-2*pnorm(abs(res$eov/res$se.eov),lower.tail=F) 928 | if(is.na(res$pval.eov)==FALSE){ 929 | if(res$pval.eov >0.05 ){ 930 | res1<-mr.raps(dat,over.dispersion = F) 931 | setwd(dir_in) 932 | #plot_name<-paste0(exposure,'raps.pdf') 933 | ggsave(file='raps_plot.pdf',plot=plot(res1),width=9,height=5) 934 | res1<-data.frame(beta.raps=res1$beta.hat,se.raps=res1$beta.se, 935 | eov=NA,se.eov=NA) 936 | res1$eov<-res$eov 937 | res1$se.eov<-res$se.eov 938 | res1$OR.raps<- exp(res1$beta.raps) 939 | res1$or_lci95.raps<-exp(res1$beta.raps)-(res1$beta.raps*1.96) 940 | res1$or_uci95.raps<-exp(res1$beta.raps)+(res1$beta.raps*1.96) 941 | res1$pval.raps<-2*pnorm(abs(res1$beta.raps/res1$se.raps),lower.tail=F) 942 | res1$pval.eov<-2*pnorm(abs(res1$eov/res1$se.eov),lower.tail=F) 943 | 944 | res<-res1 945 | } 946 | } 947 | 948 | return(res) 949 | } 950 | } 951 | 952 | mr_dircreate_base<-function(root_dir,project_name,date=NULL){ 953 | library(fs) 954 | dir_name<-root_dir 955 | setwd(dir_name) 956 | 957 | if(is.null(date)==FALSE){data_u<-date}else{data_u<-Sys.Date()} 958 | 959 | dir_name2<-project_name 960 | dir_name3<-paste0(dir_name2,data_u) 961 | dir_create(dir_name3) 962 | setwd(paste0(dir_name,"/",dir_name3)) 963 | dir_name4<-"1.figure" 964 | dir_name5<-"2.table" 965 | dir_name6<-"3.figure of sig res" 966 | dir_name7<-"4.snp with Fval" 967 | 968 | paste<-paste0(dir_name,"/",dir_name3,"/") 969 | 970 | dir1<-paste0(paste,dir_name4) 971 | dir.create(dir1) 972 | 973 | dir2<-paste0(paste,dir_name5) 974 | dir.create(dir2) 975 | 976 | dir3<-paste0(paste,dir_name6) 977 | dir.create(dir3) 978 | 979 | dir4<-paste0(paste,dir_name7) 980 | dir.create(dir4) 981 | 982 | res<-list(paste=paste,dir1=dir1,dir2=dir2,dir3=dir3,dir4=dir4) 983 | 984 | return(res) 985 | } 986 | 987 | 988 | 989 | # PRESSO 990 | mr_Presso<-function(dat,num=10000){ 991 | library(TwoSampleMR) 992 | library(MRPRESSO) 993 | library(dplyr) 994 | 995 | nsnp_filter=6 996 | set.seed(123) 997 | try (mr_presso_res<-mr_presso(BetaOutcome ="beta.outcome", BetaExposure = "beta.exposure", SdOutcome ="se.outcome", SdExposure = "se.exposure", 998 | OUTLIERtest = TRUE,DISTORTIONtest = TRUE, data = dat, 999 | SignifThreshold = 0.05,NbDistribution = num)) 1000 | return(mr_presso_res) 1001 | 1002 | } 1003 | mr_presso_pval<-function(mr_presso_res){ 1004 | try ( mr_presso_main<-mr_presso_res$`Main MR results`) 1005 | try ( mr_presso_main[3:5,]<-NA) 1006 | return(mr_presso_main) 1007 | } 1008 | 1009 | 1010 | mr_presso_snp<-function(mr_presso_res,mr_presso_main,dat,type="list"){ 1011 | data_re<-list() 1012 | if(type=="list"){ 1013 | for(i in 1:length(mr_presso_res)){ 1014 | res<-mr_presso_res[[i]] 1015 | main<-mr_presso_main[[i]] 1016 | data<-dat[[i]] 1017 | try(if(is.na(main[2,6])==FALSE){ 1018 | outliers<-which(res$`MR-PRESSO results`$`Outlier Test`$Pvalue<0.05) 1019 | data$mr_keep[outliers]<-FALSE 1020 | }) 1021 | data_re[[i]]<-data 1022 | names(data_re)[[i]]<-names(dat)[[i]] 1023 | } 1024 | return(data_re) 1025 | } 1026 | 1027 | if(type=="data"){ 1028 | res<-mr_presso_res$`MR-PRESSO results` 1029 | main<-mr_presso_main 1030 | data<-dat 1031 | try(if(is.na(main[2,6])==FALSE){ 1032 | outliers<-which(res$`MR-PRESSO results`$`Outlier Test`$Pvalue<0.05) 1033 | data$mr_keep[outliers]<-FALSE 1034 | }) 1035 | return(data) 1036 | } 1037 | } 1038 | 1039 | #MRlap 1040 | 1041 | mr_lap<-function(expo,outcome,ld,hm3,pval,r2,kb,MR_reverse=1e-03,save_logfiles=F){ 1042 | expo<-expo%>%select(rsid=SNP,chr=chr.exposure,pos=pos.exposure,alt=effect_allele.exposure, 1043 | ref=other_allele.exposure,N=samplesize.exposure,beta=beta.exposure, 1044 | se=se.exposure) 1045 | expo<-as.data.frame(expo) 1046 | outcome<-outcome%>%select(rsid=SNP,chr=chr.outcome,pos=pos.outcome,alt=effect_allele.outcome, 1047 | ref=other_allele.outcome,N=samplesize.outcome,beta=beta.outcome, 1048 | se=se.outcome) 1049 | outcome<-as.data.frame(outcome) 1050 | n_expo<-expo$exposure[1] 1051 | n_out<-outcome$outcome[1] 1052 | 1053 | try(res<-MRlap::MRlap(exposure = expo, 1054 | exposure_name = n_expo, 1055 | outcome = outcome, 1056 | outcome_name = n_out, 1057 | ld = ld, 1058 | hm3 = hm3,MR_threshold=pval,MR_pruning_dist=kb, 1059 | MR_pruning_LD=r2,MR_reverse=MR_reverse,save_logfiles=save_logfiles 1060 | )) 1061 | try(snp<-data.frame(SNP_MRlap=res$MRcorrection$IVs)) 1062 | try(res_c<-as.data.frame(res$MRcorrection[-4])%>%select(nsnp=m_IVs, 1063 | beta_MRlap=corrected_effect, 1064 | se_MRlap=corrected_effect_se, 1065 | p_MRlap=corrected_effect_p)) 1066 | try(res<-c(res,res_c,snp)) 1067 | 1068 | try(return(res)) 1069 | } 1070 | 1071 | 1072 | # clean 1073 | clean_expo<-function(expo,pval,low_af=0.5,high_af=0.5, 1074 | clump=TRUE,kb=10000,r2=0.001,LD_file=NULL,af_filter=FALSE){ 1075 | library(TwoSampleMR) 1076 | dat<-subset(expo,pval.exposurehigh_af)} 1078 | 1079 | if(clump==TRUE){ 1080 | 1081 | if(is.null(LD_file)==TRUE){ 1082 | dat<-clump_data(dat,clump_kb = kb,clump_r2 = r2)} 1083 | else{ 1084 | library(plinkbinr) 1085 | plink_pathway<-get_plink_exe() 1086 | snp<-data.frame(rsid=dat$SNP,pval=dat$pval.exposure) 1087 | snp<-try(ld_clump(snp,clump_kb = kb,clump_r2 = r2,plink_bin =plink_pathway , bfile =LD_file)) 1088 | if ("try-error"%in% class(snp)){return(dat)} 1089 | else{ 1090 | snp<-data.frame(SNP=snp$rsid) 1091 | dat<-merge(snp,dat,by.x="SNP",by.y='SNP') 1092 | } 1093 | } 1094 | 1095 | } 1096 | return(dat) 1097 | } 1098 | 1099 | clean_list<-function(list,nrow=10){ 1100 | l<-lapply(list,nrow) 1101 | n<-data.frame() 1102 | for(i in 1:length(l)){ 1103 | if(is.null(l[[i]])==T || l[[i]]==0 )next 1104 | 1105 | n1<-data.frame(i,l[[i]]) 1106 | n<-rbind(n,n1) 1107 | } 1108 | colnames(n)<-c('l','row') 1109 | n<-subset(n,row>nrow) 1110 | list<-list[n$l] 1111 | return(list) 1112 | } 1113 | 1114 | clean_IV_from_outsig<-function(dat,MR_reverse=1e-03){ 1115 | dat<-subset(dat,pval.outcome>MR_reverse) 1116 | return(dat) 1117 | } 1118 | 1119 | 1120 | LDSC_rg<-function(expo,outcome,an,sample_prev=NA, 1121 | population_prev=NA,ld,wld,chr_filter=c(1:22),n_blocks=200){ 1122 | id.o<-outcome$id.outcome[1] 1123 | id.e<-expo$id.exposure[1] 1124 | 1125 | expo<-expo%>%mutate(Z=beta.exposure/se.exposure) 1126 | expo<-expo%>%select(SNP=SNP,N=samplesize.exposure,Z=Z 1127 | ,A1=effect_allele.exposure 1128 | ,A2=other_allele.exposure) 1129 | expo<-as_tibble(expo) 1130 | 1131 | outcome<-outcome%>%mutate(Z=beta.outcome/se.outcome) 1132 | outcome<-outcome%>%select(SNP=SNP,N=samplesize.outcome,Z=Z 1133 | ,A1=effect_allele.outcome 1134 | ,A2=other_allele.outcome) 1135 | outcome<-as_tibble(outcome) 1136 | 1137 | 1138 | dat<-list(expo,outcome) 1139 | names(dat)<-c(id.e,id.o) 1140 | 1141 | rm(expo,outcome) 1142 | 1143 | 1144 | res<-try(ldscr::ldsc_rg(dat,ancestry = an,sample_prev=sample_prev, 1145 | population_prev=population_prev,ld=ld,wld=wld, 1146 | n_blocks=n_blocks,chr_filter=chr_filter)) 1147 | 1148 | return(res) 1149 | 1150 | } 1151 | 1152 | --------------------------------------------------------------------------------