├── Get_MR1.0 help.pdf
├── Get_MR2.0 help.pdf
├── 1.0
    ├── Install_Reference_Genome.r
    ├── Get_MR1.0dependence.R
    └── Get_MR1.0.r
├── Get_MR2.0 help.md
├── Get_MR1.0 help.md
├── README.md
├── LICENSE
└── 2.0
    └── Get_MR2.0.r


/Get_MR1.0 help.pdf:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/HaobinZhou/Get_MR/HEAD/Get_MR1.0 help.pdf


--------------------------------------------------------------------------------
/Get_MR2.0 help.pdf:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/HaobinZhou/Get_MR/HEAD/Get_MR2.0 help.pdf


--------------------------------------------------------------------------------
/1.0/Install_Reference_Genome.r:
--------------------------------------------------------------------------------
1 | BiocManager::install("SNPlocs.Hsapiens.dbSNP155.GRCh37")
2 | BiocManager::install("BSgenome.Hsapiens.1000genomes.hs37d5")
3 | BiocManager::install("SNPlocs.Hsapiens.dbSNP155.GRCh38")
4 | BiocManager::install("BSgenome.Hsapiens.NCBI.GRCh38")
5 | 


--------------------------------------------------------------------------------
/1.0/Get_MR1.0dependence.R:
--------------------------------------------------------------------------------
 1 | ## 保证你的R版本是4.2及以上！！
 2 | install.packages("devtools")
 3 | if (!require("BiocManager", quietly = TRUE))
 4 |   install.packages("BiocManager")
 5 | devtools::install_github("rondolab/MR-PRESSO")
 6 | devtools::install_github("MRCIEU/TwoSampleMR")
 7 | devtools::install_github("explodecomputer/plinkbinr")
 8 | devtools::install_github("mrcieu/ieugwasr")
 9 | devtools::install_github("jean997/cause")
10 | BiocManager::install("MungeSumstats")
11 | devtools::install_github("n-mounier/MRlap")
12 | devtools::install_github("mglev1n/ldscr")
13 | install.packages(c('dplyr','tidyr','data.table','stringr','vroom','mr.raps','pbapply','doParallel'))
14 | 


--------------------------------------------------------------------------------
/Get_MR2.0 help.md:
--------------------------------------------------------------------------------
  1 | ## Get_MR2.0
  2 | 
  3 | ### 写在前面
  4 | 
  5 | **欢迎来到向量化与并行化的世界！**本次更新就一个重大功能，就是并行化运行mr分析，以最优的效率批量跑大量的数据。家用电脑较新服务器基本可以实现2小时批量运行10000个因素，如果你有高性能服务器，恭喜你，30分钟以内就能跑完。
  6 | 
  7 | 
  8 | 
  9 | 由于本次主要是思路与方法的分享，所以函数的帮助文档写的不多，主要还是看示例代码即可，应该还是很容易上手的。
 10 | 
 11 | 
 12 | 
 13 | **公众号回复：“示例”即得示例代码**
 14 | 
 15 | 
 16 | 
 17 | ## cyclemr
 18 | 
 19 | ###  描述
 20 | 
 21 | 这个函数是用于执行循环Mendelian Randomization (CycleMR) 分析的功能函数，可以在R语言中使用。该函数可以将数据分配到多个计算节点中运行，提高MR分析的效率。
 22 | 
 23 | ### 用法
 24 | 
 25 | 直接在R语言中调用这个函数，如下所示：
 26 | 
 27 | ```
 28 | # 调用cyclemr函数
 29 | mr_results <- cyclemr(dat = data, cl_num = 4, type = "list")
 30 | ```
 31 | 
 32 | ### 参数
 33 | 
 34 | - `dat`: harmonise_data后的数据，可以是数据框或列表类型。默认是list
 35 | - `cl_num`: 批量化线程数。
 36 | - `type`: MR分析数据类型，可以是"list"或"data"，默认为"list"。主要使用情况也是list
 37 | 
 38 | > 注： 如何判断自己的电脑能开启的最大线程数？
 39 | >
 40 | > ![image-20230501190452753](A:\OneDrive\GET\get_mr\assets\image-20230501190452753.png)
 41 | >
 42 | > 在任务管理器可看到
 43 | >
 44 | > * 内核： 计算机核心数
 45 | >
 46 | > * 逻辑处理器： 计算机总线程数
 47 | >
 48 | >   比如说很多处理器宣传是8核16线程，这个8就指的是内核，这个16指的是逻辑处理器。
 49 | >
 50 | >   本质上，16只是将每个核心一分为2，但是他们能干的活是一样的。所以一般设置为内核数即可满载CPU
 51 | >
 52 | >   当然这不能一概而论，因为每个CPU和厂家调度不一样，如果你发现使用内核数不能让CPU跑到100%，则尝试用逻辑处理器数
 53 | 
 54 | ### 返回值
 55 | 
 56 | - `cyclemr`函数返回一个包含MR分析结果的数据框。
 57 | 
 58 | ### 使用举例
 59 | 
 60 | 下面是使用这个函数的一个示例：
 61 | 
 62 | ```
 63 | mr_results <- cyclemr(dat = data, cl_num = 16, type = "list")
 64 | ```
 65 | 
 66 | 
 67 | 
 68 | ### 运行时间参考
 69 | 
 70 | 在设置无误情况下，这是我手头有的所有电脑测试出的运行时间：
 71 | 
 72 | 运行10000个数据。（ieu批量数据前10000个）除了服务器外，其他均使用Windows系统
 73 | 
 74 | | CPU                                     | 核数（运行时开的线程数） | 时间      |
 75 | | --------------------------------------- | ------------------------ | --------- |
 76 | | i9-12900H（拯救者2022 Y9000P 狂暴模式） | 14核20线程（14）         | 1小时28分 |
 77 | | r7-5700X（台式）                        | 8核16线程（16）          | 1小时38分 |
 78 | | r9-6800H  (yoga2022 14S 性能模式)       | 8核16线程（16）          | 1小时54分 |
 79 | | r5-3500X (台式)                         | 6核12线程 （12）         | 3小时47分 |
 80 | | r5-4600H （拯救者 2020 R7000 狂暴模式） | 6核12线程 （12）         | 约4小时   |
 81 | | 双路 EPYC 7T83 （服务器  Linux）        | 128核256线程（128）      | 11分钟    |
 82 | 
 83 | **欢迎各位补充自己手头的机器的运行时间数据，尤其M系列的苹果处理器数据**
 84 | 
 85 | 
 86 | 
 87 | ## 一些小工具
 88 | 
 89 | ## get_rsid
 90 | 
 91 | ### 描述
 92 | 
 93 | 根据CHR和POS，从ensemble官网中获取rsID。
 94 | 
 95 | ### 用法
 96 | 
 97 | ```
 98 | get_rsid(chr, pos, version = 'hg38')
 99 | ```
100 | 
101 | ### 参数
102 | 
103 | - `chr`：染色体号。
104 | - `pos`：基因位置。
105 | - `version`：表示使用的基因组版本，默认为最新的版本（`'hg38'`）。也可选择hg19.
106 | - 注： GRCh37=hg19，GRCh38=hg38
107 | 
108 | ### 详细说明
109 | 
110 | 该函数基于生物信息学数据库Ensembl SNP Mart来查询给定位置的相关信息。如果未指定基因组版本，则默认使用最新的版本(hg38)。该函数会根据指定的基因组版本选择正确的URL。如果您想查询其他版本的数据，可以将`version`参数设置为相应版本的字符串。
111 | 
112 | 参考：[How to find rsID with biomaRt in R (bioconductor.org)](https://support.bioconductor.org/p/9135301/)
113 | 
114 | ### 使用举例
115 | 
116 | ```
117 | ds4 <- data.frame(CHR = c("8", "8", "8", "8", "8"),POS = c('101592213', '106973048', '108690829', '102569817', '108580746'))
118 | res<-get_rsid(chr=ds4$CHR, pos=ds4$POS, version = 'hg38')
119 | ```
120 | 
121 | 
122 | 
123 | ### 错误说明
124 | 
125 | 如果出现：
126 | 
127 | ```
128 | Error in curl::curl_fetch_memory(url, handle = handle) : 
129 |   Timeout was reached: [grch37.ensembl.org:80] Operation timed out after 300013 milliseconds with 7909 bytes received
130 | ```
131 | 
132 | 这并不是代码问题，而是网络超时了，ensemble的API经常拥堵，多试几次即可。当然也有可能请求的数据量太大，也可能会出现这个问题。
133 | 
134 | 
135 | 
136 | ##  get_exposure 和 get_outcome
137 | 
138 | ###  描述
139 | 
140 | 这两个函数是用于进行双样本MR（Two-sample Mendelian Randomization）分析的数据处理和提取过程的功能函数。
141 | 
142 | - `get_exposure`函数用于从给定的遗传仪器ID中提取出暴露（exposure）数据。
143 | - `get_outcome`函数用于从给定的遗传仪器ID和暴露数据中提取出结果（outcome）数据。
144 | 
145 | ### 用法
146 | 
147 | 使用这两个函数前，需要先安装并加载TwoSampleMR包。
148 | 
149 | ```
150 | library(TwoSampleMR)
151 | ```
152 | 
153 | 然后可以直接在R语言中调用这两个函数，如下所示：
154 | 
155 | ```
156 | # 调用get_exposure函数
157 | exposure_data <- get_exposure(id = "ieu-a-1", pval = 5e-8, r2 = 0.001, kb = 10000)
158 | 
159 | # 调用get_outcome函数
160 | outcome_data <- get_outcome(id = "ieu-a-1", expo = exposure_data)
161 | ```
162 | 
163 | ### 参数
164 | 
165 | - `get_exposure`函数的参数说明：
166 |   - `id`: 遗传仪器ID。
167 |   - `pval`: 提取暴露数据的P值阈值，默认为5e-08。
168 |   - `r2`: 遗传仪器间的LD（linkage disequilibrium）值的阈值，默认为0.001。
169 |   - `kb`: 遗传仪器的范围（以kb为单位），默认为10000。
170 | - `get_outcome`函数的参数说明：
171 |   - `id`: 遗传仪器ID。
172 |   - `expo`: 暴露数据，TwoSampleMR包格式
173 | 
174 | 
175 | 
176 | ## get_ao
177 | 
178 | ### 描述
179 | 
180 | 获取OPEN GWAS数据库所有可用的ID。可以指定获取某个前缀的ID
181 | 
182 | ### 用法
183 | 
184 | 使用这个函数前，需要先安装并加载TwoSampleMR包。然后可以直接在R语言中调用这个函数，如下所示：
185 | 
186 | ```
187 | # 调用get_ao函数
188 | ao <- get_ao()## 不限定来源则返回所有id
189 | ao <- get_ao(a = "finn")## 这样就会返回finn的所有可用id
190 | ```
191 | 
192 | ### 参数
193 | 
194 | - `a`: （可选）数据来源。
195 | 
196 | ### 备注： 来源的名称
197 | 
198 | 来自OPEN GWAS [Browse the IEU OpenGWAS project (mrcieu.ac.uk)](https://gwas.mrcieu.ac.uk/datasets/?trait__icontains=12) 5.1获取
199 | 
200 | | Batch                                                        | Description                                                  | [Count](https://gwas.mrcieu.ac.uk/datasets/?trait__icontains=12#counts) |
201 | | ------------------------------------------------------------ | ------------------------------------------------------------ | ------------------------------------------------------------ |
202 | | [bbj-a](https://gwas.mrcieu.ac.uk/datasets/?gwas_id__icontains=bbj-a) | [Biobank Japan release of disease traits](http://jenger.riken.jp/en/) | 120                                                          |
203 | | [ebi-a](https://gwas.mrcieu.ac.uk/datasets/?gwas_id__icontains=ebi-a) | [Datasets that satisfy minimum requirements imported from the EBI database of complete GWAS summary data](https://www.ebi.ac.uk/gwas/downloads/summary-statistics) | 2,585                                                        |
204 | | [eqtl-a](https://gwas.mrcieu.ac.uk/datasets/?gwas_id__icontains=eqtl-a) | [eQTLGen 2019 results, comprising all cis and some trans regions of gene expression in whole blood](https://www.eqtlgen.org/) | 19,942                                                       |
205 | | [finn-b](https://gwas.mrcieu.ac.uk/datasets/?gwas_id__icontains=finn-b) | [FinnGen biobank analysis round 5](https://www.finngen.fi/)  | 2,803                                                        |
206 | | [ieu-a](https://gwas.mrcieu.ac.uk/datasets/?gwas_id__icontains=ieu-a) | [GWAS summary datasets generated by many different consortia that have been manually collected and curated, initially developed for MR-Base](https://elifesciences.org/articles/34408) | 440                                                          |
207 | | [ieu-b](https://gwas.mrcieu.ac.uk/datasets/?gwas_id__icontains=ieu-b) | [GWAS summary datasets generated by many different consortia that have been manually collected and curated, initially developed for MR-Base (round 2)](https://elifesciences.org/articles/34408) | 207                                                          |
208 | | [met-a](https://gwas.mrcieu.ac.uk/datasets/?gwas_id__icontains=met-a) | [Human blood metabolites analysed by Shin et al 2014](https://www.ncbi.nlm.nih.gov/pubmed/24816252) | 452                                                          |
209 | | [met-b](https://gwas.mrcieu.ac.uk/datasets/?gwas_id__icontains=met-b) | [Human immune system traits analysed by Roederer et al 2015](https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4393780/) | 150                                                          |
210 | | [met-c](https://gwas.mrcieu.ac.uk/datasets/?gwas_id__icontains=met-c) | [Circulating metabolites analysed by Kettunen et al 2016](https://www.ncbi.nlm.nih.gov/pubmed/27005778) | 123                                                          |
211 | | [met-d](https://gwas.mrcieu.ac.uk/datasets/?gwas_id__icontains=met-d) | [Metabolic biomarkers in the UK Biobank measured by Nightingale Health 2020](https://www.ukbiobank.ac.uk/learn-more-about-uk-biobank/news/nightingale-health-and-uk-biobank-announces-major-initiative-to-analyse-half-a-million-blood-samples-to-facilitate-global-medical-research) | 249                                                          |
212 | | [prot-a](https://gwas.mrcieu.ac.uk/datasets/?gwas_id__icontains=prot-a) | [Complete GWAS summary data on protein levels as described by Sun et al 2018](https://www.ncbi.nlm.nih.gov/pubmed/29875488) | 3,282                                                        |
213 | | [prot-b](https://gwas.mrcieu.ac.uk/datasets/?gwas_id__icontains=prot-b) | [Complete GWAS summary data on protein levels as described by Folkersen et al 2017](https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5393901/) | 83                                                           |
214 | | [prot-c](https://gwas.mrcieu.ac.uk/datasets/?gwas_id__icontains=prot-c) | [Complete GWAS summary data on protein levels as described by Suhre et al 2017](https://pubmed.ncbi.nlm.nih.gov/28240269) | 1,124                                                        |
215 | | [ubm-a](https://gwas.mrcieu.ac.uk/datasets/?gwas_id__icontains=ubm-a) | [Complete GWAS summary data on brain region volumes as described by Elliott et al 2018](https://www.ncbi.nlm.nih.gov/pubmed/30305740) | 3,143                                                        |
216 | | [ukb-a](https://gwas.mrcieu.ac.uk/datasets/?gwas_id__icontains=ukb-a) | [Neale lab analysis of UK Biobank phenotypes, round 1](http://www.nealelab.is/blog/2017/7/19/rapid-gwas-of-thousands-of-phenotypes-for-337000-samples-in-the-uk-biobank) | 596                                                          |
217 | | [ukb-b](https://gwas.mrcieu.ac.uk/datasets/?gwas_id__icontains=ukb-b) | [IEU analysis of UK Biobank phenotypes](https://data.bris.ac.uk/data/dataset/pnoat8cxo0u52p6ynfaekeigi) | 2,514                                                        |
218 | | [ukb-d](https://gwas.mrcieu.ac.uk/datasets/?gwas_id__icontains=ukb-d) | [Neale lab analysis of UK Biobank phenotypes, round 2](http://www.nealelab.is/uk-biobank) | 904                                                          |
219 | | [ukb-e](https://gwas.mrcieu.ac.uk/datasets/?gwas_id__icontains=ukb-e) | [Pan-ancestry genetic analysis of the UK Biobank performed at the Broad Institute](https://pan.ukbb.broadinstitute.org/) | 3,873                                                        |
220 | 
221 | 
222 | 
223 | ## clean_outcome_from_exposure
224 | 
225 | ### 描述
226 | 
227 | （主要用于向量化）用于清洗outcome。将exposure中（list形式，也就是批量化的形式存在的exposure）不存在的SNP从outcome中剔除，大幅精简outcome，并大幅提升harmonise_data的速度。
228 | 
229 | **实测清洗与不清洗outcome对比，速度相差一百倍以上。**
230 | 
231 | ### 用法
232 | 
233 | 直接在R语言中调用这个函数，如下所示：
234 | 
235 | ```
236 | # 调用clean_outcome_from_exposure函数
237 | cleaned_outcome <- clean_outcome_from_exposure(expo = exposure_data, outcome = outcome_data)
238 | ```
239 | 
240 | ### 参数
241 | 
242 | - `expo`: 暴露数据，格式为**list**，TwoSampleMR包格式。比如get_exposure批量化获取下来的数据
243 | - `outcome`: 结果数据，TwoSampleMR包格式。
244 | 
245 | 
246 | 
247 | ## clean_GWAS
248 | 
249 | ### 描述
250 | 
251 | 这个函数是用于清洗遗传关联数据集，使其符合特定的数据集要求的功能函数。
252 | 
253 | ### 用法
254 | 
255 | 直接在R语言中调用这个函数，如下所示：
256 | 
257 | ```
258 | # 调用clean_GWAS函数
259 | cleaned_GWAS_list <- clean_GWAS(list = GWAS_list, clean = c("bbj", "eqtl"))
260 | ```
261 | 
262 | ### 参数
263 | 
264 | - `list`:一个list。里面包含每个暴露的data.frame。 具体参考批量运行get_exposure后的结果
265 | - `clean`: 需要清洗的数据集名称，一个字符型向量类型。具体参考get_ao的附注。
266 | 
267 | ### 返回值
268 | 
269 | - `clean_GWAS`函数返回一个清洗后的遗传关联数据集列表，符合特定数据集要求。
270 | 
271 | 
272 | 
273 | ## 作者信息
274 | 
275 | - 代码作者：广州医科大学 第一临床学院 周浩彬 第二临床学院 谢治鑫
276 | - 帮助文档作者： 周浩彬
277 | - 时间：2023/5/1
278 | - 适配版本: Get_MR2.0
279 | - 开源许可证：GPL3.0
280 | - 公众号： GetScience
281 | - 致谢：感谢广州医科大学 第六临床学院 黄覃耀和 南山学院 林子凯在孟德尔随机化概念，代码思路等提供的重要的建设性建议。


--------------------------------------------------------------------------------
/Get_MR1.0 help.md:
--------------------------------------------------------------------------------
  1 | # Get_MR1.0
  2 | 
  3 | ## 1. 写在前面：
  4 | 
  5 | ### 1.1 项目地址
  6 | 
  7 | **github：**[HaobinZhou/Get_MR: A package for running MR In batches and in parallel quickly (github.com)](https://github.com/HaobinZhou/Get_MR)
  8 | 
  9 | **如果觉得好用，可以点一下github项目上的小星星吗，这是我们继续开源的最大动力，谢谢！**
 10 | 
 11 | 
 12 | 
 13 | 
 14 | 
 15 | ### 1.2 R包使用方法： 
 16 | 
 17 | **R包以R脚本的形式提供，打开R包，全选运行，即得到所有function**
 18 | 
 19 | 1. 进入github[HaobinZhou/Get_MR: A package for running MR In batches and in parallel quickly (github.com)](https://github.com/HaobinZhou/Get_MR)，下载代码zip
 20 | 
 21 | 2. ```R
 22 |    source("./Get_MR1.0.r") ## 填文件所在地址
 23 |    
 24 |    ## 或者直接打开R文件，全选代码运行也可以！
 25 |    ```
 26 | 
 27 |    
 28 | 
 29 | 
 30 | 
 31 | ### 1.3 常见问题：
 32 | 
 33 | 1. **本地clump，1000G处理好的MAF文件，MRlap依赖文件如何获取**： GetScience公众号可免费获取已处理好文件，回复"依赖文件"即得链接。源文件请查看本文相应function介绍处
 34 | 
 35 | 2. **输入clump文件路径后总是报错**：
 36 | 
 37 |    ```R
 38 |    #尤其注意这个文件名的书写，因为他们是二进制文件，不需要写后缀！只需要选取对应的人种即可，比如欧洲人：
 39 |    LD_file="S:/GWAS数据/本地LD依赖文件/EUR"
 40 |    
 41 |    ## 这个问题我回答好多遍啦！
 42 |    ```
 43 | 
 44 |    
 45 | 
 46 | 3. **第一次使用如何安装关联R包：**[Get_MR/1.0 at main · HaobinZhou/Get_MR (github.com)](https://github.com/HaobinZhou/Get_MR/tree/main/1.0) 
 47 | 
 48 |    1. 如果不需要使用`MungeSumstats`包（相关函数包括：`format_Mun`，`get_chr_pos`，`format_getmr`中`source="ukb_nosnp"`) ，则只需要运行[Get_MR1.0dependence.R](https://github.com/HaobinZhou/Get_MR/blob/main/1.0/Get_MR1.0dependence.R)
 49 |    2. 如果需要使用`MungeSumstats`包，则还需运行[Install_Reference_Genome.r](https://github.com/HaobinZhou/Get_MR/blob/main/1.0/Install_Reference_Genome.r) 这个包括了hg19和hg38的基因组参考文件，总大小达到了5G！**如果直接安装失败，在GetScience公众号回复"基因组参考"可得下载链接，并本地安装**（推荐）
 50 | 
 51 | 4. **Bug反馈**：代码仅由两人编写，难免出现错误。欢迎提交bug到GetScience公众号后台！
 52 | 
 53 | 5. **感谢所有Get_MR使用的R包作者**，是因为他们我们才得以轻松实现这么多复杂的功能。他们都是开源的，因此我们承诺Get_MR将**永久免费开源**。这意味着使用者可以随意地修改，分发代码，但前提是遵守：
 54 | 
 55 |    **1.本代码不得用于任何商业或盈利目的**
 56 | 
 57 |    **2.未经代码作者的同意，本代码不得用于任何形式的销售或商业交易**
 58 | 
 59 |    **3.本代码可以在非商业性的科研、学术研究和个人使用的情况下免费使用**
 60 | 
 61 |    **4.在使用本代码并重新打包并向公众发放时，请引用我们的公众号原文**
 62 | 
 63 |    
 64 | 
 65 | ## 2. 帮助文档目录
 66 | 
 67 | [TOC]
 68 | 
 69 | #进阶MR分析
 70 | 
 71 | ## LDSC_rg
 72 | 
 73 | 用于计算两个数据框中SNP之间的遗传相关性（rg）。
 74 | 
 75 | ### 用法
 76 | 
 77 | ```R
 78 | LDSC_rg(expo, outcome, an, sample_prev = NA, population_prev = NA,
 79 |         ld, wld, chr_filter = c(1:22), n_blocks = 200)
 80 | ```
 81 | 
 82 | ### 参数
 83 | 
 84 | - `expo`: 一个数据框，其中包含一个遗传暴露指标的多个SNP和它们与结果变量的rg。
 85 | - `outcome`: 一个数据框，其中包含一个结果变量的多个SNP和它们与遗传暴露指标的rg。
 86 | - `an`: 它是一个字符串,目前还没有作用（因为我们提供的依赖文件只有eur的，其他人种还没更新）
 87 | - `sample_prev`: 遗传暴露指标的样本流行病学先验患病率。默认为 `NA`。
 88 | - `population_prev`: 遗传暴露指标的人群流行病学先验患病率。默认为 `NA`。
 89 | - `ld`: 本地LD依赖文件
 90 | - `wld`: 本地weighted LD 依赖文件
 91 | - `chr_filter`: 一个整数向量，用于指定要使用的染色体。默认为包含1-22的整数向量。
 92 | - `n_blocks`: 用于计算加权LD矩阵的块数。默认为200。
 93 | 
 94 | ### 返回值
 95 | 
 96 | 一个具有以下元素的列表：
 97 | 
 98 | - `rg`: 两个数据框中SNP之间的遗传相关性（rg）。
 99 | - `pval`: `rg` 的双侧P值。
100 | - `N_snps`: 参与计算rg的SNP数量。
101 | 
102 | ### 示例
103 | 
104 | **具体用法参照：mr_lap和LDSC_rg示例.r** 可通过公众号GetScience回复示例获取文件
105 | 
106 | 
107 | 
108 | ## mr_lap
109 | 
110 | ### 描述
111 | 
112 | mrlap是一种矫正样本重叠后的双样本MR方法。可用于怀疑有样本重叠的数据中。
113 | 
114 | R包官网：[n-mounier/MRlap: R package to perform two-sample Mendelian Randomisation (MR) analyses using (potentially) overlapping samples (github.com)](https://github.com/n-mounier/MRlap)
115 | 
116 | ### 语法
117 | 
118 | ```R
119 | mr_lap(expo, outcome, ld, hm3, pval, r2, kb, MR_reverse = 1e-03, save_logfiles = F)
120 | ```
121 | 
122 | 
123 | 
124 | ### 参数
125 | 
126 | - `expo`: 数据框，为TwoSampleMR包格式的数据
127 | - `outcome`: 数据框，为TwoSampleMR包格式的数据
128 | - `ld`: 数据框，本地LD文件路径
129 | - `hm3`: 数据框，本地HapMap3文件路径 
130 | - `pval`: 数值，MR 工具变量阈值。
131 | - `r2`: 数值，clump阈值
132 | - `kb`: 数值，clump阈值
133 | - `MR_reverse`: 数值，MR 的方向翻转阈值。
134 | - `save_logfiles`: 逻辑值，是否保存日志文件。
135 | 
136 | ### 值
137 | 
138 | - res: mrlap 结果。
139 | 
140 | 
141 | 
142 | ### 用法
143 | 
144 | **具体用法参照：mr_lap和LDSC_rg示例.r** 可通过公众号GetScience回复示例获取文件
145 | 
146 | 
147 | 
148 | ## cause_getmr函数
149 | 
150 | ### 描述
151 | 
152 | 一键式执行cause。可批量化执行多暴露对一结局或一暴露对多结局
153 | 
154 | ### 用法
155 | 
156 | ```R
157 | ## 不并行化运行
158 | cause_getmr(expo, outcome, LD_file, r2 = 0.001, kb = 10000, pval = 1e-05)
159 | 
160 | ## 并行化运行
161 | cl<-makeCluster(2) ## 填你想要的并行化的核数，核数越多，需要的运行内存越大
162 | cause_getmr(expo, outcome, LD_file, r2 = 0.001, kb = 10000, pval = 1e-05,cl=cl)
163 | ```
164 | 
165 | ### 参数
166 | 
167 | - `expo`: TwoSampleMR的暴露格式的数据。
168 | - `outcome`: TwoSampleMR的暴露格式的数据。
169 | - 注意！expo和outcome，可以是data.frame的形式，也可以是一个list（如list[[1:n]]里都包含数据的data.frame)。但不能outcome和expo同时都是list。当expo或outcome，其中一个为list的情况下，是批量运行一对一的cause。比如我读取了10个暴露和1个结局，将10个暴露lapply读取进来就会是一个list。这时候`cause_cyclemr` 自动运行每个暴露对结局的cause，也就是批量化执行.
170 | 
171 | ```R
172 | ## 比如我这里读取3个暴露文件和1和结局文件
173 | id<-c('a.gz','b.gz','c.gz')
174 | expo<-lapply(id,FUN=fread)
175 | outcome<-fread("outcome.gz")
176 | cl=makeCluster(4)## 内存不够的也可以不并行化运行
177 | res<-cause_getmr(expo, outcome, LD_file, r2 = 0.001, kb = 10000, pval = 1e-05,cl=cl)
178 | stopCluster(cl)
179 | ## 这样返回的结果就是3个暴露分别对一个结局的cause结果。
180 | ```
181 | 
182 | - `LD_file`: 包含LD信息的PLINK文件名。因为需要大批量地clump，在线clump很容易报错，因此我们采用本地clump。需要本地参考文件。下载地址： http://fileserve.mrcieu.ac.uk/ld/1kg.v3.tgz 。 或关注公众号GetScience直接获取。
183 | 
184 | ```R
185 | #尤其注意这个文件名的书写，因为他们是二进制文件，不需要写后缀！只需要选取对应的人种即可，比如欧洲人：
186 | LD_file="S:/GWAS数据/本地LD依赖文件/EUR"
187 | 
188 | ## 这个问题我回答好多遍啦！
189 | ```
190 | 
191 | - `r2`: LD的R平方阈值。默认值为0.001。
192 | - `kb`: LD的距离阈值（以kb为单位）。默认值为10000。
193 | - `pval`: 用于LD clumping的p值阈值。默认值为1e-05。
194 | - `cl`: 并行计算的cluster对象。默认值为NULL。在外部使用cl=makeCluster(n),n为你想并行化的核数。注意核数太多不要爆内存了。
195 | 
196 | ### 值
197 | 
198 | `cause_cyclemr`函数返回cause结果
199 | 
200 | 
201 | 
202 | ## RAPS_getmr
203 | 
204 | ### 描述
205 | 
206 | `RAPS_getmr`函数执行基于RAPS的MR并返回结果,并画图
207 | 
208 | ### 用法
209 | 
210 | ```R
211 | expo<-fread('a.gz')
212 | outcome<-fread('b.gz')
213 | expo<-format_data(...)
214 | outcome<-format_data(...)  ## format_data是TwoSampleMR包的函数，格式化。
215 | expo<-pblapply(expo,pval=1,kb=10000,r2=0.001,LD_file=LD_file,FUN=clean_expo) ## 数据很大，建议本地clump，在线很容易报错
216 | dat<-harmonise(expo,outcome)
217 | res<-RAPS_getmr(dat, dir_figure)
218 | ```
219 | 
220 | ### 参数
221 | 
222 | - `dat`: TwoSampleMR包 harmonise_data后输出的数据
223 | - `dir_figure`: 保存结果图形的目录。
224 | 
225 | ### 值
226 | 
227 | `RAPS_getmr`函数返回一个包含基于RAPS的MR结果的数据框。
228 | 
229 | 
230 | 
231 | 
232 | 
233 | ## mr_Presso
234 | 
235 | ### 描述
236 | 
237 | 执行MR-PRESSO
238 | 
239 | ### 语法
240 | 
241 | ```R
242 | mr_Presso(dat, num = 10000)
243 | ```
244 | 
245 | 
246 | 
247 | ### 参数
248 | 
249 | - `dat`: 数据框，包含基因表达和疾病风险关联分析的数据。
250 | - `num`: 整数，模拟数量。
251 | 
252 | ### 值
253 | 
254 | - `mr_presso_res`: MR-PRESSO 结果。
255 | 
256 | ### 用法
257 | 
258 | ```R
259 | dat<-harmonise_data(exposure,outcome) ## TwoSampleMR包的harmonise_data函数输出的结果
260 | mr_presso_res <- mr_Presso(dat, num = 10000)
261 | ```
262 | 
263 | 
264 | 
265 | ## mr_presso_pval函数
266 | 
267 | ### 描述
268 | 
269 | 提取 MR-PRESSO 结果中的主要结果
270 | 
271 | ### 语法
272 | 
273 | ```R
274 | mr_presso_pval(mr_presso_res)
275 | ```
276 | 
277 | 
278 | 
279 | ### 参数
280 | 
281 | - `mr_presso_res`: MR-PRESSO 结果。
282 | 
283 | ### 值
284 | 
285 | - mr_presso_main: MR-PRESSO 主要结果。
286 | 
287 | ### 用法
288 | 
289 | ```R
290 | mr_presso_main <- mr_presso_pval(mr_presso_res) ##mr_Presso输出的结果
291 | ```
292 | 
293 | 
294 | 
295 | ## mr_presso_snp函数
296 | 
297 | ### 描述
298 | 
299 | 根据 MR-PRESSO 分析结果，将离群值剔除，返回剔除离群值后的dat(我一般称为dat_aj， 也就是 adjusted_data)， 可用于后续的IVW等分析。
300 | 
301 | ### 语法
302 | 
303 | ```R
304 | mr_presso_snp(mr_presso_res, mr_presso_main, dat, type = "list")
305 | ```
306 | 
307 | 
308 | 
309 | ### 参数
310 | 
311 | - `mr_presso_res`: MR-PRESSO 结果。
312 | - `mr_presso_main`: MR-PRESSO 主要结果。
313 | - `dat`: 数据框或数据框列表，包含基因表达和疾病风险关联分析的数据。
314 | - `type`: 字符串，输入数据类型。可选值为 "list" 或 "data"。如果是列表形式的（批量化运行后的结果），就是`list`，如果是普通数据框就是data
315 | 
316 | ### 值
317 | 
318 | 过滤后的数据框或数据框列表。
319 | 
320 | ### 用法
321 | 
322 | ```R
323 | dat<-harmonise_data(exposure,outcome) ## TwoSampleMR包的harmonise_data函数输出的结果
324 | mr_presso_res <- mr_Presso(dat, num = 10000)
325 | mr_presso_main <- mr_presso_pval(mr_presso_res)
326 | data_aj <- mr_presso_snp(mr_presso_res, mr_presso_main, dat, type = "data")
327 | 
328 | ## 用矫正的data可以用于后续的分析，例如重新计算mr
329 | res_aj<-mr(data_aj)
330 | ```
331 | 
332 | 
333 | 
334 | 
335 | 
336 | 
337 | 
338 | 
339 | 
340 | # 快捷预处理及质控工具
341 | 
342 | ## format_Mun
343 | 
344 | ### 介绍
345 | 
346 | 运用MungeSumstats包标准化GWAS 摘要统计数据（包括hg19和hg38转换）。该函数可以将来自Finngen R8和其他来源的 GWAS 摘要统计数据文件清洗为标准的GWAS文件，并可将基因组位置从 `ref_genome` 转换到 `convert_ref_genome`。
347 | 
348 | ### 用法
349 | 
350 | ```R
351 | format_Mun(file, source = "finn_r8", save_path = NULL, lift = F, ref_genome = "hg38", convert_ref_genome = "hg19")
352 | ```
353 | 
354 | ### 参数
355 | 
356 | - `file`：字符向量或数据框，表示要格式化的 GWAS 摘要统计数据文件或数据框。如果输入的是字符向量，则表示文件的路径。如果输入的是数据框，则表示要格式化的数据框。
357 | - `source`：字符向量，表示输入文件的来源。默认为 `"finn_r8"`。
358 | - `save_path`：字符向量，表示格式化文件要保存的路径。默认为 `NULL`。
359 | - `lift`：逻辑值，表示是否将基因组位置从 `ref_genome` 转换到 `convert_ref_genome`。默认为 `F`。
360 | - `ref_genome`：字符向量，表示 GWAS 摘要统计数据文件使用的参考基因组。默认为 `"hg38"`。
361 | - `convert_ref_genome`：字符向量，表示要将基因组位置转换到的参考基因组。默认为 `"hg19"`。
362 | 
363 | ### 例子
364 | 
365 | ```R
366 | # 从文件中格式化数据
367 | format_Mun("my_sumstats.txt", save_path = "~/formatted_sumstats", lift = F, ref_genome = "hg38")
368 | 
369 | # 从数据框中格式化数据
370 | my_sumstats_df <- read.csv("my_sumstats.csv")
371 | format_Mun(my_sumstats_df, save_path = "~/formatted_sumstats", lift = F, ref_genome = "hg38")
372 | 
373 | #格式化数据并升降版本
374 | format_Mun(my_sumstats_df, save_path = "~/formatted_sumstats", lift = T, ref_genome = "hg38", convert_ref_genome = "hg19") ## 从hg38转为hg19
375 | ```
376 | 
377 | ### 返回值
378 | 
379 | 该函数返回格式化的数据框并将其写入磁盘文件。`save_path`指定保存的位置
380 | 
381 | 
382 | 
383 | 
384 | 
385 | ## format_getmr
386 | 
387 | ### 介绍
388 | 
389 | 预设的快捷格式化 GWAS 摘要统计数据，这个函数用于将来自多个数据来源的 GWAS 摘要统计数据转换为TwoSampleMR 包所需的格式。
390 | 
391 | ### 用法
392 | 
393 | ```
394 | format_getmr(data, type = "exposure", source = "finn_r8")
395 | ```
396 | 
397 | ### 参数
398 | 
399 | - `data`：数据框，表示要格式化的 GWAS 摘要统计数据。
400 | - `type`：字符向量，表示数据类型，可以是 "exposure" 或 "outcome"。默认为 "exposure"。
401 | - `source`：字符向量，表示数据来源。默认为 "finn_r8"。目前支持的来源有：
402 |   - "finn_r8": [Data download - FinnGen Documentation (gitbook.io)](https://finngen.gitbook.io/documentation/data-download)
403 |   - "ukb_nosnp": 尼尔数据库（UKB），因为没有rsid，因此需要匹配（已一键完成）。[www.nealelab.is/uk-biobank](http://www.nealelab.is/uk-biobank)
404 |   - "Mun"： 来自MungeSumstats包格式化后的数据
405 |   - "covid"： [COVID19-hg GWAS meta-analyses round 7 (covid19hg.org)](https://www.covid19hg.org/results/r7/)
406 |   - "outcome" ： 已经格式化为TwoSampleMR包的“outcome”格式
407 |   - "exposure"：已经格式化为TwoSampleMR包的“exposure”格式
408 |   - "fast_ukb": [fastGWA | Yang Lab (westlake.edu.cn)](https://yanglab.westlake.edu.cn/data/ukb_fastgwa/imp_binary/)
409 |   - "bac": 2021年肠菌原文数据 [Large-scale association analyses identify host factors influencing human gut microbiome composition - PMC (nih.gov)](https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8515199/)
410 | 
411 | ### 例子
412 | 
413 | ```R
414 | my_data <- fread("my_data.gz")
415 | format_getmr(my_data, type = "finn_r8", source = "Mun")
416 | ```
417 | 
418 | ### 返回值
419 | 
420 | 该函数返回格式化的数据框。
421 | 
422 | 
423 | 
424 | ## format_trait
425 | 
426 | ### 介绍
427 | 
428 | 这个函数用于格式化 GWAS 摘要统计数据中的表型信息，使其符合命名规范，易于保存为文件（例如批量保存计算R2和F值后的文件）。
429 | 
430 | 主要是为了解决，在Windows系统下，保存文件的名称中不能包含特殊字符，例如`:`,`|`。
431 | 
432 | ### 用法
433 | 
434 | ```R
435 | format_trait(list, short = FALSE, short_num = "40")
436 | ```
437 | 
438 | ### 参数
439 | 
440 | - `list`：列表，表示要格式化的 GWAS 摘要统计数据列表。
441 | - `short`：逻辑值，表示是否要将表型名称缩短。默认为 FALSE。
442 | - `short_num`：字符向量，表示缩短表型名称的长度。默认为 "40"。
443 | 
444 | ### 例子
445 | 
446 | ```R
447 | my_list <- list(data1, data2, data3)
448 | format_trait(my_list, short = TRUE, short_num = "20")
449 | ```
450 | 
451 | ### 返回值
452 | 
453 | 该函数返回格式化后的 GWAS 摘要统计数据列表。
454 | 
455 | 
456 | 
457 | 
458 | 
459 | ## read_vcf_getmr
460 | 
461 | ### 介绍
462 | 
463 | 这个函数用于从 VCF 文件中读取摘要统计数据。并保存为压缩文件。默认是.gz为后缀的压缩文件。方便下次读取以及节省空间。
464 | 
465 | 这是因为读取VCF文件将消耗大量电脑资源。我们建议批量读取VCF文件后储存为易于读取的压缩包形式。下次读取方便快捷。因此本函数不会直接返回数据框，而是保存为文件
466 | 
467 | ### 用法
468 | 
469 | ```
470 | read_vcf_getmr(file_name, nThread = 8, type = ".gz")
471 | ```
472 | 
473 | ### 参数
474 | 
475 | - `file_name`：字符向量，表示要读取的 VCF 文件名。
476 | - `nThread`：整数，表示要使用的线程数。默认为 8。
477 | - `type`：字符向量，表示输出文件类型。默认为 ".gz"。
478 | 
479 | ### 例子
480 | 
481 | ```R
482 | my_file <- "my_file.vcf"
483 | read_vcf_getmr(my_file, nThread = 4, type = ".gz")
484 | ```
485 | 
486 | ### 返回值
487 | 
488 | 该函数没有返回值，而是将读取的数据写入文件。
489 | 
490 | 
491 | 
492 | ## read_easy
493 | 
494 | ### 介绍
495 | 
496 | 这个函数用于从文件中读取 GWAS 摘要统计数据。并返回经过P值筛选的文件。一般用于批量读取大量文件时。比如我要批量读取100个暴露数据，每个数据占用运行内存2G。如果100个，则200G，不是一般电脑可以承受。因此每次读取将直接筛选p值，压缩大小
497 | 
498 | ### 用法
499 | 
500 | ```R
501 | read_easy(file_name, pval = 5e-08)
502 | ```
503 | 
504 | ### 参数
505 | 
506 | - `file_name`：字符向量，表示要读取的文件名。
507 | - `pval`：数字，表示筛选摘要统计数据的显著性水平。默认为 5e-08。
508 | 
509 | ### 例子
510 | 
511 | ```R
512 | my_file <- "my_file.csv"
513 | read_easy(my_file, pval = 1e-06)
514 | ```
515 | 
516 | ### 返回值
517 | 
518 | 该函数返回摘要统计数据的数据框。
519 | 
520 | 
521 | 
522 | 
523 | 
524 | ## get_eaf_from_1000G
525 | 
526 | ### 介绍
527 | 
528 | 从1000G的MAF文件中提取EAF并将其与输入数据匹配。
529 | 
530 | ### 用法
531 | 
532 | ```
533 | get_eaf_from_1000G(dat, path, type = "exposure")
534 | ```
535 | 
536 | ### 参数
537 | 
538 | - `dat`：一个数据框，为TwoSampleMR包格式的数据
539 | - `path`：一个字符串，表示包含1000G MAF文件`fileFrequency.frq`的目录路径。
540 | - `type`：一个字符串，表示数据是“exposure”（暴露因素）还是“outcome”（结果），默认为“exposure”。
541 | 
542 | ### 值
543 | 
544 | 一个数据框，其中包含输入数据的EAF和类型信息（原始、修正或错误）。
545 | 
546 | ### 详细说明
547 | 
548 | 该函数将读取1000G MAF文件，然后将其与输入数据进行匹配。对于每个SNP，该函数将检查输入数据中的效应等位基因与1000G中的效应等位基因是否匹配。
549 | 
550 | 如果不匹配，则将EAF设置为NA并将其类型设置为“error”。对于匹配的SNP，EAF将设置为1000G MAF中的值（如果输入数据的效应等位基因是minor allele），或1-MAF（如果输入数据的效应等位基因是major allele），并将其类型设置为“raw”或“corrected”。
551 | 
552 | 如果`type`参数设置为“outcome”，则函数将使用输入数据中的结果等位基因来查找EAF。
553 | 
554 | 在处理完所有SNP后，该函数将输出一些有关匹配成功、失败以及NA的信息，以及类型信息的说明。
555 | 
556 | ### 示例
557 | 
558 | 以下是使用该函数的示例：
559 | 
560 | ```R
561 | dat <- get_eaf_from_1000G(dat, "S:/GWAS数据/本地LD依赖文件", type = "exposure")
562 | 
563 | # 检查输出
564 | head(dat)
565 | ```
566 | 
567 | `fileFrequency.frq`为PLINK1.9输出的，根据1000G数据提取的MAF数据
568 | 
569 | 1000G参考文件下载地址：http://fileserve.mrcieu.ac.uk/ld/1kg.v3.tgz
570 | 
571 | 可自行提取MAF数据。或从`GetScience`公众号中获取已经提取好的`fileFrequency.frq`文件
572 | 
573 | 
574 | 
575 | ## get_chr_pos
576 | 
577 | 该函数利用MungeSumstats包匹配rsid的染色体位置。
578 | 
579 | ### 用法
580 | 
581 | ```R
582 | get_chr_pos(dat, type = "exposure")
583 | ```
584 | 
585 | ### 参数
586 | 
587 | - `dat`：一个数据框，TwoSampleMR格式
588 | - `type`：一个字符串，表示要获取SNP染色体位置和参考序列的SNP类型。可选值为"exposure"或"outcome"。
589 | 
590 | ### 返回值
591 | 
592 | 一个数据框，其中包含输入数据框的信息，以及新列`chr.exposure`或`chr.outcome`，表示每个SNP的染色体编号。新列`pos.exposure`或`pos.outcome`表示每个SNP在染色体上的位置。
593 | 
594 | ### 函数说明
595 | 
596 | 该函数使用`format_sumstats`和`format_data`函数从1000G项目中获取SNP的染色体位置和参考序列信息。
597 | 
598 | ### 示例
599 | 
600 | 以下示例演示如何使用`get_chr_pos`函数：
601 | 
602 | ```R
603 | # 获取曝露变量SNP的染色体位置和参考序列
604 | exposure_chr_pos <- get_chr_pos(dat, type = "exposure")
605 | 
606 | # 获取结果变量SNP的染色体位置和参考序列
607 | outcome_chr_pos <- get_chr_pos(dat, type = "outcome")
608 | ```
609 | 
610 | 
611 | 
612 | ## get_f函数
613 | 
614 | ### 描述
615 | 
616 | `get_f`函数计算样本的F统计量并返回F值大于指定阈值的样本数据。返回的数据包括每个SNP的R2和F值
617 | 
618 | ### 用法
619 | 
620 | ```R
621 | get_f(dat, F_value = 10)
622 | ```
623 | 
624 | ### 参数
625 | 
626 | - `dat`: TwoSampleMR格式，一定要包含`eaf.exposure`, `beta.exposure`, `se.exposure`, 和`samplesize.exposure`的数据框。
627 | - `F_value`: 指定的F统计量的阈值，F值大于该阈值的样本将被返回。默认值为10。
628 | 
629 | 
630 | 
631 | 注，本计算公式为
632 | 
633 | F值：![img](A:\OneDrive\GET\assets\clip_image002.gif) R2值：![image-20230327151305186](A:\OneDrive\GET\assets\image-20230327151305186.png)
634 | 
635 | 公式参考文献： 
636 | 
637 | [A Multivariate Genome-Wide Association Analysis of 10 LDL Subfractions, and Their Response to Statin Treatment, in 1868 Caucasians - PMC (nih.gov)](https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4405269/)
638 | 
639 | [Large-scale association analyses identify host factors influencing human gut microbiome composition - PMC (nih.gov)](https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8515199/)
640 | 
641 | 
642 | 
643 | 
644 | 
645 | 
646 | 
647 | ### 
648 | 
649 | ## mr_dircreate_base
650 | 
651 | ### 描述
652 | 
653 | `mr_dircreate_base`函数创建基本的目录结构以保存MR分析结果和图形。
654 | 
655 | ### 用法
656 | 
657 | ```R
658 | mr_dircreate_base(root_dir, project_name, date = NULL)
659 | ```
660 | 
661 | ### 参数
662 | 
663 | - `root_dir`: 保存结果文件夹的根目录。
664 | - `project_name`: 项目名称，用于在根目录下创建一个以此命名的子目录。
665 | - `date`: 日期（可选），用于在子目录名称中添加日期以区分不同日期的结果文件夹。默认为`NULL`，表示不添加日期。
666 | 
667 | ### 值
668 | 
669 | `mr_dircreate_base`函数返回一个包含结果文件夹路径的列表。
670 | 
671 | ### 示例
672 | 
673 | ```R
674 | # 创建结果文件夹
675 | res_dir <- mr_dircreate_base("path/to/root/dir", "project_name", date = "20220327")
676 | 
677 | # 打印结果文件夹路径
678 | print(res_dir)
679 | ```
680 | 
681 | ### 注意事项
682 | 
683 | - `root_dir`参数指定保存结果文件夹的根目录。
684 | - `project_name`参数指定项目名称，用于在根目录下创建一个以此命名的子目录。
685 | - `date`参数（可选）用于在子目录名称中添加日期以区分不同日期的结果文件夹。默认为`NULL`，表示不添加日期。
686 | - `mr_dircreate_base`函数将在根目录下创建一个名为`project_name`的子目录，并在该子目录下创建4个子目录，分别命名为`1.figure`、`2.table`、`3.figure of sig res`和`4.snp with Fval`。函数返回一个包含结果文件夹路径的列表。
687 | 
688 | 
689 | 
690 | 
691 | 
692 | 
693 | 
694 | ## clean_expo
695 | 
696 | ### 描述
697 | 
698 | 用于快捷筛选工具变量的函数，可执行P值筛选，EAF值筛选，clump。
699 | 
700 | ### 用法
701 | 
702 | ```R
703 | ## 完整可选参数
704 | clean_expo(expo, pval, low_af = 0.5, high_af = 0.5, clump = TRUE, kb = 10000, r2 = 0.001, LD_file = NULL, af_filter = FALSE)
705 | 
706 | ##不提供LD_file则自动在线clump
707 | clean_expo(expo, pval, clump = TRUE, kb = 10000, r2 = 0.001)
708 | ##提供则本地clump
709 | clean_expo(expo, pval, clump = TRUE, kb = 10000, r2 = 0.001，LD_file=LD_file)
710 | ```
711 | 
712 | ### 参数
713 | 
714 | - `expo`: 一个数据框，其中包含遗传暴露指标的SNP名称、beta值、标准误、p值和频率。
715 | - `pval`: 用于筛选遗传暴露指标的p值阈值。p值小于此阈值的SNP将被保留。
716 | - `low_af`: 频率过滤的下限值。默认为0.5。如果`af_filter`为TRUE，则只有遗传暴露指标的频率低于此值或高于`high_af`时，才会被保留。
717 | - `high_af`: 频率过滤的上限值。默认为0.5。
718 | - `clump`: 一个逻辑值，指示是否使用PLINK进行SNP聚类。默认为TRUE。
719 | - `kb`: 聚类的窗口大小（以kb为单位）。默认为10000。
720 | - `r2`: LD阈值。默认为0.001。
721 | - `LD_file`: PLINK二进制文件的路径。如果未提供，则默认在线clump。
722 | - `af_filter`: 一个逻辑值，指示是否启用频率过滤。默认为FALSE。
723 | 
724 | 
725 | 
726 | ## clean_list
727 | 
728 | ### 描述
729 | 
730 | 用于清理列表中元素行数的R包。常用于批量化质量控制。
731 | 
732 | ### 用法
733 | 
734 | ```R
735 | clean_list(list, nrow = 10)
736 | ```
737 | 
738 | ### 参数
739 | 
740 | - `list`: 一个列表，其中包含多个元素。
741 | - `nrow`: 用于筛选元素的行数阈值。如果元素的行数小于此阈值，则该元素将被删除。默认为10。
742 | 
743 | ### 返回值
744 | 
745 | 一个列表，其中仅包含行数大于 `nrow` 的元素。
746 | 
747 | ### 例子
748 | 
749 | ```
750 | # 创建一个包含5个数据框的列表，每个数据框包含1-5行
751 | set.seed(123)
752 | lst <- list(data.frame(a = rnorm(1), b = rnorm(1)),
753 |             data.frame(a = rnorm(2), b = rnorm(2)),
754 |             data.frame(a = rnorm(3), b = rnorm(3)),
755 |             data.frame(a = rnorm(4), b = rnorm(4)),
756 |             data.frame(a = rnorm(5), b = rnorm(5)))
757 | 
758 | # 运行 clean_list 函数，将阈值设置为2
759 | cleaned_lst <- clean_list(lst, nrow = 3)
760 | 
761 | # 查看清理后的列表
762 | cleaned_lst
763 | ```
764 | 
765 | 
766 | 
767 | 
768 | 
769 | ## clean_IV_from_outsig
770 | 
771 | 用于从一个数据框中清理具有显著的MR反向因果效应P值的IV。
772 | 
773 | ### 用法
774 | 
775 | ```R
776 | clean_IV_from_outsig(dat, MR_reverse = 1e-03)
777 | ```
778 | 
779 | ### 参数
780 | 
781 | - `dat`: 一个数据框，其中包含每个IV和其与结果变量之间的MR反向因果效应的P值。
782 | - `MR_reverse`: 用于筛选IV的MR反向因果效应P值阈值。具有P值小于此阈值的IV将被保留。默认为1e-03。
783 | 
784 | ### 返回值
785 | 
786 | 一个数据框，其中包含P值大于 `MR_reverse` 值的IV（也就是反向不显著的IVs）。
787 | 
788 | 
789 | 
790 | ## 作者信息
791 | 
792 | * 代码作者：广州医科大学 第一临床学院 周浩彬     第二临床学院 谢治鑫
793 | 
794 | * 帮助文档作者： 周浩彬
795 | * 时间：2023/3/27
796 | * 适配版本: Get_MR1.0
797 | * 开源许可证：GPL3.0
798 | * 公众号： GetScience
799 | 
800 | * 致谢：感谢广州医科大学 第六临床学院 黄覃耀和 南山学院 林子凯在孟德尔随机化概念，代码思路等提供的重要的建设性建议。
801 | 
802 | 


--------------------------------------------------------------------------------
/1.0/Get_MR1.0.r:
--------------------------------------------------------------------------------
  1 | library(mr.raps)
  2 | library(TwoSampleMR)
  3 | #library(plyr)
  4 | library(dplyr)
  5 | #library(fs) 
  6 | #library(ggplot2) 
  7 | #library(lubridate) 
  8 | library(ieugwasr)
  9 | library(plinkbinr) 
 10 | library(tidyr)
 11 | #library(progress)
 12 | library(data.table)
 13 | library(MRPRESSO)
 14 | library(parallel)
 15 | #library(foreach)
 16 | library(doParallel)
 17 | library(pbapply)
 18 | library(stringr)
 19 | library(cause)
 20 | library(vroom)
 21 | library(MungeSumstats)
 22 | #library(GenomicFiles)
 23 | #library(meta)
 24 | #library(readr)
 25 | #library(readxl)
 26 | #library(forestploter)
 27 | library(ldscr)
 28 | library(MRlap)
 29 | #library(meta)
 30 | #library(forestplot)
 31 | #library(biomaRt)
 32 | #library(gwasglue)
 33 | #library(coloc)
 34 | 
 35 | # format
 36 | format_Mun<-function(file,source="finn_r8",save_path=NULL,lift=F,ref_genome = "hg38",
 37 |                      convert_ref_genome = "hg19"){
 38 |   library(data.table)
 39 |   library(MungeSumstats)
 40 |   library(dplyr)
 41 |   if(class(file)!="data.frame"){
 42 |   dat<-fread(file)
 43 |   if(source=="finn_r8"){
 44 |     dat<-dat%>%dplyr::select(SNP=rsids,CHR=`#chrom`,BP=pos,A1=ref,
 45 |                              A2=alt,FRQ=af_alt,BETA=beta,
 46 |                              SE=sebeta,P=pval)
 47 |   }
 48 |   }
 49 |   dat<-as.data.frame(dat)
 50 |   dat<-format_sumstats(dat,return_data = TRUE)
 51 |   if(is.null(save_path)==F){setwd(save_path)}
 52 |   dat<-fwrite(dat,paste0(file,'(Mun_format',ref_genome,').gz'))
 53 |   te<-fs::dir_info( tempdir())
 54 |   te<-subset(te,size>1e+8)
 55 |   fs::file_delete(te$path)
 56 |   if(lift==T){
 57 |     dat <- MungeSumstats::liftover(sumstats_dt = dat, 
 58 |                                    ref_genome = ref_genome,
 59 |                                    convert_ref_genome = convert_ref_genome)
 60 |     dat<-fwrite(dat,paste0(file,'(Mun_format',convert_ref_genome,').gz'))
 61 |   }
 62 |   gc()
 63 | }
 64 | 
 65 | format_getmr<-function(data,type="exposure",source="finn_r8"){
 66 |   library(TwoSampleMR)
 67 |   library(MungeSumstats)
 68 |   
 69 |   if(source=="finn_r8"){
 70 |     data<-format_data(data,type=type,
 71 |                       id_col = "id",chr_col ="#chrom",pos="pos",
 72 |                       snp_col = "rsids",beta_col = "beta",se_col = "sebeta",
 73 |                       effect_allele_col = "alt",other_allele_col = "ref",
 74 |                       eaf_col = "af_alt",phenotype_col = "phenotype",pval_col = "pval")
 75 |   }
 76 |   
 77 |   
 78 |   
 79 |   if(source=="ukb_nosnp"){
 80 |     data<-separate(data,variant,c("chr","pos","ref","alt"),sep = ":")
 81 |     
 82 |     data<-dplyr::select(data,chr,pos,ref,alt,minor_AF,beta,se,pval)
 83 |     name<-colnames(data)
 84 |     colnames(data) <-c('CHR','BP','A1','A2','FRQ','BETA', 'SE','P')
 85 |     
 86 |     expo_rs_done<-format_sumstats(data,ref_genome = "GRCh37",return_data = TRUE)
 87 |     
 88 |     expo_rs_done$BP<-as.character(expo_rs_done$BP)
 89 |     colnames(data)<-name
 90 |     data<-merge(data,expo_rs_done,by.x=c("chr","pos"),by.y=c("CHR","BP"),all.x=TRUE)
 91 |     
 92 |     data<-format_data(data,type=type,
 93 |                       samplesize_col = "n_complete_samples",
 94 |                       snp_col = "SNP",effect_allele_col = "alt",
 95 |                       other_allele_col = "ref",eaf_col ="minor_AF",
 96 |                       beta_col="beta",se_col = "se",pval_col = "pval",
 97 |                       chr_col="chr",pos_col ="pos",id='id',phenotype_col = 'phenotype')
 98 |   }
 99 |   
100 |   if(source=="Mun"){
101 |     data<-format_data(data,type=type,
102 |                       snp_col = 'SNP',
103 |                       chr_col = 'CHR',
104 |                       pos_col = 'BP',
105 |                       effect_allele_col = 'A2',
106 |                       other_allele_col = "A1",
107 |                       se_col = 'SE',
108 |                       beta_col= 'BETA',
109 |                       eaf_col = 'FRQ',
110 |                       id_col = 'id',
111 |                       phenotype_col = "phenotype",
112 |                       pval_col = "P"
113 |     )
114 |   }
115 |   
116 |   if(source=="covid"){
117 |     data<-format_data(data,type=type,
118 |                       id_col = "id",chr_col ="#CHR",pos="POS",
119 |                       snp_col = "rsid",beta_col = "all_inv_var_meta_beta",se_col = "all_inv_var_meta_sebeta",
120 |                       effect_allele_col = "ALT",other_allele_col = "REF",
121 |                       eaf_col = "all_meta_AF",phenotype_col = "phenotype",
122 |                       ncase_col = "all_inv_var_meta_cases",ncontrol_col = "all_inv_var_meta_controls",
123 |                       pval_col = "all_inv_var_meta_p")
124 |   }
125 |   
126 |   if(source=="outcome"){
127 |     data<-format_data(data,type=type,
128 |                       id_col = "id.outcome",chr_col ="chr.outcome",pos="pos.outcome",
129 |                       snp_col = "SNP",beta_col = "beta.outcome",se_col = "se.outcome",
130 |                       effect_allele_col = "effect_allele.outcome",other_allele_col = "other_allele.outcome",
131 |                       eaf_col = "eaf.outcome",phenotype_col = "outcome",
132 |                       samplesize_col = "samplesize.outcome",
133 |                       pval_col = "pval.outcome")
134 |     
135 |   }
136 |   if(source=="exposure"){
137 |     data<-format_data(data,type=type,
138 |                       id_col = "id.exposure",chr_col ="chr.exposure",pos="pos.exposure",
139 |                       snp_col = "SNP",beta_col = "beta.exposure",se_col = "se.exposure",
140 |                       effect_allele_col = "effect_allele.exposure",other_allele_col = "other_allele.exposure",
141 |                       eaf_col = "eaf.exposure",phenotype_col = "exposure",
142 |                       samplesize_col = "samplesize.exposure",
143 |                       pval_col = "pval.exposure")
144 |     
145 |   }
146 |   
147 |   if(source=="fast_ukb"){
148 |     data<-format_data(data,type=type,id_col="id",
149 |                       phenotype_col = "phenotype",snp_col = "SNP",
150 |                       effect_allele_col = "A1",other_allele_col = "A2",
151 |                       eaf_col ="AF1",beta_col="BETA",se_col = "SE",
152 |                       pval_col = "P",chr_col = "CHR",pos_col = "POS")
153 |   }
154 |   
155 |   
156 |   
157 |   if(source=="bac"){
158 |     data<-format_data(data,type=type,id_col="bac",
159 |                       phenotype_col = "phenotype",snp_col = "rsID",
160 |                       effect_allele_col = "eff.allele",other_allele_col = "ref.allele",
161 |                       beta_col="beta",se_col = "SE",
162 |                       pval_col = "P.weightedSumZ",chr_col = "chr",pos_col = "bp",samplesize_col = "N")
163 |   }
164 |   
165 |   if(source=="finn_r7"){
166 |     data<-format_data(data,type=type,snp_col = "rsids",
167 |                       effect_allele_col = "alt",other_allele_col = "ref",
168 |                       beta_col="beta",se_col = "sebeta",eaf_col = "af_alt",
169 |                       pval_col = "pval",chr_col = "chrom",pos_col = "pos")
170 |   }
171 |   return(data)
172 | }
173 | 
174 | 
175 | format_trait<-function(list,short=FALSE,short_num="40"){
176 |   for(i in 1:length(list)){
177 |     expo<-data.frame(exposure=list[[i]]
178 |                      $exposure)
179 |     try(expo<-separate(expo,exposure,c('a','b'),sep='\\|\\|'))
180 |     if(short==TRUE){try(expo$a <- substr(expo$a, 1, short_num))}
181 |     try(expo<-expo$a)
182 |     try(expo<-gsub(":","_",expo))
183 |     try(expo<-gsub("-","_",expo))
184 |     try(expo<-gsub(" ","_",expo))
185 |     try(expo<-gsub(",","_",expo))
186 |     try(expo<-gsub("/","_",expo))
187 |     list[[i]]$exposure<-expo[1]
188 |     
189 |   }
190 |   
191 |   
192 |   return(list)
193 | }
194 | 
195 | # read
196 | read_vcf_getmr<-function(file_name,nThread = 8,type=".gz"){
197 |   name<-file_name
198 |   
199 |   for(i in 1:nrow(name)){
200 |     dat<-read_sumstats(paste0("./",name[i]),nThread = nThread,nrow=Inf,standardise_headers = FALSE,mapping_file = sumstatsColHeaders)
201 |     
202 |     vroom_write(dat,paste0(name[i],type))
203 |     
204 |     gc()
205 |   }
206 |   print(i)
207 |   
208 | }
209 | 
210 | read_easy<-function(file_name,pval=5e-08){
211 |   library(data.table)
212 |   dat<-fread(file_name)
213 |   dat<-subset(dat,pval.exposure<pval)
214 |   return(dat)
215 | }
216 | 
217 | 
218 | # get
219 | get_eaf_from_1000G<-function(dat,path,type="exposure"){
220 |   
221 |   corrected_eaf_expo<-function(data_MAF){
222 |     effect=data_MAF$effect_allele.exposure
223 |     other=data_MAF$other_allele.exposure
224 |     A1=data_MAF$A1
225 |     A2=data_MAF$A2
226 |     MAF_num=data_MAF$MAF
227 |     EAF_num=1-MAF_num
228 |     
229 |     harna<-is.na(data_MAF$A1)
230 |     harna<-data_MAF$SNP[which(harna==T)]
231 |     
232 |     cor1<-which(data_MAF$effect_allele.exposure !=data_MAF$A1)
233 |     
234 |     
235 |     data_MAF$eaf.exposure=data_MAF$MAF
236 |     data_MAF$type="raw"
237 |     data_MAF$eaf.exposure[cor1]=EAF_num[cor1]
238 |     data_MAF$type[cor1]="corrected"
239 |     cor2<-which(data_MAF$other_allele.exposure ==data_MAF$A1)
240 |     cor21<-setdiff(cor2,cor1)
241 |     cor12<-setdiff(cor1,cor2)
242 |     error<-c(cor12,cor21)
243 |     data_MAF$eaf.exposure[error]=NA
244 |     data_MAF$type[error]="error"
245 |     
246 |     data_MAF<-list(data_MAF=data_MAF,cor1=cor1,harna=harna,error=error)
247 |     
248 |     return(data_MAF)
249 |     
250 |   }
251 |   
252 |   corrected_eaf_out<-function(data_MAF){
253 |     effect=data_MAF$effect_allele.outcome
254 |     other=data_MAF$other_allele.outcome
255 |     A1=data_MAF$A1
256 |     A2=data_MAF$A2
257 |     MAF_num=data_MAF$MAF
258 |     EAF_num=1-MAF_num
259 |     
260 |     harna<-is.na(data_MAF$A1)
261 |     harna<-data_MAF$SNP[which(harna==T)]
262 |     
263 |     cor1<-which(data_MAF$effect_allele.outcome !=data_MAF$A1)
264 |     
265 |     
266 |     data_MAF$eaf.outcome=data_MAF$MAF
267 |     data_MAF$type="raw"
268 |     data_MAF$eaf.outcome[cor1]=EAF_num[cor1]
269 |     data_MAF$type[cor1]="corrected"
270 |     cor2<-which(data_MAF$other_allele.outcome ==data_MAF$A1)
271 |     cor21<-setdiff(cor2,cor1)
272 |     cor12<-setdiff(cor1,cor2)
273 |     error<-c(cor12,cor21)
274 |     data_MAF$eaf.outcome[error]=NA
275 |     data_MAF$type[error]="error"
276 |     
277 |     data_MAF<-list(data_MAF=data_MAF,cor1=cor1,harna=harna,error=error)
278 |     
279 |     return(data_MAF)
280 |     
281 |   }
282 |   
283 |   if(type=="exposure" && (is.na(dat$eaf.exposure[1])==T || is.null(dat$eaf.exposure)==T)){
284 |     r<-nrow(dat)
285 |     
286 |     setwd(path)
287 |     MAF<-fread("fileFrequency.frq",header = T)
288 |     
289 |     dat<-merge(dat,MAF,by.x = "SNP",by.y = "SNP",all.x = T)
290 |     
291 |     dat<-corrected_eaf_expo(dat)
292 |     
293 |     cor1<-dat$cor1
294 |     
295 |     harna<-dat$harna
296 |     
297 |     error<-dat$error
298 |     
299 |     dat<-dat$data_MAF
300 |     
301 |     print(paste0("一共有",(r-length(harna)-length(error)),"个SNP成功匹配EAF,占比",(r-length(harna)-length(error))/r*100,"%"))
302 |     
303 |     print(paste0("一共有",length(cor1),"个SNP是major allele，EAF被计算为1-MAF,在成功匹配数目中占比",length(cor1)/(r-length(harna)-length(error))*100,"%"))
304 |     
305 |     print(paste0("一共有",length(harna),"个SNP在1000G中找不到，占比",length(harna)/r*100,"%"))
306 |     
307 |     print(paste0("一共有",length(error),"个SNP在输入数据与1000G中效应列与参照列，将剔除eaf，占比",length(error)/r*100,"%"))
308 |     
309 |     print("输出数据中的type列说明：")
310 |     print("raw：EAF直接等于1000G里的MAF数值，因为效应列是minor allele")
311 |     print('corrected：EAF等于1000G中1-MAF，因为效应列是major allele')
312 |     print("error：输入数据与1000G里面提供的数据完全不一致，比如这个SNP输入的效应列是C，参照列是G，但是1000G提供的是A-T，这种情况下，EAF会被清空（NA），当成匹配失败")
313 |     
314 |     return(dat)
315 |   }
316 |   
317 |   if(type=="outcome" && (is.na(dat$eaf.outcome[1])==T || is.null(dat$eaf.outcome)==T)){
318 |     r<-nrow(dat)
319 |     
320 |     setwd(path)
321 |     MAF<-fread("fileFrequency.frq",header = T)
322 |     
323 |     dat<-merge(dat,MAF,by.x = "SNP",by.y = "SNP",all.x = T)
324 |     
325 |     dat<-corrected_eaf_out(dat)
326 |     
327 |     cor1<-dat$cor1
328 |     
329 |     harna<-dat$harna
330 |     
331 |     error<-dat$error
332 |     
333 |     dat<-dat$data_MAF
334 |     
335 |     print(paste0("一共有",(r-length(harna)-length(error)),"个SNP成功匹配EAF,占比",(r-length(harna)-length(error))/r*100,"%"))
336 |     
337 |     print(paste0("一共有",length(cor1),"个SNP是major allele，EAF被计算为1-MAF,在成功匹配数目中占比",length(cor1)/(r-length(harna)-length(error))*100,"%"))
338 |     
339 |     print(paste0("一共有",length(harna),"个SNP在1000G找不到，占比",length(harna)/r*100,"%"))
340 |     
341 |     print(paste0("一共有",length(error),"个SNP在输入数据与1000G中效应列与参照列，将剔除eaf，占比",length(error)/r*100,"%"))
342 |     
343 |     print("输出数据中的type列说明：")
344 |     print("raw：EAF直接等于1000G里的MAF数值，因为效应列是minor allele")
345 |     print('corrected：EAF等于1000G中1-MAF，因为效应列是major allele')
346 |     print("error：输入数据与1000G里面提供的数据完全不一致，比如这个SNP输入的效应列是C，参照列是G，但是1000G提供的是A-T，这种情况下，EAF会被清空（NA），当成匹配失败")
347 |     
348 |     return(dat)
349 |   }
350 |   else{return(dat)}
351 | }
352 | 
353 | get_chr_pos<-function(dat,type="exposure"){
354 |   if(type=="exposure"){
355 |     if(is.na(dat$eaf.exposure[1])==T || is.null(dat$eaf.exposure)==T){
356 |       print("需要先运行get_eaf_from_1000G来匹配eaf，再匹配chr和pos")
357 |     }
358 |     else{
359 |       dat_d<-dat%>%select(SNP,effect_allele.exposure,other_allele.exposure,eaf.exposure,beta.exposure,se.exposure,pval.exposure)
360 |       colnames(dat_d) <-c('SNP','A1','A2','FRQ','BETA', 'SE','P')
361 |       dat_d<-format_sumstats(dat_d,ref_genome = "GRCh37",return_data = TRUE)
362 |       dat_d<-format_data(dat_d,type=type,
363 |                          snp_col = 'SNP',
364 |                          chr_col = 'CHR',
365 |                          pos_col = 'BP',
366 |                          effect_allele_col = 'A1',
367 |                          other_allele_col = "A2",
368 |                          se_col = 'SE',
369 |                          beta_col= 'BETA',
370 |                          eaf_col = 'FRQ',
371 |                          pval_col = "P"
372 |       )
373 |       dat_d<-dat_d%>%select(SNP,chr.exposure,pos.exposure)
374 |       dat<-merge(dat,dat_d,by="SNP",all.x=T)
375 |       
376 |       return(dat)
377 |     }
378 |   }
379 |   if(type=="outcome"){
380 |     if(is.na(dat$eaf.outcome[1])==T || is.null(dat$eaf.outcome)==T){
381 |       print("需要先运行get_eaf_from_1000G来匹配eaf，再匹配chr和pos")
382 |     }
383 |     else{
384 |       dat_d<-dat%>%select(SNP,effect_allele.outcome,other_allele.outcome,eaf.outcome,beta.outcome,se.outcome,pval.outcome)
385 |       colnames(dat_d) <-c('SNP','A1','A2','FRQ','BETA', 'SE','P')
386 |       dat_d<-format_sumstats(dat_d,ref_genome = "GRCh37",return_data = TRUE)
387 |       dat_d<-format_data(dat_d,type=type,
388 |                          snp_col = 'SNP',
389 |                          chr_col = 'CHR',
390 |                          pos_col = 'BP',
391 |                          effect_allele_col = 'A1',
392 |                          other_allele_col = "A2",
393 |                          se_col = 'SE',
394 |                          beta_col= 'BETA',
395 |                          eaf_col = 'FRQ',
396 |                          pval_col = "P"
397 |       )
398 |       dat_d<-dat_d%>%select(SNP,chr.outcome,pos.outcome)
399 |       dat<-merge(dat,dat_d,by="SNP",all.x=T)
400 |       
401 |       return(dat)
402 |     }
403 |     
404 |   }
405 | }
406 | 
407 | get_f<-function(dat,F_value=10){
408 |   log<-is.na(dat$eaf.exposure)
409 |   log<-unique(log)
410 |   if(length(log)==1)
411 |   {if(log==TRUE){
412 |     print("数据不包含eaf，无法计算F统计量")
413 |     return(dat)}
414 |   }
415 |   if(is.null(dat$beta.exposure[1])==T || is.na(dat$beta.exposure[1])==T){print("数据不包含beta，无法计算F统计量")
416 |     return(dat)}
417 |   if(is.null(dat$se.exposure[1])==T || is.na(dat$se.exposure[1])==T){print("数据不包含se，无法计算F统计量")
418 |     return(dat)}
419 |   if(is.null(dat$samplesize.exposure[1])==T || is.na(dat$samplesize.exposure[1])==T){print("数据不包含samplesize(样本量)，无法计算F统计量")
420 |     return(dat)}
421 |   
422 |   
423 |   if("FALSE"%in%log && is.null(dat$beta.exposure[1])==F && is.na(dat$beta.exposure[1])==F && is.null(dat$se.exposure[1])==F && is.na(dat$se.exposure[1])==F && is.null(dat$samplesize.exposure[1])==F && is.na(dat$samplesize.exposure[1])==F){
424 |     R2<-(2*(1-dat$eaf.exposure)*dat$eaf.exposure*(dat$beta.exposure^2))/((2*(1-dat$eaf.exposure)*dat$eaf.exposure*(dat$beta.exposure^2))+(2*(1-dat$eaf.exposure)*dat$eaf.exposure*(dat$se.exposure^2)*dat$samplesize.exposure))
425 |     F<- (dat$samplesize.exposure-2)*R2/(1-R2)
426 |     dat$R2<-R2
427 |     dat$F<-F
428 |     dat<-subset(dat,F>F_value)
429 |     return(dat)
430 |   }
431 | }
432 | 
433 | 
434 | # MR
435 | cause_getmr<-function(expo,outcome,LD_file,r2=0.001,
436 |                         kb=10000,pval=1e-05,cl=NULL){
437 |   format_cause_expo<-function(dat){
438 |     library(cause)
439 |     dat<-gwas_format(dat,snp='SNP',beta_hat ='beta.exposure',
440 |                      se='se.exposure',A1="effect_allele.exposure",
441 |                      A2="other_allele.exposure",chrom='chr.exposure',
442 |                      pos="pos.exposure",p_value = 'pval.exposure'
443 |     )
444 |     return(dat)
445 |   }
446 |   format_cause_out<-function(dat){
447 |     library(cause)
448 |     dat<-gwas_format(dat,snp='SNP',beta_hat ='beta.outcome',
449 |                      se='se.outcome',A1="effect_allele.outcome",
450 |                      A2="other_allele.outcome",chrom='chr.outcome',
451 |                      pos="pos.outcome",p_value = 'pval.outcome'
452 |     )
453 |     return(dat)
454 |   }
455 |   sample_cause<-function(dat,num_snp){
456 |     set.seed(123)
457 |     VAR<-with(dat,sample(snp,size=num_snp,replace=FALSE))
458 |     return(VAR)
459 |   }
460 |   datap_cause<-function(dat){
461 |     dat$p1<-pnorm(abs(dat$beta_hat_1/dat$seb1),lower.tail=F)*2
462 |     return(dat)
463 |   }
464 |   dat_clump_cause<-function(dat){
465 |     for_clump<-data.frame(dat$snp,dat$p1)
466 |     colnames(for_clump)<-c('rsid','pval')
467 |     return(for_clump)
468 |   }
469 |   datap_cause<-function(dat){
470 |     dat$p1<-pnorm(abs(dat$beta_hat_1/dat$seb1),lower.tail=F)*2
471 |     return(dat)
472 |   }
473 |   ld_local<-function(dat,r2,kb,p,LD_file){
474 |     library(plinkbinr)
475 |     
476 |     plink_pathway<-get_plink_exe()
477 |     
478 |     dat<-ld_clump(dat,clump_r2 = r2,clump_kb = kb,
479 |                   clump_p = p,
480 |                   plink_bin =plink_pathway , bfile =LD_file)
481 |     
482 |     return(dat)
483 |   }
484 |   if("list"%in%class(outcome)){
485 |     single_expo=T
486 |     single_outcome=F}
487 |   if("list"%in%class(expo)){
488 |     single_outcome=T
489 |     single_expo=F}
490 |   if(single_expo==T){
491 |     outcome<-pblapply(outcome,FUN=format_cause_out,cl=cl)
492 |     expo<-format_cause_expo(expo)
493 |   }
494 |   if(single_outcome==T){
495 |     expo<-pblapply(expo,FUN=format_cause_expo,cl=cl)
496 |     outcome<-format_cause_out(outcome)
497 |   }
498 |   
499 |   
500 |   if(single_outcome==T){dat<-pblapply(expo,outcome,FUN=gwas_merge,cl=cl)}
501 |   if(single_expo==T){dat<-pblapply(outcome,expo,
502 |                                    FUN=function(outcome,exposure)gwas_merge(exposure,outcome),
503 |                                    cl=cl)}
504 |   
505 |   
506 |   VAR<-pblapply(dat,1000000,FUN=sample_cause)
507 |   
508 |   est<-foreach(i=1:length(VAR)) %do%{
509 |     library(cause)
510 |     data<-est_cause_params(dat[[i]],VAR[[i]])
511 |   }
512 |   dat<-pblapply(dat,FUN=datap_cause)
513 |   for_clump<-lapply(dat, dat_clump_cause)
514 |   clumped<-pblapply(for_clump,0.001,10000,1e-05,LD_file,FUN=ld_local)
515 |   top_vars<-pblapply(clumped,FUN=function(dat) return(dat$rsid))
516 |   cause_res<-list()
517 |   for(i in 1:length(dat)){
518 |     library(cause)
519 |     res<-cause(dat[[i]],est[[i]],top_vars[[i]])
520 |     cause_res[[i]]<-res
521 |     print(i)
522 |   }
523 |   cause_table<-data.frame()
524 |   for (i in 1:length(cause_res)){
525 |     res<-cause_res[[i]]
526 |     elpd<-res$elpd
527 |     elpd$p<-pnorm(-elpd$z,lower.tail = F)
528 |     name<-paste0('file',i)
529 |     elpd$file<-name
530 |     cause_table<-rbind(cause_table,elpd)
531 |   }
532 |   return(cause_table)
533 | }
534 | 
535 | # RAPS
536 | RAPS_getmr<-function(dat,dir_figure){
537 |   setwd(dir_figure)
538 |   res<-try(mr.raps(dat,over.dispersion = TRUE))
539 |   if(class(res)%in%"try-error"){}
540 |   else{
541 |     exposure<-dat$id.exposure
542 |     dir_create(exposure)
543 |     dir_in<-paste0(dir_figure,'/',exposure)
544 |     setwd(dir_in)
545 |     #plot_name<-paste0(exposure,'raps.pdf')
546 |     ggsave(file='raps_plot.pdf',plot=plot(res),width=9,height=5)
547 |     res<-data.frame(beta.raps=res$beta.hat,se.raps=res$beta.se,
548 |                     eov=res$tau2.hat,se.eov=res$tau2.se,OR.raps=NA,
549 |                     or_lci95.raps=NA,or_uci95.raps=NA)
550 |     res$OR.raps<- exp(res$beta.raps)
551 |     res$or_lci95.raps<-exp(res$beta.raps)-(res$beta.raps*1.96)
552 |     res$or_uci95.raps<-exp(res$beta.raps)+(res$beta.raps*1.96)
553 |     res$pval.raps<-2*pnorm(abs(res$beta.raps/res$se.raps),lower.tail=F) 
554 |     res$pval.eov<-2*pnorm(abs(res$eov/res$se.eov),lower.tail=F)
555 |     if(is.na(res$pval.eov)==FALSE){
556 |       if(res$pval.eov >0.05 ){
557 |         res1<-mr.raps(dat,over.dispersion = F)
558 |         setwd(dir_in)
559 |         #plot_name<-paste0(exposure,'raps.pdf')
560 |         ggsave(file='raps_plot.pdf',plot=plot(res1),width=9,height=5)
561 |         res1<-data.frame(beta.raps=res1$beta.hat,se.raps=res1$beta.se,
562 |                          eov=NA,se.eov=NA)
563 |         res1$eov<-res$eov
564 |         res1$se.eov<-res$se.eov
565 |         res1$OR.raps<- exp(res1$beta.raps)
566 |         res1$or_lci95.raps<-exp(res1$beta.raps)-(res1$beta.raps*1.96)
567 |         res1$or_uci95.raps<-exp(res1$beta.raps)+(res1$beta.raps*1.96)
568 |         res1$pval.raps<-2*pnorm(abs(res1$beta.raps/res1$se.raps),lower.tail=F) 
569 |         res1$pval.eov<-2*pnorm(abs(res1$eov/res1$se.eov),lower.tail=F)
570 |         
571 |         res<-res1
572 |       }
573 |     }
574 |     
575 |     return(res)
576 |   }
577 | }
578 | 
579 | mr_dircreate_base<-function(root_dir,project_name,date=NULL){  
580 |   library(fs)
581 |   dir_name<-root_dir
582 |   setwd(dir_name)
583 |   
584 |   if(is.null(date)==FALSE){data_u<-date}else{data_u<-Sys.Date()}
585 |   
586 |   dir_name2<-project_name
587 |   dir_name3<-paste0(dir_name2,data_u)
588 |   dir_create(dir_name3)
589 |   setwd(paste0(dir_name,"/",dir_name3))
590 |   dir_name4<-"1.figure"
591 |   dir_name5<-"2.table"
592 |   dir_name6<-"3.figure of sig res"
593 |   dir_name7<-"4.snp with Fval"
594 |   
595 |   paste<-paste0(dir_name,"/",dir_name3,"/")
596 |   
597 |   dir1<-paste0(paste,dir_name4)
598 |   dir.create(dir1)
599 |   
600 |   dir2<-paste0(paste,dir_name5)
601 |   dir.create(dir2)
602 |   
603 |   dir3<-paste0(paste,dir_name6)
604 |   dir.create(dir3)
605 |   
606 |   dir4<-paste0(paste,dir_name7)
607 |   dir.create(dir4)
608 |   
609 |   res<-list(paste=paste,dir1=dir1,dir2=dir2,dir3=dir3,dir4=dir4)
610 |   
611 |   return(res)
612 | }
613 | 
614 | 
615 | 
616 | # PRESSO
617 | mr_Presso<-function(dat,num=10000){
618 |   library(TwoSampleMR)
619 |   library(MRPRESSO)
620 |   library(dplyr)
621 |   set.seed(123)
622 |   try (mr_presso_res<-mr_presso(BetaOutcome ="beta.outcome", BetaExposure = "beta.exposure", SdOutcome ="se.outcome", SdExposure = "se.exposure", 
623 |                                 OUTLIERtest = TRUE,DISTORTIONtest = TRUE, data = dat,  
624 |                                 SignifThreshold = 0.05,NbDistribution = num))
625 |   return(mr_presso_res)
626 |   
627 | }
628 | mr_presso_pval<-function(mr_presso_res){ 
629 |   try ( mr_presso_main<-mr_presso_res$`Main MR results`)
630 |   try ( mr_presso_main[3:5,]<-NA) 
631 |   return(mr_presso_main)
632 | }
633 | 
634 | 
635 | mr_presso_snp<-function(mr_presso_res,mr_presso_main,dat,type="list"){
636 |   data_re<-list()
637 |   if(type=="list"){
638 |     for(i in 1:length(mr_presso_res)){
639 |       res<-mr_presso_res$`MR-PRESSO results`[[i]]
640 |       main<-mr_presso_main[[i]]
641 |       data<-dat[[i]]
642 |       try(if(is.na(main[2,6])==FALSE){
643 |         outliers<-which(res$`Outlier Test`$Pvalue<0.05)
644 |         data$mr_keep[outliers]<-FALSE
645 |       })
646 |       data_re[[i]]<-data
647 |       names(data_re)[[i]]<-names(dat)[[i]]
648 |     }
649 |     return(data_re)
650 |   }
651 |   
652 |   if(type=="data"){
653 |     res<-mr_presso_res$`MR-PRESSO results`
654 |     main<-mr_presso_main
655 |     data<-dat
656 |     try(if(is.na(main[2,6])==FALSE){
657 |       outliers<-which(res$`Outlier Test`$Pvalue<0.05)
658 |       data$mr_keep[outliers]<-FALSE
659 |     })
660 |     return(data)
661 |   }
662 | }
663 | 
664 | #MRlap
665 | 
666 | mr_lap<-function(expo,outcome,ld,hm3,pval,r2,kb,MR_reverse=1e-03,save_logfiles=F){
667 |   expo<-expo%>%select(rsid=SNP,chr=chr.exposure,pos=pos.exposure,alt=effect_allele.exposure,
668 |                       ref=other_allele.exposure,N=samplesize.exposure,beta=beta.exposure,
669 |                       se=se.exposure)
670 |   expo<-as.data.frame(expo)
671 |   outcome<-outcome%>%select(rsid=SNP,chr=chr.outcome,pos=pos.outcome,alt=effect_allele.outcome,
672 |                             ref=other_allele.outcome,N=samplesize.outcome,beta=beta.outcome,
673 |                             se=se.outcome)
674 |   outcome<-as.data.frame(outcome)
675 |   n_expo<-expo$exposure[1]
676 |   n_out<-outcome$outcome[1]
677 |   
678 |   try(res<-MRlap::MRlap(exposure = expo,
679 |                         exposure_name = n_expo,
680 |                         outcome = outcome,
681 |                         outcome_name = n_out,
682 |                         ld = ld,
683 |                         hm3 = hm3,MR_threshold=pval,MR_pruning_dist=kb,
684 |                         MR_pruning_LD=r2,MR_reverse=MR_reverse,save_logfiles=save_logfiles
685 |   ))
686 |   try(snp<-data.frame(SNP_MRlap=res$MRcorrection$IVs))
687 |   try(res_c<-as.data.frame(res$MRcorrection[-4])%>%select(nsnp=m_IVs,
688 |                                                           beta_MRlap=corrected_effect,
689 |                                                           se_MRlap=corrected_effect_se,
690 |                                                           p_MRlap=corrected_effect_p))
691 |   try(res<-c(res,res_c,snp))
692 |   
693 |   try(return(res))
694 | }
695 | 
696 | 
697 | # clean
698 | clean_expo<-function(expo,pval,low_af=0.5,high_af=0.5,
699 |                      clump=TRUE,kb=10000,r2=0.001,LD_file=NULL,af_filter=FALSE){
700 |   library(TwoSampleMR)
701 |   dat<-subset(expo,pval.exposure<pval)
702 |   if(af_filter==TRUE){dat<-subset(dat,eaf.exposure<low_af|eaf.exposure>high_af)}
703 |   
704 |   if(clump==TRUE){
705 |     
706 |     if(is.null(LD_file)==TRUE){
707 |       dat<-clump_data(dat,clump_kb = kb,clump_r2 = r2)}
708 |     else{
709 |       library(plinkbinr)
710 |       plink_pathway<-get_plink_exe()
711 |       snp<-data.frame(rsid=dat$SNP,pval=dat$pval.exposure)
712 |       snp<-try(ld_clump(snp,clump_kb = kb,clump_r2 = r2,plink_bin =plink_pathway , bfile =LD_file))
713 |       if ("try-error"%in% class(snp)){return(dat)}
714 |       else{
715 |         snp<-data.frame(SNP=snp$rsid)
716 |         dat<-merge(snp,dat,by.x="SNP",by.y='SNP')
717 |       }
718 |     }
719 |     
720 |   }
721 |   return(dat)
722 | }
723 | 
724 | clean_list<-function(list,nrow=10){
725 |   l<-lapply(list,nrow)
726 |   n<-data.frame()
727 |   for(i in 1:length(l)){
728 |     if(is.null(l[[i]])==T || l[[i]]==0 )next
729 |     
730 |     n1<-data.frame(i,l[[i]])
731 |     n<-rbind(n,n1)
732 |   }
733 |   colnames(n)<-c('l','row')
734 |   n<-subset(n,row>nrow)
735 |   list<-list[n$l]
736 |   return(list)
737 | }
738 | 
739 | clean_IV_from_outsig<-function(dat,MR_reverse=1e-03){
740 |   dat<-subset(dat,pval.outcome>MR_reverse)
741 |   return(dat)
742 | }
743 | 
744 | 
745 | LDSC_rg<-function(expo,outcome,an,sample_prev=NA,
746 |                   population_prev=NA,ld,wld,chr_filter=c(1:22),n_blocks=200){
747 |   id.o<-outcome$id.outcome[1]
748 |   id.e<-expo$id.exposure[1]
749 |   
750 |   expo<-expo%>%mutate(Z=beta.exposure/se.exposure)
751 |   expo<-expo%>%select(SNP=SNP,N=samplesize.exposure,Z=Z
752 |                       ,A1=effect_allele.exposure
753 |                       ,A2=other_allele.exposure)
754 |   expo<-as_tibble(expo)
755 |   
756 |   outcome<-outcome%>%mutate(Z=beta.outcome/se.outcome)
757 |   outcome<-outcome%>%select(SNP=SNP,N=samplesize.outcome,Z=Z
758 |                             ,A1=effect_allele.outcome
759 |                             ,A2=other_allele.outcome)
760 |   outcome<-as_tibble(outcome)
761 |   
762 |   
763 |   dat<-list(expo,outcome)
764 |   names(dat)<-c(id.e,id.o)
765 |   
766 |   rm(expo,outcome)
767 |   
768 |   
769 |   res<-try(ldscr::ldsc_rg(dat,ancestry = an,sample_prev=sample_prev,
770 |                           population_prev=population_prev,ld=ld,wld=wld,
771 |                           n_blocks=n_blocks,chr_filter=chr_filter))
772 |   
773 |   return(res)
774 |   
775 | }
776 | 
777 | 


--------------------------------------------------------------------------------
/README.md:
--------------------------------------------------------------------------------
   1 | # Get_MR
   2 | 
   3 | 已失效的网盘链接已重新更新如下：https://pan.baidu.com/s/1yyYA4cVHKnyCce9YSPRA9A?pwd=ipvl 
   4 | 提取码：ipvl
   5 | 
   6 | # Get_MR2.0
   7 | 
   8 | ## 更新说明
   9 | ### 5.7 ： 
  10 | 1. 修复cyclemr函数中clean函数bug
  11 | 2. 补充说明，get_outcome函数修改了TwoSampleMR包原函数的默认值，将maf阈值调整为0.4，并不获取proxy（默认值为0.3与获取proxy）。由于这个函数主要是为了方便批量预实验（添加了错误防停止运行机制），因此建议正式做数据采用官方函数。
  12 | 
  13 | ## 写在前面
  14 | 
  15 | **欢迎来到向量化与并行化的世界** 本次更新就一个重大功能，就是并行化运行mr分析，以最优的效率批量跑大量的数据。家用电脑较新服务器基本可以实现2小时批量运行10000个因素，如果你有高性能服务器，恭喜你，30分钟以内就能跑完。
  16 | 
  17 | 
  18 | 
  19 | 由于本次主要是思路与方法的分享，所以函数的帮助文档写的不多，主要还是看示例代码即可，应该还是很容易上手的。
  20 | 
  21 | 
  22 | 
  23 | **公众号回复：“示例”即得示例代码** (不会还没关注我们公众号吧#doge，GetScience, 等你来！)
  24 | 
  25 | 
  26 | 
  27 | ## cyclemr
  28 | 
  29 | ###  描述
  30 | 
  31 | 这个函数是用于执行循mr 分析的功能函数，可以在R语言中使用。该函数可以将数据分配到多个计算节点中运行，提高MR分析的效率。
  32 | 
  33 | ### 用法
  34 | 
  35 | 直接在R语言中调用这个函数，如下所示：
  36 | 
  37 | ```
  38 | # 调用cyclemr函数
  39 | mr_results <- cyclemr(dat = data, cl_num = 4, type = "list")
  40 | ```
  41 | 
  42 | ### 参数
  43 | 
  44 | - `dat`: harmonise_data后的数据，可以是数据框或列表类型。默认是list
  45 | - `cl_num`: 批量化线程数。
  46 | - `type`: MR分析数据类型，可以是"list"或"data"，默认为"list"。主要使用情况也是list
  47 | 
  48 | > 注： 如何判断自己的电脑能开启的最大线程数？
  49 | >
  50 | > 在任务管理器可看到
  51 | >
  52 | > * 内核： 计算机核心数
  53 | >
  54 | > * 逻辑处理器： 计算机总线程数
  55 | >
  56 | >   比如说很多处理器宣传是8核16线程，这个8就指的是内核，这个16指的是逻辑处理器。
  57 | >
  58 | >   本质上，16只是将每个核心一分为2，但是他们能干的活是一样的。所以一般设置为内核数即可满载CPU
  59 | >
  60 | >   当然这不能一概而论，因为每个CPU和厂家调度不一样，如果你发现使用内核数不能让CPU跑到100%，则尝试用逻辑处理器数
  61 | 
  62 | ### 返回值
  63 | 
  64 | - `cyclemr`函数返回一个包含MR分析结果的数据框。
  65 | 
  66 | ### 使用举例
  67 | 
  68 | 下面是使用这个函数的一个示例：
  69 | 
  70 | ```
  71 | mr_results <- cyclemr(dat = data, cl_num = 16, type = "list")
  72 | ```
  73 | 
  74 | 
  75 | 
  76 | ### 运行时间参考
  77 | 
  78 | 在设置无误情况下，这是我手头有的所有电脑测试出的运行时间：
  79 | 
  80 | 运行10000个数据。（ieu批量数据前10000个）除了服务器外，其他均使用Windows系统
  81 | 
  82 | | CPU                                     | 核数（运行时开的线程数） | 时间      |
  83 | | --------------------------------------- | ------------------------ | --------- |
  84 | | i9-12900H（拯救者2022 Y9000P 狂暴模式） | 14核20线程（14）         | 1小时28分 |
  85 | | r7-5700X（台式）                        | 8核16线程（16）          | 1小时38分 |
  86 | | r9-6800H  (yoga2022 14S 性能模式)       | 8核16线程（16）          | 1小时54分 |
  87 | | r5-3500X (台式)                         | 6核12线程 （12）         | 3小时47分 |
  88 | | r5-4600H （拯救者 2020 R7000 狂暴模式） | 6核12线程 （12）         | 约4小时   |
  89 | | 双路 EPYC 7T83 （服务器  Linux）        | 128核256线程（128）      | 11分钟    |
  90 | 
  91 | **欢迎各位补充自己手头的机器的运行时间数据，尤其M系列的苹果处理器数据**
  92 | 
  93 | 
  94 | 
  95 | ## 一些小工具
  96 | 
  97 | ## get_rsid
  98 | 
  99 | ### 描述
 100 | 
 101 | 根据CHR和POS，从ensemble官网中获取rsID。
 102 | 
 103 | ### 用法
 104 | 
 105 | ```
 106 | get_rsid(chr, pos, version = 'hg38')
 107 | ```
 108 | 
 109 | ### 参数
 110 | 
 111 | - `chr`：染色体号。
 112 | - `pos`：基因位置。
 113 | - `version`：表示使用的基因组版本，默认为最新的版本（`'hg38'`）。也可选择hg19.
 114 | - 注： GRCh37=hg19，GRCh38=hg38
 115 | 
 116 | ### 详细说明
 117 | 
 118 | 该函数基于生物信息学数据库Ensembl SNP Mart来查询给定位置的相关信息。如果未指定基因组版本，则默认使用最新的版本(hg38)。该函数会根据指定的基因组版本选择正确的URL。如果您想查询其他版本的数据，可以将`version`参数设置为相应版本的字符串。
 119 | 
 120 | 参考：[How to find rsID with biomaRt in R (bioconductor.org)](https://support.bioconductor.org/p/9135301/)
 121 | 
 122 | ### 使用举例
 123 | 
 124 | ```
 125 | ds4 <- data.frame(CHR = c("8", "8", "8", "8", "8"),POS = c('101592213', '106973048', '108690829', '102569817', '108580746'))
 126 | res<-get_rsid(chr=ds4$CHR, pos=ds4$POS, version = 'hg38')
 127 | ```
 128 | 
 129 | 
 130 | 
 131 | ### 错误说明
 132 | 
 133 | 如果出现：
 134 | 
 135 | ```
 136 | Error in curl::curl_fetch_memory(url, handle = handle) : 
 137 |   Timeout was reached: [grch37.ensembl.org:80] Operation timed out after 300013 milliseconds with 7909 bytes received
 138 | ```
 139 | 
 140 | 这并不是代码问题，而是网络超时了，ensemble的API经常拥堵，多试几次即可。当然也有可能请求的数据量太大，也可能会出现这个问题。
 141 | 
 142 | 
 143 | 
 144 | ##  get_exposure 和 get_outcome
 145 | 
 146 | ### get_outcome有特殊说明，见**更新说明**
 147 | 
 148 | ###  描述
 149 | 
 150 | 这两个函数是用于进行双样本MR（Two-sample Mendelian Randomization）分析的数据处理和提取过程的功能函数。
 151 | 
 152 | - `get_exposure`函数用于从给定的遗传仪器ID中提取出暴露（exposure）数据。
 153 | - `get_outcome`函数用于从给定的遗传仪器ID和暴露数据中提取出结果（outcome）数据。
 154 | 
 155 | ### 用法
 156 | 
 157 | 使用这两个函数前，需要先安装并加载TwoSampleMR包。
 158 | 
 159 | ```
 160 | library(TwoSampleMR)
 161 | ```
 162 | 
 163 | 然后可以直接在R语言中调用这两个函数，如下所示：
 164 | 
 165 | ```
 166 | # 调用get_exposure函数
 167 | exposure_data <- get_exposure(id = "ieu-a-1", pval = 5e-8, r2 = 0.001, kb = 10000)
 168 | 
 169 | # 调用get_outcome函数
 170 | outcome_data <- get_outcome(id = "ieu-a-1", expo = exposure_data)
 171 | ```
 172 | 
 173 | ### 参数
 174 | 
 175 | - `get_exposure`函数的参数说明：
 176 |   - `id`: 遗传仪器ID。
 177 |   - `pval`: 提取暴露数据的P值阈值，默认为5e-08。
 178 |   - `r2`: 遗传仪器间的LD（linkage disequilibrium）值的阈值，默认为0.001。
 179 |   - `kb`: 遗传仪器的范围（以kb为单位），默认为10000。
 180 | - `get_outcome`函数的参数说明：
 181 |   - `id`: 遗传仪器ID。
 182 |   - `expo`: 暴露数据，TwoSampleMR包格式
 183 | 
 184 | 
 185 | 
 186 | ## get_ao
 187 | 
 188 | ### 描述
 189 | 
 190 | 获取OPEN GWAS数据库所有可用的ID。可以指定获取某个前缀的ID
 191 | 
 192 | ### 用法
 193 | 
 194 | 使用这个函数前，需要先安装并加载TwoSampleMR包。然后可以直接在R语言中调用这个函数，如下所示：
 195 | 
 196 | ```
 197 | # 调用get_ao函数
 198 | ao <- get_ao()## 不限定来源则返回所有id
 199 | ao <- get_ao(a = "finn")## 这样就会返回finn的所有可用id
 200 | ```
 201 | 
 202 | ### 参数
 203 | 
 204 | - `a`: （可选）数据来源。
 205 | 
 206 | ### 备注： 来源的名称
 207 | 
 208 | 来自OPEN GWAS [Browse the IEU OpenGWAS project (mrcieu.ac.uk)](https://gwas.mrcieu.ac.uk/datasets/?trait__icontains=12) 5.1获取
 209 | 
 210 | | Batch                                                        | Description                                                  | [Count](https://gwas.mrcieu.ac.uk/datasets/?trait__icontains=12#counts) |
 211 | | ------------------------------------------------------------ | ------------------------------------------------------------ | ------------------------------------------------------------ |
 212 | | [bbj-a](https://gwas.mrcieu.ac.uk/datasets/?gwas_id__icontains=bbj-a) | [Biobank Japan release of disease traits](http://jenger.riken.jp/en/) | 120                                                          |
 213 | | [ebi-a](https://gwas.mrcieu.ac.uk/datasets/?gwas_id__icontains=ebi-a) | [Datasets that satisfy minimum requirements imported from the EBI database of complete GWAS summary data](https://www.ebi.ac.uk/gwas/downloads/summary-statistics) | 2,585                                                        |
 214 | | [eqtl-a](https://gwas.mrcieu.ac.uk/datasets/?gwas_id__icontains=eqtl-a) | [eQTLGen 2019 results, comprising all cis and some trans regions of gene expression in whole blood](https://www.eqtlgen.org/) | 19,942                                                       |
 215 | | [finn-b](https://gwas.mrcieu.ac.uk/datasets/?gwas_id__icontains=finn-b) | [FinnGen biobank analysis round 5](https://www.finngen.fi/)  | 2,803                                                        |
 216 | | [ieu-a](https://gwas.mrcieu.ac.uk/datasets/?gwas_id__icontains=ieu-a) | [GWAS summary datasets generated by many different consortia that have been manually collected and curated, initially developed for MR-Base](https://elifesciences.org/articles/34408) | 440                                                          |
 217 | | [ieu-b](https://gwas.mrcieu.ac.uk/datasets/?gwas_id__icontains=ieu-b) | [GWAS summary datasets generated by many different consortia that have been manually collected and curated, initially developed for MR-Base (round 2)](https://elifesciences.org/articles/34408) | 207                                                          |
 218 | | [met-a](https://gwas.mrcieu.ac.uk/datasets/?gwas_id__icontains=met-a) | [Human blood metabolites analysed by Shin et al 2014](https://www.ncbi.nlm.nih.gov/pubmed/24816252) | 452                                                          |
 219 | | [met-b](https://gwas.mrcieu.ac.uk/datasets/?gwas_id__icontains=met-b) | [Human immune system traits analysed by Roederer et al 2015](https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4393780/) | 150                                                          |
 220 | | [met-c](https://gwas.mrcieu.ac.uk/datasets/?gwas_id__icontains=met-c) | [Circulating metabolites analysed by Kettunen et al 2016](https://www.ncbi.nlm.nih.gov/pubmed/27005778) | 123                                                          |
 221 | | [met-d](https://gwas.mrcieu.ac.uk/datasets/?gwas_id__icontains=met-d) | [Metabolic biomarkers in the UK Biobank measured by Nightingale Health 2020](https://www.ukbiobank.ac.uk/learn-more-about-uk-biobank/news/nightingale-health-and-uk-biobank-announces-major-initiative-to-analyse-half-a-million-blood-samples-to-facilitate-global-medical-research) | 249                                                          |
 222 | | [prot-a](https://gwas.mrcieu.ac.uk/datasets/?gwas_id__icontains=prot-a) | [Complete GWAS summary data on protein levels as described by Sun et al 2018](https://www.ncbi.nlm.nih.gov/pubmed/29875488) | 3,282                                                        |
 223 | | [prot-b](https://gwas.mrcieu.ac.uk/datasets/?gwas_id__icontains=prot-b) | [Complete GWAS summary data on protein levels as described by Folkersen et al 2017](https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5393901/) | 83                                                           |
 224 | | [prot-c](https://gwas.mrcieu.ac.uk/datasets/?gwas_id__icontains=prot-c) | [Complete GWAS summary data on protein levels as described by Suhre et al 2017](https://pubmed.ncbi.nlm.nih.gov/28240269) | 1,124                                                        |
 225 | | [ubm-a](https://gwas.mrcieu.ac.uk/datasets/?gwas_id__icontains=ubm-a) | [Complete GWAS summary data on brain region volumes as described by Elliott et al 2018](https://www.ncbi.nlm.nih.gov/pubmed/30305740) | 3,143                                                        |
 226 | | [ukb-a](https://gwas.mrcieu.ac.uk/datasets/?gwas_id__icontains=ukb-a) | [Neale lab analysis of UK Biobank phenotypes, round 1](http://www.nealelab.is/blog/2017/7/19/rapid-gwas-of-thousands-of-phenotypes-for-337000-samples-in-the-uk-biobank) | 596                                                          |
 227 | | [ukb-b](https://gwas.mrcieu.ac.uk/datasets/?gwas_id__icontains=ukb-b) | [IEU analysis of UK Biobank phenotypes](https://data.bris.ac.uk/data/dataset/pnoat8cxo0u52p6ynfaekeigi) | 2,514                                                        |
 228 | | [ukb-d](https://gwas.mrcieu.ac.uk/datasets/?gwas_id__icontains=ukb-d) | [Neale lab analysis of UK Biobank phenotypes, round 2](http://www.nealelab.is/uk-biobank) | 904                                                          |
 229 | | [ukb-e](https://gwas.mrcieu.ac.uk/datasets/?gwas_id__icontains=ukb-e) | [Pan-ancestry genetic analysis of the UK Biobank performed at the Broad Institute](https://pan.ukbb.broadinstitute.org/) | 3,873                                                        |
 230 | 
 231 | 
 232 | 
 233 | ## clean_outcome_from_exposure
 234 | 
 235 | ### 描述
 236 | 
 237 | （主要用于向量化）用于清洗outcome。将exposure中（list形式，也就是批量化的形式存在的exposure）不存在的SNP从outcome中剔除，大幅精简outcome，并大幅提升harmonise_data的速度。
 238 | 
 239 | **实测清洗与不清洗outcome对比，速度相差一百倍以上。**
 240 | 
 241 | ### 用法
 242 | 
 243 | 直接在R语言中调用这个函数，如下所示：
 244 | 
 245 | ```
 246 | # 调用clean_outcome_from_exposure函数
 247 | cleaned_outcome <- clean_outcome_from_exposure(expo = exposure_data, outcome = outcome_data)
 248 | ```
 249 | 
 250 | ### 参数
 251 | 
 252 | - `expo`: 暴露数据，格式为**list**，TwoSampleMR包格式。比如get_exposure批量化获取下来的数据
 253 | - `outcome`: 结果数据，TwoSampleMR包格式。
 254 | 
 255 | 
 256 | 
 257 | ## clean_GWAS
 258 | 
 259 | ### 描述
 260 | 
 261 | 这个函数是用于清洗遗传关联数据集，使其符合特定的数据集要求的功能函数。
 262 | 
 263 | ### 用法
 264 | 
 265 | 直接在R语言中调用这个函数，如下所示：
 266 | 
 267 | ```
 268 | # 调用clean_GWAS函数
 269 | cleaned_GWAS_list <- clean_GWAS(list = GWAS_list, clean = c("bbj", "eqtl"))
 270 | ```
 271 | 
 272 | ### 参数
 273 | 
 274 | - `list`:一个list。里面包含每个暴露的data.frame。 具体参考批量运行get_exposure后的结果
 275 | - `clean`: 需要清洗的数据集名称，一个字符型向量类型。具体参考get_ao的附注。
 276 | 
 277 | ### 返回值
 278 | 
 279 | - `clean_GWAS`函数返回一个清洗后的遗传关联数据集列表，符合特定数据集要求。
 280 | 
 281 | 
 282 | 
 283 | # Get_MR1.0
 284 | 
 285 | ## 近期更改与提示：
 286 | 
 287 | **4.22**：**（重要）** 修复PRESSO的bug，请使用PRESSO时务必更新代码！！
 288 | 
 289 | **4.19**：注意TwoSampleMR包有一列是mr_keep。请在运行我们get_mr包的所有分析函数前，如果需要使用TwoSampleMR包格式的数据，请将mr_keep状态为false的删掉（如果是没有harmonise之前，列名为mr_keep.exposure/mr_keep.outcome）。因为带有未通过质控的SNP，可能会带来未知的错误！可以参考以下代码
 290 |  ```R
 291 |    
 292 |    dat<-harmonise_data(exposure,outcome)  
 293 |    
 294 |    ## 在运行完harmonise_data后可能会产生mr_keep状态为false的，不能用于MR分析的SNP，我们把他删掉：
 295 |    
 296 |    dat<-subset(dat,mr_keep==T)
 297 |    
 298 |    
 299 |    ## 然后再继续使用各种功能
 300 |    ```
 301 | 
 302 | ## 1. 写在前面：
 303 | 
 304 | ### 1.1 项目地址
 305 | 
 306 | **github：**[HaobinZhou/Get_MR: A package for running MR In batches and in parallel quickly (github.com)](https://github.com/HaobinZhou/Get_MR)
 307 | 
 308 | **如果觉得好用，可以点一下github项目上的小星星吗，这是我们继续开源的最大动力，谢谢！**
 309 | 
 310 | 
 311 | 
 312 | 
 313 | 
 314 | ### 1.2 R包使用方法： 
 315 | 
 316 | **R包以R脚本的形式提供，打开R包，全选运行，即得到所有function**
 317 | 
 318 | 1. 进入github[HaobinZhou/Get_MR: A package for running MR In batches and in parallel quickly (github.com)](https://github.com/HaobinZhou/Get_MR)，下载代码zip
 319 | 
 320 | 2. ```R
 321 |    source("./Get_MR1.0.r") ## 填文件所在地址
 322 |    
 323 |    ## 或者直接打开R文件，全选代码运行也可以！
 324 |    ```
 325 | 
 326 |    
 327 | 
 328 | 
 329 | 
 330 | ### 1.3 常见问题：
 331 | 
 332 | 1. **本地clump，1000G处理好的MAF文件，MRlap依赖文件如何获取**： GetScience公众号可免费获取已处理好文件，回复"依赖文件"即得链接。源文件请查看本文相应function介绍处
 333 | 
 334 | 2. **输入clump文件路径后总是报错**：
 335 | 
 336 |    ```R
 337 |    #尤其注意这个文件名的书写，因为他们是二进制文件，不需要写后缀！只需要选取对应的人种即可，比如欧洲人：
 338 |    LD_file="S:/GWAS数据/本地LD依赖文件/EUR"
 339 |    
 340 |    ## 这个问题我回答好多遍啦！
 341 |    ```
 342 | 
 343 |    
 344 | 
 345 | 3. **第一次使用如何安装关联R包：**[Get_MR/1.0 at main · HaobinZhou/Get_MR (github.com)](https://github.com/HaobinZhou/Get_MR/tree/main/1.0) 
 346 | 
 347 |    1. 如果不需要使用`MungeSumstats`包（相关函数包括：`format_Mun`，`get_chr_pos`，`format_getmr`中`source="ukb_nosnp"`) ，则只需要运行[Get_MR1.0dependence.R](https://github.com/HaobinZhou/Get_MR/blob/main/1.0/Get_MR1.0dependence.R)
 348 |    2. 如果需要使用`MungeSumstats`包，则还需运行[Install_Reference_Genome.r](https://github.com/HaobinZhou/Get_MR/blob/main/1.0/Install_Reference_Genome.r) 这个包括了hg19和hg38的基因组参考文件，总大小达到了5G！**如果直接安装失败，在GetScience公众号回复"基因组参考"可得下载链接，并本地安装**（推荐）
 349 | 
 350 | 4. **Bug反馈**：代码仅由两人编写，难免出现错误。欢迎提交bug到GetScience公众号后台！
 351 | 
 352 | 5. **感谢所有Get_MR使用的R包作者**，是因为他们我们才得以轻松实现这么多复杂的功能。他们都是开源的，因此我们承诺Get_MR将**永久免费开源**。这意味着使用者可以随意地修改，分发代码，但前提是遵守：
 353 | 
 354 |    **1.本代码不得用于任何商业或盈利目的**
 355 | 
 356 |    **2.未经代码作者的同意，本代码不得用于任何形式的销售或商业交易**
 357 | 
 358 |    **3.本代码可以在非商业性的科研、学术研究和个人使用的情况下免费使用**
 359 | 
 360 |    **4.在使用本代码并重新打包并向公众发放时，请引用我们的公众号原文**
 361 | 
 362 |    
 363 | 
 364 | ## 2. 帮助文档目录
 365 | 
 366 | 
 367 | 
 368 | # 进阶MR分析
 369 | 
 370 | ## LDSC_rg
 371 | 
 372 | 用于计算两个数据框中SNP之间的遗传相关性（rg）。
 373 | 
 374 | ### 用法
 375 | 
 376 | ```R
 377 | LDSC_rg(expo, outcome, an, sample_prev = NA, population_prev = NA,
 378 |         ld, wld, chr_filter = c(1:22), n_blocks = 200)
 379 | ```
 380 | 
 381 | ### 参数
 382 | 
 383 | - `expo`: 一个数据框，其中包含一个遗传暴露指标的多个SNP和它们与结果变量的rg。
 384 | - `outcome`: 一个数据框，其中包含一个结果变量的多个SNP和它们与遗传暴露指标的rg。
 385 | - `an`: 它是一个字符串,目前还没有作用（因为我们提供的依赖文件只有eur的，其他人种还没更新）
 386 | - `sample_prev`: 遗传暴露指标的样本流行病学先验患病率。默认为 `NA`。
 387 | - `population_prev`: 遗传暴露指标的人群流行病学先验患病率。默认为 `NA`。
 388 | - `ld`: 本地LD依赖文件
 389 | - `wld`: 本地weighted LD 依赖文件
 390 | - `chr_filter`: 一个整数向量，用于指定要使用的染色体。默认为包含1-22的整数向量。
 391 | - `n_blocks`: 用于计算加权LD矩阵的块数。默认为200。
 392 | 
 393 | ### 返回值
 394 | 
 395 | 一个具有以下元素的列表：
 396 | 
 397 | - `rg`: 两个数据框中SNP之间的遗传相关性（rg）。
 398 | - `pval`: `rg` 的双侧P值。
 399 | - `N_snps`: 参与计算rg的SNP数量。
 400 | 
 401 | ### 示例
 402 | 
 403 | **具体用法参照：mr_lap和LDSC_rg示例.r** 可通过公众号GetScience回复示例获取文件
 404 | 
 405 | 
 406 | 
 407 | ## mr_lap
 408 | 
 409 | ### 描述
 410 | 
 411 | mrlap是一种矫正样本重叠后的双样本MR方法。可用于怀疑有样本重叠的数据中。
 412 | 
 413 | R包官网：[n-mounier/MRlap: R package to perform two-sample Mendelian Randomisation (MR) analyses using (potentially) overlapping samples (github.com)](https://github.com/n-mounier/MRlap)
 414 | 
 415 | ### 语法
 416 | 
 417 | ```R
 418 | mr_lap(expo, outcome, ld, hm3, pval, r2, kb, MR_reverse = 1e-03, save_logfiles = F)
 419 | ```
 420 | 
 421 | 
 422 | 
 423 | ### 参数
 424 | 
 425 | - `expo`: 数据框，为TwoSampleMR包格式的数据
 426 | - `outcome`: 数据框，为TwoSampleMR包格式的数据
 427 | - `ld`: 数据框，本地LD文件路径
 428 | - `hm3`: 数据框，本地HapMap3文件路径 
 429 | - `pval`: 数值，MR 工具变量阈值。
 430 | - `r2`: 数值，clump阈值
 431 | - `kb`: 数值，clump阈值
 432 | - `MR_reverse`: 数值，MR 的方向翻转阈值。
 433 | - `save_logfiles`: 逻辑值，是否保存日志文件。
 434 | 
 435 | ### 值
 436 | 
 437 | - res: mrlap 结果。
 438 | 
 439 | 
 440 | 
 441 | ### 用法
 442 | 
 443 | **具体用法参照：mr_lap和LDSC_rg示例.r** 可通过公众号GetScience回复示例获取文件
 444 | 
 445 | 
 446 | 
 447 | ## cause_getmr函数
 448 | 
 449 | ### 描述
 450 | 
 451 | 一键式执行cause。可批量化执行多暴露对一结局或一暴露对多结局
 452 | 
 453 | ### 用法
 454 | 
 455 | ```R
 456 | ## 不并行化运行
 457 | cause_getmr(expo, outcome, LD_file, r2 = 0.001, kb = 10000, pval = 1e-05)
 458 | 
 459 | ## 并行化运行
 460 | cl<-makeCluster(2) ## 填你想要的并行化的核数，核数越多，需要的运行内存越大
 461 | cause_getmr(expo, outcome, LD_file, r2 = 0.001, kb = 10000, pval = 1e-05,cl=cl)
 462 | ```
 463 | 
 464 | ### 参数
 465 | 
 466 | - `expo`: TwoSampleMR的暴露格式的数据。
 467 | - `outcome`: TwoSampleMR的暴露格式的数据。
 468 | - 注意！expo和outcome，可以是data.frame的形式，也可以是一个list（如list[[1:n]]里都包含数据的data.frame)。但不能outcome和expo同时都是list。当expo或outcome，其中一个为list的情况下，是批量运行一对一的cause。比如我读取了10个暴露和1个结局，将10个暴露lapply读取进来就会是一个list。这时候`cause_cyclemr` 自动运行每个暴露对结局的cause，也就是批量化执行.
 469 | 
 470 | ```R
 471 | ## 比如我这里读取3个暴露文件和1和结局文件
 472 | id<-c('a.gz','b.gz','c.gz')
 473 | expo<-lapply(id,FUN=fread)
 474 | outcome<-fread("outcome.gz")
 475 | cl=makeCluster(4)## 内存不够的也可以不并行化运行
 476 | res<-cause_getmr(expo, outcome, LD_file, r2 = 0.001, kb = 10000, pval = 1e-05,cl=cl)
 477 | stopCluster(cl)
 478 | ## 这样返回的结果就是3个暴露分别对一个结局的cause结果。
 479 | ```
 480 | 
 481 | - `LD_file`: 包含LD信息的PLINK文件名。因为需要大批量地clump，在线clump很容易报错，因此我们采用本地clump。需要本地参考文件。下载地址： http://fileserve.mrcieu.ac.uk/ld/1kg.v3.tgz 。 或关注公众号GetScience直接获取。
 482 | 
 483 | ```R
 484 | #尤其注意这个文件名的书写，因为他们是二进制文件，不需要写后缀！只需要选取对应的人种即可，比如欧洲人：
 485 | LD_file="S:/GWAS数据/本地LD依赖文件/EUR"
 486 | 
 487 | ## 这个问题我回答好多遍啦！
 488 | ```
 489 | 
 490 | - `r2`: LD的R平方阈值。默认值为0.001。
 491 | - `kb`: LD的距离阈值（以kb为单位）。默认值为10000。
 492 | - `pval`: 用于LD clumping的p值阈值。默认值为1e-05。
 493 | - `cl`: 并行计算的cluster对象。默认值为NULL。在外部使用cl=makeCluster(n),n为你想并行化的核数。注意核数太多不要爆内存了。
 494 | 
 495 | ### 值
 496 | 
 497 | `cause_cyclemr`函数返回cause结果
 498 | 
 499 | 
 500 | 
 501 | ## RAPS_getmr
 502 | 
 503 | ### 描述
 504 | 
 505 | `RAPS_getmr`函数执行基于RAPS的MR并返回结果,并画图
 506 | 
 507 | ### 用法
 508 | 
 509 | ```R
 510 | expo<-fread('a.gz')
 511 | outcome<-fread('b.gz')
 512 | expo<-format_data(...)
 513 | outcome<-format_data(...)  ## format_data是TwoSampleMR包的函数，格式化。
 514 | expo<-pblapply(expo,pval=1,kb=10000,r2=0.001,LD_file=LD_file,FUN=clean_expo) ## 数据很大，建议本地clump，在线很容易报错
 515 | dat<-harmonise(expo,outcome)
 516 | res<-RAPS_getmr(dat, dir_figure)
 517 | ```
 518 | 
 519 | ### 参数
 520 | 
 521 | - `dat`: TwoSampleMR包 harmonise_data后输出的数据
 522 | - `dir_figure`: 保存结果图形的目录。
 523 | 
 524 | ### 值
 525 | 
 526 | `RAPS_getmr`函数返回一个包含基于RAPS的MR结果的数据框。
 527 | 
 528 | 
 529 | 
 530 | 
 531 | 
 532 | ## mr_Presso
 533 | 
 534 | ### 描述
 535 | 
 536 | 执行MR-PRESSO
 537 | 
 538 | ### 语法
 539 | 
 540 | ```R
 541 | mr_Presso(dat, num = 10000)
 542 | ```
 543 | 
 544 | 
 545 | 
 546 | ### 参数
 547 | 
 548 | - `dat`: 数据框，包含基因表达和疾病风险关联分析的数据。
 549 | - `num`: 整数，模拟数量。
 550 | 
 551 | ### 值
 552 | 
 553 | - `mr_presso_res`: MR-PRESSO 结果。
 554 | 
 555 | ### 用法
 556 | 
 557 | ```R
 558 | dat<-harmonise_data(exposure,outcome) ## TwoSampleMR包的harmonise_data函数输出的结果
 559 | mr_presso_res <- mr_Presso(dat, num = 10000)
 560 | ```
 561 | 
 562 | 
 563 | 
 564 | ## mr_presso_pval函数
 565 | 
 566 | ### 描述
 567 | 
 568 | 提取 MR-PRESSO 结果中的主要结果
 569 | 
 570 | ### 语法
 571 | 
 572 | ```R
 573 | mr_presso_pval(mr_presso_res)
 574 | ```
 575 | 
 576 | 
 577 | 
 578 | ### 参数
 579 | 
 580 | - `mr_presso_res`: MR-PRESSO 结果。
 581 | 
 582 | ### 值
 583 | 
 584 | - mr_presso_main: MR-PRESSO 主要结果。
 585 | 
 586 | ### 用法
 587 | 
 588 | ```R
 589 | mr_presso_main <- mr_presso_pval(mr_presso_res) ##mr_Presso输出的结果
 590 | ```
 591 | 
 592 | 
 593 | 
 594 | ## mr_presso_snp函数
 595 | 
 596 | ### 描述
 597 | 
 598 | 根据 MR-PRESSO 分析结果，将离群值剔除，返回剔除离群值后的dat(我一般称为dat_aj， 也就是 adjusted_data)， 可用于后续的IVW等分析。
 599 | 
 600 | ### 语法
 601 | 
 602 | ```R
 603 | mr_presso_snp(mr_presso_res, mr_presso_main, dat, type = "list")
 604 | ```
 605 | 
 606 | 
 607 | 
 608 | ### 参数
 609 | 
 610 | - `mr_presso_res`: MR-PRESSO 结果。
 611 | - `mr_presso_main`: MR-PRESSO 主要结果。
 612 | - `dat`: 数据框或数据框列表，包含基因表达和疾病风险关联分析的数据。
 613 | - `type`: 字符串，输入数据类型。可选值为 "list" 或 "data"。如果是列表形式的（批量化运行后的结果），就是`list`，如果是普通数据框就是data
 614 | 
 615 | ### 值
 616 | 
 617 | 过滤后的数据框或数据框列表。
 618 | 
 619 | ### 用法
 620 | 
 621 | ```R
 622 | dat<-harmonise_data(exposure,outcome) ## TwoSampleMR包的harmonise_data函数输出的结果
 623 | mr_presso_res <- mr_Presso(dat, num = 10000)
 624 | mr_presso_main <- mr_presso_pval(mr_presso_res)
 625 | data_aj <- mr_presso_snp(mr_presso_res, mr_presso_main, dat, type = "data")
 626 | 
 627 | ## 用矫正的data可以用于后续的分析，例如重新计算mr
 628 | res_aj<-mr(data_aj)
 629 | ```
 630 | 
 631 | 
 632 | 
 633 | 
 634 | 
 635 | 
 636 | 
 637 | 
 638 | 
 639 | # 快捷预处理及质控工具
 640 | 
 641 | ## format_Mun
 642 | 
 643 | ### 介绍
 644 | 
 645 | 运用MungeSumstats包标准化GWAS 摘要统计数据（包括hg19和hg38转换）。该函数可以将来自Finngen R8和其他来源的 GWAS 摘要统计数据文件清洗为标准的GWAS文件，并可将基因组位置从 `ref_genome` 转换到 `convert_ref_genome`。
 646 | 
 647 | ### 用法
 648 | 
 649 | ```R
 650 | format_Mun(file, source = "finn_r8", save_path = NULL, lift = F, ref_genome = "hg38", convert_ref_genome = "hg19")
 651 | ```
 652 | 
 653 | ### 参数
 654 | 
 655 | - `file`：字符向量或数据框，表示要格式化的 GWAS 摘要统计数据文件或数据框。如果输入的是字符向量，则表示文件的路径。如果输入的是数据框，则表示要格式化的数据框。
 656 | - `source`：字符向量，表示输入文件的来源。默认为 `"finn_r8"`。
 657 | - `save_path`：字符向量，表示格式化文件要保存的路径。默认为 `NULL`。
 658 | - `lift`：逻辑值，表示是否将基因组位置从 `ref_genome` 转换到 `convert_ref_genome`。默认为 `F`。
 659 | - `ref_genome`：字符向量，表示 GWAS 摘要统计数据文件使用的参考基因组。默认为 `"hg38"`。
 660 | - `convert_ref_genome`：字符向量，表示要将基因组位置转换到的参考基因组。默认为 `"hg19"`。
 661 | 
 662 | ### 例子
 663 | 
 664 | ```R
 665 | # 从文件中格式化数据
 666 | format_Mun("my_sumstats.txt", save_path = "~/formatted_sumstats", lift = F, ref_genome = "hg38")
 667 | 
 668 | # 从数据框中格式化数据
 669 | my_sumstats_df <- read.csv("my_sumstats.csv")
 670 | format_Mun(my_sumstats_df, save_path = "~/formatted_sumstats", lift = F, ref_genome = "hg38")
 671 | 
 672 | #格式化数据并升降版本
 673 | format_Mun(my_sumstats_df, save_path = "~/formatted_sumstats", lift = T, ref_genome = "hg38", convert_ref_genome = "hg19") ## 从hg38转为hg19
 674 | ```
 675 | 
 676 | ### 返回值
 677 | 
 678 | 该函数返回格式化的数据框并将其写入磁盘文件。`save_path`指定保存的位置
 679 | 
 680 | 
 681 | 
 682 | 
 683 | 
 684 | ## format_getmr
 685 | 
 686 | ### 介绍
 687 | 
 688 | 预设的快捷格式化 GWAS 摘要统计数据，这个函数用于将来自多个数据来源的 GWAS 摘要统计数据转换为TwoSampleMR 包所需的格式。
 689 | 
 690 | ### 用法
 691 | 
 692 | ```
 693 | format_getmr(data, type = "exposure", source = "finn_r8")
 694 | ```
 695 | 
 696 | ### 参数
 697 | 
 698 | - `data`：数据框，表示要格式化的 GWAS 摘要统计数据。
 699 | - `type`：字符向量，表示数据类型，可以是 "exposure" 或 "outcome"。默认为 "exposure"。
 700 | - `source`：字符向量，表示数据来源。默认为 "finn_r8"。目前支持的来源有：
 701 |   - "finn_r8": [Data download - FinnGen Documentation (gitbook.io)](https://finngen.gitbook.io/documentation/data-download)
 702 |   - "ukb_nosnp": 尼尔数据库（UKB），因为没有rsid，因此需要匹配（已一键完成）。[www.nealelab.is/uk-biobank](http://www.nealelab.is/uk-biobank)
 703 |   - "Mun"： 来自MungeSumstats包格式化后的数据
 704 |   - "covid"： [COVID19-hg GWAS meta-analyses round 7 (covid19hg.org)](https://www.covid19hg.org/results/r7/)
 705 |   - "outcome" ： 已经格式化为TwoSampleMR包的“outcome”格式
 706 |   - "exposure"：已经格式化为TwoSampleMR包的“exposure”格式
 707 |   - "fast_ukb": [fastGWA | Yang Lab (westlake.edu.cn)](https://yanglab.westlake.edu.cn/data/ukb_fastgwa/imp_binary/)
 708 |   - "bac": 2021年肠菌原文数据 [Large-scale association analyses identify host factors influencing human gut microbiome composition - PMC (nih.gov)](https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8515199/)
 709 | 
 710 | ### 例子
 711 | 
 712 | ```R
 713 | my_data <- fread("my_data.gz")
 714 | format_getmr(my_data, type = "finn_r8", source = "Mun")
 715 | ```
 716 | 
 717 | ### 返回值
 718 | 
 719 | 该函数返回格式化的数据框。
 720 | 
 721 | 
 722 | 
 723 | ## format_trait
 724 | 
 725 | ### 介绍
 726 | 
 727 | 这个函数用于格式化 GWAS 摘要统计数据中的表型信息，使其符合命名规范，易于保存为文件（例如批量保存计算R2和F值后的文件）。
 728 | 
 729 | 主要是为了解决，在Windows系统下，保存文件的名称中不能包含特殊字符，例如`:`,`|`。
 730 | 
 731 | ### 用法
 732 | 
 733 | ```R
 734 | format_trait(list, short = FALSE, short_num = "40")
 735 | ```
 736 | 
 737 | ### 参数
 738 | 
 739 | - `list`：列表，表示要格式化的 GWAS 摘要统计数据列表。
 740 | - `short`：逻辑值，表示是否要将表型名称缩短。默认为 FALSE。
 741 | - `short_num`：字符向量，表示缩短表型名称的长度。默认为 "40"。
 742 | 
 743 | ### 例子
 744 | 
 745 | ```R
 746 | my_list <- list(data1, data2, data3)
 747 | format_trait(my_list, short = TRUE, short_num = "20")
 748 | ```
 749 | 
 750 | ### 返回值
 751 | 
 752 | 该函数返回格式化后的 GWAS 摘要统计数据列表。
 753 | 
 754 | 
 755 | 
 756 | 
 757 | 
 758 | ## read_vcf_getmr
 759 | 
 760 | ### 介绍
 761 | 
 762 | 这个函数用于从 VCF 文件中读取摘要统计数据。并保存为压缩文件。默认是.gz为后缀的压缩文件。方便下次读取以及节省空间。
 763 | 
 764 | 这是因为读取VCF文件将消耗大量电脑资源。我们建议批量读取VCF文件后储存为易于读取的压缩包形式。下次读取方便快捷。因此本函数不会直接返回数据框，而是保存为文件
 765 | 
 766 | ### 用法
 767 | 
 768 | ```
 769 | read_vcf_getmr(file_name, nThread = 8, type = ".gz")
 770 | ```
 771 | 
 772 | ### 参数
 773 | 
 774 | - `file_name`：字符向量，表示要读取的 VCF 文件名。
 775 | - `nThread`：整数，表示要使用的线程数。默认为 8。
 776 | - `type`：字符向量，表示输出文件类型。默认为 ".gz"。
 777 | 
 778 | ### 例子
 779 | 
 780 | ```R
 781 | my_file <- "my_file.vcf"
 782 | read_vcf_getmr(my_file, nThread = 4, type = ".gz")
 783 | ```
 784 | 
 785 | ### 返回值
 786 | 
 787 | 该函数没有返回值，而是将读取的数据写入文件。
 788 | 
 789 | 
 790 | 
 791 | ## read_easy
 792 | 
 793 | ### 介绍
 794 | 
 795 | 这个函数用于从文件中读取 GWAS 摘要统计数据。并返回经过P值筛选的文件。一般用于批量读取大量文件时。比如我要批量读取100个暴露数据，每个数据占用运行内存2G。如果100个，则200G，不是一般电脑可以承受。因此每次读取将直接筛选p值，压缩大小
 796 | 
 797 | ### 用法
 798 | 
 799 | ```R
 800 | read_easy(file_name, pval = 5e-08)
 801 | ```
 802 | 
 803 | ### 参数
 804 | 
 805 | - `file_name`：字符向量，表示要读取的文件名。
 806 | - `pval`：数字，表示筛选摘要统计数据的显著性水平。默认为 5e-08。
 807 | 
 808 | ### 例子
 809 | 
 810 | ```R
 811 | my_file <- "my_file.csv"
 812 | read_easy(my_file, pval = 1e-06)
 813 | ```
 814 | 
 815 | ### 返回值
 816 | 
 817 | 该函数返回摘要统计数据的数据框。
 818 | 
 819 | 
 820 | 
 821 | 
 822 | 
 823 | ## get_eaf_from_1000G
 824 | 
 825 | ### 介绍
 826 | 
 827 | 从1000G的MAF文件中提取EAF并将其与输入数据匹配。
 828 | 
 829 | ### 用法
 830 | 
 831 | ```
 832 | get_eaf_from_1000G(dat, path, type = "exposure")
 833 | ```
 834 | 
 835 | ### 参数
 836 | 
 837 | - `dat`：一个数据框，为TwoSampleMR包格式的数据
 838 | - `path`：一个字符串，表示包含1000G MAF文件`fileFrequency.frq`的目录路径。
 839 | - `type`：一个字符串，表示数据是“exposure”（暴露因素）还是“outcome”（结果），默认为“exposure”。
 840 | 
 841 | ### 值
 842 | 
 843 | 一个数据框，其中包含输入数据的EAF和类型信息（原始、修正或错误）。
 844 | 
 845 | ### 详细说明
 846 | 
 847 | 该函数将读取1000G MAF文件，然后将其与输入数据进行匹配。对于每个SNP，该函数将检查输入数据中的效应等位基因与1000G中的效应等位基因是否匹配。
 848 | 
 849 | 如果不匹配，则将EAF设置为NA并将其类型设置为“error”。对于匹配的SNP，EAF将设置为1000G MAF中的值（如果输入数据的效应等位基因是minor allele），或1-MAF（如果输入数据的效应等位基因是major allele），并将其类型设置为“raw”或“corrected”。
 850 | 
 851 | 如果`type`参数设置为“outcome”，则函数将使用输入数据中的结果等位基因来查找EAF。
 852 | 
 853 | 在处理完所有SNP后，该函数将输出一些有关匹配成功、失败以及NA的信息，以及类型信息的说明。
 854 | 
 855 | ### 示例
 856 | 
 857 | 以下是使用该函数的示例：
 858 | 
 859 | ```R
 860 | dat <- get_eaf_from_1000G(dat, "S:/GWAS数据/本地LD依赖文件", type = "exposure")
 861 | 
 862 | # 检查输出
 863 | head(dat)
 864 | ```
 865 | 
 866 | `fileFrequency.frq`为PLINK1.9输出的，根据1000G数据提取的MAF数据
 867 | 
 868 | 1000G参考文件下载地址：http://fileserve.mrcieu.ac.uk/ld/1kg.v3.tgz
 869 | 
 870 | 可自行提取MAF数据。或从`GetScience`公众号中获取已经提取好的`fileFrequency.frq`文件
 871 | 
 872 | 
 873 | 
 874 | ## get_chr_pos
 875 | 
 876 | 该函数利用MungeSumstats包匹配rsid的染色体位置。
 877 | 
 878 | ### 用法
 879 | 
 880 | ```R
 881 | get_chr_pos(dat, type = "exposure")
 882 | ```
 883 | 
 884 | ### 参数
 885 | 
 886 | - `dat`：一个数据框，TwoSampleMR格式
 887 | - `type`：一个字符串，表示要获取SNP染色体位置和参考序列的SNP类型。可选值为"exposure"或"outcome"。
 888 | 
 889 | ### 返回值
 890 | 
 891 | 一个数据框，其中包含输入数据框的信息，以及新列`chr.exposure`或`chr.outcome`，表示每个SNP的染色体编号。新列`pos.exposure`或`pos.outcome`表示每个SNP在染色体上的位置。
 892 | 
 893 | ### 函数说明
 894 | 
 895 | 该函数使用`format_sumstats`和`format_data`函数从1000G项目中获取SNP的染色体位置和参考序列信息。
 896 | 
 897 | ### 示例
 898 | 
 899 | 以下示例演示如何使用`get_chr_pos`函数：
 900 | 
 901 | ```R
 902 | # 获取曝露变量SNP的染色体位置和参考序列
 903 | exposure_chr_pos <- get_chr_pos(dat, type = "exposure")
 904 | 
 905 | # 获取结果变量SNP的染色体位置和参考序列
 906 | outcome_chr_pos <- get_chr_pos(dat, type = "outcome")
 907 | ```
 908 | 
 909 | 
 910 | 
 911 | ## get_f函数
 912 | 
 913 | ### 描述
 914 | 
 915 | `get_f`函数计算样本的F统计量并返回F值大于指定阈值的样本数据。返回的数据包括每个SNP的R2和F值
 916 | 
 917 | ### 用法
 918 | 
 919 | ```R
 920 | get_f(dat, F_value = 10)
 921 | ```
 922 | 
 923 | ### 参数
 924 | 
 925 | - `dat`: TwoSampleMR格式，一定要包含`eaf.exposure`, `beta.exposure`, `se.exposure`, 和`samplesize.exposure`的数据框。
 926 | - `F_value`: 指定的F统计量的阈值，F值大于该阈值的样本将被返回。默认值为10。
 927 | 
 928 | 
 929 | 
 930 | 注，本计算公式为
 931 | 
 932 | F值：![img](A:\OneDrive\GET\assets\clip_image002.gif) R2值：![image-20230327151305186](A:\OneDrive\GET\assets\image-20230327151305186.png)
 933 | 
 934 | 公式参考文献： 
 935 | 
 936 | [A Multivariate Genome-Wide Association Analysis of 10 LDL Subfractions, and Their Response to Statin Treatment, in 1868 Caucasians - PMC (nih.gov)](https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4405269/)
 937 | 
 938 | [Large-scale association analyses identify host factors influencing human gut microbiome composition - PMC (nih.gov)](https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8515199/)
 939 | 
 940 | 
 941 | 
 942 | 
 943 | 
 944 | 
 945 | 
 946 | ### 
 947 | 
 948 | ## mr_dircreate_base
 949 | 
 950 | ### 描述
 951 | 
 952 | `mr_dircreate_base`函数创建基本的目录结构以保存MR分析结果和图形。
 953 | 
 954 | ### 用法
 955 | 
 956 | ```R
 957 | mr_dircreate_base(root_dir, project_name, date = NULL)
 958 | ```
 959 | 
 960 | ### 参数
 961 | 
 962 | - `root_dir`: 保存结果文件夹的根目录。
 963 | - `project_name`: 项目名称，用于在根目录下创建一个以此命名的子目录。
 964 | - `date`: 日期（可选），用于在子目录名称中添加日期以区分不同日期的结果文件夹。默认为`NULL`，表示不添加日期。
 965 | 
 966 | ### 值
 967 | 
 968 | `mr_dircreate_base`函数返回一个包含结果文件夹路径的列表。
 969 | 
 970 | ### 示例
 971 | 
 972 | ```R
 973 | # 创建结果文件夹
 974 | res_dir <- mr_dircreate_base("path/to/root/dir", "project_name", date = "20220327")
 975 | 
 976 | # 打印结果文件夹路径
 977 | print(res_dir)
 978 | ```
 979 | 
 980 | ### 注意事项
 981 | 
 982 | - `root_dir`参数指定保存结果文件夹的根目录。
 983 | - `project_name`参数指定项目名称，用于在根目录下创建一个以此命名的子目录。
 984 | - `date`参数（可选）用于在子目录名称中添加日期以区分不同日期的结果文件夹。默认为`NULL`，表示不添加日期。
 985 | - `mr_dircreate_base`函数将在根目录下创建一个名为`project_name`的子目录，并在该子目录下创建4个子目录，分别命名为`1.figure`、`2.table`、`3.figure of sig res`和`4.snp with Fval`。函数返回一个包含结果文件夹路径的列表。
 986 | 
 987 | 
 988 | 
 989 | 
 990 | 
 991 | 
 992 | 
 993 | ## clean_expo
 994 | 
 995 | ### 描述
 996 | 
 997 | 用于快捷筛选工具变量的函数，可执行P值筛选，EAF值筛选，clump。
 998 | 
 999 | ### 用法
1000 | 
1001 | ```R
1002 | ## 完整可选参数
1003 | clean_expo(expo, pval, low_af = 0.5, high_af = 0.5, clump = TRUE, kb = 10000, r2 = 0.001, LD_file = NULL, af_filter = FALSE)
1004 | 
1005 | ##不提供LD_file则自动在线clump
1006 | clean_expo(expo, pval, clump = TRUE, kb = 10000, r2 = 0.001)
1007 | ##提供则本地clump
1008 | clean_expo(expo, pval, clump = TRUE, kb = 10000, r2 = 0.001，LD_file=LD_file)
1009 | ```
1010 | 
1011 | ### 参数
1012 | 
1013 | - `expo`: 一个数据框，其中包含遗传暴露指标的SNP名称、beta值、标准误、p值和频率。
1014 | - `pval`: 用于筛选遗传暴露指标的p值阈值。p值小于此阈值的SNP将被保留。
1015 | - `low_af`: 频率过滤的下限值。默认为0.5。如果`af_filter`为TRUE，则只有遗传暴露指标的频率低于此值或高于`high_af`时，才会被保留。
1016 | - `high_af`: 频率过滤的上限值。默认为0.5。
1017 | - `clump`: 一个逻辑值，指示是否使用PLINK进行SNP聚类。默认为TRUE。
1018 | - `kb`: 聚类的窗口大小（以kb为单位）。默认为10000。
1019 | - `r2`: LD阈值。默认为0.001。
1020 | - `LD_file`: PLINK二进制文件的路径。如果未提供，则默认在线clump。
1021 | - `af_filter`: 一个逻辑值，指示是否启用频率过滤。默认为FALSE。
1022 | 
1023 | 
1024 | 
1025 | ## clean_list
1026 | 
1027 | ### 描述
1028 | 
1029 | 用于清理列表中元素行数的R包。常用于批量化质量控制。
1030 | 
1031 | ### 用法
1032 | 
1033 | ```R
1034 | clean_list(list, nrow = 10)
1035 | ```
1036 | 
1037 | ### 参数
1038 | 
1039 | - `list`: 一个列表，其中包含多个元素。
1040 | - `nrow`: 用于筛选元素的行数阈值。如果元素的行数小于此阈值，则该元素将被删除。默认为10。
1041 | 
1042 | ### 返回值
1043 | 
1044 | 一个列表，其中仅包含行数大于 `nrow` 的元素。
1045 | 
1046 | ### 例子
1047 | 
1048 | ```
1049 | # 创建一个包含5个数据框的列表，每个数据框包含1-5行
1050 | set.seed(123)
1051 | lst <- list(data.frame(a = rnorm(1), b = rnorm(1)),
1052 |             data.frame(a = rnorm(2), b = rnorm(2)),
1053 |             data.frame(a = rnorm(3), b = rnorm(3)),
1054 |             data.frame(a = rnorm(4), b = rnorm(4)),
1055 |             data.frame(a = rnorm(5), b = rnorm(5)))
1056 | 
1057 | # 运行 clean_list 函数，将阈值设置为2
1058 | cleaned_lst <- clean_list(lst, nrow = 3)
1059 | 
1060 | # 查看清理后的列表
1061 | cleaned_lst
1062 | ```
1063 | 
1064 | 
1065 | 
1066 | 
1067 | 
1068 | ## clean_IV_from_outsig
1069 | 
1070 | 用于从一个数据框中清理具有显著的MR反向因果效应P值的IV。
1071 | 
1072 | ### 用法
1073 | 
1074 | ```R
1075 | clean_IV_from_outsig(dat, MR_reverse = 1e-03)
1076 | ```
1077 | 
1078 | ### 参数
1079 | 
1080 | - `dat`: 一个数据框，其中包含每个IV和其与结果变量之间的MR反向因果效应的P值。
1081 | - `MR_reverse`: 用于筛选IV的MR反向因果效应P值阈值。具有P值小于此阈值的IV将被保留。默认为1e-03。
1082 | 
1083 | ### 返回值
1084 | 
1085 | 一个数据框，其中包含P值大于 `MR_reverse` 值的IV（也就是反向不显著的IVs）。
1086 | 
1087 | 
1088 | 
1089 | 
1090 | ## 使用声明：
1091 | 1. 禁止一切倒卖我们代码的行为。我们承诺已开源代码永久性免费开源，公开可得
1092 | 2. 使用代码前，务必确认代码无误。我们没有利用这个R包发表文献，也没有进行倒卖，或授课。**因此，我们无法确保我们的代码每一环，完全的，彻底的，100%的，没有错误**。我们的代码本质上是分享经验，我们并没有因为这些代码获得任何利益，也没有利用它来当成文章发表，虽然我们会尽力，但我们无法保证彻底的完美。**我们对所有因代码可能存在的错误导致的纠纷不负有任何责任。**
1093 | 3. 我们不会参与任何商业行为和组织，我们没有与任何组织进行合作。任何倒卖行为都是侵权行为，非倒卖的利用行为属于第三方的行为，与我们无关，由第三方对本身内容负责。因代码错误导致的纠纷参考第2条声明。
1094 | 4. 我们分批开源不是利益因素导致的，我们没有将更新的版本的Get_MR进行售卖！我们本意是为了保护内部小伙伴的努力成果，优先使用我们最新开发的前沿功能，万望大家理解！
1095 | 
1096 | 
1097 | 
1098 | 
1099 | 


--------------------------------------------------------------------------------
/LICENSE:
--------------------------------------------------------------------------------
  1 |                     GNU GENERAL PUBLIC LICENSE
  2 |                        Version 3, 29 June 2007
  3 | 
  4 |  Copyright (C) 2007 Free Software Foundation, Inc. <https://fsf.org/>
  5 |  Everyone is permitted to copy and distribute verbatim copies
  6 |  of this license document, but changing it is not allowed.
  7 | 
  8 |                             Preamble
  9 | 
 10 |   The GNU General Public License is a free, copyleft license for
 11 | software and other kinds of works.
 12 | 
 13 |   The licenses for most software and other practical works are designed
 14 | to take away your freedom to share and change the works.  By contrast,
 15 | the GNU General Public License is intended to guarantee your freedom to
 16 | share and change all versions of a program--to make sure it remains free
 17 | software for all its users.  We, the Free Software Foundation, use the
 18 | GNU General Public License for most of our software; it applies also to
 19 | any other work released this way by its authors.  You can apply it to
 20 | your programs, too.
 21 | 
 22 |   When we speak of free software, we are referring to freedom, not
 23 | price.  Our General Public Licenses are designed to make sure that you
 24 | have the freedom to distribute copies of free software (and charge for
 25 | them if you wish), that you receive source code or can get it if you
 26 | want it, that you can change the software or use pieces of it in new
 27 | free programs, and that you know you can do these things.
 28 | 
 29 |   To protect your rights, we need to prevent others from denying you
 30 | these rights or asking you to surrender the rights.  Therefore, you have
 31 | certain responsibilities if you distribute copies of the software, or if
 32 | you modify it: responsibilities to respect the freedom of others.
 33 | 
 34 |   For example, if you distribute copies of such a program, whether
 35 | gratis or for a fee, you must pass on to the recipients the same
 36 | freedoms that you received.  You must make sure that they, too, receive
 37 | or can get the source code.  And you must show them these terms so they
 38 | know their rights.
 39 | 
 40 |   Developers that use the GNU GPL protect your rights with two steps:
 41 | (1) assert copyright on the software, and (2) offer you this License
 42 | giving you legal permission to copy, distribute and/or modify it.
 43 | 
 44 |   For the developers' and authors' protection, the GPL clearly explains
 45 | that there is no warranty for this free software.  For both users' and
 46 | authors' sake, the GPL requires that modified versions be marked as
 47 | changed, so that their problems will not be attributed erroneously to
 48 | authors of previous versions.
 49 | 
 50 |   Some devices are designed to deny users access to install or run
 51 | modified versions of the software inside them, although the manufacturer
 52 | can do so.  This is fundamentally incompatible with the aim of
 53 | protecting users' freedom to change the software.  The systematic
 54 | pattern of such abuse occurs in the area of products for individuals to
 55 | use, which is precisely where it is most unacceptable.  Therefore, we
 56 | have designed this version of the GPL to prohibit the practice for those
 57 | products.  If such problems arise substantially in other domains, we
 58 | stand ready to extend this provision to those domains in future versions
 59 | of the GPL, as needed to protect the freedom of users.
 60 | 
 61 |   Finally, every program is threatened constantly by software patents.
 62 | States should not allow patents to restrict development and use of
 63 | software on general-purpose computers, but in those that do, we wish to
 64 | avoid the special danger that patents applied to a free program could
 65 | make it effectively proprietary.  To prevent this, the GPL assures that
 66 | patents cannot be used to render the program non-free.
 67 | 
 68 |   The precise terms and conditions for copying, distribution and
 69 | modification follow.
 70 | 
 71 |                        TERMS AND CONDITIONS
 72 | 
 73 |   0. Definitions.
 74 | 
 75 |   "This License" refers to version 3 of the GNU General Public License.
 76 | 
 77 |   "Copyright" also means copyright-like laws that apply to other kinds of
 78 | works, such as semiconductor masks.
 79 | 
 80 |   "The Program" refers to any copyrightable work licensed under this
 81 | License.  Each licensee is addressed as "you".  "Licensees" and
 82 | "recipients" may be individuals or organizations.
 83 | 
 84 |   To "modify" a work means to copy from or adapt all or part of the work
 85 | in a fashion requiring copyright permission, other than the making of an
 86 | exact copy.  The resulting work is called a "modified version" of the
 87 | earlier work or a work "based on" the earlier work.
 88 | 
 89 |   A "covered work" means either the unmodified Program or a work based
 90 | on the Program.
 91 | 
 92 |   To "propagate" a work means to do anything with it that, without
 93 | permission, would make you directly or secondarily liable for
 94 | infringement under applicable copyright law, except executing it on a
 95 | computer or modifying a private copy.  Propagation includes copying,
 96 | distribution (with or without modification), making available to the
 97 | public, and in some countries other activities as well.
 98 | 
 99 |   To "convey" a work means any kind of propagation that enables other
100 | parties to make or receive copies.  Mere interaction with a user through
101 | a computer network, with no transfer of a copy, is not conveying.
102 | 
103 |   An interactive user interface displays "Appropriate Legal Notices"
104 | to the extent that it includes a convenient and prominently visible
105 | feature that (1) displays an appropriate copyright notice, and (2)
106 | tells the user that there is no warranty for the work (except to the
107 | extent that warranties are provided), that licensees may convey the
108 | work under this License, and how to view a copy of this License.  If
109 | the interface presents a list of user commands or options, such as a
110 | menu, a prominent item in the list meets this criterion.
111 | 
112 |   1. Source Code.
113 | 
114 |   The "source code" for a work means the preferred form of the work
115 | for making modifications to it.  "Object code" means any non-source
116 | form of a work.
117 | 
118 |   A "Standard Interface" means an interface that either is an official
119 | standard defined by a recognized standards body, or, in the case of
120 | interfaces specified for a particular programming language, one that
121 | is widely used among developers working in that language.
122 | 
123 |   The "System Libraries" of an executable work include anything, other
124 | than the work as a whole, that (a) is included in the normal form of
125 | packaging a Major Component, but which is not part of that Major
126 | Component, and (b) serves only to enable use of the work with that
127 | Major Component, or to implement a Standard Interface for which an
128 | implementation is available to the public in source code form.  A
129 | "Major Component", in this context, means a major essential component
130 | (kernel, window system, and so on) of the specific operating system
131 | (if any) on which the executable work runs, or a compiler used to
132 | produce the work, or an object code interpreter used to run it.
133 | 
134 |   The "Corresponding Source" for a work in object code form means all
135 | the source code needed to generate, install, and (for an executable
136 | work) run the object code and to modify the work, including scripts to
137 | control those activities.  However, it does not include the work's
138 | System Libraries, or general-purpose tools or generally available free
139 | programs which are used unmodified in performing those activities but
140 | which are not part of the work.  For example, Corresponding Source
141 | includes interface definition files associated with source files for
142 | the work, and the source code for shared libraries and dynamically
143 | linked subprograms that the work is specifically designed to require,
144 | such as by intimate data communication or control flow between those
145 | subprograms and other parts of the work.
146 | 
147 |   The Corresponding Source need not include anything that users
148 | can regenerate automatically from other parts of the Corresponding
149 | Source.
150 | 
151 |   The Corresponding Source for a work in source code form is that
152 | same work.
153 | 
154 |   2. Basic Permissions.
155 | 
156 |   All rights granted under this License are granted for the term of
157 | copyright on the Program, and are irrevocable provided the stated
158 | conditions are met.  This License explicitly affirms your unlimited
159 | permission to run the unmodified Program.  The output from running a
160 | covered work is covered by this License only if the output, given its
161 | content, constitutes a covered work.  This License acknowledges your
162 | rights of fair use or other equivalent, as provided by copyright law.
163 | 
164 |   You may make, run and propagate covered works that you do not
165 | convey, without conditions so long as your license otherwise remains
166 | in force.  You may convey covered works to others for the sole purpose
167 | of having them make modifications exclusively for you, or provide you
168 | with facilities for running those works, provided that you comply with
169 | the terms of this License in conveying all material for which you do
170 | not control copyright.  Those thus making or running the covered works
171 | for you must do so exclusively on your behalf, under your direction
172 | and control, on terms that prohibit them from making any copies of
173 | your copyrighted material outside their relationship with you.
174 | 
175 |   Conveying under any other circumstances is permitted solely under
176 | the conditions stated below.  Sublicensing is not allowed; section 10
177 | makes it unnecessary.
178 | 
179 |   3. Protecting Users' Legal Rights From Anti-Circumvention Law.
180 | 
181 |   No covered work shall be deemed part of an effective technological
182 | measure under any applicable law fulfilling obligations under article
183 | 11 of the WIPO copyright treaty adopted on 20 December 1996, or
184 | similar laws prohibiting or restricting circumvention of such
185 | measures.
186 | 
187 |   When you convey a covered work, you waive any legal power to forbid
188 | circumvention of technological measures to the extent such circumvention
189 | is effected by exercising rights under this License with respect to
190 | the covered work, and you disclaim any intention to limit operation or
191 | modification of the work as a means of enforcing, against the work's
192 | users, your or third parties' legal rights to forbid circumvention of
193 | technological measures.
194 | 
195 |   4. Conveying Verbatim Copies.
196 | 
197 |   You may convey verbatim copies of the Program's source code as you
198 | receive it, in any medium, provided that you conspicuously and
199 | appropriately publish on each copy an appropriate copyright notice;
200 | keep intact all notices stating that this License and any
201 | non-permissive terms added in accord with section 7 apply to the code;
202 | keep intact all notices of the absence of any warranty; and give all
203 | recipients a copy of this License along with the Program.
204 | 
205 |   You may charge any price or no price for each copy that you convey,
206 | and you may offer support or warranty protection for a fee.
207 | 
208 |   5. Conveying Modified Source Versions.
209 | 
210 |   You may convey a work based on the Program, or the modifications to
211 | produce it from the Program, in the form of source code under the
212 | terms of section 4, provided that you also meet all of these conditions:
213 | 
214 |     a) The work must carry prominent notices stating that you modified
215 |     it, and giving a relevant date.
216 | 
217 |     b) The work must carry prominent notices stating that it is
218 |     released under this License and any conditions added under section
219 |     7.  This requirement modifies the requirement in section 4 to
220 |     "keep intact all notices".
221 | 
222 |     c) You must license the entire work, as a whole, under this
223 |     License to anyone who comes into possession of a copy.  This
224 |     License will therefore apply, along with any applicable section 7
225 |     additional terms, to the whole of the work, and all its parts,
226 |     regardless of how they are packaged.  This License gives no
227 |     permission to license the work in any other way, but it does not
228 |     invalidate such permission if you have separately received it.
229 | 
230 |     d) If the work has interactive user interfaces, each must display
231 |     Appropriate Legal Notices; however, if the Program has interactive
232 |     interfaces that do not display Appropriate Legal Notices, your
233 |     work need not make them do so.
234 | 
235 |   A compilation of a covered work with other separate and independent
236 | works, which are not by their nature extensions of the covered work,
237 | and which are not combined with it such as to form a larger program,
238 | in or on a volume of a storage or distribution medium, is called an
239 | "aggregate" if the compilation and its resulting copyright are not
240 | used to limit the access or legal rights of the compilation's users
241 | beyond what the individual works permit.  Inclusion of a covered work
242 | in an aggregate does not cause this License to apply to the other
243 | parts of the aggregate.
244 | 
245 |   6. Conveying Non-Source Forms.
246 | 
247 |   You may convey a covered work in object code form under the terms
248 | of sections 4 and 5, provided that you also convey the
249 | machine-readable Corresponding Source under the terms of this License,
250 | in one of these ways:
251 | 
252 |     a) Convey the object code in, or embodied in, a physical product
253 |     (including a physical distribution medium), accompanied by the
254 |     Corresponding Source fixed on a durable physical medium
255 |     customarily used for software interchange.
256 | 
257 |     b) Convey the object code in, or embodied in, a physical product
258 |     (including a physical distribution medium), accompanied by a
259 |     written offer, valid for at least three years and valid for as
260 |     long as you offer spare parts or customer support for that product
261 |     model, to give anyone who possesses the object code either (1) a
262 |     copy of the Corresponding Source for all the software in the
263 |     product that is covered by this License, on a durable physical
264 |     medium customarily used for software interchange, for a price no
265 |     more than your reasonable cost of physically performing this
266 |     conveying of source, or (2) access to copy the
267 |     Corresponding Source from a network server at no charge.
268 | 
269 |     c) Convey individual copies of the object code with a copy of the
270 |     written offer to provide the Corresponding Source.  This
271 |     alternative is allowed only occasionally and noncommercially, and
272 |     only if you received the object code with such an offer, in accord
273 |     with subsection 6b.
274 | 
275 |     d) Convey the object code by offering access from a designated
276 |     place (gratis or for a charge), and offer equivalent access to the
277 |     Corresponding Source in the same way through the same place at no
278 |     further charge.  You need not require recipients to copy the
279 |     Corresponding Source along with the object code.  If the place to
280 |     copy the object code is a network server, the Corresponding Source
281 |     may be on a different server (operated by you or a third party)
282 |     that supports equivalent copying facilities, provided you maintain
283 |     clear directions next to the object code saying where to find the
284 |     Corresponding Source.  Regardless of what server hosts the
285 |     Corresponding Source, you remain obligated to ensure that it is
286 |     available for as long as needed to satisfy these requirements.
287 | 
288 |     e) Convey the object code using peer-to-peer transmission, provided
289 |     you inform other peers where the object code and Corresponding
290 |     Source of the work are being offered to the general public at no
291 |     charge under subsection 6d.
292 | 
293 |   A separable portion of the object code, whose source code is excluded
294 | from the Corresponding Source as a System Library, need not be
295 | included in conveying the object code work.
296 | 
297 |   A "User Product" is either (1) a "consumer product", which means any
298 | tangible personal property which is normally used for personal, family,
299 | or household purposes, or (2) anything designed or sold for incorporation
300 | into a dwelling.  In determining whether a product is a consumer product,
301 | doubtful cases shall be resolved in favor of coverage.  For a particular
302 | product received by a particular user, "normally used" refers to a
303 | typical or common use of that class of product, regardless of the status
304 | of the particular user or of the way in which the particular user
305 | actually uses, or expects or is expected to use, the product.  A product
306 | is a consumer product regardless of whether the product has substantial
307 | commercial, industrial or non-consumer uses, unless such uses represent
308 | the only significant mode of use of the product.
309 | 
310 |   "Installation Information" for a User Product means any methods,
311 | procedures, authorization keys, or other information required to install
312 | and execute modified versions of a covered work in that User Product from
313 | a modified version of its Corresponding Source.  The information must
314 | suffice to ensure that the continued functioning of the modified object
315 | code is in no case prevented or interfered with solely because
316 | modification has been made.
317 | 
318 |   If you convey an object code work under this section in, or with, or
319 | specifically for use in, a User Product, and the conveying occurs as
320 | part of a transaction in which the right of possession and use of the
321 | User Product is transferred to the recipient in perpetuity or for a
322 | fixed term (regardless of how the transaction is characterized), the
323 | Corresponding Source conveyed under this section must be accompanied
324 | by the Installation Information.  But this requirement does not apply
325 | if neither you nor any third party retains the ability to install
326 | modified object code on the User Product (for example, the work has
327 | been installed in ROM).
328 | 
329 |   The requirement to provide Installation Information does not include a
330 | requirement to continue to provide support service, warranty, or updates
331 | for a work that has been modified or installed by the recipient, or for
332 | the User Product in which it has been modified or installed.  Access to a
333 | network may be denied when the modification itself materially and
334 | adversely affects the operation of the network or violates the rules and
335 | protocols for communication across the network.
336 | 
337 |   Corresponding Source conveyed, and Installation Information provided,
338 | in accord with this section must be in a format that is publicly
339 | documented (and with an implementation available to the public in
340 | source code form), and must require no special password or key for
341 | unpacking, reading or copying.
342 | 
343 |   7. Additional Terms.
344 | 
345 |   "Additional permissions" are terms that supplement the terms of this
346 | License by making exceptions from one or more of its conditions.
347 | Additional permissions that are applicable to the entire Program shall
348 | be treated as though they were included in this License, to the extent
349 | that they are valid under applicable law.  If additional permissions
350 | apply only to part of the Program, that part may be used separately
351 | under those permissions, but the entire Program remains governed by
352 | this License without regard to the additional permissions.
353 | 
354 |   When you convey a copy of a covered work, you may at your option
355 | remove any additional permissions from that copy, or from any part of
356 | it.  (Additional permissions may be written to require their own
357 | removal in certain cases when you modify the work.)  You may place
358 | additional permissions on material, added by you to a covered work,
359 | for which you have or can give appropriate copyright permission.
360 | 
361 |   Notwithstanding any other provision of this License, for material you
362 | add to a covered work, you may (if authorized by the copyright holders of
363 | that material) supplement the terms of this License with terms:
364 | 
365 |     a) Disclaiming warranty or limiting liability differently from the
366 |     terms of sections 15 and 16 of this License; or
367 | 
368 |     b) Requiring preservation of specified reasonable legal notices or
369 |     author attributions in that material or in the Appropriate Legal
370 |     Notices displayed by works containing it; or
371 | 
372 |     c) Prohibiting misrepresentation of the origin of that material, or
373 |     requiring that modified versions of such material be marked in
374 |     reasonable ways as different from the original version; or
375 | 
376 |     d) Limiting the use for publicity purposes of names of licensors or
377 |     authors of the material; or
378 | 
379 |     e) Declining to grant rights under trademark law for use of some
380 |     trade names, trademarks, or service marks; or
381 | 
382 |     f) Requiring indemnification of licensors and authors of that
383 |     material by anyone who conveys the material (or modified versions of
384 |     it) with contractual assumptions of liability to the recipient, for
385 |     any liability that these contractual assumptions directly impose on
386 |     those licensors and authors.
387 | 
388 |   All other non-permissive additional terms are considered "further
389 | restrictions" within the meaning of section 10.  If the Program as you
390 | received it, or any part of it, contains a notice stating that it is
391 | governed by this License along with a term that is a further
392 | restriction, you may remove that term.  If a license document contains
393 | a further restriction but permits relicensing or conveying under this
394 | License, you may add to a covered work material governed by the terms
395 | of that license document, provided that the further restriction does
396 | not survive such relicensing or conveying.
397 | 
398 |   If you add terms to a covered work in accord with this section, you
399 | must place, in the relevant source files, a statement of the
400 | additional terms that apply to those files, or a notice indicating
401 | where to find the applicable terms.
402 | 
403 |   Additional terms, permissive or non-permissive, may be stated in the
404 | form of a separately written license, or stated as exceptions;
405 | the above requirements apply either way.
406 | 
407 |   8. Termination.
408 | 
409 |   You may not propagate or modify a covered work except as expressly
410 | provided under this License.  Any attempt otherwise to propagate or
411 | modify it is void, and will automatically terminate your rights under
412 | this License (including any patent licenses granted under the third
413 | paragraph of section 11).
414 | 
415 |   However, if you cease all violation of this License, then your
416 | license from a particular copyright holder is reinstated (a)
417 | provisionally, unless and until the copyright holder explicitly and
418 | finally terminates your license, and (b) permanently, if the copyright
419 | holder fails to notify you of the violation by some reasonable means
420 | prior to 60 days after the cessation.
421 | 
422 |   Moreover, your license from a particular copyright holder is
423 | reinstated permanently if the copyright holder notifies you of the
424 | violation by some reasonable means, this is the first time you have
425 | received notice of violation of this License (for any work) from that
426 | copyright holder, and you cure the violation prior to 30 days after
427 | your receipt of the notice.
428 | 
429 |   Termination of your rights under this section does not terminate the
430 | licenses of parties who have received copies or rights from you under
431 | this License.  If your rights have been terminated and not permanently
432 | reinstated, you do not qualify to receive new licenses for the same
433 | material under section 10.
434 | 
435 |   9. Acceptance Not Required for Having Copies.
436 | 
437 |   You are not required to accept this License in order to receive or
438 | run a copy of the Program.  Ancillary propagation of a covered work
439 | occurring solely as a consequence of using peer-to-peer transmission
440 | to receive a copy likewise does not require acceptance.  However,
441 | nothing other than this License grants you permission to propagate or
442 | modify any covered work.  These actions infringe copyright if you do
443 | not accept this License.  Therefore, by modifying or propagating a
444 | covered work, you indicate your acceptance of this License to do so.
445 | 
446 |   10. Automatic Licensing of Downstream Recipients.
447 | 
448 |   Each time you convey a covered work, the recipient automatically
449 | receives a license from the original licensors, to run, modify and
450 | propagate that work, subject to this License.  You are not responsible
451 | for enforcing compliance by third parties with this License.
452 | 
453 |   An "entity transaction" is a transaction transferring control of an
454 | organization, or substantially all assets of one, or subdividing an
455 | organization, or merging organizations.  If propagation of a covered
456 | work results from an entity transaction, each party to that
457 | transaction who receives a copy of the work also receives whatever
458 | licenses to the work the party's predecessor in interest had or could
459 | give under the previous paragraph, plus a right to possession of the
460 | Corresponding Source of the work from the predecessor in interest, if
461 | the predecessor has it or can get it with reasonable efforts.
462 | 
463 |   You may not impose any further restrictions on the exercise of the
464 | rights granted or affirmed under this License.  For example, you may
465 | not impose a license fee, royalty, or other charge for exercise of
466 | rights granted under this License, and you may not initiate litigation
467 | (including a cross-claim or counterclaim in a lawsuit) alleging that
468 | any patent claim is infringed by making, using, selling, offering for
469 | sale, or importing the Program or any portion of it.
470 | 
471 |   11. Patents.
472 | 
473 |   A "contributor" is a copyright holder who authorizes use under this
474 | License of the Program or a work on which the Program is based.  The
475 | work thus licensed is called the contributor's "contributor version".
476 | 
477 |   A contributor's "essential patent claims" are all patent claims
478 | owned or controlled by the contributor, whether already acquired or
479 | hereafter acquired, that would be infringed by some manner, permitted
480 | by this License, of making, using, or selling its contributor version,
481 | but do not include claims that would be infringed only as a
482 | consequence of further modification of the contributor version.  For
483 | purposes of this definition, "control" includes the right to grant
484 | patent sublicenses in a manner consistent with the requirements of
485 | this License.
486 | 
487 |   Each contributor grants you a non-exclusive, worldwide, royalty-free
488 | patent license under the contributor's essential patent claims, to
489 | make, use, sell, offer for sale, import and otherwise run, modify and
490 | propagate the contents of its contributor version.
491 | 
492 |   In the following three paragraphs, a "patent license" is any express
493 | agreement or commitment, however denominated, not to enforce a patent
494 | (such as an express permission to practice a patent or covenant not to
495 | sue for patent infringement).  To "grant" such a patent license to a
496 | party means to make such an agreement or commitment not to enforce a
497 | patent against the party.
498 | 
499 |   If you convey a covered work, knowingly relying on a patent license,
500 | and the Corresponding Source of the work is not available for anyone
501 | to copy, free of charge and under the terms of this License, through a
502 | publicly available network server or other readily accessible means,
503 | then you must either (1) cause the Corresponding Source to be so
504 | available, or (2) arrange to deprive yourself of the benefit of the
505 | patent license for this particular work, or (3) arrange, in a manner
506 | consistent with the requirements of this License, to extend the patent
507 | license to downstream recipients.  "Knowingly relying" means you have
508 | actual knowledge that, but for the patent license, your conveying the
509 | covered work in a country, or your recipient's use of the covered work
510 | in a country, would infringe one or more identifiable patents in that
511 | country that you have reason to believe are valid.
512 | 
513 |   If, pursuant to or in connection with a single transaction or
514 | arrangement, you convey, or propagate by procuring conveyance of, a
515 | covered work, and grant a patent license to some of the parties
516 | receiving the covered work authorizing them to use, propagate, modify
517 | or convey a specific copy of the covered work, then the patent license
518 | you grant is automatically extended to all recipients of the covered
519 | work and works based on it.
520 | 
521 |   A patent license is "discriminatory" if it does not include within
522 | the scope of its coverage, prohibits the exercise of, or is
523 | conditioned on the non-exercise of one or more of the rights that are
524 | specifically granted under this License.  You may not convey a covered
525 | work if you are a party to an arrangement with a third party that is
526 | in the business of distributing software, under which you make payment
527 | to the third party based on the extent of your activity of conveying
528 | the work, and under which the third party grants, to any of the
529 | parties who would receive the covered work from you, a discriminatory
530 | patent license (a) in connection with copies of the covered work
531 | conveyed by you (or copies made from those copies), or (b) primarily
532 | for and in connection with specific products or compilations that
533 | contain the covered work, unless you entered into that arrangement,
534 | or that patent license was granted, prior to 28 March 2007.
535 | 
536 |   Nothing in this License shall be construed as excluding or limiting
537 | any implied license or other defenses to infringement that may
538 | otherwise be available to you under applicable patent law.
539 | 
540 |   12. No Surrender of Others' Freedom.
541 | 
542 |   If conditions are imposed on you (whether by court order, agreement or
543 | otherwise) that contradict the conditions of this License, they do not
544 | excuse you from the conditions of this License.  If you cannot convey a
545 | covered work so as to satisfy simultaneously your obligations under this
546 | License and any other pertinent obligations, then as a consequence you may
547 | not convey it at all.  For example, if you agree to terms that obligate you
548 | to collect a royalty for further conveying from those to whom you convey
549 | the Program, the only way you could satisfy both those terms and this
550 | License would be to refrain entirely from conveying the Program.
551 | 
552 |   13. Use with the GNU Affero General Public License.
553 | 
554 |   Notwithstanding any other provision of this License, you have
555 | permission to link or combine any covered work with a work licensed
556 | under version 3 of the GNU Affero General Public License into a single
557 | combined work, and to convey the resulting work.  The terms of this
558 | License will continue to apply to the part which is the covered work,
559 | but the special requirements of the GNU Affero General Public License,
560 | section 13, concerning interaction through a network will apply to the
561 | combination as such.
562 | 
563 |   14. Revised Versions of this License.
564 | 
565 |   The Free Software Foundation may publish revised and/or new versions of
566 | the GNU General Public License from time to time.  Such new versions will
567 | be similar in spirit to the present version, but may differ in detail to
568 | address new problems or concerns.
569 | 
570 |   Each version is given a distinguishing version number.  If the
571 | Program specifies that a certain numbered version of the GNU General
572 | Public License "or any later version" applies to it, you have the
573 | option of following the terms and conditions either of that numbered
574 | version or of any later version published by the Free Software
575 | Foundation.  If the Program does not specify a version number of the
576 | GNU General Public License, you may choose any version ever published
577 | by the Free Software Foundation.
578 | 
579 |   If the Program specifies that a proxy can decide which future
580 | versions of the GNU General Public License can be used, that proxy's
581 | public statement of acceptance of a version permanently authorizes you
582 | to choose that version for the Program.
583 | 
584 |   Later license versions may give you additional or different
585 | permissions.  However, no additional obligations are imposed on any
586 | author or copyright holder as a result of your choosing to follow a
587 | later version.
588 | 
589 |   15. Disclaimer of Warranty.
590 | 
591 |   THERE IS NO WARRANTY FOR THE PROGRAM, TO THE EXTENT PERMITTED BY
592 | APPLICABLE LAW.  EXCEPT WHEN OTHERWISE STATED IN WRITING THE COPYRIGHT
593 | HOLDERS AND/OR OTHER PARTIES PROVIDE THE PROGRAM "AS IS" WITHOUT WARRANTY
594 | OF ANY KIND, EITHER EXPRESSED OR IMPLIED, INCLUDING, BUT NOT LIMITED TO,
595 | THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
596 | PURPOSE.  THE ENTIRE RISK AS TO THE QUALITY AND PERFORMANCE OF THE PROGRAM
597 | IS WITH YOU.  SHOULD THE PROGRAM PROVE DEFECTIVE, YOU ASSUME THE COST OF
598 | ALL NECESSARY SERVICING, REPAIR OR CORRECTION.
599 | 
600 |   16. Limitation of Liability.
601 | 
602 |   IN NO EVENT UNLESS REQUIRED BY APPLICABLE LAW OR AGREED TO IN WRITING
603 | WILL ANY COPYRIGHT HOLDER, OR ANY OTHER PARTY WHO MODIFIES AND/OR CONVEYS
604 | THE PROGRAM AS PERMITTED ABOVE, BE LIABLE TO YOU FOR DAMAGES, INCLUDING ANY
605 | GENERAL, SPECIAL, INCIDENTAL OR CONSEQUENTIAL DAMAGES ARISING OUT OF THE
606 | USE OR INABILITY TO USE THE PROGRAM (INCLUDING BUT NOT LIMITED TO LOSS OF
607 | DATA OR DATA BEING RENDERED INACCURATE OR LOSSES SUSTAINED BY YOU OR THIRD
608 | PARTIES OR A FAILURE OF THE PROGRAM TO OPERATE WITH ANY OTHER PROGRAMS),
609 | EVEN IF SUCH HOLDER OR OTHER PARTY HAS BEEN ADVISED OF THE POSSIBILITY OF
610 | SUCH DAMAGES.
611 | 
612 |   17. Interpretation of Sections 15 and 16.
613 | 
614 |   If the disclaimer of warranty and limitation of liability provided
615 | above cannot be given local legal effect according to their terms,
616 | reviewing courts shall apply local law that most closely approximates
617 | an absolute waiver of all civil liability in connection with the
618 | Program, unless a warranty or assumption of liability accompanies a
619 | copy of the Program in return for a fee.
620 | 
621 |                      END OF TERMS AND CONDITIONS
622 | 
623 |             How to Apply These Terms to Your New Programs
624 | 
625 |   If you develop a new program, and you want it to be of the greatest
626 | possible use to the public, the best way to achieve this is to make it
627 | free software which everyone can redistribute and change under these terms.
628 | 
629 |   To do so, attach the following notices to the program.  It is safest
630 | to attach them to the start of each source file to most effectively
631 | state the exclusion of warranty; and each file should have at least
632 | the "copyright" line and a pointer to where the full notice is found.
633 | 
634 |     <one line to give the program's name and a brief idea of what it does.>
635 |     Copyright (C) <year>  <name of author>
636 | 
637 |     This program is free software: you can redistribute it and/or modify
638 |     it under the terms of the GNU General Public License as published by
639 |     the Free Software Foundation, either version 3 of the License, or
640 |     (at your option) any later version.
641 | 
642 |     This program is distributed in the hope that it will be useful,
643 |     but WITHOUT ANY WARRANTY; without even the implied warranty of
644 |     MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
645 |     GNU General Public License for more details.
646 | 
647 |     You should have received a copy of the GNU General Public License
648 |     along with this program.  If not, see <https://www.gnu.org/licenses/>.
649 | 
650 | Also add information on how to contact you by electronic and paper mail.
651 | 
652 |   If the program does terminal interaction, make it output a short
653 | notice like this when it starts in an interactive mode:
654 | 
655 |     <program>  Copyright (C) <year>  <name of author>
656 |     This program comes with ABSOLUTELY NO WARRANTY; for details type `show w'.
657 |     This is free software, and you are welcome to redistribute it
658 |     under certain conditions; type `show c' for details.
659 | 
660 | The hypothetical commands `show w' and `show c' should show the appropriate
661 | parts of the General Public License.  Of course, your program's commands
662 | might be different; for a GUI interface, you would use an "about box".
663 | 
664 |   You should also get your employer (if you work as a programmer) or school,
665 | if any, to sign a "copyright disclaimer" for the program, if necessary.
666 | For more information on this, and how to apply and follow the GNU GPL, see
667 | <https://www.gnu.org/licenses/>.
668 | 
669 |   The GNU General Public License does not permit incorporating your program
670 | into proprietary programs.  If your program is a subroutine library, you
671 | may consider it more useful to permit linking proprietary applications with
672 | the library.  If this is what you want to do, use the GNU Lesser General
673 | Public License instead of this License.  But first, please read
674 | <https://www.gnu.org/licenses/why-not-lgpl.html>.
675 | 


--------------------------------------------------------------------------------
/2.0/Get_MR2.0.r:
--------------------------------------------------------------------------------
   1 | library(mr.raps)
   2 | library(TwoSampleMR)
   3 | library(dplyr)
   4 | library(fs) 
   5 | library(ieugwasr)
   6 | library(plinkbinr) 
   7 | library(tidyr)
   8 | library(data.table)
   9 | library(MRPRESSO)
  10 | library(parallel)
  11 | library(doParallel)
  12 | library(pbapply)
  13 | library(stringr)
  14 | library(cause)
  15 | library(vroom)
  16 | library(MungeSumstats)
  17 | library(ldscr)
  18 | library(MRlap)
  19 | 
  20 | 
  21 | select<-dplyr::select
  22 | mcl<-makeCluster
  23 | 
  24 | 
  25 | ## 2.0更新内容
  26 | 
  27 | clean<-function(list,clean_0=FALSE){
  28 |   
  29 |   l<-pblapply(list,class)
  30 |   ll<-c()
  31 |   for(i in 1:length(l)){
  32 |     if(l[[i]][1]%in% "try-error"){}
  33 |     else{ll<-c(ll,i)}
  34 |   }
  35 |   list<-list[ll]
  36 |   
  37 |   if(clean_0==TRUE){
  38 |     list<-Filter(function(x) nrow(x)!=0, list)
  39 |   }
  40 |   return(list)
  41 | }
  42 | 
  43 | get_rsid<-function(chr,pos,version='hg38'){
  44 |   library(biomaRt)
  45 |   dat<-data.frame(CHR=chr,start=pos,end=pos)
  46 |   if(version=="hg38"){
  47 |     ver<-listEnsemblArchives()%>%subset(name!='Ensembl GRCh37')
  48 |     ur<-ver$url[1]
  49 |     snp_mart <- useEnsembl(biomart="ENSEMBL_MART_SNP", 
  50 |                            host=ur, 
  51 |                            dataset="hsapiens_snp")
  52 |     position <- apply(dat, 1, paste, collapse = ":")
  53 |     res<-getBM(attributes = c('refsnp_id', 'allele', 'chrom_start'), 
  54 |                filters = 'chromosomal_region', 
  55 |                values = position, 
  56 |                mart = snp_mart)
  57 |   }
  58 |   if(version=="hg19"){
  59 |     snp_mart <- useEnsembl(biomart="ENSEMBL_MART_SNP", 
  60 |                            host="https://grch37.ensembl.org", 
  61 |                            dataset="hsapiens_snp")
  62 |     position <- apply(dat, 1, paste, collapse = ":")
  63 |     res<-getBM(attributes = c('refsnp_id', 'allele', 'chrom_start'), 
  64 |                filters = 'chromosomal_region', 
  65 |                values = position, 
  66 |                mart = snp_mart)
  67 |   }
  68 |   
  69 |   return(res)
  70 | }
  71 | 
  72 | get_exposure = function(id,pval=5e-08,r2=0.001,kb=10000) {
  73 |   library(TwoSampleMR)
  74 |   dat<-try(data.frame(extract_instruments(id,p1=pval,clump = T,p2=1,r2=r2,kb=kb)))
  75 |   if ('try-error'%in%class(dat))   {
  76 |     dat=NA
  77 |   }
  78 |   return(dat)
  79 | }
  80 | 
  81 | get_outcome=function(id,expo){
  82 |   library(TwoSampleMR)
  83 |   dat<-try(data.frame(extract_outcome_data(expo$SNP,id,proxies = F,maf_threshold=0.4)))
  84 |   if ('try-error'%in%class(dat))   {
  85 |     dat=NA
  86 |   }
  87 |   
  88 |   try( return(dat))
  89 | }
  90 | 
  91 | get_ao<-function(a=NULL){
  92 |   ao<-available_outcomes()
  93 |   ao<-separate(ao,id,c("a","b","c"),sep="-",remove = FALSE)
  94 |   if(is.null(a)==F){ao<-subset(ao,a==a)}
  95 |   return(ao)
  96 | }
  97 | 
  98 | get_exposure_wrong_num<-function(list){
  99 |   res<-is.na(list)
 100 |   num=0
 101 |   foreach (i=1:length(res))%do%{
 102 |     if(res[i]==TRUE) num=num+1
 103 |   }
 104 |   return(num)
 105 | }
 106 | 
 107 | clean_outcome_from_exposure<-function(expo,outcome){
 108 |   snp<-lapply(expo,FUN=function(x)data.frame(SNP=x$SNP))
 109 |   snp<-bind_rows(snp)
 110 |   snp<-data.frame(SNP=unique(snp$SNP))
 111 |   outcome_adj<-merge(outcome,snp)
 112 |   return(outcome_adj)
 113 | }
 114 | 
 115 | clean_GWAS<-function(list,clean=c("bbj","eqtl")){
 116 |   clean_GWAS_logit<-function(dat,clean){
 117 |     id<-dat$id.exposure[1]
 118 |     id<-as.data.frame(id)
 119 |     id<-separate(id,"id",c('a','b','c'),sep = "-")
 120 |     if(id$a==clean){
 121 |       return(data.frame(lg=1))
 122 |     }
 123 |     else(return(data.frame(lg=0)))
 124 |   }
 125 |   print("正在获取指定数据集信息")
 126 |   for(i in 1:length(clean)){
 127 |   lg<-pblapply(list,clean[i],FUN=clean_GWAS_logit)
 128 |   
 129 |   lg<-bind_rows(lg)
 130 |   
 131 |   list<-list[which(lg$lg==0)]
 132 |   }
 133 |   return(list)
 134 |   
 135 | }
 136 | 
 137 | 
 138 | 
 139 | cyclemr<-function(dat,cl_num,type="list"){
 140 | clean<-function(list,clean_0=FALSE){
 141 |   
 142 |   l<-pblapply(list,class)
 143 |   ll<-c()
 144 |   for(i in 1:length(l)){
 145 |     if(l[[i]][1]%in% "try-error"){}
 146 |     else{ll<-c(ll,i)}
 147 |   }
 148 |   list<-list[ll]
 149 |   
 150 |   if(clean_0==TRUE){
 151 |     list<-Filter(function(x) nrow(x)!=0, list)
 152 |   }
 153 |   return(list)
 154 | }
 155 |   # base mr
 156 |   mr_base<-function(dat){
 157 |     library(TwoSampleMR)
 158 |     library(dplyr)
 159 |     try(mr<-mr(dat) )
 160 |     try(mr<-dplyr::select(mr,id.exposure,id.outcome,method,nsnp,b,se,pval)) 
 161 |     try(mr_OR<-generate_odds_ratios(mr_res = mr(dat)))
 162 |     try(mr_OR<-dplyr::select(mr_OR,id.exposure,id.outcome,method,nsnp,b,se,pval,lo_ci,up_ci,or, or_lci95,or_uci95))
 163 |     try(mr_OR<-dplyr::rename(mr_OR,b.OR="b",se.OR="se",pval.OR="pval"))
 164 |     try( mr_p_OR<-merge(mr,mr_OR))
 165 |     try(return(mr_p_OR))
 166 |   }
 167 |   
 168 |   mr_egger<-function(dat){
 169 |     library(TwoSampleMR)
 170 |     library(dplyr)
 171 |     mr_egger<-mr_pleiotropy_test(dat) 
 172 |     try( mr_egger<-dplyr::select(mr_egger,id.exposure,id.outcome,egger_intercept,se,pval))
 173 |     try( mr_egger<-dplyr::rename(mr_egger,se.egger="se",pval.egger="pval"))
 174 |     try( mr_egger[2:5,]<-NA) 
 175 |     try( return(mr_egger))
 176 |   }
 177 |   
 178 |   mr_test<-function(dat){
 179 |     library(TwoSampleMR)
 180 |     library(dplyr)
 181 |     mr_heterogeneity<-mr_heterogeneity(dat)
 182 |     try (mr_heterogeneity<-dplyr::select(mr_heterogeneity,id.exposure,id.outcome,method,Q, Q_df,Q_pval))
 183 |     try( mr_heterogeneity<-dplyr::rename(mr_heterogeneity,method.he="method"))
 184 |     try (mr_heterogeneity[3:5,]<-NA )
 185 |     try( return(mr_heterogeneity))
 186 |   }
 187 |   
 188 |   # presso
 189 |   cycle_presso<-function(dat){
 190 |     library(TwoSampleMR)
 191 |     library(MRPRESSO)
 192 |     library(dplyr) 
 193 |     nsnp_filter=6
 194 |     try (mr_presso_res<-mr_presso(BetaOutcome ="beta.outcome", BetaExposure = "beta.exposure", SdOutcome ="se.outcome", SdExposure = "se.exposure", 
 195 |                                   OUTLIERtest = TRUE,DISTORTIONtest = TRUE, data = dat,  
 196 |                                   SignifThreshold = 0.05))
 197 |     try ( mr_presso_main<-mr_presso_res$`Main MR results`)
 198 |     try(mr_presso_main)
 199 |     try ( mr_presso_main[3:5,]<-NA)
 200 |     try(  return(mr_presso_main))
 201 |   }
 202 |   
 203 |   mr_Presso<-function(dat,num=10000){
 204 |     library(TwoSampleMR)
 205 |     library(MRPRESSO)
 206 |     library(dplyr)
 207 |     
 208 |     nsnp_filter=6
 209 |     set.seed(123)
 210 |     try (mr_presso_res<-mr_presso(BetaOutcome ="beta.outcome", BetaExposure = "beta.exposure", SdOutcome ="se.outcome", SdExposure = "se.exposure", 
 211 |                                   OUTLIERtest = TRUE,DISTORTIONtest = TRUE, data = dat,  
 212 |                                   SignifThreshold = 0.05,NbDistribution = num))
 213 |     return(mr_presso_res)
 214 |     
 215 |   }
 216 |   mr_presso_pval<-function(mr_presso_res){ 
 217 |     try ( mr_presso_main<-mr_presso_res$`Main MR results`)
 218 |     try ( mr_presso_main[3:5,]<-NA) 
 219 |     return(mr_presso_main)
 220 |   }
 221 |   
 222 |   
 223 |   mr_presso_snp<-function(mr_presso_res,mr_presso_main,dat,type="list"){
 224 |     data_re<-list()
 225 |     if(type=="list"){
 226 |       for(i in 1:length(mr_presso_res)){
 227 |         res<-mr_presso_res$`MR-PRESSO results`[[i]]
 228 |         main<-mr_presso_main[[i]]
 229 |         data<-dat[[i]]
 230 |         try(if(is.na(main[2,6])==FALSE){
 231 |           outliers<-which(res$`Outlier Test`$Pvalue<0.05)
 232 |           data$mr_keep[outliers]<-FALSE
 233 |         })
 234 |         data_re[[i]]<-data
 235 |         names(data_re)[[i]]<-names(dat)[[i]]
 236 |       }
 237 |       return(data_re)
 238 |     }
 239 |     
 240 |     if(type=="data"){
 241 |       res<-mr_presso_res$`MR-PRESSO results`
 242 |       main<-mr_presso_main
 243 |       data<-dat
 244 |       try(if(is.na(main[2,6])==FALSE){
 245 |         outliers<-which(res$`Outlier Test`$Pvalue<0.05)
 246 |         data$mr_keep[outliers]<-FALSE
 247 |       })
 248 |       return(data)
 249 |     }
 250 |   }
 251 |   bind_basemr<-function(base.res,egger.res,test.res){
 252 |     res_all<-data.frame()
 253 |     for(i in 1:nrow(base.res)){
 254 |       if(i%%5==1){
 255 |         id<-base.res$id.exposure[i]
 256 |         num.bg<-which(egger.res$id.exposure==id)
 257 |         num.end<-num.bg+4
 258 |         res<-cbind(base.res[i:(i+4),],egger.res[num.bg:num.end,3:5])
 259 |         num.bg<-which(test.res$id.exposure==id)[1]
 260 |         num.end<-num.bg+4
 261 |         res<-cbind(res,test.res[num.bg:num.end,3:6])
 262 |         res_all<-rbind(res_all,res)
 263 |       }
 264 |     }
 265 |     return(res_all)
 266 |   }
 267 |   
 268 |   if(type=="list"){
 269 |     cl <- makeCluster(cl_num) 
 270 |     
 271 |     base.res<-pblapply(dat, FUN=mr_base,cl=cl)
 272 |     
 273 |     egger.res<-pblapply(dat, FUN=mr_egger)
 274 |     
 275 |     test.res<-pblapply(dat,FUN=mr_test)
 276 |     
 277 |     stopCluster(cl)
 278 |     
 279 |     base.res<-clean(base.res)
 280 |     
 281 |     egger.res<-clean(egger.res)
 282 |     
 283 |     test.res<-clean(test.res)
 284 |     
 285 |     base.res<-bind_rows (base.res)
 286 |     
 287 |     base.res<-subset(base.res,nsnp>2) ## 防止出现wald ratio，导致错误
 288 |     
 289 |     egger.res<-bind_rows (egger.res)
 290 |     
 291 |     test.res<-bind_rows (test.res)
 292 |     
 293 |     res_all<-bind_basemr(base.res,egger.res,test.res)
 294 |     
 295 |   }
 296 |   
 297 |   if(type=="data"){
 298 |     
 299 |     presso_res<-mr_Presso(dat,10000)
 300 |     
 301 |     presso_pval<-mr_presso_pval(presso_res)
 302 |     
 303 |     dat_aj<-mr_presso_snp(presso_res,presso_pval,dat,"data")
 304 |     
 305 |     base.res<-mr_base(dat)
 306 |     
 307 |     egger.res<-mr_egger(dat)
 308 |     
 309 |     test.res<-mr_test(dat)
 310 |     
 311 |     base.res<-bind_rows (base.res)
 312 |     
 313 |     res_all<-cbind(base.res,presso_pval,egger.res,test.res)
 314 |   }
 315 |   
 316 |   return(res_all)
 317 | }
 318 | 
 319 | 
 320 | bind_basemr<-function(base.res,egger.res,test.res){
 321 |   res_all<-data.frame()
 322 |   for(i in 1:nrow(base.res)){
 323 |     if(i%%5==1){
 324 |       id<-base.res$id.exposure[i]
 325 |       num.bg<-which(egger.res$id.exposure==id)
 326 |       num.end<-num.bg+4
 327 |       res<-cbind(base.res[i:(i+4),],egger.res[num.bg:num.end,3:5])
 328 |       num.bg<-which(test.res$id.exposure==id)[1]
 329 |       num.end<-num.bg+4
 330 |       res<-cbind(res,test.res[num.bg:num.end,3:6])
 331 |       res_all<-rbind(res_all,res)
 332 |     }
 333 |   }
 334 |   return(res_all)
 335 | }
 336 | 
 337 | 
 338 | bind_pressomr<-function(base_res,presso_pval,egger_res,test_res){
 339 |   
 340 |   presso_id<-names(presso_pval)
 341 |   presso_pval<-bind_rows(presso_pval)
 342 |   base_res<-bind_rows(base_res)
 343 |   egger_res<-bind_rows(egger_res)
 344 |   test_res<-bind_rows(test_res)
 345 |   
 346 |   res_all<-data.frame()
 347 |   for(i in 1:nrow(base_res)){
 348 |     if(i%%5==1){
 349 |       id<-base_res$id.exposure[i]
 350 |       
 351 |       num.bg<-5*(which(presso_id==id)[1]-1)+1
 352 |       num.end<-num.bg+4
 353 |       res<-cbind(base_res[i:(i+4),],presso_pval[num.bg:num.end,2:6])
 354 |       
 355 |       num.bg<-which(egger_res$id.exposure==id)
 356 |       num.end<-num.bg+4
 357 |       res<-cbind(res,egger_res[num.bg:num.end,3:5])
 358 |       
 359 |       num.bg<-which(test_res$id.exposure==id)[1]
 360 |       num.end<-num.bg+4
 361 |       res<-cbind(res,test_res[num.bg:num.end,3:6])
 362 |       
 363 |       res_all<-rbind(res_all,res)
 364 |     }
 365 |   }
 366 |   return(res_all)
 367 | }
 368 | 
 369 | 
 370 | bind_trait<-function(res_all,ao){
 371 |   res_all<-data.frame(trait=NA,res_all)
 372 |   for(i in 1:nrow(res_all)){
 373 |     try( num<-which(ao$id==res_all$id.exposure[i]))
 374 |     try( res_all$trait[i]<-ao$trait[num])
 375 |   }
 376 |   return(res_all)
 377 | }
 378 | 
 379 | bind_trait_outcome<-function(res_all,ao){
 380 |   res_all<-data.frame(trait=NA,res_all)
 381 |   for(i in 1:nrow(res_all)){
 382 |     num<-which(ao$id==res_all$id.outcome[i])
 383 |     res_all$trait[i]<-ao$trait[num]
 384 |   }
 385 |   return(res_all)
 386 | }
 387 | 
 388 | clean_cl<-function(){
 389 |   cl<<-makeCluster(1)
 390 |   stopCluster(cl)
 391 |   gc()
 392 | }
 393 | 
 394 | clean_GWAS_id<-function(dat,id){
 395 |   lid<-lapply(dat,FUN = function(x)x$id.exposure[1])
 396 |   lid<-unlist(lid)
 397 |   l<-c()
 398 |   for (i in 1:length(id)) {
 399 |     l1<-which(lid==id[i])
 400 |     l<-c(l,l1)
 401 |   }
 402 |   dat<-dat[l]
 403 |   return(dat)
 404 | }
 405 | 
 406 | 
 407 | ## 1.0内容
 408 | # format
 409 | format_Mun<-function(file,source="finn_r8",save_path=NULL,lift=F,ref_genome = "hg38",
 410 |                      convert_ref_genome = "hg19"){
 411 |   library(data.table)
 412 |   library(MungeSumstats)
 413 |   library(dplyr)
 414 |   if(class(file)!="data.frame"){
 415 |     dat<-fread(file)
 416 |     if(source=="finn_r8"){
 417 |       dat<-dat%>%dplyr::select(SNP=rsids,CHR=`#chrom`,BP=pos,A1=ref,
 418 |                                A2=alt,FRQ=af_alt,BETA=beta,
 419 |                                SE=sebeta,P=pval)
 420 |     }
 421 |   }
 422 |   dat<-as.data.frame(dat)
 423 |   dat<-format_sumstats(dat,return_data = TRUE)
 424 |   if(is.null(save_path)==F){setwd(save_path)}
 425 |   dat<-fwrite(dat,paste0(file,'(Mun_format',ref_genome,').gz'))
 426 |   te<-fs::dir_info( tempdir())
 427 |   te<-subset(te,size>1e+8)
 428 |   fs::file_delete(te$path)
 429 |   if(lift==T){
 430 |     dat <- MungeSumstats::liftover(sumstats_dt = dat, 
 431 |                                    ref_genome = ref_genome,
 432 |                                    convert_ref_genome = convert_ref_genome)
 433 |     dat<-fwrite(dat,paste0(file,'(Mun_format',convert_ref_genome,').gz'))
 434 |   }
 435 |   gc()
 436 | }
 437 | 
 438 | format_cyclemr<-function(data,type="exposure",source="finn_r8"){
 439 |   library(TwoSampleMR)
 440 |   library(MungeSumstats)
 441 |   
 442 |   if(source=="finn_r8"){
 443 |     data<-format_data(data,type=type,
 444 |                       id_col = "id",chr_col ="#chrom",pos="pos",
 445 |                       snp_col = "rsids",beta_col = "beta",se_col = "sebeta",
 446 |                       effect_allele_col = "alt",other_allele_col = "ref",
 447 |                       eaf_col = "af_alt",phenotype_col = "phenotype",pval_col = "pval")
 448 |   }
 449 |   
 450 |   
 451 |   
 452 |   if(source=="ukb_nosnp"){
 453 |     data<-separate(data,variant,c("chr","pos","ref","alt"),sep = ":")
 454 |     
 455 |     data<-dplyr::select(data,chr,pos,ref,alt,minor_AF,beta,se,pval)
 456 |     name<-colnames(data)
 457 |     colnames(data) <-c('CHR','BP','A1','A2','FRQ','BETA', 'SE','P')
 458 |     
 459 |     expo_rs_done<-format_sumstats(data,ref_genome = "GRCh37",return_data = TRUE)
 460 |     
 461 |     expo_rs_done$BP<-as.character(expo_rs_done$BP)
 462 |     colnames(data)<-name
 463 |     data<-merge(data,expo_rs_done,by.x=c("chr","pos"),by.y=c("CHR","BP"),all.x=TRUE)
 464 |     
 465 |     data<-format_data(data,type=type,
 466 |                       samplesize_col = "n_complete_samples",
 467 |                       snp_col = "SNP",effect_allele_col = "alt",
 468 |                       other_allele_col = "ref",eaf_col ="minor_AF",
 469 |                       beta_col="beta",se_col = "se",pval_col = "pval",
 470 |                       chr_col="chr",pos_col ="pos",id='id',phenotype_col = 'phenotype')
 471 |   }
 472 |   
 473 |   if(source=="Mun"){
 474 |     data<-format_data(data,type=type,
 475 |                       snp_col = 'SNP',
 476 |                       chr_col = 'CHR',
 477 |                       pos_col = 'BP',
 478 |                       effect_allele_col = 'A2',
 479 |                       other_allele_col = "A1",
 480 |                       se_col = 'SE',
 481 |                       beta_col= 'BETA',
 482 |                       eaf_col = 'FRQ',
 483 |                       id_col = 'id',
 484 |                       phenotype_col = "phenotype",
 485 |                       pval_col = "P"
 486 |     )
 487 |   }
 488 |   
 489 |   if(source=="covid"){
 490 |     data<-format_data(data,type=type,
 491 |                       id_col = "id",chr_col ="#CHR",pos="POS",
 492 |                       snp_col = "rsid",beta_col = "all_inv_var_meta_beta",se_col = "all_inv_var_meta_sebeta",
 493 |                       effect_allele_col = "ALT",other_allele_col = "REF",
 494 |                       eaf_col = "all_meta_AF",phenotype_col = "phenotype",
 495 |                       ncase_col = "all_inv_var_meta_cases",ncontrol_col = "all_inv_var_meta_controls",
 496 |                       pval_col = "all_inv_var_meta_p")
 497 |   }
 498 |   
 499 |   if(source=="outcome"){
 500 |     data<-format_data(data,type=type,
 501 |                       id_col = "id.outcome",chr_col ="chr.outcome",pos="pos.outcome",
 502 |                       snp_col = "SNP",beta_col = "beta.outcome",se_col = "se.outcome",
 503 |                       effect_allele_col = "effect_allele.outcome",other_allele_col = "other_allele.outcome",
 504 |                       eaf_col = "eaf.outcome",phenotype_col = "outcome",
 505 |                       samplesize_col = "samplesize.outcome",
 506 |                       pval_col = "pval.outcome")
 507 |     
 508 |   }
 509 |   if(source=="exposure"){
 510 |     data<-format_data(data,type=type,
 511 |                       id_col = "id.exposure",chr_col ="chr.exposure",pos="pos.exposure",
 512 |                       snp_col = "SNP",beta_col = "beta.exposure",se_col = "se.exposure",
 513 |                       effect_allele_col = "effect_allele.exposure",other_allele_col = "other_allele.exposure",
 514 |                       eaf_col = "eaf.exposure",phenotype_col = "exposure",
 515 |                       samplesize_col = "samplesize.exposure",
 516 |                       pval_col = "pval.exposure")
 517 |     
 518 |   }
 519 |   
 520 |   if(source=="fast_ukb"){
 521 |     data<-format_data(data,type=type,id_col="id",
 522 |                       phenotype_col = "phenotype",snp_col = "SNP",
 523 |                       effect_allele_col = "A1",other_allele_col = "A2",
 524 |                       eaf_col ="AF1",beta_col="BETA",se_col = "SE",
 525 |                       pval_col = "P",chr_col = "CHR",pos_col = "POS")
 526 |   }
 527 |   
 528 |   
 529 |   
 530 |   if(source=="bac"){
 531 |     data<-format_data(data,type=type,id_col="bac",
 532 |                       phenotype_col = "phenotype",snp_col = "rsID",
 533 |                       effect_allele_col = "eff.allele",other_allele_col = "ref.allele",
 534 |                       beta_col="beta",se_col = "SE",
 535 |                       pval_col = "P.weightedSumZ",chr_col = "chr",pos_col = "bp",samplesize_col = "N")
 536 |   }
 537 |   
 538 |   if(source=="finn_r7"){
 539 |     data<-format_data(data,type=type,snp_col = "rsids",
 540 |                       effect_allele_col = "alt",other_allele_col = "ref",
 541 |                       beta_col="beta",se_col = "sebeta",eaf_col = "af_alt",
 542 |                       pval_col = "pval",chr_col = "chrom",pos_col = "pos")
 543 |   }
 544 |   return(data)
 545 | }
 546 | 
 547 | 
 548 | format_trait<-function(list,short=FALSE,short_num="40"){
 549 |   for(i in 1:length(list)){
 550 |     expo<-data.frame(exposure=list[[i]]
 551 |                      $exposure)
 552 |     try(expo<-separate(expo,exposure,c('a','b'),sep='\\|\\|'))
 553 |     if(short==TRUE){try(expo$a <- substr(expo$a, 1, short_num))}
 554 |     try(expo<-expo$a)
 555 |     try(expo<-gsub(":","_",expo))
 556 |     try(expo<-gsub("-","_",expo))
 557 |     try(expo<-gsub(" ","_",expo))
 558 |     try(expo<-gsub(",","_",expo))
 559 |     try(expo<-gsub("/","_",expo))
 560 |     list[[i]]$exposure<-expo[1]
 561 |     
 562 |   }
 563 |   
 564 |   
 565 |   return(list)
 566 | }
 567 | 
 568 | # read
 569 | read_vcf_getmr<-function(file_name,nThread = 8,type=".gz"){
 570 |   name<-file_name
 571 |   
 572 |   for(i in 1:nrow(name)){
 573 |     dat<-read_sumstats(paste0("./",name[i]),nThread = nThread,nrow=Inf,standardise_headers = FALSE,mapping_file = sumstatsColHeaders)
 574 |     
 575 |     vroom_write(dat,paste0(name[i],type))
 576 |     
 577 |     gc()
 578 |   }
 579 |   print(i)
 580 |   
 581 | }
 582 | 
 583 | read_easy<-function(file_name,pval=5e-08){
 584 |   library(data.table)
 585 |   dat<-fread(file_name)
 586 |   dat<-subset(dat,pval.exposure<pval)
 587 |   return(dat)
 588 | }
 589 | 
 590 | 
 591 | # get
 592 | get_eaf_from_1000G<-function(dat,path,type="exposure"){
 593 |   
 594 |   corrected_eaf_expo<-function(data_MAF){
 595 |     effect=data_MAF$effect_allele.exposure
 596 |     other=data_MAF$other_allele.exposure
 597 |     A1=data_MAF$A1
 598 |     A2=data_MAF$A2
 599 |     MAF_num=data_MAF$MAF
 600 |     EAF_num=1-MAF_num
 601 |     
 602 |     harna<-is.na(data_MAF$A1)
 603 |     harna<-data_MAF$SNP[which(harna==T)]
 604 |     
 605 |     cor1<-which(data_MAF$effect_allele.exposure !=data_MAF$A1)
 606 |     
 607 |     
 608 |     data_MAF$eaf.exposure=data_MAF$MAF
 609 |     data_MAF$type="raw"
 610 |     data_MAF$eaf.exposure[cor1]=EAF_num[cor1]
 611 |     data_MAF$type[cor1]="corrected"
 612 |     cor2<-which(data_MAF$other_allele.exposure ==data_MAF$A1)
 613 |     cor21<-setdiff(cor2,cor1)
 614 |     cor12<-setdiff(cor1,cor2)
 615 |     error<-c(cor12,cor21)
 616 |     data_MAF$eaf.exposure[error]=NA
 617 |     data_MAF$type[error]="error"
 618 |     
 619 |     data_MAF<-list(data_MAF=data_MAF,cor1=cor1,harna=harna,error=error)
 620 |     
 621 |     return(data_MAF)
 622 |     
 623 |   }
 624 |   
 625 |   corrected_eaf_out<-function(data_MAF){
 626 |     effect=data_MAF$effect_allele.outcome
 627 |     other=data_MAF$other_allele.outcome
 628 |     A1=data_MAF$A1
 629 |     A2=data_MAF$A2
 630 |     MAF_num=data_MAF$MAF
 631 |     EAF_num=1-MAF_num
 632 |     
 633 |     harna<-is.na(data_MAF$A1)
 634 |     harna<-data_MAF$SNP[which(harna==T)]
 635 |     
 636 |     cor1<-which(data_MAF$effect_allele.outcome !=data_MAF$A1)
 637 |     
 638 |     
 639 |     data_MAF$eaf.outcome=data_MAF$MAF
 640 |     data_MAF$type="raw"
 641 |     data_MAF$eaf.outcome[cor1]=EAF_num[cor1]
 642 |     data_MAF$type[cor1]="corrected"
 643 |     cor2<-which(data_MAF$other_allele.outcome ==data_MAF$A1)
 644 |     cor21<-setdiff(cor2,cor1)
 645 |     cor12<-setdiff(cor1,cor2)
 646 |     error<-c(cor12,cor21)
 647 |     data_MAF$eaf.outcome[error]=NA
 648 |     data_MAF$type[error]="error"
 649 |     
 650 |     data_MAF<-list(data_MAF=data_MAF,cor1=cor1,harna=harna,error=error)
 651 |     
 652 |     return(data_MAF)
 653 |     
 654 |   }
 655 |   
 656 |   if(type=="exposure" && (is.na(dat$eaf.exposure[1])==T || is.null(dat$eaf.exposure)==T)){
 657 |     r<-nrow(dat)
 658 |     
 659 |     setwd(path)
 660 |     MAF<-fread("fileFrequency.frq",header = T)
 661 |     
 662 |     dat<-merge(dat,MAF,by.x = "SNP",by.y = "SNP",all.x = T)
 663 |     
 664 |     dat<-corrected_eaf_expo(dat)
 665 |     
 666 |     cor1<-dat$cor1
 667 |     
 668 |     harna<-dat$harna
 669 |     
 670 |     error<-dat$error
 671 |     
 672 |     dat<-dat$data_MAF
 673 |     
 674 |     print(paste0("一共有",(r-length(harna)-length(error)),"个SNP成功匹配EAF,占比",(r-length(harna)-length(error))/r*100,"%"))
 675 |     
 676 |     print(paste0("一共有",length(cor1),"个SNP是major allele，EAF被计算为1-MAF,在成功匹配数目中占比",length(cor1)/(r-length(harna)-length(error))*100,"%"))
 677 |     
 678 |     print(paste0("一共有",length(harna),"个SNP在1000G中找不到，占比",length(harna)/r*100,"%"))
 679 |     
 680 |     print(paste0("一共有",length(error),"个SNP在输入数据与1000G中效应列与参照列，将剔除eaf，占比",length(error)/r*100,"%"))
 681 |     
 682 |     print("输出数据中的type列说明：")
 683 |     print("raw：EAF直接等于1000G里的MAF数值，因为效应列是minor allele")
 684 |     print('corrected：EAF等于1000G中1-MAF，因为效应列是major allele')
 685 |     print("error：输入数据与1000G里面提供的数据完全不一致，比如这个SNP输入的效应列是C，参照列是G，但是1000G提供的是A-T，这种情况下，EAF会被清空（NA），当成匹配失败")
 686 |     
 687 |     return(dat)
 688 |   }
 689 |   
 690 |   if(type=="outcome" && (is.na(dat$eaf.outcome[1])==T || is.null(dat$eaf.outcome)==T)){
 691 |     r<-nrow(dat)
 692 |     
 693 |     setwd(path)
 694 |     MAF<-fread("fileFrequency.frq",header = T)
 695 |     
 696 |     dat<-merge(dat,MAF,by.x = "SNP",by.y = "SNP",all.x = T)
 697 |     
 698 |     dat<-corrected_eaf_out(dat)
 699 |     
 700 |     cor1<-dat$cor1
 701 |     
 702 |     harna<-dat$harna
 703 |     
 704 |     error<-dat$error
 705 |     
 706 |     dat<-dat$data_MAF
 707 |     
 708 |     print(paste0("一共有",(r-length(harna)-length(error)),"个SNP成功匹配EAF,占比",(r-length(harna)-length(error))/r*100,"%"))
 709 |     
 710 |     print(paste0("一共有",length(cor1),"个SNP是major allele，EAF被计算为1-MAF,在成功匹配数目中占比",length(cor1)/(r-length(harna)-length(error))*100,"%"))
 711 |     
 712 |     print(paste0("一共有",length(harna),"个SNP在1000G找不到，占比",length(harna)/r*100,"%"))
 713 |     
 714 |     print(paste0("一共有",length(error),"个SNP在输入数据与1000G中效应列与参照列，将剔除eaf，占比",length(error)/r*100,"%"))
 715 |     
 716 |     print("输出数据中的type列说明：")
 717 |     print("raw：EAF直接等于1000G里的MAF数值，因为效应列是minor allele")
 718 |     print('corrected：EAF等于1000G中1-MAF，因为效应列是major allele')
 719 |     print("error：输入数据与1000G里面提供的数据完全不一致，比如这个SNP输入的效应列是C，参照列是G，但是1000G提供的是A-T，这种情况下，EAF会被清空（NA），当成匹配失败")
 720 |     
 721 |     return(dat)
 722 |   }
 723 |   else{return(dat)}
 724 | }
 725 | 
 726 | get_chr_pos<-function(dat,type="exposure"){
 727 |   if(type=="exposure"){
 728 |     if(is.na(dat$eaf.exposure[1])==T || is.null(dat$eaf.exposure)==T){
 729 |       print("需要先运行get_eaf_from_1000G来匹配eaf，再匹配chr和pos")
 730 |     }
 731 |     else{
 732 |       dat_d<-dat%>%select(SNP,effect_allele.exposure,other_allele.exposure,eaf.exposure,beta.exposure,se.exposure,pval.exposure)
 733 |       colnames(dat_d) <-c('SNP','A1','A2','FRQ','BETA', 'SE','P')
 734 |       dat_d<-format_sumstats(dat_d,ref_genome = "GRCh37",return_data = TRUE)
 735 |       dat_d<-format_data(dat_d,type=type,
 736 |                          snp_col = 'SNP',
 737 |                          chr_col = 'CHR',
 738 |                          pos_col = 'BP',
 739 |                          effect_allele_col = 'A1',
 740 |                          other_allele_col = "A2",
 741 |                          se_col = 'SE',
 742 |                          beta_col= 'BETA',
 743 |                          eaf_col = 'FRQ',
 744 |                          pval_col = "P"
 745 |       )
 746 |       dat_d<-dat_d%>%select(SNP,chr.exposure,pos.exposure)
 747 |       dat<-merge(dat,dat_d,by="SNP",all.x=T)
 748 |       
 749 |       return(dat)
 750 |     }
 751 |   }
 752 |   if(type=="outcome"){
 753 |     if(is.na(dat$eaf.outcome[1])==T || is.null(dat$eaf.outcome)==T){
 754 |       print("需要先运行get_eaf_from_1000G来匹配eaf，再匹配chr和pos")
 755 |     }
 756 |     else{
 757 |       dat_d<-dat%>%select(SNP,effect_allele.outcome,other_allele.outcome,eaf.outcome,beta.outcome,se.outcome,pval.outcome)
 758 |       colnames(dat_d) <-c('SNP','A1','A2','FRQ','BETA', 'SE','P')
 759 |       dat_d<-format_sumstats(dat_d,ref_genome = "GRCh37",return_data = TRUE)
 760 |       dat_d<-format_data(dat_d,type=type,
 761 |                          snp_col = 'SNP',
 762 |                          chr_col = 'CHR',
 763 |                          pos_col = 'BP',
 764 |                          effect_allele_col = 'A1',
 765 |                          other_allele_col = "A2",
 766 |                          se_col = 'SE',
 767 |                          beta_col= 'BETA',
 768 |                          eaf_col = 'FRQ',
 769 |                          pval_col = "P"
 770 |       )
 771 |       dat_d<-dat_d%>%select(SNP,chr.outcome,pos.outcome)
 772 |       dat<-merge(dat,dat_d,by="SNP",all.x=T)
 773 |       
 774 |       return(dat)
 775 |     }
 776 |     
 777 |   }
 778 | }
 779 | 
 780 | get_f<-function(dat,F_value=10){
 781 |   log<-is.na(dat$eaf.exposure)
 782 |   log<-unique(log)
 783 |   if(length(log)==1)
 784 |   {if(log==TRUE){
 785 |     print("数据不包含eaf，无法计算F统计量")
 786 |     return(dat)}
 787 |   }
 788 |   if(is.null(dat$beta.exposure[1])==T || is.na(dat$beta.exposure[1])==T){print("数据不包含beta，无法计算F统计量")
 789 |     return(dat)}
 790 |   if(is.null(dat$se.exposure[1])==T || is.na(dat$se.exposure[1])==T){print("数据不包含se，无法计算F统计量")
 791 |     return(dat)}
 792 |   if(is.null(dat$samplesize.exposure[1])==T || is.na(dat$samplesize.exposure[1])==T){print("数据不包含samplesize(样本量)，无法计算F统计量")
 793 |     return(dat)}
 794 |   
 795 |   
 796 |   if("FALSE"%in%log && is.null(dat$beta.exposure[1])==F && is.na(dat$beta.exposure[1])==F && is.null(dat$se.exposure[1])==F && is.na(dat$se.exposure[1])==F && is.null(dat$samplesize.exposure[1])==F && is.na(dat$samplesize.exposure[1])==F){
 797 |     R2<-(2*(1-dat$eaf.exposure)*dat$eaf.exposure*(dat$beta.exposure^2))/((2*(1-dat$eaf.exposure)*dat$eaf.exposure*(dat$beta.exposure^2))+(2*(1-dat$eaf.exposure)*dat$eaf.exposure*(dat$se.exposure^2)*dat$samplesize.exposure))
 798 |     F<- (dat$samplesize.exposure-2)*R2/(1-R2)
 799 |     dat$R2<-R2
 800 |     dat$F<-F
 801 |     dat<-subset(dat,F>F_value)
 802 |     return(dat)
 803 |   }
 804 | }
 805 | 
 806 | 
 807 | # MR
 808 | cause_getmr<-function(expo,outcome,LD_file,r2=0.001,
 809 |                       kb=10000,pval=1e-05,cl=NULL){
 810 |   format_cause_expo<-function(dat){
 811 |     library(cause)
 812 |     dat<-gwas_format(dat,snp='SNP',beta_hat ='beta.exposure',
 813 |                      se='se.exposure',A1="effect_allele.exposure",
 814 |                      A2="other_allele.exposure",chrom='chr.exposure',
 815 |                      pos="pos.exposure",p_value = 'pval.exposure'
 816 |     )
 817 |     return(dat)
 818 |   }
 819 |   format_cause_out<-function(dat){
 820 |     library(cause)
 821 |     dat<-gwas_format(dat,snp='SNP',beta_hat ='beta.outcome',
 822 |                      se='se.outcome',A1="effect_allele.outcome",
 823 |                      A2="other_allele.outcome",chrom='chr.outcome',
 824 |                      pos="pos.outcome",p_value = 'pval.outcome'
 825 |     )
 826 |     return(dat)
 827 |   }
 828 |   sample_cause<-function(dat,num_snp){
 829 |     set.seed(123)
 830 |     VAR<-with(dat,sample(snp,size=num_snp,replace=FALSE))
 831 |     return(VAR)
 832 |   }
 833 |   datap_cause<-function(dat){
 834 |     dat$p1<-pnorm(abs(dat$beta_hat_1/dat$seb1),lower.tail=F)*2
 835 |     return(dat)
 836 |   }
 837 |   dat_clump_cause<-function(dat){
 838 |     for_clump<-data.frame(dat$snp,dat$p1)
 839 |     colnames(for_clump)<-c('rsid','pval')
 840 |     return(for_clump)
 841 |   }
 842 |   datap_cause<-function(dat){
 843 |     dat$p1<-pnorm(abs(dat$beta_hat_1/dat$seb1),lower.tail=F)*2
 844 |     return(dat)
 845 |   }
 846 |   ld_local<-function(dat,r2,kb,p,LD_file){
 847 |     library(plinkbinr)
 848 |     
 849 |     plink_pathway<-get_plink_exe()
 850 |     
 851 |     dat<-ld_clump(dat,clump_r2 = r2,clump_kb = kb,
 852 |                   clump_p = p,
 853 |                   plink_bin =plink_pathway , bfile =LD_file)
 854 |     
 855 |     return(dat)
 856 |   }
 857 |   if("list"%in%class(outcome)){
 858 |     single_expo=T
 859 |     single_outcome=F}
 860 |   if("list"%in%class(expo)){
 861 |     single_outcome=T
 862 |     single_expo=F}
 863 |   if(single_expo==T){
 864 |     outcome<-pblapply(outcome,FUN=format_cause_out,cl=cl)
 865 |     expo<-format_cause_expo(expo)
 866 |   }
 867 |   if(single_outcome==T){
 868 |     expo<-pblapply(expo,FUN=format_cause_expo,cl=cl)
 869 |     outcome<-format_cause_out(outcome)
 870 |   }
 871 |   
 872 |   
 873 |   if(single_outcome==T){dat<-pblapply(expo,outcome,FUN=gwas_merge,cl=cl)}
 874 |   if(single_expo==T){dat<-pblapply(outcome,expo,
 875 |                                    FUN=function(outcome,exposure)gwas_merge(exposure,outcome),
 876 |                                    cl=cl)}
 877 |   
 878 |   
 879 |   VAR<-pblapply(dat,1000000,FUN=sample_cause)
 880 |   
 881 |   est<-foreach(i=1:length(VAR)) %do%{
 882 |     library(cause)
 883 |     data<-est_cause_params(dat[[i]],VAR[[i]])
 884 |   }
 885 |   dat<-pblapply(dat,FUN=datap_cause)
 886 |   for_clump<-lapply(dat, dat_clump_cause)
 887 |   clumped<-pblapply(for_clump,0.001,10000,1e-05,LD_file,FUN=ld_local)
 888 |   top_vars<-pblapply(clumped,FUN=function(dat) return(dat$rsid))
 889 |   cause_res<-list()
 890 |   for(i in 1:length(dat)){
 891 |     library(cause)
 892 |     res<-cause(dat[[i]],est[[i]],top_vars[[i]])
 893 |     cause_res[[i]]<-res
 894 |     print(i)
 895 |   }
 896 |   cause_table<-data.frame()
 897 |   for (i in 1:length(cause_res)){
 898 |     res<-cause_res[[i]]
 899 |     elpd<-res$elpd
 900 |     elpd$p<-pnorm(-elpd$z,lower.tail = F)
 901 |     name<-paste0('file',i)
 902 |     elpd$file<-name
 903 |     cause_table<-rbind(cause_table,elpd)
 904 |   }
 905 |   return(cause_table)
 906 | }
 907 | 
 908 | # RAPS
 909 | RAPS_getmr<-function(dat,dir_figure){
 910 |   setwd(dir_figure)
 911 |   res<-try(mr.raps(dat,over.dispersion = TRUE))
 912 |   if(class(res)%in%"try-error"){}
 913 |   else{
 914 |     exposure<-dat$id.exposure
 915 |     dir_create(exposure)
 916 |     dir_in<-paste0(dir_figure,'/',exposure)
 917 |     setwd(dir_in)
 918 |     #plot_name<-paste0(exposure,'raps.pdf')
 919 |     ggsave(file='raps_plot.pdf',plot=plot(res),width=9,height=5)
 920 |     res<-data.frame(beta.raps=res$beta.hat,se.raps=res$beta.se,
 921 |                     eov=res$tau2.hat,se.eov=res$tau2.se,OR.raps=NA,
 922 |                     or_lci95.raps=NA,or_uci95.raps=NA)
 923 |     res$OR.raps<- exp(res$beta.raps)
 924 |     res$or_lci95.raps<-exp(res$beta.raps)-(res$beta.raps*1.96)
 925 |     res$or_uci95.raps<-exp(res$beta.raps)+(res$beta.raps*1.96)
 926 |     res$pval.raps<-2*pnorm(abs(res$beta.raps/res$se.raps),lower.tail=F) 
 927 |     res$pval.eov<-2*pnorm(abs(res$eov/res$se.eov),lower.tail=F)
 928 |     if(is.na(res$pval.eov)==FALSE){
 929 |       if(res$pval.eov >0.05 ){
 930 |         res1<-mr.raps(dat,over.dispersion = F)
 931 |         setwd(dir_in)
 932 |         #plot_name<-paste0(exposure,'raps.pdf')
 933 |         ggsave(file='raps_plot.pdf',plot=plot(res1),width=9,height=5)
 934 |         res1<-data.frame(beta.raps=res1$beta.hat,se.raps=res1$beta.se,
 935 |                          eov=NA,se.eov=NA)
 936 |         res1$eov<-res$eov
 937 |         res1$se.eov<-res$se.eov
 938 |         res1$OR.raps<- exp(res1$beta.raps)
 939 |         res1$or_lci95.raps<-exp(res1$beta.raps)-(res1$beta.raps*1.96)
 940 |         res1$or_uci95.raps<-exp(res1$beta.raps)+(res1$beta.raps*1.96)
 941 |         res1$pval.raps<-2*pnorm(abs(res1$beta.raps/res1$se.raps),lower.tail=F) 
 942 |         res1$pval.eov<-2*pnorm(abs(res1$eov/res1$se.eov),lower.tail=F)
 943 |         
 944 |         res<-res1
 945 |       }
 946 |     }
 947 |     
 948 |     return(res)
 949 |   }
 950 | }
 951 | 
 952 | mr_dircreate_base<-function(root_dir,project_name,date=NULL){  
 953 |   library(fs)
 954 |   dir_name<-root_dir
 955 |   setwd(dir_name)
 956 |   
 957 |   if(is.null(date)==FALSE){data_u<-date}else{data_u<-Sys.Date()}
 958 |   
 959 |   dir_name2<-project_name
 960 |   dir_name3<-paste0(dir_name2,data_u)
 961 |   dir_create(dir_name3)
 962 |   setwd(paste0(dir_name,"/",dir_name3))
 963 |   dir_name4<-"1.figure"
 964 |   dir_name5<-"2.table"
 965 |   dir_name6<-"3.figure of sig res"
 966 |   dir_name7<-"4.snp with Fval"
 967 |   
 968 |   paste<-paste0(dir_name,"/",dir_name3,"/")
 969 |   
 970 |   dir1<-paste0(paste,dir_name4)
 971 |   dir.create(dir1)
 972 |   
 973 |   dir2<-paste0(paste,dir_name5)
 974 |   dir.create(dir2)
 975 |   
 976 |   dir3<-paste0(paste,dir_name6)
 977 |   dir.create(dir3)
 978 |   
 979 |   dir4<-paste0(paste,dir_name7)
 980 |   dir.create(dir4)
 981 |   
 982 |   res<-list(paste=paste,dir1=dir1,dir2=dir2,dir3=dir3,dir4=dir4)
 983 |   
 984 |   return(res)
 985 | }
 986 | 
 987 | 
 988 | 
 989 | # PRESSO
 990 | mr_Presso<-function(dat,num=10000){
 991 |   library(TwoSampleMR)
 992 |   library(MRPRESSO)
 993 |   library(dplyr)
 994 |   
 995 |   nsnp_filter=6
 996 |   set.seed(123)
 997 |   try (mr_presso_res<-mr_presso(BetaOutcome ="beta.outcome", BetaExposure = "beta.exposure", SdOutcome ="se.outcome", SdExposure = "se.exposure", 
 998 |                                 OUTLIERtest = TRUE,DISTORTIONtest = TRUE, data = dat,  
 999 |                                 SignifThreshold = 0.05,NbDistribution = num))
1000 |   return(mr_presso_res)
1001 |   
1002 | }
1003 | mr_presso_pval<-function(mr_presso_res){ 
1004 |   try ( mr_presso_main<-mr_presso_res$`Main MR results`)
1005 |   try ( mr_presso_main[3:5,]<-NA) 
1006 |   return(mr_presso_main)
1007 | }
1008 | 
1009 | 
1010 | mr_presso_snp<-function(mr_presso_res,mr_presso_main,dat,type="list"){
1011 |   data_re<-list()
1012 |   if(type=="list"){
1013 |     for(i in 1:length(mr_presso_res)){
1014 |       res<-mr_presso_res[[i]]
1015 |       main<-mr_presso_main[[i]]
1016 |       data<-dat[[i]]
1017 |       try(if(is.na(main[2,6])==FALSE){
1018 |         outliers<-which(res$`MR-PRESSO results`$`Outlier Test`$Pvalue<0.05)
1019 |         data$mr_keep[outliers]<-FALSE
1020 |       })
1021 |       data_re[[i]]<-data
1022 |       names(data_re)[[i]]<-names(dat)[[i]]
1023 |     }
1024 |     return(data_re)
1025 |   }
1026 |   
1027 |   if(type=="data"){
1028 |     res<-mr_presso_res$`MR-PRESSO results`
1029 |     main<-mr_presso_main
1030 |     data<-dat
1031 |     try(if(is.na(main[2,6])==FALSE){
1032 |       outliers<-which(res$`MR-PRESSO results`$`Outlier Test`$Pvalue<0.05)
1033 |       data$mr_keep[outliers]<-FALSE
1034 |     })
1035 |     return(data)
1036 |   }
1037 | }
1038 | 
1039 | #MRlap
1040 | 
1041 | mr_lap<-function(expo,outcome,ld,hm3,pval,r2,kb,MR_reverse=1e-03,save_logfiles=F){
1042 |   expo<-expo%>%select(rsid=SNP,chr=chr.exposure,pos=pos.exposure,alt=effect_allele.exposure,
1043 |                       ref=other_allele.exposure,N=samplesize.exposure,beta=beta.exposure,
1044 |                       se=se.exposure)
1045 |   expo<-as.data.frame(expo)
1046 |   outcome<-outcome%>%select(rsid=SNP,chr=chr.outcome,pos=pos.outcome,alt=effect_allele.outcome,
1047 |                             ref=other_allele.outcome,N=samplesize.outcome,beta=beta.outcome,
1048 |                             se=se.outcome)
1049 |   outcome<-as.data.frame(outcome)
1050 |   n_expo<-expo$exposure[1]
1051 |   n_out<-outcome$outcome[1]
1052 |   
1053 |   try(res<-MRlap::MRlap(exposure = expo,
1054 |                         exposure_name = n_expo,
1055 |                         outcome = outcome,
1056 |                         outcome_name = n_out,
1057 |                         ld = ld,
1058 |                         hm3 = hm3,MR_threshold=pval,MR_pruning_dist=kb,
1059 |                         MR_pruning_LD=r2,MR_reverse=MR_reverse,save_logfiles=save_logfiles
1060 |   ))
1061 |   try(snp<-data.frame(SNP_MRlap=res$MRcorrection$IVs))
1062 |   try(res_c<-as.data.frame(res$MRcorrection[-4])%>%select(nsnp=m_IVs,
1063 |                                                           beta_MRlap=corrected_effect,
1064 |                                                           se_MRlap=corrected_effect_se,
1065 |                                                           p_MRlap=corrected_effect_p))
1066 |   try(res<-c(res,res_c,snp))
1067 |   
1068 |   try(return(res))
1069 | }
1070 | 
1071 | 
1072 | # clean
1073 | clean_expo<-function(expo,pval,low_af=0.5,high_af=0.5,
1074 |                      clump=TRUE,kb=10000,r2=0.001,LD_file=NULL,af_filter=FALSE){
1075 |   library(TwoSampleMR)
1076 |   dat<-subset(expo,pval.exposure<pval)
1077 |   if(af_filter==TRUE){dat<-subset(dat,eaf.exposure<low_af|eaf.exposure>high_af)}
1078 |   
1079 |   if(clump==TRUE){
1080 |     
1081 |     if(is.null(LD_file)==TRUE){
1082 |       dat<-clump_data(dat,clump_kb = kb,clump_r2 = r2)}
1083 |     else{
1084 |       library(plinkbinr)
1085 |       plink_pathway<-get_plink_exe()
1086 |       snp<-data.frame(rsid=dat$SNP,pval=dat$pval.exposure)
1087 |       snp<-try(ld_clump(snp,clump_kb = kb,clump_r2 = r2,plink_bin =plink_pathway , bfile =LD_file))
1088 |       if ("try-error"%in% class(snp)){return(dat)}
1089 |       else{
1090 |         snp<-data.frame(SNP=snp$rsid)
1091 |         dat<-merge(snp,dat,by.x="SNP",by.y='SNP')
1092 |       }
1093 |     }
1094 |     
1095 |   }
1096 |   return(dat)
1097 | }
1098 | 
1099 | clean_list<-function(list,nrow=10){
1100 |   l<-lapply(list,nrow)
1101 |   n<-data.frame()
1102 |   for(i in 1:length(l)){
1103 |     if(is.null(l[[i]])==T || l[[i]]==0 )next
1104 |     
1105 |     n1<-data.frame(i,l[[i]])
1106 |     n<-rbind(n,n1)
1107 |   }
1108 |   colnames(n)<-c('l','row')
1109 |   n<-subset(n,row>nrow)
1110 |   list<-list[n$l]
1111 |   return(list)
1112 | }
1113 | 
1114 | clean_IV_from_outsig<-function(dat,MR_reverse=1e-03){
1115 |   dat<-subset(dat,pval.outcome>MR_reverse)
1116 |   return(dat)
1117 | }
1118 | 
1119 | 
1120 | LDSC_rg<-function(expo,outcome,an,sample_prev=NA,
1121 |                   population_prev=NA,ld,wld,chr_filter=c(1:22),n_blocks=200){
1122 |   id.o<-outcome$id.outcome[1]
1123 |   id.e<-expo$id.exposure[1]
1124 |   
1125 |   expo<-expo%>%mutate(Z=beta.exposure/se.exposure)
1126 |   expo<-expo%>%select(SNP=SNP,N=samplesize.exposure,Z=Z
1127 |                       ,A1=effect_allele.exposure
1128 |                       ,A2=other_allele.exposure)
1129 |   expo<-as_tibble(expo)
1130 |   
1131 |   outcome<-outcome%>%mutate(Z=beta.outcome/se.outcome)
1132 |   outcome<-outcome%>%select(SNP=SNP,N=samplesize.outcome,Z=Z
1133 |                             ,A1=effect_allele.outcome
1134 |                             ,A2=other_allele.outcome)
1135 |   outcome<-as_tibble(outcome)
1136 |   
1137 |   
1138 |   dat<-list(expo,outcome)
1139 |   names(dat)<-c(id.e,id.o)
1140 |   
1141 |   rm(expo,outcome)
1142 |   
1143 |   
1144 |   res<-try(ldscr::ldsc_rg(dat,ancestry = an,sample_prev=sample_prev,
1145 |                           population_prev=population_prev,ld=ld,wld=wld,
1146 |                           n_blocks=n_blocks,chr_filter=chr_filter))
1147 |   
1148 |   return(res)
1149 |   
1150 | }
1151 | 
1152 | 


--------------------------------------------------------------------------------