├── model-prepare.rda ├── Identification of Immune-related lncRNAs.R ├── README.md ├── Consensus cluster.R ├── Machine Learning based intergration.R ├── All-signature-compare.R └── clinical validation.R /model-prepare.rda: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Zaoqu-Liu/IRLS/HEAD/model-prepare.rda -------------------------------------------------------------------------------- /Identification of Immune-related lncRNAs.R: -------------------------------------------------------------------------------- 1 | rm(list = ls()) 2 | library(estimate) 3 | library(clusterProfiler) 4 | library(org.Hs.eg.db) 5 | library(stringr) 6 | library(ImmuLncRNA) 7 | load('/Users/CRC-data-lnc/TCGA.rda') 8 | load('/Users/CRC-data-lnc/GEO-mRNA-data/GEO-Rdata/TCGA.rda') 9 | rm(TCGA_clin) 10 | load('Purity.Rdata') 11 | CRC.lncRNA <- CRC.lncRNA[apply(CRC.lncRNA,1,function(x){sum(x>0)>ncol(CRC.lncRNA)/3}),] 12 | 13 | # ------------------------------------------------------------------------- 14 | 15 | set.seed(123456) 16 | system.time( 17 | result <- immu.LncRNA(CRC.mRNA,CRC.lncRNA,adjusted=TRUE,Tumour_purity,pathways,) 18 | ) 19 | 20 | sig <- subset(result$fgseaRes_all,padj < 0.05 & sigValue > 0.995) 21 | sig_immune_related_lncRNA <- unique(sig$lncRNA) 22 | save(result,sig,sig_immune_related_lncRNA,file = 'immune_related_result.Rdata') 23 | 24 | ggplot(bardata,aes(log2(Freq+1),reorder(Var1,Freq),fill=Var1))+ 25 | geom_bar(stat = 'identity',width = 0.7)+ 26 | theme_classic(base_rect_size = 1.5)+ 27 | ylab('')+xlab('log2(Number of immune-related lncRNAs)')+ 28 | ggtitle('ImmuLnc analysis')+ 29 | theme(axis.ticks.y = element_blank(), 30 | panel.border = element_blank(), 31 | panel.grid = element_blank(), 32 | plot.title = element_text(hjust=0.5,size=12), 33 | axis.title.x = element_text(size = 10), 34 | legend.position = 'none')+ 35 | scale_y_discrete(expand = c(0.04,0),)+ 36 | scale_x_continuous(expand = c(0,0.1))+ 37 | scale_fill_manual(values = c(pal_npg(alpha = 0.8)(10),pal_d3(alpha = 0.8)(10))[-c(17:20)]) 38 | ggsave(filename = 'Immune-lncRNA-type.pdf',width = 5,height = 5) 39 | 40 | 41 | -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | # IRLS 2 | Machine learning-based integrative analysis develops an immune-derived lncRNA signature for improving clinical outcomes in colorectal cancer 3 | 4 | ## Consensus clustering 5 | According to the infiltration profile of various immune cells, a resampling-based method termed consensus clustering was applied for cluster discovery. This process was performed by ConsensusClusterPlus package[1]. Subsample 80% of samples at each iteration and partition each subsample into up to k (max K = 9) groups by k-means algorithm upon Euclidean distance. This process was repeated for 1,000 repetitions. Subsequently, the consensus score matrix, cumulative distribution function (CDF) curve, proportion of ambiguous clustering (PAC) score, and Nbclust were synthetically used to determine the optimal number of clusters. A higher consensus score between two samples indicates they are more likely to be grouped into the same cluster in different iterations. The consensus values range from 0 (never clustered together) to 1 (always clustered together) marked by white to dark brown. In the CDF curve of a consensus matrix, the lower left portion represents sample pairs rarely clustered together, the upper right portion represents those almost always clustered together, whereas the middle segment represents those with ambiguous assignments in different clustering runs. The "proportion of ambiguous clustering" (PAC) measure quantifies this middle segment; and is defined as the fraction of sample pairs with consensus indices falling in the interval (u1, u2) ∈ [0, 1] where u1 is a value close to 0 and u2 is a value close to 1 (for instance u1=0.1 and u2=0.9). A low value of PAC indicates a flat middle segment, and a low rate of discordant assignments across permuted clustering runs. PAC for each K is CDFk(u2) - CDFk(u1)[2]. According to his criterion, we can therefore infer the optimal number of clusters by the K value having the lowest PAC. The Nbclust uses 26 mathematic criteria to select the optimal number. 6 | 7 | ## Signature generated from machine learning based integrative approaches 8 | To develop a consensus immune-related lncRNA signature (IRLS) with high accuracy and stability performance, we integrated 10 machine learning algorithms and 101 algorithm combinations. The integrative algorithms included random survival forest (RSF), elastic network (Enet), Lasso, Ridge, stepwise Cox, CoxBoost, partial least squares regression for Cox (plsRcox), supervised principal components (SuperPC), generalized boosted regression modeling (GBM), and survival support vector machine (survival-SVM). The RSF model was implemented via the randomForestSRC package. RSF had two parameters ntree and mtry, where ntree represented the number of trees in the forest and mtry was the number of randomly selected variables for splitting at each node. We used a grid-search on ntree and mtry using leave-one-out cross-validation (LOOCV) framework. All the pairs of (ntree, mtry) are formed and the one with the best C-index value is identified as the optimized parameters. The Enet, Lasso, and Ridge were implemented via the glmnet package. The regularization parameter, λ, was determined by LOOCV, whereas the L1-L2 trade-off parameter, α, was set to 0-1 (interval =0.1). The stepwise Cox model was implemented via survival package. A stepwise algorithm using the AIC (Akaike information criterion) was applied, and the direction mode of stepwise search was set to "both", "backward", and "forward", respectively. The CoxBoost model was implemented via CoxBoost package, which is used to fit a Cox proportional hazards model by componentwise likelihood-based boosting. For the CoxBoost model, we used LOOCV routine optimCoxBoostPenalty function to first determine the optimal penalty (amount of shrinkage). Once this parameter was determined, the other tuning parameter of the algorithm, namely, the number of boosting steps to perform, was selected via the function cv.CoxBoost. The dimension of the selected multivariate Cox model was finally set by the principal routine CoxBoost. The plsRcox model was implemented via plsRcox package. The cv.plsRcox function was used to determine the number of components requested, and the plsRcox function was applied to fit a partial least squares regression generalized linear model. The SuperPC model was implemented via superpc package, is a generalization of principal component analysis, which generates a linear combination of the features or variables of interest that capture the directions of largest variation in a dataset. The superpc.cv function used a form of LOOCV to estimate the optimal feature threshold in supervised principal components. To avoid problems with fitting Cox models to small validation datasets, it uses the "pre-validation" approach. The GBM model was implemented via superpc package. Using the LOOCV, the cv.gbm function selected index for number trees with minimum cross-validation error. The gbm function was used to fit the generalized boosted regression model. The survival-SVM model was implemented via survivalsvm package. The regression approach takes censoring into account when formulating the inequality constraints of the support vector problem. 9 | 10 | ## Cells infiltration estimation 11 | The single sample gene set enrichment analysis (ssGSEA) implemented in R package GSVA was employed to quantify the relative infiltration of 28 immune cells in CRC[22]. Six other algorithms including TIMER, quanTIseq, MCP-counter, xCell, EPIC, and ESTIMATE, were further performed to verify the stability and robustness of the ssGSEA results. 12 | 13 | ## Weighted gene co-expression network analysis (WGCNA) 14 | Co-expression lncRNA networks were generated using WGCNA package. An appropriate soft threshold β was calculated to meet the criteria for scale-free network. Further, the weighted adjacency matrix was converted into a topological overlap matrix (TOM), and the corresponding dissimilarity was generated (1-TOM). The dynamic tree cutting approach was employed to conduct the module identification. To recognize lncRNAs modules significantly corelated with immune cluster, the module that displayed the highest correlation was selected for further study. lncRNAs with both high gene significance (GS) and module membership (MM) were defined as immune-related lncRNAs. 15 | 16 | ## ImmLnc analysis framework 17 | ImmLnc is an integrated algorithm for identifying lncRNA modulators of immune-related pathways. First, the ESTIMATE algorithm was used to infer tumor purity. Second, we calculated the partial correlation coefficient (PCC) between a specific lncRNA and all mRNAs by adjusting the tumor purity as a covariable. Final, all mRNAs were ranked by the correlation coefficient with a specific lncRNA, and the ranked gene list was further subjected to GSEA procedure to investigate whether the immune genes were enriched in the top or bottom of the gene list. As recommended, lncRES scores >0.995 and FDR <0.05 were considered statistically significant[9, 19]. 18 | -------------------------------------------------------------------------------- /Consensus cluster.R: -------------------------------------------------------------------------------- 1 | rm(list = ls()) 2 | library(dplyr) 3 | library(data.table) 4 | library(tidyr) 5 | library(tibble) 6 | library(GSVA) 7 | library(ConsensusClusterPlus) 8 | library(ComplexHeatmap) 9 | library(ggplot2) 10 | library(ggsci) 11 | library(ggpubr) 12 | library(ComplexHeatmap) 13 | library(circlize) 14 | load('/Users/CRC-data-lnc/TCGA.rda') 15 | rm(CRC.lncRNA) 16 | load('/Users/Tool and Data/cellMarker_ssGSEA.Rdata') 17 | 18 | # ------------------------------------------------------------------------- 19 | 20 | gsva_data <- gsva(as.matrix(CRC.mRNA),cellMarker, method = "ssgsea") 21 | ss <- gsva_data 22 | 23 | # ------------------------------------------------------------------------- 24 | 25 | dir.create('ConsensusCluster/') 26 | ## 一致性聚类 27 | results = ConsensusClusterPlus(as.matrix(ss), 28 | maxK=9, 29 | reps=100, 30 | pItem=0.8, 31 | pFeature=1, 32 | tmyPal = c('navy','darkred'), 33 | title='ConsensusCluster/', 34 | clusterAlg="km", 35 | distance="euclidean", 36 | seed=123456, 37 | plot="pdf") 38 | 39 | icl <- calcICL(results,title = 'ConsensusCluster/',plot = 'pdf') 40 | 41 | ## PAC = Proportion of ambiguous clustering 模糊聚类比例 42 | Kvec = 2:9 43 | x1 = 0.1; x2 = 0.9 44 | PAC = rep(NA,length(Kvec)) 45 | names(PAC) = paste("K=",Kvec,sep="") 46 | for(i in Kvec){ 47 | M = results[[i]]$consensusMatrix 48 | Fn = ecdf(M[lower.tri(M)]) 49 | PAC[i-1] = Fn(x2) - Fn(x1) 50 | } 51 | optK = Kvec[which.min(PAC)] 52 | optK 53 | 54 | PAC <- as.data.frame(PAC) 55 | PAC$K <- 2:9 56 | library(ggplot2) 57 | ggplot(PAC,aes(factor(K),PAC,group=1))+ 58 | geom_line()+ 59 | theme_bw(base_rect_size = 1.5)+ 60 | geom_point(size=4,shape=21,color='darkred',fill='orange')+ 61 | ggtitle('Proportion of ambiguous clustering')+ 62 | xlab('Cluster number K')+ylab(NULL)+ 63 | theme(axis.text = element_text(size=12), 64 | plot.title = element_text(hjust=0.5), 65 | axis.title = element_text(size=13)) 66 | ggsave(filename = 'ConsensusCluster/PAC.pdf',width = 3.8,height = 4) 67 | 68 | ## 保存分型信息 69 | clusterNum=2 70 | cluster=results[[clusterNum]][["consensusClass"]] 71 | 72 | sub <- data.frame(Sample=names(cluster),Cluster=cluster) 73 | sub$Cluster <- paste0('C',sub$Cluster) 74 | table(sub$Cluster) 75 | 76 | head(sub) 77 | 78 | my <- results[[2]][["ml"]] 79 | library(pheatmap) 80 | rownames(my) <- sub$Sample 81 | colnames(my) <- sub$Sample 82 | pheatmap(1-my,show_colnames = F,show_rownames = F, 83 | treeheight_row = 20,treeheight_col = 20, 84 | clustering_method = 'complete', 85 | color = colorRampPalette(c("white","#C75D30"))(50), 86 | annotation_names_row = F,annotation_names_col = F, 87 | annotation_row = data.frame(Cluster=sub$Cluster,row.names = sub$Sample), 88 | annotation_col = data.frame(Cluster=sub$Cluster,row.names = sub$Sample), 89 | annotation_colors = list(Cluster=c('C2'='#B5739D','C1'='#4E8279'))) 90 | library(export) 91 | graph2pdf(file='cluster2.pdf',width=5.5,height=4.5) 92 | 93 | # ------------------------------------------------------------------------- 94 | 95 | ss2 <- merge(sub,t(ss),by.x=1,by.y=0) 96 | ss2 <- pivot_longer(ss2,3:30,names_to = 'cell',values_to = 'value') 97 | 98 | ggplot(ss2,aes(cell,value,fill=Cluster))+ 99 | geom_boxplot(outlier.colour = NA)+ 100 | stat_compare_means(label = 'p.signif')+ 101 | theme(axis.text.x = element_text(angle=90,hjust=1,vjust=0.5)) 102 | 103 | # ------------------------------------------------------------------------- 104 | 105 | TCGA_clin <- merge(sub,TCGA_clin,by=1) 106 | my <- TCGA_clin[,c(1,2,7,8,12,13)]%>%column_to_rownames('Sample') 107 | my$Age <- ifelse(my$Age>65,'>65','≤65') 108 | table(my$Age) 109 | my$Gender <- Hmisc::capitalize(my$Gender) 110 | my <- my[order(my$Cluster,my$Age,my$Stage,my$Status,my$Gender),] 111 | 112 | ee <- t(scale(t(ee))) 113 | table(my$Cluster) 114 | 115 | # ------------------------------------------------------------------------- 116 | 117 | my[is.na(my)] <- 'NA' 118 | my$Age <- factor(my$Age,levels = c('≤65','>65')) 119 | my$Gender <- factor(my$Gender,levels = c('Female','Male')) 120 | my$Stage <- factor(my$Stage,levels = c('I','II','III','IV','NA')) 121 | my$Cluster <- factor(my$Cluster) 122 | my$Status <- factor(my$Status) 123 | 124 | # ------------------------------------------------------------------------- 125 | 126 | Cluster <- c('#4E8279','#B5739D') 127 | names(Cluster) <- levels(my$Cluster) 128 | Age <- c(pal_nejm(alpha = 0.9)(8)[3],'#CF4E27') 129 | names(Age) <- levels(my$Age) 130 | table(my$Gender) 131 | Gender <- c('#E0864A','rosybrown') 132 | names(Gender) <- levels(my$Gender) 133 | table(my$Stage) 134 | Stage <- c('cornsilk','paleturquoise','goldenrod','firebrick','White') 135 | names(Stage) <- levels(my$Stage) 136 | table(my$Recurrence) 137 | table(my$Status) 138 | Status <- c('lavenderblush','slategray') 139 | names(Status) <- levels(my$Status) 140 | 141 | # ------------------------------------------------------------------------- 142 | 143 | Top = HeatmapAnnotation(Cluster=my$Cluster, 144 | Age=my$Age, 145 | Gender=my$Gender, 146 | Stage= my$Stage, 147 | Status = my$Status, 148 | annotation_legend_param=list(labels_gp = gpar(fontsize = 10),border = T, 149 | title_gp = gpar(fontsize = 10,fontface = "bold"), 150 | ncol=1), 151 | border = T, 152 | col=list(Cluster = Cluster, 153 | Age = Age, 154 | Gender = Gender, 155 | Stage= Stage, 156 | Status = Status 157 | ), 158 | show_annotation_name = TRUE, 159 | annotation_name_side="left", 160 | annotation_name_gp = gpar(fontsize = 10)) 161 | 162 | Heatmap(ee,name='Z-score', 163 | top_annotation = Top, 164 | cluster_rows = T, 165 | col=colorRamp2(c(-2,0,2),c('#21b6af','white','#eeba4d')),#49b0d9 166 | color_space = "RGB", 167 | cluster_columns = FALSE,border = T, 168 | row_order=NULL, 169 | row_names_side = 'left', 170 | column_order=NULL, 171 | show_column_names = FALSE, 172 | row_names_gp = gpar(fontsize = 9), 173 | column_split = c(rep(1,316),rep(2,268)), 174 | gap = unit(1, "mm"), 175 | column_title = NULL, 176 | column_title_gp = gpar(fontsize = 10), 177 | show_heatmap_legend = TRUE, 178 | heatmap_legend_param=list(labels_gp = gpar(fontsize = 10), border = T, 179 | title_gp = gpar(fontsize = 10, fontface = "bold")), 180 | column_gap = unit(2,'mm') 181 | ) 182 | library(export) 183 | graph2pdf(file='cell-heatmap.pdf',width=9,height=5.5) 184 | 185 | # ------------------------------------------------------------------------- 186 | 187 | 188 | ee <- as.data.frame(ss)[,rownames(my)] 189 | tt <- cbind(Cluster=as.character(my$Cluster),t(ee)) 190 | tt <- as.data.frame(tt) 191 | tt2 <- pivot_longer(tt,cols = 2:29,names_to = 'cell',values_to = 'value') 192 | tt2$value <- as.numeric(tt2$value) 193 | 194 | # ------------------------------------------------------------------------- 195 | 196 | source("GeomSplitViolin.R") 197 | 198 | ggplot(tt2, aes(cell,value, fill = Cluster)) + 199 | geom_split_violin(draw_quantiles = c(0.25, 0.5, 0.75), #画4分位线 200 | trim = T, #是否修剪小提琴图的密度曲线 201 | linetype = "solid", #周围线的轮廓 202 | color = "black", 203 | size = 0.35, 204 | na.rm = T, 205 | position ="identity")+ #周围线粗细 206 | ylab("Relative Infiltration") + xlab(NULL) + 207 | scale_fill_manual(values = c('#21b6af','#eeba4d'))+ 208 | theme_classic()+ 209 | theme(axis.text.x = element_text(angle = 60, hjust = 1, vjust = 1,size=10), 210 | legend.position=c(0.05,0.003),legend.justification = c(0,0), 211 | axis.title.y = element_text(size=12)) 212 | ggsave(filename = 'cell-boxplot.pdf',width = 8,height=4.3) 213 | 214 | # ------------------------------------------------------------------------- 215 | 216 | save(TCGA_clin,ss,file = 'Cluster+ssgsea.Rda') 217 | 218 | 219 | -------------------------------------------------------------------------------- /Machine Learning based intergration.R: -------------------------------------------------------------------------------- 1 | rm(list = ls()) 2 | library(survival) 3 | library(randomForestSRC) 4 | library(glmnet) 5 | library(plsRcox) 6 | library(superpc) 7 | library(gbm) 8 | library(CoxBoost) 9 | library(survivalsvm) 10 | library(dplyr) 11 | library(tibble) 12 | library(BART) 13 | load('model-step.Rda') 14 | rm(rr,rr2,rr3,geneids) 15 | mm <- lapply(mm,function(x){ 16 | x[,-c(1:3)] <- scale(x[,-c(1:3)]) 17 | return(x)}) 18 | 19 | result <- data.frame() 20 | est_data <- mm$TCGA 21 | val_data_list <- mm 22 | pre_var <- colnames(est_data)[-c(1:3)] 23 | est_dd <- est_data[,c('OS.time','OS',pre_var)] 24 | val_dd_list <- lapply(val_data_list,function(x){x[,c('OS.time','OS',pre_var)]}) 25 | rm(mm) 26 | 27 | rf_nodesize <- 5 28 | seed <- 1 29 | 30 | ################################## 31 | #### 1-1.RSF #### 32 | ################################## 33 | 34 | set.seed(seed) 35 | fit <- rfsrc(Surv(OS.time,OS)~.,data = est_dd, 36 | ntree = 1000,nodesize = rf_nodesize, 37 | splitrule = 'logrank', 38 | importance = T, 39 | proximity = T, 40 | forest = T, 41 | seed = seed) 42 | rs <- lapply(val_dd_list,function(x){cbind(x[,1:2],RS=predict(fit,newdata = x)$predicted)}) 43 | cc <- data.frame(Cindex=sapply(rs,function(x){as.numeric(summary(coxph(Surv(OS.time,OS)~RS,x))$concordance[1])}))%>% 44 | rownames_to_column('ID') 45 | cc$Model <- 'RSF' 46 | result <- rbind(result,cc) 47 | 48 | ################################## 49 | #### 2-1.Enet #### 50 | ################################## 51 | 52 | x1 <- as.matrix(est_dd[,pre_var]) 53 | x2 <- as.matrix(Surv(est_dd$OS.time,est_dd$OS)) 54 | 55 | for (alpha in seq(0,1,0.1)) { 56 | set.seed(seed) 57 | fit = cv.glmnet(x1, x2,family = "cox",alpha=alpha,nfolds = 10) 58 | rs <- lapply(val_dd_list,function(x){cbind(x[,1:2],RS=as.numeric(predict(fit,type='link',newx=as.matrix(x[,-c(1,2)]),s=fit$lambda.min)))}) 59 | 60 | cc <- data.frame(Cindex=sapply(rs,function(x){as.numeric(summary(coxph(Surv(OS.time,OS)~RS,x))$concordance[1])}))%>% 61 | rownames_to_column('ID') 62 | cc$Model <- paste0('Enet','[α=',alpha,']') 63 | result <- rbind(result,cc) 64 | } 65 | 66 | ################################## 67 | #### 3-1.StepCox #### 68 | ################################## 69 | 70 | for (direction in c("both", "backward", "forward")) { 71 | fit <- step(coxph(Surv(OS.time,OS)~.,est_dd),direction = direction) 72 | rs <- lapply(val_dd_list,function(x){cbind(x[,1:2],RS=predict(fit,type = 'risk',newdata = x))}) 73 | 74 | cc <- data.frame(Cindex=sapply(rs,function(x){as.numeric(summary(coxph(Surv(OS.time,OS)~RS,x))$concordance[1])}))%>% 75 | rownames_to_column('ID') 76 | cc$Model <- paste0('StepCox','[',direction,']') 77 | result <- rbind(result,cc) 78 | } 79 | ################################## 80 | #### 3-8.StepCox+survivalsvm #### 81 | ################################## 82 | 83 | for (direction in c("both", "backward")) { 84 | fit <- step(coxph(Surv(OS.time,OS)~.,est_dd),direction = direction) 85 | rid <- names(coef(fit)) 86 | est_dd2 <- est_data[,c('OS.time','OS',rid)] 87 | val_dd_list2 <- lapply(val_data_list,function(x){x[,c('OS.time','OS',rid)]}) 88 | 89 | fit = survivalsvm(Surv(OS.time,OS)~., data= est_dd2, gamma.mu = 1) 90 | rs <- lapply(val_dd_list2,function(x){cbind(x[,1:2],RS=as.numeric(predict(fit, x)$predicted))}) 91 | cc <- data.frame(Cindex=sapply(rs,function(x){as.numeric(summary(coxph(Surv(OS.time,OS)~RS,x))$concordance[1])}))%>% 92 | rownames_to_column('ID') 93 | cc$Model <- paste0('StepCox','[',direction,']',' + survival-SVM') 94 | result <- rbind(result,cc) 95 | } 96 | 97 | 98 | ################################## 99 | #### 4-1.CoxBoost #### 100 | ################################## 101 | 102 | set.seed(seed) 103 | pen <- optimCoxBoostPenalty(est_dd[,'OS.time'],est_dd[,'OS'],as.matrix(est_dd[,-c(1,2)]), 104 | trace=TRUE,start.penalty=500,parallel = T) 105 | cv.res <- cv.CoxBoost(est_dd[,'OS.time'],est_dd[,'OS'],as.matrix(est_dd[,-c(1,2)]), 106 | maxstepno=500,K=10,type="verweij",penalty=pen$penalty) 107 | fit <- CoxBoost(est_dd[,'OS.time'],est_dd[,'OS'],as.matrix(est_dd[,-c(1,2)]), 108 | stepno=cv.res$optimal.step,penalty=pen$penalty) 109 | rs <- lapply(val_dd_list,function(x){cbind(x[,1:2],RS=as.numeric(predict(fit,newdata=x[,-c(1,2)], newtime=x[,1], newstatus=x[,2], type="lp")))}) 110 | 111 | cc <- data.frame(Cindex=sapply(rs,function(x){as.numeric(summary(coxph(Surv(OS.time,OS)~RS,x))$concordance[1])}))%>% 112 | rownames_to_column('ID') 113 | cc$Model <- paste0('CoxBoost') 114 | result <- rbind(result,cc) 115 | 116 | ################################## 117 | #### 5.plsRcox#### 118 | ################################## 119 | 120 | set.seed(seed) 121 | cv.plsRcox.res=cv.plsRcox(list(x=est_dd[,pre_var],time=est_dd$OS.time,status=est_dd$OS),nt=10,verbose = FALSE) 122 | fit <- plsRcox(est_dd[,pre_var],time=est_dd$OS.time,event=est_dd$OS,nt=as.numeric(cv.plsRcox.res[5])) 123 | rs <- lapply(val_dd_list,function(x){cbind(x[,1:2],RS=as.numeric(predict(fit,type="lp",newdata=x[,-c(1,2)])))}) 124 | 125 | cc <- data.frame(Cindex=sapply(rs,function(x){as.numeric(summary(coxph(Surv(OS.time,OS)~RS,x))$concordance[1])}))%>% 126 | rownames_to_column('ID') 127 | cc$Model <- paste0('plsRcox') 128 | result <- rbind(result,cc) 129 | 130 | ################################## 131 | #### 6.superpc#### 132 | ################################## 133 | 134 | data <- list(x=t(est_dd[,-c(1,2)]),y=est_dd$OS.time,censoring.status=est_dd$OS,featurenames=colnames(est_dd)[-c(1,2)]) 135 | set.seed(seed) 136 | fit <- superpc.train(data = data,type = 'survival',s0.perc = 0.5) #default 137 | cv.fit <- superpc.cv(fit,data,n.threshold = 20,#default 138 | n.fold = 10, 139 | n.components=3, 140 | min.features=5, 141 | max.features=nrow(data$x), 142 | compute.fullcv= TRUE, 143 | compute.preval=TRUE) 144 | rs <- lapply(val_dd_list,function(w){ 145 | test <- list(x=t(w[,-c(1,2)]),y=w$OS.time,censoring.status=w$OS,featurenames=colnames(w)[-c(1,2)]) 146 | ff <- superpc.predict(fit,data,test,threshold = cv.fit$thresholds[which.max(cv.fit[["scor"]][1,])],n.components = 1) 147 | rr <- as.numeric(ff$v.pred) 148 | rr2 <- cbind(w[,1:2],RS=rr) 149 | return(rr2) 150 | }) 151 | 152 | cc <- data.frame(Cindex=sapply(rs,function(x){as.numeric(summary(coxph(Surv(OS.time,OS)~RS,x))$concordance[1])}))%>% 153 | rownames_to_column('ID') 154 | cc$Model <- paste0('SuperPC') 155 | result <- rbind(result,cc) 156 | 157 | 158 | ################################## 159 | #### 7.GBM #### 160 | ################################## 161 | 162 | set.seed(seed) 163 | fit <- gbm(formula = Surv(OS.time,OS)~.,data = est_dd,distribution = 'coxph', 164 | n.trees = 10000, 165 | interaction.depth = 3, 166 | n.minobsinnode = 10, 167 | shrinkage = 0.001, 168 | cv.folds = 10,n.cores = 6) 169 | # find index for number trees with minimum CV error 170 | best <- which.min(fit$cv.error) 171 | set.seed(seed) 172 | fit <- gbm(formula = Surv(OS.time,OS)~.,data = est_dd,distribution = 'coxph', 173 | n.trees = best, 174 | interaction.depth = 3, 175 | n.minobsinnode = 10, 176 | shrinkage = 0.001, 177 | cv.folds = 10,n.cores = 8) 178 | rs <- lapply(val_dd_list,function(x){cbind(x[,1:2],RS=as.numeric(predict(fit,x,n.trees = best,type = 'link')))}) 179 | 180 | cc <- data.frame(Cindex=sapply(rs,function(x){as.numeric(summary(coxph(Surv(OS.time,OS)~RS,x))$concordance[1])}))%>% 181 | rownames_to_column('ID') 182 | cc$Model <- paste0('GBM') 183 | result <- rbind(result,cc) 184 | 185 | 186 | ################################## 187 | #### 8.survivalsvm #### 188 | ################################## 189 | 190 | fit = survivalsvm(Surv(OS.time,OS)~., data= est_dd, gamma.mu = 1) 191 | 192 | rs <- lapply(val_dd_list,function(x){cbind(x[,1:2],RS=as.numeric(predict(fit, x)$predicted))}) 193 | 194 | cc <- data.frame(Cindex=sapply(rs,function(x){as.numeric(summary(coxph(Surv(OS.time,OS)~RS,x))$concordance[1])}))%>% 195 | rownames_to_column('ID') 196 | cc$Model <- paste0('survival-SVM') 197 | result <- rbind(result,cc) 198 | result2 <- result 199 | result2$Model <- gsub('α','a',result2$Model) 200 | library(ggplot2) 201 | library(ggsci) 202 | library(tidyr) 203 | library(ggbreak) 204 | range(result2$Cindex) 205 | result2%>%filter(ID!='TCGA')%>% 206 | ggplot(aes(Cindex,reorder(Model,Cindex)))+ 207 | geom_bar(width = 0.7,stat = 'summary',fun='mean',fill='orange2')+ 208 | theme_classic()+ 209 | labs(y=NULL) 210 | 211 | dd <- result2%>% 212 | #filter(ID!='TCGA')%>% 213 | group_by(Model)%>% 214 | summarise(Cindex=mean(Cindex)) 215 | 216 | dd%>% 217 | ggplot(aes(Cindex,reorder(Model,Cindex)))+ 218 | geom_bar(width=0.7,stat = 'identity',fill='orange')+ 219 | scale_x_break(c(0.05,0.53),scales = 20) 220 | 221 | 222 | dd2 <- pivot_wider(result2,names_from = 'ID',values_from = 'Cindex')%>%as.data.frame() 223 | dd2[,-1] <- apply(dd2[,-1],2,as.numeric) 224 | 225 | 226 | 227 | 228 | 229 | 230 | 231 | 232 | 233 | 234 | 235 | 236 | 237 | 238 | 239 | 240 | 241 | 242 | 243 | -------------------------------------------------------------------------------- /All-signature-compare.R: -------------------------------------------------------------------------------- 1 | rm(list = ls()) 2 | library(dplyr) 3 | library(data.table) 4 | library(tidyr) 5 | library(tibble) 6 | library(survival) 7 | library(survminer) 8 | library(timeROC) 9 | library(compareC) 10 | library(ggsci) 11 | load('all-signature.rda') 12 | 13 | uni <- data.frame() 14 | for (i in names(tmp2)) { 15 | dd <- tmp2[[i]] 16 | for(j in colnames(dd)[4:ncol(dd)]){ 17 | scox <- summary(coxph(Surv(OS.time,OS)~get(j),dd)) 18 | p <- compareC(dd$OS.time,dd$OS,dd$IRLS,dd[,j])$pval 19 | uni <- rbind(uni,data.frame(ID=i,A=j, 20 | HR=scox$conf.int[,1], 21 | HR.95L=scox$conf.int[,3], 22 | HR.95R=scox$conf.int[,4], 23 | pvalue=scox$coefficients[,5], 24 | cindex=scox$concordance[1]%>%as.numeric(), 25 | cse=scox$concordance[2]%>%as.numeric(), 26 | CP=p)) 27 | } 28 | } 29 | 30 | unique(uni$ID) 31 | dd <- uni[uni$ID=='TCGA-CRC',] 32 | dd$ll <- ifelse(dd$CP<0.0001,'****',ifelse(dd$CP<0.001,'***',ifelse(dd$CP<0.01,'**',ifelse(dd$CP<0.05,'*','')))) 33 | rownames(dd) <- NULL 34 | 35 | ggplot(dd,aes(cindex,reorder(A,cindex)))+ 36 | geom_errorbarh(aes(xmax=cindex+1.5*cse,xmin=cindex-1.5*cse),color="black",height=0,size=0.7)+ 37 | geom_point(size=4,shape=21,fill=pal_nejm()(10)[1])+ 38 | ylab(NULL)+xlab(NULL)+ 39 | labs(title ="TCGA-CRC")+ 40 | geom_vline(xintercept = 0.6,linetype='dashed',size=0.5,color='grey50')+ 41 | theme_bw(base_rect_size = 1)+ 42 | theme(panel.grid =element_blank(), 43 | axis.text.y = element_text(size=12), 44 | axis.text.x = element_text(size=12), 45 | axis.title = element_text(size=13), 46 | plot.title = element_text(hjust = 0.5,size=15), 47 | legend.position = 'none', 48 | strip.text = element_text(size=14))+ 49 | geom_text(aes(x=0.89,y=A,label=ll),color='black',size=3,vjust=0.76)+ 50 | scale_x_continuous(breaks = c(0.5,0.7,0.9),limits = c(0.4,0.94)) 51 | 52 | # ------------------------------------------------------------------------- 53 | 54 | unique(uni$ID) 55 | dd <- uni[uni$ID=='GSE17536',] 56 | dd$ll <- ifelse(dd$CP<0.0001,'****',ifelse(dd$CP<0.001,'***',ifelse(dd$CP<0.01,'**',ifelse(dd$CP<0.05,'*','')))) 57 | 58 | ggplot(dd,aes(cindex,reorder(A,cindex)))+ 59 | geom_errorbarh(aes(xmax=cindex+1.5*cse,xmin=cindex-1.5*cse),color="black",height=0,size=0.7)+ 60 | geom_point(size=4,shape=21,fill=pal_nejm()(10)[2])+ 61 | ylab(NULL)+xlab(NULL)+ 62 | labs(title ="GSE17536")+ 63 | geom_vline(xintercept = 0.6,linetype='dashed',size=0.5,color='grey50')+ 64 | theme_bw(base_rect_size = 1)+ 65 | theme(panel.grid =element_blank(), 66 | axis.text.y = element_text(size=12), 67 | axis.text.x = element_text(size=12), 68 | axis.title = element_text(size=13), 69 | plot.title = element_text(hjust = 0.5,size=15), 70 | legend.position = 'none', 71 | strip.text = element_text(size=14))+ 72 | geom_text(aes(x=0.71,y=A,label=ll),color='black',size=3,vjust=0.76)+ 73 | scale_x_continuous(breaks = c(0.5,0.6,0.7)) 74 | 75 | # ------------------------------------------------------------------------- 76 | 77 | unique(uni$ID) 78 | dd <- uni[uni$ID=='GSE17537',] 79 | dd$ll <- ifelse(dd$CP<0.0001,'****',ifelse(dd$CP<0.001,'***',ifelse(dd$CP<0.01,'**',ifelse(dd$CP<0.05,'*','')))) 80 | rownames(dd) <- NULL 81 | 82 | 83 | ggplot(dd,aes(cindex,reorder(A,cindex)))+ 84 | geom_errorbarh(aes(xmax=cindex+1.5*cse,xmin=cindex-1.5*cse),color="black",height=0,size=0.7)+ 85 | geom_point(size=4,shape=21,fill=pal_nejm()(10)[3])+ 86 | ylab(NULL)+xlab(NULL)+ 87 | labs(title ="GSE17537")+ 88 | geom_vline(xintercept = 0.6,linetype='dashed',size=0.5,color='grey50')+ 89 | theme_bw(base_rect_size = 1)+ 90 | theme(panel.grid =element_blank(), 91 | axis.text.y = element_text(size=12), 92 | axis.text.x = element_text(size=12), 93 | axis.title = element_text(size=13), 94 | plot.title = element_text(hjust = 0.5,size=15), 95 | legend.position = 'none', 96 | strip.text = element_text(size=14))+ 97 | geom_text(aes(x=0.85,y=A,label=ll),color='black',size=3,vjust=0.76)+ 98 | scale_x_continuous(breaks = c(0.4,0.6,0.8),limits = c(0.38,0.88)) 99 | 100 | # ------------------------------------------------------------------------- 101 | 102 | unique(uni$ID) 103 | dd <- uni[uni$ID=='GSE29621',] 104 | dd$ll <- ifelse(dd$CP<0.0001,'****',ifelse(dd$CP<0.001,'***',ifelse(dd$CP<0.01,'**',ifelse(dd$CP<0.05,'*','')))) 105 | rownames(dd) <- NULL 106 | 107 | 108 | ggplot(dd,aes(cindex,reorder(A,cindex)))+ 109 | geom_errorbarh(aes(xmax=cindex+1.5*cse,xmin=cindex-1.5*cse),color="black",height=0,size=0.7)+ 110 | geom_point(size=4,shape=21,fill=pal_nejm()(10)[4])+ 111 | ylab(NULL)+xlab(NULL)+ 112 | labs(title ="GSE29621")+ 113 | geom_vline(xintercept = 0.6,linetype='dashed',size=0.5,color='grey50')+ 114 | theme_bw(base_rect_size = 1)+ 115 | theme(panel.grid =element_blank(), 116 | axis.text.y = element_text(size=12), 117 | axis.text.x = element_text(size=12), 118 | axis.title = element_text(size=13), 119 | plot.title = element_text(hjust = 0.5,size=15), 120 | legend.position = 'none', 121 | strip.text = element_text(size=14))+ 122 | geom_text(aes(x=0.79,y=A,label=ll),color='black',size=3,vjust=0.76)+ 123 | scale_x_continuous(breaks = c(0.4,0.6,0.8),limits = c(0.36,0.82)) 124 | 125 | 126 | unique(uni$ID) 127 | dd <- uni[uni$ID=='GSE38832',] 128 | dd$ll <- ifelse(dd$CP<0.0001,'****',ifelse(dd$CP<0.001,'***',ifelse(dd$CP<0.01,'**',ifelse(dd$CP<0.05,'*','')))) 129 | rownames(dd) <- NULL 130 | ggplot(dd,aes(cindex,reorder(A,cindex)))+ 131 | geom_errorbarh(aes(xmax=cindex+1.5*cse,xmin=cindex-1.5*cse),color="black",height=0,size=0.7)+ 132 | geom_point(size=4,shape=21,fill=pal_nejm()(10)[5])+ 133 | ylab(NULL)+xlab(NULL)+ 134 | labs(title ="GSE38832")+ 135 | geom_vline(xintercept = 0.6,linetype='dashed',size=0.5,color='grey50')+ 136 | theme_bw(base_rect_size = 1)+ 137 | theme(panel.grid =element_blank(), 138 | axis.text.y = element_text(size=12), 139 | axis.text.x = element_text(size=12), 140 | axis.title = element_text(size=13), 141 | plot.title = element_text(hjust = 0.5,size=15), 142 | legend.position = 'none', 143 | strip.text = element_text(size=14))+ 144 | geom_text(aes(x=0.78,y=A,label=ll),color='black',size=3,vjust=0.76)+ 145 | scale_x_continuous(breaks = c(0.4,0.6,0.8)) 146 | 147 | unique(uni$ID) 148 | dd <- uni[uni$ID=='GSE39582',] 149 | dd$ll <- ifelse(dd$CP<0.0001,'****',ifelse(dd$CP<0.001,'***',ifelse(dd$CP<0.01,'**',ifelse(dd$CP<0.05,'*','')))) 150 | rownames(dd) <- NULL 151 | 152 | ggplot(dd,aes(cindex,reorder(A,cindex)))+ 153 | geom_errorbarh(aes(xmax=cindex+1.5*cse,xmin=cindex-1.5*cse),color="black",height=0,size=0.7)+ 154 | geom_point(size=4,shape=21,fill=pal_nejm()(10)[6])+ 155 | ylab(NULL)+xlab(NULL)+ 156 | labs(title ="GSE39582")+ 157 | geom_vline(xintercept = 0.6,linetype='dashed',size=0.5,color='grey50')+ 158 | theme_bw(base_rect_size = 1)+ 159 | theme(panel.grid =element_blank(), 160 | axis.text.y = element_text(size=12), 161 | axis.text.x = element_text(size=12), 162 | axis.title = element_text(size=13), 163 | plot.title = element_text(hjust = 0.5,size=15), 164 | legend.position = 'none', 165 | strip.text = element_text(size=14))+ 166 | geom_text(aes(x=0.69,y=A,label=ll),color='black',size=3,vjust=0.76)+ 167 | scale_x_continuous(breaks = c(0.5,0.6,0.7)) 168 | 169 | 170 | unique(uni$ID) 171 | dd <- uni[uni$ID=='GSE72970',] 172 | dd$ll <- ifelse(dd$CP<0.0001,'****',ifelse(dd$CP<0.001,'***',ifelse(dd$CP<0.01,'**',ifelse(dd$CP<0.05,'*','')))) 173 | rownames(dd) <- NULL 174 | 175 | ggplot(dd,aes(cindex,reorder(A,cindex)))+ 176 | geom_errorbarh(aes(xmax=cindex+1.5*cse,xmin=cindex-1.5*cse),color="black",height=0,size=0.7)+ 177 | geom_point(size=4,shape=21,fill=pal_nejm()(10)[7])+ 178 | ylab(NULL)+xlab(NULL)+ 179 | labs(title ="GSE72970")+ 180 | geom_vline(xintercept = 0.6,linetype='dashed',size=0.5,color='grey50')+ 181 | theme_bw(base_rect_size = 1)+ 182 | theme(panel.grid =element_blank(), 183 | axis.text.y = element_text(size=12), 184 | axis.text.x = element_text(size=12), 185 | axis.title = element_text(size=13), 186 | plot.title = element_text(hjust = 0.5,size=15), 187 | legend.position = 'none', 188 | strip.text = element_text(size=14))+ 189 | geom_text(aes(x=0.69,y=A,label=ll),color='black',size=3,vjust=0.76)+ 190 | scale_x_continuous(breaks = c(0.5,0.6,0.7)) 191 | 192 | unique(uni$ID) 193 | dd <- uni[uni$ID=='Meta-Cohort',] 194 | dd$ll <- ifelse(dd$CP<0.0001,'****',ifelse(dd$CP<0.001,'***',ifelse(dd$CP<0.01,'**',ifelse(dd$CP<0.05,'*','')))) 195 | rownames(dd) <- NULL 196 | 197 | ggplot(dd,aes(cindex,reorder(A,cindex)))+ 198 | geom_errorbarh(aes(xmax=cindex+1.5*cse,xmin=cindex-1.5*cse),color="black",height=0,size=0.7)+ 199 | geom_point(size=4,shape=21,fill=pal_nejm()(10)[8])+ 200 | ylab(NULL)+xlab(NULL)+ 201 | labs(title ="Meta-Cohort")+ 202 | geom_vline(xintercept = 0.6,linetype='dashed',size=0.5,color='grey50')+ 203 | theme_bw(base_rect_size = 1)+ 204 | theme(panel.grid =element_blank(), 205 | axis.text.y = element_text(size=12), 206 | axis.text.x = element_text(size=12), 207 | axis.title = element_text(size=13), 208 | plot.title = element_text(hjust = 0.5,size=15), 209 | legend.position = 'none', 210 | strip.text = element_text(size=14))+ 211 | geom_text(aes(x=0.69,y=A,label=ll),color='black',size=3,vjust=0.76)+ 212 | scale_x_continuous(breaks = c(0.5,0.6,0.7)) 213 | 214 | # ------------------------------------------------------------------------- 215 | 216 | uni$x <- ifelse(uni$pvalue<0.05&uni$HR>1,'Risky', 217 | ifelse(uni$pvalue<0.05&uni$HR<1,'Protective','P > 0.05')) 218 | xx <- pivot_wider(uni[,c(1,2,10)],names_from = 'ID',values_from = 'x') 219 | xx <- column_to_rownames(xx,'A') 220 | 221 | library(ComplexHeatmap) 222 | library(circlize) 223 | 224 | ss <- read.csv('signature.csv')[,3:4] 225 | ss <- distinct(ss,Author,.keep_all = T) 226 | ss <- ss[ss$Author%in%rownames(xx),] 227 | ss[107,] <- c('lncRNA','IRLS') 228 | rownames(ss) <- ss$Author 229 | ss <- ss[order(ss$Type,ss$Author),] 230 | rownames(ss) <- NULL 231 | xx <- xx[ss$Author,] 232 | 233 | right = HeatmapAnnotation(Signature=ss$Type, 234 | annotation_legend_param=list(labels_gp = gpar(fontsize = 12),border = T, 235 | title_gp = gpar(fontsize = 12,fontface = "bold"), 236 | ncol=1), 237 | border = T,which = 'column', 238 | col=list(Signature = c('lncRNA'=pal_npg(alpha = 0.8)(10)[3],'mRNA'=pal_nejm(alpha = 0.8)(8)[3])), 239 | show_annotation_name = F, 240 | annotation_name_gp = gpar(fontsize = 12)) 241 | Heatmap(t(xx),col = c('#F2F2F2',pal_npg()(2)[2],pal_npg()(2)[1]),name = 'Type', 242 | rect_gp = gpar(col='grey70'), 243 | top_annotation = right, 244 | row_names_side = 'left', 245 | heatmap_legend_param=list(labels_gp = gpar(fontsize = 12), border = T, 246 | title_gp = gpar(fontsize = 12, fontface = "bold"))) 247 | 248 | 249 | 250 | -------------------------------------------------------------------------------- /clinical validation.R: -------------------------------------------------------------------------------- 1 | rm(list = ls()) 2 | library(dplyr) 3 | library(data.table) 4 | library(tidyr) 5 | library(tibble) 6 | library(survival) 7 | library(survminer) 8 | library(timeROC) 9 | library(pROC) 10 | library(export) 11 | library(ggsci) 12 | 13 | load('sampleclin.rda') 14 | 15 | # ------------------------------------------------------------------------- 16 | 17 | kmp <- function(data,legend,main){ 18 | mytheme <- theme_survminer(font.legend = c(14,"plain", "black"), 19 | font.x = c(14,"plain", "black"), 20 | font.y = c(14,"plain", "black"), 21 | legend = "top") 22 | cut <- surv_cutpoint(data,'OS.time','OS','RS',minprop = 0.1) 23 | cat <- surv_categorize(cut) 24 | fit <- survfit(Surv(OS.time,OS)~RS,cat) 25 | pp <- ggsurvplot(fit,data = cat, 26 | palette= pal_nejm()(2), 27 | conf.int=FALSE,size=1.3, 28 | pval=T,pval.method = T, 29 | legend.labs=c('High-risk','Low-risk'), 30 | legend.title="", 31 | legend=legend, 32 | xlab="Time(years)", 33 | ylab='Overall survival', 34 | ggtheme = mytheme) 35 | return(ggpar(pp,main = main)) 36 | } 37 | 38 | table(my$IC) 39 | kmp(my,legend = c(0.84,0.97),main = NULL) 40 | graph2pdf(file='All-patients.pdf',width=4.2,height=4) 41 | 42 | # ------------------------------------------------------------------------- 43 | 44 | kmp <- function(data,legend,main){ 45 | mytheme <- theme_survminer(font.legend = c(14,"plain", "black"), 46 | font.x = c(14,"plain", "black"), 47 | font.y = c(14,"plain", "black"), 48 | legend = "top") 49 | cut <- surv_cutpoint(data,'RFS.time','RFS','RS',minprop = 0.1) 50 | cat <- surv_categorize(cut) 51 | fit <- survfit(Surv(RFS.time,RFS)~RS,cat) 52 | pp <- ggsurvplot(fit,data = cat, 53 | palette= pal_nejm()(2), 54 | conf.int=FALSE,size=1.3, 55 | pval=T,pval.method = T, 56 | legend.labs=c('High-risk','Low-risk'), 57 | legend.title="", 58 | legend=legend, 59 | xlab="Time(years)", 60 | ylab='Recurrence-free survival', 61 | ggtheme = mytheme) 62 | return(ggpar(pp,main = main)) 63 | } 64 | 65 | kmp(my,legend = c(0.84,0.97),main = NULL) 66 | graph2pdf(file='All-patients-rfs.pdf',width=4.2,height=4) 67 | 68 | #kmp(my[my$IC==1,],legend = c(0.84,0.97),main = 'Patients with ICI (n = 65)') 69 | #graph2pdf(file='Patients with immunotherapy-rfs.pdf',width=4.2,height=4) 70 | 71 | #kmp(my[my$IC==0,],legend = c(0.84,0.97),main = 'Patients without ICI (n = 167)') 72 | #graph2pdf(file='Patients without immunotherapy-rfs.pdf',width=4.2,height=4) 73 | 74 | # ------------------------------------------------------------------------- 75 | 76 | x <- summary(coxph(Surv(OS.time,OS)~.,my[,c(1,2,5:13)])) 77 | y <- data.frame(id=rownames(x$coefficients), 78 | HR=x$coefficients[,2], 79 | HR.95L=x$conf.int[,"lower .95"], 80 | HR.95H=x$conf.int[,'upper .95'], 81 | pvalue=x$coefficients[,"Pr(>|z|)"]) 82 | 83 | y <- y[c(1:6,10,7:9),] 84 | y[,-1] <- apply(y[,-1],2,as.numeric) 85 | 86 | rt <- y 87 | gene <- rt$id 88 | hr <- sprintf("%.3f",rt$"HR") 89 | hrLow <- sprintf("%.3f",rt$"HR.95L") 90 | hrHigh <- sprintf("%.3f",rt$"HR.95H") 91 | Hazard.ratio <- paste0(hr," (",hrLow,"-",hrHigh,")") 92 | pVal <- ifelse(rt$pvalue<0.001, "<0.001", sprintf("%.3f", rt$pvalue)) 93 | 94 | if(T){ 95 | n <- nrow(rt) 96 | nRow <- n+1 97 | ylim <- c(1,nRow) 98 | layout(matrix(c(1,2),nc=2),width=c(3,2)) 99 | 100 | #绘制森林图左边的基因信息 101 | xlim = c(0,2.6) 102 | par(mar=c(4,2.5,2,0)) 103 | plot(1,xlim=xlim,ylim=ylim,type="n",axes=F,xlab="",ylab="") 104 | text.cex=1 105 | text(0.5,n:1,gene,adj=0,cex=text.cex) 106 | text(1.65,n:1,pVal,adj=1,cex=text.cex); 107 | text(1.5+0.2,n+1,'P-value',cex=text.cex,font=2,adj=1) 108 | text(2.6,n:1,Hazard.ratio,adj=1,cex=text.cex) 109 | text(2.45,n+1,'HR (95% CI)',cex=text.cex,font=2,adj=1,) 110 | 111 | #绘制森林图 112 | par(mar=c(4,0,2,1),mgp=c(2,0.5,0)) 113 | xlim = c(0,6) 114 | plot(1,xlim=xlim,ylim=ylim,type="n",axes=F,ylab="",xaxs="i",xlab="") 115 | arrows(as.numeric(hrLow),n:1,as.numeric(hrHigh),n:1,angle=0,code=3,length=0.05,col="black",lwd=2.5) 116 | abline(v=1,col="gray",lty=2,lwd=1.5) 117 | boxcolor = '#67AB9F' 118 | points(as.numeric(hr), n:1, pch = 15, cex=2,col = boxcolor) 119 | axis(1) 120 | } 121 | library(export) 122 | graph2pdf(file="inhouse-multicox.pdf", width = 10,height = 4.7) 123 | 124 | # ------------------------------------------------------------------------- 125 | 126 | x <- summary(coxph(Surv(RFS.time,RFS)~.,my[,c(3,4,5:13)])) 127 | y <- data.frame(id=rownames(x$coefficients), 128 | HR=x$coefficients[,2], 129 | HR.95L=x$conf.int[,"lower .95"], 130 | HR.95H=x$conf.int[,'upper .95'], 131 | pvalue=x$coefficients[,"Pr(>|z|)"]) 132 | y <- y[c(1:6,10,7:9),] 133 | 134 | rt <- y 135 | gene <- rt$id 136 | hr <- sprintf("%.3f",rt$"HR") 137 | hrLow <- sprintf("%.3f",rt$"HR.95L") 138 | hrHigh <- sprintf("%.3f",rt$"HR.95H") 139 | Hazard.ratio <- paste0(hr," (",hrLow,"-",hrHigh,")") 140 | pVal <- ifelse(rt$pvalue<0.001, "<0.001", sprintf("%.3f", rt$pvalue)) 141 | 142 | if(T){ 143 | n <- nrow(rt) 144 | nRow <- n+1 145 | ylim <- c(1,nRow) 146 | layout(matrix(c(1,2),nc=2),width=c(3,2)) 147 | 148 | #绘制森林图左边的基因信息 149 | xlim = c(0,2.6) 150 | par(mar=c(4,2.5,2,0)) 151 | plot(1,xlim=xlim,ylim=ylim,type="n",axes=F,xlab="",ylab="") 152 | text.cex=1 153 | text(0.5,n:1,gene,adj=0,cex=text.cex) 154 | text(1.65,n:1,pVal,adj=1,cex=text.cex); 155 | text(1.5+0.2,n+1,'P-value',cex=text.cex,font=2,adj=1) 156 | text(2.6,n:1,Hazard.ratio,adj=1,cex=text.cex) 157 | text(2.45,n+1,'HR (95% CI)',cex=text.cex,font=2,adj=1,) 158 | 159 | #绘制森林图 160 | par(mar=c(4,0,2,1),mgp=c(2,0.5,0)) 161 | xlim = c(0,5) 162 | plot(1,xlim=xlim,ylim=ylim,type="n",axes=F,ylab="",xaxs="i",xlab="") 163 | arrows(as.numeric(hrLow),n:1,as.numeric(hrHigh),n:1,angle=0,code=3,length=0.05,col="black",lwd=2.5) 164 | abline(v=1,col="gray",lty=2,lwd=1.5) 165 | boxcolor = '#67AB9F' 166 | points(as.numeric(hr), n:1, pch = 15, cex=2,col = boxcolor) 167 | axis(1) 168 | } 169 | graph2pdf(file="inhouse-multicox-rfs.pdf", width = 10,height = 4.7) 170 | 171 | # ------------------------------------------------------------------------- 172 | 173 | # ------------------------------------------------------------------------- 174 | 175 | tt <- timeROC(my$OS.time,my$OS,my$RS,cause = 1,weighting = 'marginal',times = c(1,3,5),ROC = T) 176 | tp <- tt$TP%>%as.data.frame()%>%pivot_longer(cols = 1:3,names_to = 'time',values_to = 'tp') 177 | fp <- tt$FP%>%as.data.frame()%>%pivot_longer(cols = 1:3,names_to = 'time',values_to = 'fp') 178 | 179 | dd <- tp 180 | dd$fp <- fp$fp 181 | dd$time <- ifelse(dd$time=='t=1',"1-Year = 0.840", 182 | ifelse(dd$time=='t=3','3-Year = 0.776','5-Year = 0.818')) 183 | 184 | ggplot(dd,aes(fp,tp,color=time))+ 185 | geom_line(size=1)+ 186 | labs(x='1-Specificity',y='Sensitivity',color=NULL)+ 187 | theme_bw(base_rect_size = 1.5)+ 188 | geom_abline(slope = 1,color='grey70')+ 189 | theme(panel.grid =element_blank(), 190 | axis.text = element_text(size=11), 191 | axis.title = element_text(size=13), 192 | legend.text = element_text(size=12), 193 | legend.position = c(0.995,0.012), 194 | legend.justification = c(1,0))+ 195 | scale_color_nejm()+ 196 | scale_x_continuous(expand = c(0.01,0.01))+ 197 | scale_y_continuous(expand = c(0.01,0.01)) 198 | ggsave(filename = 'inhouse-timeROC.pdf',width = 4.3,height = 4) 199 | 200 | # ------------------------------------------------------------------------- 201 | 202 | fit <- roc(x$RR,x$RS,auc=T) 203 | 204 | dd <- data.frame(x=1-fit$specificities,y=fit$sensitivities) 205 | dd <- dd[order(dd$y,dd$x),] 206 | dd$AUC <- 'IRLS' 207 | tmp <- dd 208 | ggplot(tmp,aes(x,y))+ 209 | geom_line(aes(group=AUC)) 210 | 211 | tmp$AUC2 <- ifelse(tmp$AUC=='IRLS','IRLS= 0.897',ifelse(tmp$AUC=='PD-L1','PD-L1 = 0.686 ***','CD8A= 0.725 **')) 212 | 213 | tmp$AUC2 <- factor(tmp$AUC2,levels = unique(tmp$AUC2)) 214 | ggplot(tmp,aes(x,y,color=AUC2))+ 215 | geom_line(size=1)+ 216 | labs(x='1-Specificity',y='Sensitivity',color=NULL)+ 217 | theme_bw(base_rect_size = 1.5)+ 218 | geom_abline(slope = 1,color='grey70')+ 219 | theme(panel.grid =element_blank(), 220 | axis.text = element_text(size=11), 221 | axis.title = element_text(size=13), 222 | legend.text = element_text(size=12), 223 | legend.position = c(0.995,0.012), 224 | legend.justification = c(1,0))+ 225 | scale_color_nejm()+ 226 | scale_x_continuous(expand = c(0.01,0.01))+ 227 | scale_y_continuous(expand = c(0.01,0.01)) 228 | ggsave(filename = 'IC-inhouse-timeROC.pdf',width = 4.3,height = 4) 229 | 230 | 231 | 232 | if(F){ 233 | plot.roc(fit,print.thres = T,print.auc = T) 234 | pdf(file = 'IC-roc.pdf',width = 4,height = 4) 235 | plot.roc(fit, 236 | axes=T, ## 是否显示xy轴 237 | legacy.axes=T, 238 | main=NULL, ## Title 239 | col= "steelblue", ## 曲线颜色 240 | lty=1, ## 曲线形状 241 | lwd=3, ## 曲线粗细 242 | identity=T, ## 是否显示对角线 243 | identity.col="grey60", ## 对角线颜色 244 | identity.lty=2, ## 对角线形状 245 | identity.lwd=2, ## 对角线粗细 246 | print.thres=F, ## 是否输出cut-off值 247 | print.thres.pch=20, ## cut-off点的形状 248 | print.thres.col="red", ## cut-off点和文本的颜色 249 | print.thres.cex=1.2, 250 | print.auc=T, ## 是否显示AUC 251 | print.auc.pattern="AUC = 0.896", ## 展示AUC的格式 252 | auc.polygon.border='darkred', 253 | print.auc.x=0.48, ## AUC值的X位置 254 | print.auc.y=0.13, ## AUC值的Y位置 255 | print.auc.cex=1.2, ## AUC值的放大倍数 256 | print.auc.col='black', ## ACU值的颜色 257 | auc.polygon=TRUE, ## 是否将ROC下面积上色 258 | auc.polygon.col='skyblue', 259 | max.auc.polygon=TRUE, 260 | max.auc.polygon.col='WhiteSmoke', 261 | max.auc.polygon.lty=1 262 | ) 263 | dev.off() 264 | } 265 | # ------------------------------------------------------------------------- 266 | 267 | # ------------------------------------------------------------------------- 268 | 269 | library(plyr) 270 | ggdata <- table(x$gg,x$RR)%>%as.data.frame() 271 | ggdata2 <- ddply(ggdata,'Var1',transform,percent_Freq=Freq/sum(Freq)*100) 272 | ggdata2 <- ddply(ggdata2,'Var1',transform,lable=cumsum(percent_Freq)-0.5*percent_Freq) 273 | ggdata2$ll <- paste0(round(ggdata2$percent_Freq/100,2)*100,'%') 274 | ggdata2$Var2 <- factor(ifelse(ggdata2$Var2==0,'NR','R'),levels = c('R','NR')) 275 | 276 | ggplot(ggdata2,aes(Var1,percent_Freq,fill=Var2))+ 277 | geom_bar(stat = 'identity',width = 0.85)+ 278 | xlab(NULL)+ylab('Fraction (%)')+ 279 | geom_text(aes(label=ll),size=3.8, 280 | position = position_stack(vjust = 0.5), 281 | color='white')+ 282 | scale_fill_manual(values = pal_npg(alpha = 0.9)(2)[1:2])+ 283 | theme_classic()+ 284 | scale_x_discrete(expand = c(0.3,0.2))+ 285 | scale_y_continuous(expand = c(0.01,0.0))+ 286 | ggtitle('****')+ 287 | theme(legend.position = 'right', 288 | plot.title = element_text(hjust=0.5,face = 'plain'), 289 | axis.title = element_text(size=13), 290 | axis.text.y = element_text(size=10), 291 | axis.text.x = element_text(size=12))+ 292 | labs(fill=NULL) 293 | ggsave(filename = 'IC-bar.pdf',height = 4,width = 3.2) 294 | 295 | chisq.test(table(x$gg,x$RR)) 296 | 297 | # ------------------------------------------------------------------------- 298 | 299 | ggplot(x2,aes(reorder(ID,RS),RS,fill=Res))+ 300 | geom_bar(stat = 'identity',width = 0.7,color='grey30',position = position_dodge2(width = 0.9))+ 301 | scale_fill_nejm()+labs(y='IRLS score')+ 302 | theme_classic(base_rect_size = 2)+ 303 | theme(axis.line.x = element_blank(), 304 | axis.ticks.x = element_blank(), 305 | axis.text.x = element_blank(), 306 | axis.title.x = element_blank(), 307 | legend.title = element_blank(), 308 | axis.text.y = element_text(size=12), 309 | axis.title.y = element_text(size=13), 310 | legend.text = element_text(size=12), 311 | legend.position = c(0.01,1), 312 | legend.justification = c(0,1)) 313 | 314 | # ------------------------------------------------------------------------- 315 | 316 | library(compareC) 317 | tt <- my 318 | dd <- data.frame() 319 | for (i in colnames(my)[5:13]) { 320 | fit <- summary(coxph(Surv(OS.time,OS)~get(i),tt)) 321 | CC <- fit$concordance[1]%>%as.numeric() 322 | se <- fit$concordance[2]%>%as.numeric() 323 | p <- compareC(tt$OS.time,tt$OS,tt$RS,tt[,i])$pval 324 | dd <- rbind(dd,data.frame(ID=i,C=CC,SE=se,P=p)) 325 | } 326 | 327 | ggplot(dd,aes(ID,C,fill=ID))+ 328 | geom_bar(stat='identity',position=position_dodge(0.8),width=0.6)+ 329 | geom_errorbar(aes(ymax=C+1.5*SE,ymin=C-1.5*SE), 330 | width=0.1,position = position_dodge(0.8),size=0.6)+ 331 | theme_bw(base_rect_size = 1.5)+ 332 | ggtitle('C-index (Compared with IRLS)')+ 333 | theme(axis.title = element_blank(), 334 | axis.text = element_text(size=12), 335 | legend.position = 'none', 336 | panel.grid = element_blank(), 337 | plot.title = element_text(hjust = 0.5,size=14), 338 | strip.text = element_text(size=12), 339 | axis.ticks.x = element_blank())+ 340 | scale_fill_npg()+ 341 | scale_y_continuous(expand = c(0,0.01),limits = c(0,0.85))+ 342 | geom_text(aes(y=0.80,label=ll),size=5) 343 | 344 | 345 | 346 | 347 | 348 | 349 | 350 | 351 | 352 | 353 | 354 | 355 | --------------------------------------------------------------------------------