├── model-prepare.rda
├── Identification of Immune-related lncRNAs.R
├── README.md
├── Consensus cluster.R
├── Machine Learning based intergration.R
├── All-signature-compare.R
└── clinical validation.R


/model-prepare.rda:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/Zaoqu-Liu/IRLS/HEAD/model-prepare.rda


--------------------------------------------------------------------------------
/Identification of Immune-related lncRNAs.R:
--------------------------------------------------------------------------------
 1 | rm(list = ls())
 2 | library(estimate)
 3 | library(clusterProfiler)
 4 | library(org.Hs.eg.db)
 5 | library(stringr)
 6 | library(ImmuLncRNA)
 7 | load('/Users/CRC-data-lnc/TCGA.rda')
 8 | load('/Users/CRC-data-lnc/GEO-mRNA-data/GEO-Rdata/TCGA.rda')
 9 | rm(TCGA_clin)
10 | load('Purity.Rdata')
11 | CRC.lncRNA <- CRC.lncRNA[apply(CRC.lncRNA,1,function(x){sum(x>0)>ncol(CRC.lncRNA)/3}),]
12 | 
13 | # -------------------------------------------------------------------------
14 | 
15 | set.seed(123456)
16 | system.time(
17 |   result <- immu.LncRNA(CRC.mRNA,CRC.lncRNA,adjusted=TRUE,Tumour_purity,pathways,)
18 | )
19 | 
20 | sig <- subset(result$fgseaRes_all,padj < 0.05 & sigValue > 0.995)
21 | sig_immune_related_lncRNA <- unique(sig$lncRNA)
22 | save(result,sig,sig_immune_related_lncRNA,file = 'immune_related_result.Rdata')
23 | 
24 | ggplot(bardata,aes(log2(Freq+1),reorder(Var1,Freq),fill=Var1))+
25 |   geom_bar(stat = 'identity',width = 0.7)+
26 |   theme_classic(base_rect_size = 1.5)+
27 |   ylab('')+xlab('log2(Number of immune-related lncRNAs)')+
28 |   ggtitle('ImmuLnc analysis')+
29 |   theme(axis.ticks.y = element_blank(),
30 |         panel.border = element_blank(),
31 |         panel.grid = element_blank(),
32 |         plot.title = element_text(hjust=0.5,size=12),
33 |         axis.title.x = element_text(size = 10),
34 |         legend.position = 'none')+
35 |   scale_y_discrete(expand = c(0.04,0),)+
36 |   scale_x_continuous(expand = c(0,0.1))+
37 |   scale_fill_manual(values = c(pal_npg(alpha = 0.8)(10),pal_d3(alpha = 0.8)(10))[-c(17:20)])
38 | ggsave(filename = 'Immune-lncRNA-type.pdf',width = 5,height = 5)
39 | 
40 | 
41 | 


--------------------------------------------------------------------------------
/README.md:
--------------------------------------------------------------------------------
 1 | # IRLS
 2 | Machine learning-based integrative analysis develops an immune-derived lncRNA signature for improving clinical outcomes in colorectal cancer
 3 | 
 4 | ## Consensus clustering
 5 | According to the infiltration profile of various immune cells, a resampling-based method termed consensus clustering was applied for cluster discovery. This process was performed by ConsensusClusterPlus package[1]. Subsample 80% of samples at each iteration and partition each subsample into up to k (max K = 9) groups by k-means algorithm upon Euclidean distance. This process was repeated for 1,000 repetitions. Subsequently, the consensus score matrix, cumulative distribution function (CDF) curve, proportion of ambiguous clustering (PAC) score, and Nbclust were synthetically used to determine the optimal number of clusters. A higher consensus score between two samples indicates they are more likely to be grouped into the same cluster in different iterations. The consensus values range from 0 (never clustered together) to 1 (always clustered together) marked by white to dark brown. In the CDF curve of a consensus matrix, the lower left portion represents sample pairs rarely clustered together, the upper right portion represents those almost always clustered together, whereas the middle segment represents those with ambiguous assignments in different clustering runs. The "proportion of ambiguous clustering" (PAC) measure quantifies this middle segment; and is defined as the fraction of sample pairs with consensus indices falling in the interval (u1, u2) ∈ [0, 1] where u1 is a value close to 0 and u2 is a value close to 1 (for instance u1=0.1 and u2=0.9). A low value of PAC indicates a flat middle segment, and a low rate of discordant assignments across permuted clustering runs. PAC for each K is CDFk(u2) - CDFk(u1)[2]. According to his criterion, we can therefore infer the optimal number of clusters by the K value having the lowest PAC. The Nbclust uses 26 mathematic criteria to select the optimal number.
 6 | 
 7 | ## Signature generated from machine learning based integrative approaches   
 8 | To develop a consensus immune-related lncRNA signature (IRLS) with high accuracy and stability performance, we integrated 10 machine learning algorithms and 101 algorithm combinations. The integrative algorithms included random survival forest (RSF), elastic network (Enet), Lasso, Ridge, stepwise Cox, CoxBoost, partial least squares regression for Cox (plsRcox), supervised principal components (SuperPC), generalized boosted regression modeling (GBM), and survival support vector machine (survival-SVM). The RSF model was implemented via the randomForestSRC package. RSF had two parameters ntree and mtry, where ntree represented the number of trees in the forest and mtry was the number of randomly selected variables for splitting at each node. We used a grid-search on ntree and mtry using leave-one-out cross-validation (LOOCV) framework. All the pairs of (ntree, mtry) are formed and the one with the best C-index value is identified as the optimized parameters. The Enet, Lasso, and Ridge were implemented via the glmnet package. The regularization parameter, λ, was determined by LOOCV, whereas the L1-L2 trade-off parameter, α, was set to 0-1 (interval =0.1). The stepwise Cox model was implemented via survival package. A stepwise algorithm using the AIC (Akaike information criterion) was applied, and the direction mode of stepwise search was set to "both", "backward", and "forward", respectively. The CoxBoost model was implemented via CoxBoost package, which is used to fit a Cox proportional hazards model by componentwise likelihood-based boosting. For the CoxBoost model, we used LOOCV routine optimCoxBoostPenalty function to first determine the optimal penalty (amount of shrinkage). Once this parameter was determined, the other tuning parameter of the algorithm, namely, the number of boosting steps to perform, was selected via the function cv.CoxBoost. The dimension of the selected multivariate Cox model was finally set by the principal routine CoxBoost. The plsRcox model was implemented via plsRcox package. The cv.plsRcox function was used to determine the number of components requested, and the plsRcox function was applied to fit a partial least squares regression generalized linear model. The SuperPC model was implemented via superpc package, is a generalization of principal component analysis, which generates a linear combination of the features or variables of interest that capture the directions of largest variation in a dataset. The superpc.cv function used a form of LOOCV to estimate the optimal feature threshold in supervised principal components. To avoid problems with fitting Cox models to small validation datasets, it uses the "pre-validation" approach. The GBM model was implemented via superpc package. Using the LOOCV, the cv.gbm function selected index for number trees with minimum cross-validation error. The gbm function was used to fit the generalized boosted regression model. The survival-SVM model was implemented via survivalsvm package. The regression approach takes censoring into account when formulating the inequality constraints of the support vector problem.
 9 | 
10 | ## Cells infiltration estimation
11 | The single sample gene set enrichment analysis (ssGSEA) implemented in R package GSVA was employed to quantify the relative infiltration of 28 immune cells in CRC[22]. Six other algorithms including TIMER, quanTIseq, MCP-counter, xCell, EPIC, and ESTIMATE, were further performed to verify the stability and robustness of the ssGSEA results.
12 | 
13 | ## Weighted gene co-expression network analysis (WGCNA)
14 | Co-expression lncRNA networks were generated using WGCNA package. An appropriate soft threshold β was calculated to meet the criteria for scale-free network. Further, the weighted adjacency matrix was converted into a topological overlap matrix (TOM), and the corresponding dissimilarity was generated (1-TOM). The dynamic tree cutting approach was employed to conduct the module identification. To recognize lncRNAs modules significantly corelated with immune cluster, the module that displayed the highest correlation was selected for further study. lncRNAs with both high gene significance (GS) and module membership (MM) were defined as immune-related lncRNAs.
15 | 
16 | ## ImmLnc analysis framework
17 | ImmLnc is an integrated algorithm for identifying lncRNA modulators of immune-related pathways. First, the ESTIMATE algorithm was used to infer tumor purity. Second, we calculated the partial correlation coefficient (PCC) between a specific lncRNA and all mRNAs by adjusting the tumor purity as a covariable. Final, all mRNAs were ranked by the correlation coefficient with a specific lncRNA, and the ranked gene list was further subjected to GSEA procedure to investigate whether the immune genes were enriched in the top or bottom of the gene list. As recommended, lncRES scores >0.995 and FDR <0.05 were considered statistically significant[9, 19].
18 | 


--------------------------------------------------------------------------------
/Consensus cluster.R:
--------------------------------------------------------------------------------
  1 | rm(list = ls())
  2 | library(dplyr)
  3 | library(data.table)
  4 | library(tidyr)
  5 | library(tibble)
  6 | library(GSVA)
  7 | library(ConsensusClusterPlus)
  8 | library(ComplexHeatmap)
  9 | library(ggplot2)
 10 | library(ggsci)
 11 | library(ggpubr)
 12 | library(ComplexHeatmap)
 13 | library(circlize)
 14 | load('/Users/CRC-data-lnc/TCGA.rda')
 15 | rm(CRC.lncRNA)
 16 | load('/Users/Tool and Data/cellMarker_ssGSEA.Rdata')
 17 | 
 18 | # -------------------------------------------------------------------------
 19 | 
 20 | gsva_data <- gsva(as.matrix(CRC.mRNA),cellMarker, method = "ssgsea")
 21 | ss <- gsva_data
 22 | 
 23 | # -------------------------------------------------------------------------
 24 | 
 25 | dir.create('ConsensusCluster/')
 26 | ## 一致性聚类
 27 | results = ConsensusClusterPlus(as.matrix(ss),
 28 |                                maxK=9,
 29 |                                reps=100,
 30 |                                pItem=0.8,
 31 |                                pFeature=1,
 32 |                                tmyPal = c('navy','darkred'),
 33 |                                title='ConsensusCluster/',
 34 |                                clusterAlg="km",
 35 |                                distance="euclidean",
 36 |                                seed=123456,
 37 |                                plot="pdf")
 38 | 
 39 | icl <- calcICL(results,title = 'ConsensusCluster/',plot = 'pdf')
 40 | 
 41 | ## PAC = Proportion of ambiguous clustering 模糊聚类比例
 42 | Kvec = 2:9
 43 | x1 = 0.1; x2 = 0.9 
 44 | PAC = rep(NA,length(Kvec)) 
 45 | names(PAC) = paste("K=",Kvec,sep="") 
 46 | for(i in Kvec){
 47 |   M = results[[i]]$consensusMatrix
 48 |   Fn = ecdf(M[lower.tri(M)])
 49 |   PAC[i-1] = Fn(x2) - Fn(x1)
 50 | }
 51 | optK = Kvec[which.min(PAC)]
 52 | optK
 53 | 
 54 | PAC <- as.data.frame(PAC)
 55 | PAC$K <- 2:9
 56 | library(ggplot2)
 57 | ggplot(PAC,aes(factor(K),PAC,group=1))+
 58 |   geom_line()+
 59 |   theme_bw(base_rect_size = 1.5)+
 60 |   geom_point(size=4,shape=21,color='darkred',fill='orange')+
 61 |   ggtitle('Proportion of ambiguous clustering')+
 62 |   xlab('Cluster number K')+ylab(NULL)+
 63 |   theme(axis.text = element_text(size=12),
 64 |         plot.title = element_text(hjust=0.5),
 65 |         axis.title = element_text(size=13))
 66 | ggsave(filename = 'ConsensusCluster/PAC.pdf',width = 3.8,height = 4)
 67 | 
 68 | ## 保存分型信息
 69 | clusterNum=2      
 70 | cluster=results[[clusterNum]][["consensusClass"]]
 71 | 
 72 | sub <- data.frame(Sample=names(cluster),Cluster=cluster)
 73 | sub$Cluster <- paste0('C',sub$Cluster)
 74 | table(sub$Cluster)
 75 | 
 76 | head(sub)
 77 | 
 78 | my <- results[[2]][["ml"]]
 79 | library(pheatmap)
 80 | rownames(my) <- sub$Sample
 81 | colnames(my) <- sub$Sample
 82 | pheatmap(1-my,show_colnames = F,show_rownames = F,
 83 |          treeheight_row = 20,treeheight_col = 20,
 84 |          clustering_method = 'complete',
 85 |          color = colorRampPalette(c("white","#C75D30"))(50),
 86 |          annotation_names_row = F,annotation_names_col = F,
 87 |          annotation_row = data.frame(Cluster=sub$Cluster,row.names = sub$Sample),
 88 |          annotation_col = data.frame(Cluster=sub$Cluster,row.names = sub$Sample),
 89 |          annotation_colors = list(Cluster=c('C2'='#B5739D','C1'='#4E8279')))
 90 | library(export)
 91 | graph2pdf(file='cluster2.pdf',width=5.5,height=4.5)
 92 | 
 93 | # -------------------------------------------------------------------------
 94 | 
 95 | ss2 <- merge(sub,t(ss),by.x=1,by.y=0)
 96 | ss2 <- pivot_longer(ss2,3:30,names_to = 'cell',values_to = 'value')
 97 | 
 98 | ggplot(ss2,aes(cell,value,fill=Cluster))+
 99 |   geom_boxplot(outlier.colour = NA)+
100 |   stat_compare_means(label = 'p.signif')+
101 |   theme(axis.text.x = element_text(angle=90,hjust=1,vjust=0.5))
102 | 
103 | # -------------------------------------------------------------------------
104 | 
105 | TCGA_clin <- merge(sub,TCGA_clin,by=1)
106 | my <- TCGA_clin[,c(1,2,7,8,12,13)]%>%column_to_rownames('Sample')
107 | my$Age <- ifelse(my$Age>65,'>65','≤65')
108 | table(my$Age)
109 | my$Gender <- Hmisc::capitalize(my$Gender)
110 | my <- my[order(my$Cluster,my$Age,my$Stage,my$Status,my$Gender),]
111 | 
112 | ee <- t(scale(t(ee)))
113 | table(my$Cluster)
114 | 
115 | # -------------------------------------------------------------------------
116 | 
117 | my[is.na(my)] <- 'NA'
118 | my$Age <- factor(my$Age,levels = c('≤65','>65'))
119 | my$Gender <- factor(my$Gender,levels = c('Female','Male'))
120 | my$Stage <- factor(my$Stage,levels = c('I','II','III','IV','NA'))
121 | my$Cluster <- factor(my$Cluster)
122 | my$Status <- factor(my$Status)
123 | 
124 | # -------------------------------------------------------------------------
125 | 
126 | Cluster <- c('#4E8279','#B5739D')
127 | names(Cluster) <- levels(my$Cluster)
128 | Age <- c(pal_nejm(alpha = 0.9)(8)[3],'#CF4E27')
129 | names(Age) <- levels(my$Age)
130 | table(my$Gender)
131 | Gender <- c('#E0864A','rosybrown')
132 | names(Gender) <- levels(my$Gender)
133 | table(my$Stage)
134 | Stage <- c('cornsilk','paleturquoise','goldenrod','firebrick','White')
135 | names(Stage) <- levels(my$Stage)
136 | table(my$Recurrence)
137 | table(my$Status)
138 | Status <- c('lavenderblush','slategray')
139 | names(Status) <- levels(my$Status)
140 | 
141 | # -------------------------------------------------------------------------
142 | 
143 | Top = HeatmapAnnotation(Cluster=my$Cluster,
144 |                         Age=my$Age,
145 |                         Gender=my$Gender,
146 |                         Stage= my$Stage,
147 |                         Status = my$Status,
148 |                         annotation_legend_param=list(labels_gp = gpar(fontsize = 10),border = T,
149 |                                                      title_gp = gpar(fontsize = 10,fontface = "bold"),
150 |                                                      ncol=1),
151 |                         border = T,
152 |                         col=list(Cluster = Cluster,
153 |                                  Age = Age,
154 |                                  Gender = Gender,
155 |                                  Stage= Stage,
156 |                                  Status = Status
157 |                         ),
158 |                         show_annotation_name = TRUE,
159 |                         annotation_name_side="left",
160 |                         annotation_name_gp = gpar(fontsize = 10))
161 | 
162 | Heatmap(ee,name='Z-score',
163 |         top_annotation = Top,
164 |         cluster_rows = T,
165 |         col=colorRamp2(c(-2,0,2),c('#21b6af','white','#eeba4d')),#49b0d9
166 |         color_space = "RGB",
167 |         cluster_columns = FALSE,border = T,
168 |         row_order=NULL,
169 |         row_names_side = 'left',
170 |         column_order=NULL,
171 |         show_column_names = FALSE,
172 |         row_names_gp = gpar(fontsize = 9),
173 |         column_split = c(rep(1,316),rep(2,268)),
174 |         gap = unit(1, "mm"),
175 |         column_title = NULL,
176 |         column_title_gp = gpar(fontsize = 10),
177 |         show_heatmap_legend = TRUE,
178 |         heatmap_legend_param=list(labels_gp = gpar(fontsize = 10), border = T,
179 |                                   title_gp = gpar(fontsize = 10, fontface = "bold")),
180 |         column_gap = unit(2,'mm')
181 | ) 
182 | library(export)
183 | graph2pdf(file='cell-heatmap.pdf',width=9,height=5.5)
184 | 
185 | # -------------------------------------------------------------------------
186 | 
187 | 
188 | ee <- as.data.frame(ss)[,rownames(my)]
189 | tt <- cbind(Cluster=as.character(my$Cluster),t(ee))
190 | tt <- as.data.frame(tt)
191 | tt2 <- pivot_longer(tt,cols = 2:29,names_to = 'cell',values_to = 'value')
192 | tt2$value <- as.numeric(tt2$value)
193 | 
194 | # -------------------------------------------------------------------------
195 | 
196 | source("GeomSplitViolin.R")
197 | 
198 | ggplot(tt2, aes(cell,value, fill = Cluster)) + 
199 |   geom_split_violin(draw_quantiles = c(0.25, 0.5, 0.75), #画4分位线
200 |                     trim = T, #是否修剪小提琴图的密度曲线
201 |                     linetype = "solid", #周围线的轮廓
202 |                     color = "black", 
203 |                     size = 0.35,
204 |                     na.rm = T,
205 |                     position ="identity")+ #周围线粗细
206 |   ylab("Relative Infiltration") + xlab(NULL) +
207 |   scale_fill_manual(values = c('#21b6af','#eeba4d'))+
208 |   theme_classic()+
209 |   theme(axis.text.x = element_text(angle = 60, hjust = 1, vjust = 1,size=10),
210 |         legend.position=c(0.05,0.003),legend.justification = c(0,0),
211 |         axis.title.y = element_text(size=12))
212 | ggsave(filename = 'cell-boxplot.pdf',width = 8,height=4.3)
213 | 
214 | # -------------------------------------------------------------------------
215 | 
216 | save(TCGA_clin,ss,file = 'Cluster+ssgsea.Rda')
217 | 
218 | 
219 | 


--------------------------------------------------------------------------------
/Machine Learning based intergration.R:
--------------------------------------------------------------------------------
  1 | rm(list = ls())
  2 | library(survival)
  3 | library(randomForestSRC)
  4 | library(glmnet)
  5 | library(plsRcox)
  6 | library(superpc)
  7 | library(gbm)
  8 | library(CoxBoost)
  9 | library(survivalsvm)
 10 | library(dplyr)
 11 | library(tibble)
 12 | library(BART)
 13 | load('model-step.Rda')
 14 | rm(rr,rr2,rr3,geneids)
 15 | mm <- lapply(mm,function(x){
 16 |   x[,-c(1:3)] <- scale(x[,-c(1:3)])
 17 |   return(x)})
 18 | 
 19 | result <- data.frame()
 20 | est_data <- mm$TCGA
 21 | val_data_list <- mm
 22 | pre_var <- colnames(est_data)[-c(1:3)]
 23 | est_dd <- est_data[,c('OS.time','OS',pre_var)]
 24 | val_dd_list <- lapply(val_data_list,function(x){x[,c('OS.time','OS',pre_var)]})
 25 | rm(mm)
 26 | 
 27 | rf_nodesize <- 5
 28 | seed <- 1
 29 | 
 30 | ##################################
 31 | #### 1-1.RSF ####
 32 | ##################################
 33 | 
 34 | set.seed(seed)
 35 | fit <- rfsrc(Surv(OS.time,OS)~.,data = est_dd,
 36 |              ntree = 1000,nodesize = rf_nodesize,
 37 |              splitrule = 'logrank',
 38 |              importance = T,
 39 |              proximity = T,
 40 |              forest = T,
 41 |              seed = seed)
 42 | rs <- lapply(val_dd_list,function(x){cbind(x[,1:2],RS=predict(fit,newdata = x)$predicted)})
 43 | cc <- data.frame(Cindex=sapply(rs,function(x){as.numeric(summary(coxph(Surv(OS.time,OS)~RS,x))$concordance[1])}))%>%
 44 |   rownames_to_column('ID')
 45 | cc$Model <- 'RSF'
 46 | result <- rbind(result,cc)
 47 | 
 48 | ##################################
 49 | #### 2-1.Enet ####
 50 | ##################################
 51 | 
 52 | x1 <- as.matrix(est_dd[,pre_var])
 53 | x2 <- as.matrix(Surv(est_dd$OS.time,est_dd$OS))
 54 | 
 55 | for (alpha in seq(0,1,0.1)) {
 56 |   set.seed(seed)
 57 |   fit = cv.glmnet(x1, x2,family = "cox",alpha=alpha,nfolds = 10)
 58 |   rs <- lapply(val_dd_list,function(x){cbind(x[,1:2],RS=as.numeric(predict(fit,type='link',newx=as.matrix(x[,-c(1,2)]),s=fit$lambda.min)))})
 59 |   
 60 |   cc <- data.frame(Cindex=sapply(rs,function(x){as.numeric(summary(coxph(Surv(OS.time,OS)~RS,x))$concordance[1])}))%>%
 61 |     rownames_to_column('ID')
 62 |   cc$Model <- paste0('Enet','[α=',alpha,']')
 63 |   result <- rbind(result,cc)
 64 | }
 65 | 
 66 | ##################################
 67 | #### 3-1.StepCox ####
 68 | ##################################
 69 | 
 70 | for (direction in c("both", "backward", "forward")) {
 71 |   fit <- step(coxph(Surv(OS.time,OS)~.,est_dd),direction = direction)
 72 |   rs <- lapply(val_dd_list,function(x){cbind(x[,1:2],RS=predict(fit,type = 'risk',newdata = x))})
 73 |   
 74 |   cc <- data.frame(Cindex=sapply(rs,function(x){as.numeric(summary(coxph(Surv(OS.time,OS)~RS,x))$concordance[1])}))%>%
 75 |     rownames_to_column('ID')
 76 |   cc$Model <- paste0('StepCox','[',direction,']')
 77 |   result <- rbind(result,cc)
 78 | }
 79 | ##################################
 80 | #### 3-8.StepCox+survivalsvm ####
 81 | ##################################
 82 | 
 83 | for (direction in c("both", "backward")) {
 84 |   fit <- step(coxph(Surv(OS.time,OS)~.,est_dd),direction = direction)
 85 |   rid <- names(coef(fit))
 86 |   est_dd2 <- est_data[,c('OS.time','OS',rid)]
 87 |   val_dd_list2 <- lapply(val_data_list,function(x){x[,c('OS.time','OS',rid)]})
 88 |   
 89 |   fit = survivalsvm(Surv(OS.time,OS)~., data= est_dd2, gamma.mu = 1)
 90 |   rs <- lapply(val_dd_list2,function(x){cbind(x[,1:2],RS=as.numeric(predict(fit, x)$predicted))})
 91 |   cc <- data.frame(Cindex=sapply(rs,function(x){as.numeric(summary(coxph(Surv(OS.time,OS)~RS,x))$concordance[1])}))%>%
 92 |     rownames_to_column('ID')
 93 |   cc$Model <- paste0('StepCox','[',direction,']',' + survival-SVM')
 94 |   result <- rbind(result,cc)
 95 | }
 96 | 
 97 | 
 98 | ##################################
 99 | #### 4-1.CoxBoost ####
100 | ##################################
101 | 
102 | set.seed(seed)
103 | pen <- optimCoxBoostPenalty(est_dd[,'OS.time'],est_dd[,'OS'],as.matrix(est_dd[,-c(1,2)]),
104 |                             trace=TRUE,start.penalty=500,parallel = T)
105 | cv.res <- cv.CoxBoost(est_dd[,'OS.time'],est_dd[,'OS'],as.matrix(est_dd[,-c(1,2)]),
106 |                       maxstepno=500,K=10,type="verweij",penalty=pen$penalty)
107 | fit <- CoxBoost(est_dd[,'OS.time'],est_dd[,'OS'],as.matrix(est_dd[,-c(1,2)]),
108 |                 stepno=cv.res$optimal.step,penalty=pen$penalty)
109 | rs <- lapply(val_dd_list,function(x){cbind(x[,1:2],RS=as.numeric(predict(fit,newdata=x[,-c(1,2)], newtime=x[,1], newstatus=x[,2], type="lp")))})
110 | 
111 | cc <- data.frame(Cindex=sapply(rs,function(x){as.numeric(summary(coxph(Surv(OS.time,OS)~RS,x))$concordance[1])}))%>%
112 |   rownames_to_column('ID')
113 | cc$Model <- paste0('CoxBoost')
114 | result <- rbind(result,cc)
115 | 
116 | ##################################
117 | #### 5.plsRcox####
118 | ##################################
119 | 
120 | set.seed(seed)
121 | cv.plsRcox.res=cv.plsRcox(list(x=est_dd[,pre_var],time=est_dd$OS.time,status=est_dd$OS),nt=10,verbose = FALSE)
122 | fit <- plsRcox(est_dd[,pre_var],time=est_dd$OS.time,event=est_dd$OS,nt=as.numeric(cv.plsRcox.res[5]))
123 | rs <- lapply(val_dd_list,function(x){cbind(x[,1:2],RS=as.numeric(predict(fit,type="lp",newdata=x[,-c(1,2)])))})
124 | 
125 | cc <- data.frame(Cindex=sapply(rs,function(x){as.numeric(summary(coxph(Surv(OS.time,OS)~RS,x))$concordance[1])}))%>%
126 |   rownames_to_column('ID')
127 | cc$Model <- paste0('plsRcox')
128 | result <- rbind(result,cc)
129 | 
130 | ##################################
131 | #### 6.superpc####
132 | ##################################
133 | 
134 | data <- list(x=t(est_dd[,-c(1,2)]),y=est_dd$OS.time,censoring.status=est_dd$OS,featurenames=colnames(est_dd)[-c(1,2)])
135 | set.seed(seed)
136 | fit <- superpc.train(data = data,type = 'survival',s0.perc = 0.5) #default
137 | cv.fit <- superpc.cv(fit,data,n.threshold = 20,#default 
138 |                      n.fold = 10,
139 |                      n.components=3,
140 |                      min.features=5,
141 |                      max.features=nrow(data$x),
142 |                      compute.fullcv= TRUE,
143 |                      compute.preval=TRUE)
144 | rs <- lapply(val_dd_list,function(w){
145 |   test <- list(x=t(w[,-c(1,2)]),y=w$OS.time,censoring.status=w$OS,featurenames=colnames(w)[-c(1,2)])
146 |   ff <- superpc.predict(fit,data,test,threshold = cv.fit$thresholds[which.max(cv.fit[["scor"]][1,])],n.components = 1)
147 |   rr <- as.numeric(ff$v.pred)
148 |   rr2 <- cbind(w[,1:2],RS=rr)
149 |   return(rr2)
150 | })
151 | 
152 | cc <- data.frame(Cindex=sapply(rs,function(x){as.numeric(summary(coxph(Surv(OS.time,OS)~RS,x))$concordance[1])}))%>%
153 |   rownames_to_column('ID')
154 | cc$Model <- paste0('SuperPC')
155 | result <- rbind(result,cc)
156 | 
157 | 
158 | ##################################
159 | #### 7.GBM ####
160 | ##################################
161 | 
162 | set.seed(seed)
163 | fit <- gbm(formula = Surv(OS.time,OS)~.,data = est_dd,distribution = 'coxph',
164 |            n.trees = 10000,
165 |            interaction.depth = 3,
166 |            n.minobsinnode = 10,
167 |            shrinkage = 0.001,
168 |            cv.folds = 10,n.cores = 6)
169 | # find index for number trees with minimum CV error
170 | best <- which.min(fit$cv.error)
171 | set.seed(seed)
172 | fit <- gbm(formula = Surv(OS.time,OS)~.,data = est_dd,distribution = 'coxph',
173 |            n.trees = best,
174 |            interaction.depth = 3,
175 |            n.minobsinnode = 10,
176 |            shrinkage = 0.001,
177 |            cv.folds = 10,n.cores = 8)
178 | rs <- lapply(val_dd_list,function(x){cbind(x[,1:2],RS=as.numeric(predict(fit,x,n.trees = best,type = 'link')))})
179 | 
180 | cc <- data.frame(Cindex=sapply(rs,function(x){as.numeric(summary(coxph(Surv(OS.time,OS)~RS,x))$concordance[1])}))%>%
181 |   rownames_to_column('ID')
182 | cc$Model <- paste0('GBM')
183 | result <- rbind(result,cc)
184 | 
185 | 
186 | ##################################
187 | #### 8.survivalsvm ####
188 | ##################################
189 | 
190 | fit = survivalsvm(Surv(OS.time,OS)~., data= est_dd, gamma.mu = 1)
191 | 
192 | rs <- lapply(val_dd_list,function(x){cbind(x[,1:2],RS=as.numeric(predict(fit, x)$predicted))})
193 | 
194 | cc <- data.frame(Cindex=sapply(rs,function(x){as.numeric(summary(coxph(Surv(OS.time,OS)~RS,x))$concordance[1])}))%>%
195 |   rownames_to_column('ID')
196 | cc$Model <- paste0('survival-SVM')
197 | result <- rbind(result,cc)
198 | result2 <- result
199 | result2$Model <- gsub('α','a',result2$Model)
200 | library(ggplot2)
201 | library(ggsci)
202 | library(tidyr)
203 | library(ggbreak)
204 | range(result2$Cindex)
205 | result2%>%filter(ID!='TCGA')%>%
206 |   ggplot(aes(Cindex,reorder(Model,Cindex)))+
207 |   geom_bar(width = 0.7,stat = 'summary',fun='mean',fill='orange2')+
208 |   theme_classic()+
209 |   labs(y=NULL)
210 | 
211 | dd <- result2%>%
212 |   #filter(ID!='TCGA')%>%
213 |   group_by(Model)%>%
214 |   summarise(Cindex=mean(Cindex))
215 | 
216 | dd%>%
217 |   ggplot(aes(Cindex,reorder(Model,Cindex)))+
218 |   geom_bar(width=0.7,stat = 'identity',fill='orange')+
219 |   scale_x_break(c(0.05,0.53),scales = 20)
220 | 
221 | 
222 | dd2 <- pivot_wider(result2,names_from = 'ID',values_from = 'Cindex')%>%as.data.frame()
223 | dd2[,-1] <- apply(dd2[,-1],2,as.numeric)
224 | 
225 | 
226 | 
227 | 
228 | 
229 | 
230 | 
231 | 
232 | 
233 | 
234 | 
235 | 
236 | 
237 | 
238 | 
239 | 
240 | 
241 | 
242 | 
243 | 


--------------------------------------------------------------------------------
/All-signature-compare.R:
--------------------------------------------------------------------------------
  1 | rm(list = ls())
  2 | library(dplyr)
  3 | library(data.table)
  4 | library(tidyr)
  5 | library(tibble)
  6 | library(survival)
  7 | library(survminer)
  8 | library(timeROC)
  9 | library(compareC)
 10 | library(ggsci)
 11 | load('all-signature.rda')
 12 | 
 13 | uni <- data.frame()
 14 | for (i in names(tmp2)) {
 15 |   dd <- tmp2[[i]]
 16 |   for(j in colnames(dd)[4:ncol(dd)]){
 17 |     scox <- summary(coxph(Surv(OS.time,OS)~get(j),dd))
 18 |     p <- compareC(dd$OS.time,dd$OS,dd$IRLS,dd[,j])$pval
 19 |     uni <- rbind(uni,data.frame(ID=i,A=j,
 20 |                                 HR=scox$conf.int[,1],
 21 |                                 HR.95L=scox$conf.int[,3],
 22 |                                 HR.95R=scox$conf.int[,4],
 23 |                                 pvalue=scox$coefficients[,5],
 24 |                                 cindex=scox$concordance[1]%>%as.numeric(),
 25 |                                 cse=scox$concordance[2]%>%as.numeric(),
 26 |                                 CP=p))
 27 |   }
 28 | }
 29 | 
 30 | unique(uni$ID)
 31 | dd <- uni[uni$ID=='TCGA-CRC',]
 32 | dd$ll <- ifelse(dd$CP<0.0001,'****',ifelse(dd$CP<0.001,'***',ifelse(dd$CP<0.01,'**',ifelse(dd$CP<0.05,'*',''))))
 33 | rownames(dd) <- NULL
 34 | 
 35 | ggplot(dd,aes(cindex,reorder(A,cindex)))+
 36 |   geom_errorbarh(aes(xmax=cindex+1.5*cse,xmin=cindex-1.5*cse),color="black",height=0,size=0.7)+
 37 |   geom_point(size=4,shape=21,fill=pal_nejm()(10)[1])+
 38 |   ylab(NULL)+xlab(NULL)+
 39 |   labs(title ="TCGA-CRC")+
 40 |   geom_vline(xintercept = 0.6,linetype='dashed',size=0.5,color='grey50')+
 41 |   theme_bw(base_rect_size = 1)+
 42 |   theme(panel.grid =element_blank(),
 43 |         axis.text.y = element_text(size=12),
 44 |         axis.text.x = element_text(size=12),
 45 |         axis.title = element_text(size=13),
 46 |         plot.title = element_text(hjust = 0.5,size=15),
 47 |         legend.position = 'none',
 48 |         strip.text = element_text(size=14))+ 
 49 |   geom_text(aes(x=0.89,y=A,label=ll),color='black',size=3,vjust=0.76)+
 50 |   scale_x_continuous(breaks = c(0.5,0.7,0.9),limits = c(0.4,0.94))
 51 | 
 52 | # -------------------------------------------------------------------------
 53 | 
 54 | unique(uni$ID)
 55 | dd <- uni[uni$ID=='GSE17536',]
 56 | dd$ll <- ifelse(dd$CP<0.0001,'****',ifelse(dd$CP<0.001,'***',ifelse(dd$CP<0.01,'**',ifelse(dd$CP<0.05,'*',''))))
 57 | 
 58 | ggplot(dd,aes(cindex,reorder(A,cindex)))+
 59 |   geom_errorbarh(aes(xmax=cindex+1.5*cse,xmin=cindex-1.5*cse),color="black",height=0,size=0.7)+
 60 |   geom_point(size=4,shape=21,fill=pal_nejm()(10)[2])+
 61 |   ylab(NULL)+xlab(NULL)+
 62 |   labs(title ="GSE17536")+
 63 |   geom_vline(xintercept = 0.6,linetype='dashed',size=0.5,color='grey50')+
 64 |   theme_bw(base_rect_size = 1)+
 65 |   theme(panel.grid =element_blank(),
 66 |         axis.text.y = element_text(size=12),
 67 |         axis.text.x = element_text(size=12),
 68 |         axis.title = element_text(size=13),
 69 |         plot.title = element_text(hjust = 0.5,size=15),
 70 |         legend.position = 'none',
 71 |         strip.text = element_text(size=14))+ 
 72 |   geom_text(aes(x=0.71,y=A,label=ll),color='black',size=3,vjust=0.76)+
 73 |   scale_x_continuous(breaks = c(0.5,0.6,0.7))
 74 | 
 75 | # -------------------------------------------------------------------------
 76 | 
 77 | unique(uni$ID)
 78 | dd <- uni[uni$ID=='GSE17537',]
 79 | dd$ll <- ifelse(dd$CP<0.0001,'****',ifelse(dd$CP<0.001,'***',ifelse(dd$CP<0.01,'**',ifelse(dd$CP<0.05,'*',''))))
 80 | rownames(dd) <- NULL
 81 | 
 82 | 
 83 | ggplot(dd,aes(cindex,reorder(A,cindex)))+
 84 |   geom_errorbarh(aes(xmax=cindex+1.5*cse,xmin=cindex-1.5*cse),color="black",height=0,size=0.7)+
 85 |   geom_point(size=4,shape=21,fill=pal_nejm()(10)[3])+
 86 |   ylab(NULL)+xlab(NULL)+
 87 |   labs(title ="GSE17537")+
 88 |   geom_vline(xintercept = 0.6,linetype='dashed',size=0.5,color='grey50')+
 89 |   theme_bw(base_rect_size = 1)+
 90 |   theme(panel.grid =element_blank(),
 91 |         axis.text.y = element_text(size=12),
 92 |         axis.text.x = element_text(size=12),
 93 |         axis.title = element_text(size=13),
 94 |         plot.title = element_text(hjust = 0.5,size=15),
 95 |         legend.position = 'none',
 96 |         strip.text = element_text(size=14))+ 
 97 |   geom_text(aes(x=0.85,y=A,label=ll),color='black',size=3,vjust=0.76)+
 98 |   scale_x_continuous(breaks = c(0.4,0.6,0.8),limits = c(0.38,0.88))
 99 | 
100 | # -------------------------------------------------------------------------
101 | 
102 | unique(uni$ID)
103 | dd <- uni[uni$ID=='GSE29621',]
104 | dd$ll <- ifelse(dd$CP<0.0001,'****',ifelse(dd$CP<0.001,'***',ifelse(dd$CP<0.01,'**',ifelse(dd$CP<0.05,'*',''))))
105 | rownames(dd) <- NULL
106 | 
107 | 
108 | ggplot(dd,aes(cindex,reorder(A,cindex)))+
109 |   geom_errorbarh(aes(xmax=cindex+1.5*cse,xmin=cindex-1.5*cse),color="black",height=0,size=0.7)+
110 |   geom_point(size=4,shape=21,fill=pal_nejm()(10)[4])+
111 |   ylab(NULL)+xlab(NULL)+
112 |   labs(title ="GSE29621")+
113 |   geom_vline(xintercept = 0.6,linetype='dashed',size=0.5,color='grey50')+
114 |   theme_bw(base_rect_size = 1)+
115 |   theme(panel.grid =element_blank(),
116 |         axis.text.y = element_text(size=12),
117 |         axis.text.x = element_text(size=12),
118 |         axis.title = element_text(size=13),
119 |         plot.title = element_text(hjust = 0.5,size=15),
120 |         legend.position = 'none',
121 |         strip.text = element_text(size=14))+ 
122 |   geom_text(aes(x=0.79,y=A,label=ll),color='black',size=3,vjust=0.76)+
123 |   scale_x_continuous(breaks = c(0.4,0.6,0.8),limits = c(0.36,0.82))
124 | 
125 | 
126 | unique(uni$ID)
127 | dd <- uni[uni$ID=='GSE38832',]
128 | dd$ll <- ifelse(dd$CP<0.0001,'****',ifelse(dd$CP<0.001,'***',ifelse(dd$CP<0.01,'**',ifelse(dd$CP<0.05,'*',''))))
129 | rownames(dd) <- NULL
130 | ggplot(dd,aes(cindex,reorder(A,cindex)))+
131 |   geom_errorbarh(aes(xmax=cindex+1.5*cse,xmin=cindex-1.5*cse),color="black",height=0,size=0.7)+
132 |   geom_point(size=4,shape=21,fill=pal_nejm()(10)[5])+
133 |   ylab(NULL)+xlab(NULL)+
134 |   labs(title ="GSE38832")+
135 |   geom_vline(xintercept = 0.6,linetype='dashed',size=0.5,color='grey50')+
136 |   theme_bw(base_rect_size = 1)+
137 |   theme(panel.grid =element_blank(),
138 |         axis.text.y = element_text(size=12),
139 |         axis.text.x = element_text(size=12),
140 |         axis.title = element_text(size=13),
141 |         plot.title = element_text(hjust = 0.5,size=15),
142 |         legend.position = 'none',
143 |         strip.text = element_text(size=14))+ 
144 |   geom_text(aes(x=0.78,y=A,label=ll),color='black',size=3,vjust=0.76)+
145 |   scale_x_continuous(breaks = c(0.4,0.6,0.8))
146 | 
147 | unique(uni$ID)
148 | dd <- uni[uni$ID=='GSE39582',]
149 | dd$ll <- ifelse(dd$CP<0.0001,'****',ifelse(dd$CP<0.001,'***',ifelse(dd$CP<0.01,'**',ifelse(dd$CP<0.05,'*',''))))
150 | rownames(dd) <- NULL
151 | 
152 | ggplot(dd,aes(cindex,reorder(A,cindex)))+
153 |   geom_errorbarh(aes(xmax=cindex+1.5*cse,xmin=cindex-1.5*cse),color="black",height=0,size=0.7)+
154 |   geom_point(size=4,shape=21,fill=pal_nejm()(10)[6])+
155 |   ylab(NULL)+xlab(NULL)+
156 |   labs(title ="GSE39582")+
157 |   geom_vline(xintercept = 0.6,linetype='dashed',size=0.5,color='grey50')+
158 |   theme_bw(base_rect_size = 1)+
159 |   theme(panel.grid =element_blank(),
160 |         axis.text.y = element_text(size=12),
161 |         axis.text.x = element_text(size=12),
162 |         axis.title = element_text(size=13),
163 |         plot.title = element_text(hjust = 0.5,size=15),
164 |         legend.position = 'none',
165 |         strip.text = element_text(size=14))+ 
166 |   geom_text(aes(x=0.69,y=A,label=ll),color='black',size=3,vjust=0.76)+
167 |   scale_x_continuous(breaks = c(0.5,0.6,0.7))
168 | 
169 | 
170 | unique(uni$ID)
171 | dd <- uni[uni$ID=='GSE72970',]
172 | dd$ll <- ifelse(dd$CP<0.0001,'****',ifelse(dd$CP<0.001,'***',ifelse(dd$CP<0.01,'**',ifelse(dd$CP<0.05,'*',''))))
173 | rownames(dd) <- NULL
174 | 
175 | ggplot(dd,aes(cindex,reorder(A,cindex)))+
176 |   geom_errorbarh(aes(xmax=cindex+1.5*cse,xmin=cindex-1.5*cse),color="black",height=0,size=0.7)+
177 |   geom_point(size=4,shape=21,fill=pal_nejm()(10)[7])+
178 |   ylab(NULL)+xlab(NULL)+
179 |   labs(title ="GSE72970")+
180 |   geom_vline(xintercept = 0.6,linetype='dashed',size=0.5,color='grey50')+
181 |   theme_bw(base_rect_size = 1)+
182 |   theme(panel.grid =element_blank(),
183 |         axis.text.y = element_text(size=12),
184 |         axis.text.x = element_text(size=12),
185 |         axis.title = element_text(size=13),
186 |         plot.title = element_text(hjust = 0.5,size=15),
187 |         legend.position = 'none',
188 |         strip.text = element_text(size=14))+ 
189 |   geom_text(aes(x=0.69,y=A,label=ll),color='black',size=3,vjust=0.76)+
190 |   scale_x_continuous(breaks = c(0.5,0.6,0.7))
191 | 
192 | unique(uni$ID)
193 | dd <- uni[uni$ID=='Meta-Cohort',]
194 | dd$ll <- ifelse(dd$CP<0.0001,'****',ifelse(dd$CP<0.001,'***',ifelse(dd$CP<0.01,'**',ifelse(dd$CP<0.05,'*',''))))
195 | rownames(dd) <- NULL
196 | 
197 | ggplot(dd,aes(cindex,reorder(A,cindex)))+
198 |   geom_errorbarh(aes(xmax=cindex+1.5*cse,xmin=cindex-1.5*cse),color="black",height=0,size=0.7)+
199 |   geom_point(size=4,shape=21,fill=pal_nejm()(10)[8])+
200 |   ylab(NULL)+xlab(NULL)+
201 |   labs(title ="Meta-Cohort")+
202 |   geom_vline(xintercept = 0.6,linetype='dashed',size=0.5,color='grey50')+
203 |   theme_bw(base_rect_size = 1)+
204 |   theme(panel.grid =element_blank(),
205 |         axis.text.y = element_text(size=12),
206 |         axis.text.x = element_text(size=12),
207 |         axis.title = element_text(size=13),
208 |         plot.title = element_text(hjust = 0.5,size=15),
209 |         legend.position = 'none',
210 |         strip.text = element_text(size=14))+ 
211 |   geom_text(aes(x=0.69,y=A,label=ll),color='black',size=3,vjust=0.76)+
212 |   scale_x_continuous(breaks = c(0.5,0.6,0.7))
213 | 
214 | # -------------------------------------------------------------------------
215 | 
216 | uni$x <- ifelse(uni$pvalue<0.05&uni$HR>1,'Risky',
217 |                 ifelse(uni$pvalue<0.05&uni$HR<1,'Protective','P > 0.05'))
218 | xx <- pivot_wider(uni[,c(1,2,10)],names_from = 'ID',values_from = 'x')
219 | xx <- column_to_rownames(xx,'A')
220 | 
221 | library(ComplexHeatmap)
222 | library(circlize)
223 | 
224 | ss <- read.csv('signature.csv')[,3:4]
225 | ss <- distinct(ss,Author,.keep_all = T)
226 | ss <- ss[ss$Author%in%rownames(xx),]
227 | ss[107,] <- c('lncRNA','IRLS')
228 | rownames(ss) <- ss$Author
229 | ss <- ss[order(ss$Type,ss$Author),]
230 | rownames(ss) <- NULL
231 | xx <- xx[ss$Author,]
232 | 
233 | right = HeatmapAnnotation(Signature=ss$Type,
234 |                           annotation_legend_param=list(labels_gp = gpar(fontsize = 12),border = T,
235 |                                                        title_gp = gpar(fontsize = 12,fontface = "bold"),
236 |                                                        ncol=1),
237 |                           border = T,which = 'column',
238 |                           col=list(Signature = c('lncRNA'=pal_npg(alpha = 0.8)(10)[3],'mRNA'=pal_nejm(alpha = 0.8)(8)[3])),
239 |                           show_annotation_name = F,
240 |                           annotation_name_gp = gpar(fontsize = 12))
241 | Heatmap(t(xx),col = c('#F2F2F2',pal_npg()(2)[2],pal_npg()(2)[1]),name = 'Type',
242 |         rect_gp = gpar(col='grey70'),
243 |         top_annotation = right,
244 |         row_names_side = 'left',
245 |         heatmap_legend_param=list(labels_gp = gpar(fontsize = 12), border = T,
246 |                                   title_gp = gpar(fontsize = 12, fontface = "bold")))
247 | 
248 | 
249 | 
250 | 


--------------------------------------------------------------------------------
/clinical validation.R:
--------------------------------------------------------------------------------
  1 | rm(list = ls())
  2 | library(dplyr)
  3 | library(data.table)
  4 | library(tidyr)
  5 | library(tibble)
  6 | library(survival)
  7 | library(survminer)
  8 | library(timeROC)
  9 | library(pROC)
 10 | library(export)
 11 | library(ggsci)
 12 | 
 13 | load('sampleclin.rda')
 14 | 
 15 | # -------------------------------------------------------------------------
 16 | 
 17 | kmp <- function(data,legend,main){
 18 |   mytheme <- theme_survminer(font.legend = c(14,"plain", "black"),
 19 |                              font.x = c(14,"plain", "black"),
 20 |                              font.y = c(14,"plain", "black"),
 21 |                              legend = "top")
 22 |   cut <- surv_cutpoint(data,'OS.time','OS','RS',minprop = 0.1)
 23 |   cat <- surv_categorize(cut)
 24 |   fit <- survfit(Surv(OS.time,OS)~RS,cat)
 25 |   pp <- ggsurvplot(fit,data = cat,
 26 |                    palette= pal_nejm()(2),
 27 |                    conf.int=FALSE,size=1.3,
 28 |                    pval=T,pval.method = T,
 29 |                    legend.labs=c('High-risk','Low-risk'), 
 30 |                    legend.title="",
 31 |                    legend=legend,
 32 |                    xlab="Time(years)",
 33 |                    ylab='Overall survival',
 34 |                    ggtheme = mytheme)
 35 |   return(ggpar(pp,main = main))
 36 | }
 37 | 
 38 | table(my$IC)
 39 | kmp(my,legend = c(0.84,0.97),main = NULL)
 40 | graph2pdf(file='All-patients.pdf',width=4.2,height=4)
 41 | 
 42 | # -------------------------------------------------------------------------
 43 | 
 44 | kmp <- function(data,legend,main){
 45 |   mytheme <- theme_survminer(font.legend = c(14,"plain", "black"),
 46 |                              font.x = c(14,"plain", "black"),
 47 |                              font.y = c(14,"plain", "black"),
 48 |                              legend = "top")
 49 |   cut <- surv_cutpoint(data,'RFS.time','RFS','RS',minprop = 0.1)
 50 |   cat <- surv_categorize(cut)
 51 |   fit <- survfit(Surv(RFS.time,RFS)~RS,cat)
 52 |   pp <- ggsurvplot(fit,data = cat,
 53 |                    palette= pal_nejm()(2),
 54 |                    conf.int=FALSE,size=1.3,
 55 |                    pval=T,pval.method = T,
 56 |                    legend.labs=c('High-risk','Low-risk'), 
 57 |                    legend.title="",
 58 |                    legend=legend,
 59 |                    xlab="Time(years)",
 60 |                    ylab='Recurrence-free survival',
 61 |                    ggtheme = mytheme)
 62 |   return(ggpar(pp,main = main))
 63 | }
 64 | 
 65 | kmp(my,legend = c(0.84,0.97),main = NULL)
 66 | graph2pdf(file='All-patients-rfs.pdf',width=4.2,height=4)
 67 | 
 68 | #kmp(my[my$IC==1,],legend = c(0.84,0.97),main = 'Patients with ICI (n = 65)')
 69 | #graph2pdf(file='Patients with immunotherapy-rfs.pdf',width=4.2,height=4)
 70 | 
 71 | #kmp(my[my$IC==0,],legend = c(0.84,0.97),main = 'Patients without ICI (n = 167)')
 72 | #graph2pdf(file='Patients without immunotherapy-rfs.pdf',width=4.2,height=4)
 73 | 
 74 | # -------------------------------------------------------------------------
 75 | 
 76 | x <- summary(coxph(Surv(OS.time,OS)~.,my[,c(1,2,5:13)]))
 77 | y <- data.frame(id=rownames(x$coefficients),
 78 |                 HR=x$coefficients[,2],
 79 |                 HR.95L=x$conf.int[,"lower .95"],
 80 |                 HR.95H=x$conf.int[,'upper .95'],
 81 |                 pvalue=x$coefficients[,"Pr(>|z|)"])
 82 | 
 83 | y <- y[c(1:6,10,7:9),]
 84 | y[,-1] <- apply(y[,-1],2,as.numeric)
 85 | 
 86 | rt <- y
 87 | gene <- rt$id
 88 | hr <- sprintf("%.3f",rt$"HR")
 89 | hrLow  <- sprintf("%.3f",rt$"HR.95L")
 90 | hrHigh <- sprintf("%.3f",rt$"HR.95H")
 91 | Hazard.ratio <- paste0(hr," (",hrLow,"-",hrHigh,")")
 92 | pVal <- ifelse(rt$pvalue<0.001, "<0.001", sprintf("%.3f", rt$pvalue))
 93 | 
 94 | if(T){
 95 |   n <- nrow(rt)
 96 |   nRow <- n+1
 97 |   ylim <- c(1,nRow)
 98 |   layout(matrix(c(1,2),nc=2),width=c(3,2))
 99 |   
100 |   #绘制森林图左边的基因信息
101 |   xlim = c(0,2.6)
102 |   par(mar=c(4,2.5,2,0))
103 |   plot(1,xlim=xlim,ylim=ylim,type="n",axes=F,xlab="",ylab="")
104 |   text.cex=1
105 |   text(0.5,n:1,gene,adj=0,cex=text.cex)
106 |   text(1.65,n:1,pVal,adj=1,cex=text.cex);
107 |   text(1.5+0.2,n+1,'P-value',cex=text.cex,font=2,adj=1)
108 |   text(2.6,n:1,Hazard.ratio,adj=1,cex=text.cex)
109 |   text(2.45,n+1,'HR (95% CI)',cex=text.cex,font=2,adj=1,)
110 |   
111 |   #绘制森林图
112 |   par(mar=c(4,0,2,1),mgp=c(2,0.5,0))
113 |   xlim = c(0,6)
114 |   plot(1,xlim=xlim,ylim=ylim,type="n",axes=F,ylab="",xaxs="i",xlab="")
115 |   arrows(as.numeric(hrLow),n:1,as.numeric(hrHigh),n:1,angle=0,code=3,length=0.05,col="black",lwd=2.5)
116 |   abline(v=1,col="gray",lty=2,lwd=1.5)
117 |   boxcolor = '#67AB9F'
118 |   points(as.numeric(hr), n:1, pch = 15, cex=2,col = boxcolor)
119 |   axis(1)
120 | }
121 | library(export)
122 | graph2pdf(file="inhouse-multicox.pdf", width = 10,height = 4.7)
123 | 
124 | # -------------------------------------------------------------------------
125 | 
126 | x <- summary(coxph(Surv(RFS.time,RFS)~.,my[,c(3,4,5:13)]))
127 | y <- data.frame(id=rownames(x$coefficients),
128 |                 HR=x$coefficients[,2],
129 |                 HR.95L=x$conf.int[,"lower .95"],
130 |                 HR.95H=x$conf.int[,'upper .95'],
131 |                 pvalue=x$coefficients[,"Pr(>|z|)"])
132 | y <- y[c(1:6,10,7:9),]
133 | 
134 | rt <- y
135 | gene <- rt$id
136 | hr <- sprintf("%.3f",rt$"HR")
137 | hrLow  <- sprintf("%.3f",rt$"HR.95L")
138 | hrHigh <- sprintf("%.3f",rt$"HR.95H")
139 | Hazard.ratio <- paste0(hr," (",hrLow,"-",hrHigh,")")
140 | pVal <- ifelse(rt$pvalue<0.001, "<0.001", sprintf("%.3f", rt$pvalue))
141 | 
142 | if(T){
143 |   n <- nrow(rt)
144 |   nRow <- n+1
145 |   ylim <- c(1,nRow)
146 |   layout(matrix(c(1,2),nc=2),width=c(3,2))
147 |   
148 |   #绘制森林图左边的基因信息
149 |   xlim = c(0,2.6)
150 |   par(mar=c(4,2.5,2,0))
151 |   plot(1,xlim=xlim,ylim=ylim,type="n",axes=F,xlab="",ylab="")
152 |   text.cex=1
153 |   text(0.5,n:1,gene,adj=0,cex=text.cex)
154 |   text(1.65,n:1,pVal,adj=1,cex=text.cex);
155 |   text(1.5+0.2,n+1,'P-value',cex=text.cex,font=2,adj=1)
156 |   text(2.6,n:1,Hazard.ratio,adj=1,cex=text.cex)
157 |   text(2.45,n+1,'HR (95% CI)',cex=text.cex,font=2,adj=1,)
158 |   
159 |   #绘制森林图
160 |   par(mar=c(4,0,2,1),mgp=c(2,0.5,0))
161 |   xlim = c(0,5)
162 |   plot(1,xlim=xlim,ylim=ylim,type="n",axes=F,ylab="",xaxs="i",xlab="")
163 |   arrows(as.numeric(hrLow),n:1,as.numeric(hrHigh),n:1,angle=0,code=3,length=0.05,col="black",lwd=2.5)
164 |   abline(v=1,col="gray",lty=2,lwd=1.5)
165 |   boxcolor = '#67AB9F'
166 |   points(as.numeric(hr), n:1, pch = 15, cex=2,col = boxcolor)
167 |   axis(1)
168 | }
169 | graph2pdf(file="inhouse-multicox-rfs.pdf", width = 10,height = 4.7)
170 | 
171 | # -------------------------------------------------------------------------
172 | 
173 | # -------------------------------------------------------------------------
174 | 
175 | tt <- timeROC(my$OS.time,my$OS,my$RS,cause = 1,weighting = 'marginal',times = c(1,3,5),ROC = T)
176 | tp <- tt$TP%>%as.data.frame()%>%pivot_longer(cols = 1:3,names_to = 'time',values_to = 'tp')
177 | fp <- tt$FP%>%as.data.frame()%>%pivot_longer(cols = 1:3,names_to = 'time',values_to = 'fp')
178 | 
179 | dd <- tp
180 | dd$fp <- fp$fp
181 | dd$time <- ifelse(dd$time=='t=1',"1-Year = 0.840",
182 |                   ifelse(dd$time=='t=3','3-Year = 0.776','5-Year = 0.818'))
183 | 
184 | ggplot(dd,aes(fp,tp,color=time))+
185 |   geom_line(size=1)+
186 |   labs(x='1-Specificity',y='Sensitivity',color=NULL)+
187 |   theme_bw(base_rect_size = 1.5)+
188 |   geom_abline(slope = 1,color='grey70')+
189 |   theme(panel.grid =element_blank(),
190 |         axis.text = element_text(size=11),
191 |         axis.title = element_text(size=13),
192 |         legend.text = element_text(size=12),
193 |         legend.position = c(0.995,0.012),
194 |         legend.justification = c(1,0))+
195 |   scale_color_nejm()+
196 |   scale_x_continuous(expand = c(0.01,0.01))+
197 |   scale_y_continuous(expand = c(0.01,0.01))
198 | ggsave(filename = 'inhouse-timeROC.pdf',width = 4.3,height = 4)
199 | 
200 | # -------------------------------------------------------------------------
201 | 
202 | fit <- roc(x$RR,x$RS,auc=T)
203 | 
204 | dd <- data.frame(x=1-fit$specificities,y=fit$sensitivities)
205 | dd <- dd[order(dd$y,dd$x),]
206 | dd$AUC <- 'IRLS'
207 | tmp <- dd
208 | ggplot(tmp,aes(x,y))+
209 |   geom_line(aes(group=AUC))
210 | 
211 | tmp$AUC2 <- ifelse(tmp$AUC=='IRLS','IRLS= 0.897',ifelse(tmp$AUC=='PD-L1','PD-L1 = 0.686 ***','CD8A= 0.725 **'))
212 | 
213 | tmp$AUC2 <- factor(tmp$AUC2,levels = unique(tmp$AUC2))
214 | ggplot(tmp,aes(x,y,color=AUC2))+
215 |   geom_line(size=1)+
216 |   labs(x='1-Specificity',y='Sensitivity',color=NULL)+
217 |   theme_bw(base_rect_size = 1.5)+
218 |   geom_abline(slope = 1,color='grey70')+
219 |   theme(panel.grid =element_blank(),
220 |         axis.text = element_text(size=11),
221 |         axis.title = element_text(size=13),
222 |         legend.text = element_text(size=12),
223 |         legend.position = c(0.995,0.012),
224 |         legend.justification = c(1,0))+
225 |   scale_color_nejm()+
226 |   scale_x_continuous(expand = c(0.01,0.01))+
227 |   scale_y_continuous(expand = c(0.01,0.01))
228 | ggsave(filename = 'IC-inhouse-timeROC.pdf',width = 4.3,height = 4)
229 | 
230 | 
231 | 
232 | if(F){
233 |   plot.roc(fit,print.thres = T,print.auc = T)
234 |   pdf(file = 'IC-roc.pdf',width = 4,height = 4)
235 |   plot.roc(fit,
236 |            axes=T, ## 是否显示xy轴
237 |            legacy.axes=T,
238 |            main=NULL, ## Title
239 |            col= "steelblue", ## 曲线颜色
240 |            lty=1, ## 曲线形状
241 |            lwd=3, ## 曲线粗细
242 |            identity=T, ## 是否显示对角线
243 |            identity.col="grey60", ## 对角线颜色
244 |            identity.lty=2, ## 对角线形状
245 |            identity.lwd=2, ## 对角线粗细
246 |            print.thres=F, ## 是否输出cut-off值
247 |            print.thres.pch=20, ## cut-off点的形状
248 |            print.thres.col="red", ## cut-off点和文本的颜色
249 |            print.thres.cex=1.2, 
250 |            print.auc=T, ## 是否显示AUC
251 |            print.auc.pattern="AUC = 0.896", ## 展示AUC的格式
252 |            auc.polygon.border='darkred',
253 |            print.auc.x=0.48, ## AUC值的X位置
254 |            print.auc.y=0.13, ## AUC值的Y位置
255 |            print.auc.cex=1.2, ## AUC值的放大倍数
256 |            print.auc.col='black', ## ACU值的颜色
257 |            auc.polygon=TRUE, ## 是否将ROC下面积上色
258 |            auc.polygon.col='skyblue', 
259 |            max.auc.polygon=TRUE,
260 |            max.auc.polygon.col='WhiteSmoke',
261 |            max.auc.polygon.lty=1
262 |   )
263 |   dev.off()
264 | }
265 | # -------------------------------------------------------------------------
266 | 
267 | # -------------------------------------------------------------------------
268 | 
269 | library(plyr)
270 | ggdata <- table(x$gg,x$RR)%>%as.data.frame()
271 | ggdata2 <- ddply(ggdata,'Var1',transform,percent_Freq=Freq/sum(Freq)*100)
272 | ggdata2 <- ddply(ggdata2,'Var1',transform,lable=cumsum(percent_Freq)-0.5*percent_Freq)
273 | ggdata2$ll <- paste0(round(ggdata2$percent_Freq/100,2)*100,'%')
274 | ggdata2$Var2 <- factor(ifelse(ggdata2$Var2==0,'NR','R'),levels = c('R','NR'))
275 | 
276 | ggplot(ggdata2,aes(Var1,percent_Freq,fill=Var2))+
277 |   geom_bar(stat = 'identity',width = 0.85)+
278 |   xlab(NULL)+ylab('Fraction (%)')+
279 |   geom_text(aes(label=ll),size=3.8,
280 |             position = position_stack(vjust = 0.5),
281 |             color='white')+
282 |   scale_fill_manual(values = pal_npg(alpha = 0.9)(2)[1:2])+
283 |   theme_classic()+
284 |   scale_x_discrete(expand = c(0.3,0.2))+
285 |   scale_y_continuous(expand = c(0.01,0.0))+
286 |   ggtitle('****')+
287 |   theme(legend.position = 'right',
288 |         plot.title = element_text(hjust=0.5,face = 'plain'),
289 |         axis.title = element_text(size=13),
290 |         axis.text.y = element_text(size=10),
291 |         axis.text.x = element_text(size=12))+
292 |   labs(fill=NULL)
293 | ggsave(filename = 'IC-bar.pdf',height = 4,width = 3.2)
294 | 
295 | chisq.test(table(x$gg,x$RR))
296 | 
297 | # -------------------------------------------------------------------------
298 | 
299 | ggplot(x2,aes(reorder(ID,RS),RS,fill=Res))+
300 |   geom_bar(stat = 'identity',width = 0.7,color='grey30',position = position_dodge2(width = 0.9))+
301 |   scale_fill_nejm()+labs(y='IRLS score')+
302 |   theme_classic(base_rect_size = 2)+
303 |   theme(axis.line.x = element_blank(),
304 |         axis.ticks.x = element_blank(),
305 |         axis.text.x = element_blank(),
306 |         axis.title.x = element_blank(),
307 |         legend.title = element_blank(),
308 |         axis.text.y = element_text(size=12),
309 |         axis.title.y = element_text(size=13),
310 |         legend.text = element_text(size=12),
311 |         legend.position = c(0.01,1),
312 |         legend.justification = c(0,1))
313 | 
314 | # -------------------------------------------------------------------------
315 | 
316 | library(compareC)
317 | tt <- my
318 | dd <- data.frame()
319 | for (i in colnames(my)[5:13]) {
320 |   fit <- summary(coxph(Surv(OS.time,OS)~get(i),tt))
321 |   CC <- fit$concordance[1]%>%as.numeric()
322 |   se <- fit$concordance[2]%>%as.numeric()
323 |   p <- compareC(tt$OS.time,tt$OS,tt$RS,tt[,i])$pval
324 |   dd <- rbind(dd,data.frame(ID=i,C=CC,SE=se,P=p))
325 | }
326 | 
327 | ggplot(dd,aes(ID,C,fill=ID))+
328 |   geom_bar(stat='identity',position=position_dodge(0.8),width=0.6)+
329 |   geom_errorbar(aes(ymax=C+1.5*SE,ymin=C-1.5*SE),
330 |                 width=0.1,position = position_dodge(0.8),size=0.6)+
331 |   theme_bw(base_rect_size = 1.5)+
332 |   ggtitle('C-index (Compared with IRLS)')+
333 |   theme(axis.title = element_blank(),
334 |         axis.text = element_text(size=12),
335 |         legend.position = 'none',
336 |         panel.grid = element_blank(),
337 |         plot.title = element_text(hjust = 0.5,size=14),
338 |         strip.text = element_text(size=12),
339 |         axis.ticks.x = element_blank())+
340 |   scale_fill_npg()+
341 |   scale_y_continuous(expand = c(0,0.01),limits = c(0,0.85))+
342 |   geom_text(aes(y=0.80,label=ll),size=5)
343 | 
344 | 
345 | 
346 | 
347 | 
348 | 
349 | 
350 | 
351 | 
352 | 
353 | 
354 | 
355 | 


--------------------------------------------------------------------------------