├── License.md ├── README.md ├── docs ├── index.html ├── javascripts │ └── scale.fix.js └── stylesheets │ ├── pygment_trac.css │ └── styles.css ├── etc ├── Lymphocyte_Percentage.png ├── TILAb_Score.png ├── TILAb_Score_train.png ├── TILAb_Score_valid.png ├── flow_diagram.png └── results.png ├── models └── MobileNet.hdf5 ├── results ├── TILAb_Score_train.png └── TILAb_Score_valid.png ├── src ├── inference.py ├── survival_analysis.r ├── survival_utils.r ├── til_quantification.py └── training.py └── survival_data └── README.md /License.md: -------------------------------------------------------------------------------- 1 | TILAb score quantify the aboundance of Tumour Infiltrating Lymphocytes (TIL) in a Whole Slide Image. 2 | 3 | Copyright (C) 2019 TIA-Lab 4 | 5 | This program is free software: you can redistribute it and/or modify 6 | it under the terms of the GNU General Public License as published by 7 | the Free Software Foundation, either version 3 of the License, or 8 | any later version. 9 | 10 | This program is distributed in the hope that it will be useful, 11 | but WITHOUT ANY WARRANTY; without even the implied warranty of 12 | MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the 13 | GNU General Public License for more details. 14 | 15 | You should have received a copy of the GNU General Public License 16 | along with this program. If not, see . 17 | -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | ![IF FORKED DO NOT REMOVE](etc/flow_diagram.png) 2 | 3 | # [A Novel Digital Score for Abundance of Tumour Infiltrating Lymphocytes Predicts Disease Free Survival in Oral Squamous Cell Carcinoma](https://tia-lab.github.io/TILAb-Score/) 4 | 5 | ### Table of Contents 6 | 0. [Introduction](#introduction) 7 | 0. [Citation](#citation) 8 | 0. [Dataset](#Dataset) 9 | 0. [Model](#model) 10 | 0. [Prerequisites](#prerequisites) 11 | 0. [License](#License) 12 | 13 | ### Introduction 14 | 15 | This repository contains the implementation of TILAb-score as described in the paper. 16 | 17 | ### Citation 18 | 19 | The journal paper on this work has been published in [Nature Scientific Reports](https://www.nature.com/articles/s41598-019-49710-z#Sec17). If you use this code in your research, please cite this work: 20 | 21 | @article{shaban2019novel, 22 | title={A novel Digital Score for Abundance of Tumour Infiltrating Lymphocytes predicts Disease free Survival in oral Squamous cell carcinoma}, 23 | author={Shaban, Muhammad and Khurram, Syed Ali and Fraz, Muhammad Moazam and Alsubaie, Najah and Masood, Iqra and Mushtaq, Sajid and Hassan, Mariam and Loya, Asif and Rajpoot, Nasir M}, 24 | journal={Scientific reports}, 25 | volume={9}, 26 | number={1}, 27 | pages={1--13}, 28 | year={2019}, 29 | publisher={Nature Publishing Group} 30 | } 31 | 32 | ### Dataset 33 | The datset for training should be organized in following hierarchy: 34 | ``` 35 | dataset 36 | -- train 37 | -- 0_Stroma 38 | -- 1_Non_ROI 39 | -- 2_Tumour 40 | -- 3_Lymphocyte 41 | -- valid 42 | -- 0_Stroma 43 | -- 1_Non_ROI 44 | -- 2_Tumour 45 | -- 3_Lymphocyte 46 | ``` 47 | Please contact Prof. Nasir Rajpoot (n.m.rajpoot@warwick.ac.uk) for dataset related queries. 48 | 49 | ### Training 50 | The training.py file in `src/` directory will train the model using the dataset in `dataset/` directory. You may need to tune the hyperparameters for training on your own dataset to train an optimal model. 51 | 52 | ### Model 53 | The trained model used to produce the results in the paper is available in the `models/` directory. 54 | 55 | ### Prerequisites 56 | Following software packages will be required to run this code: 57 | 58 | ``` 59 | -- Python 3.5 60 | -- tensorflow-gpu=1.8.0 61 | -- keras=2.1.6 62 | -- openslide 63 | -- opencv_python 64 | -- scipy 65 | -- R packages 66 | -- survival 67 | -- survMisc 68 | -- gdata 69 | -- ggplot2 70 | -- survminer 71 | -- rms 72 | ``` 73 | ## Authors 74 | 75 | See the list of [contributors](https://github.com/TIA-Lab/TILAb_Score/graphs/contributors) who participated in this project. 76 | 77 | ## License 78 | 79 | This project is licensed under the GNU General Public License - see the [LICENSE.md](https://github.com/TIA-Lab/TILAb_Score/blob/master/License.md) file for details. 80 | -------------------------------------------------------------------------------- /docs/index.html: -------------------------------------------------------------------------------- 1 | 2 | 3 | 4 | 5 | 6 | TILAb-Score 7 | 8 | 9 | 10 | 11 | 14 | 15 | 16 |
17 |
18 |

TILAb-Score

19 |

A measure to quantify TIL abundance in Histology Images

20 |

View the Project on GitHub TIA-Lab/TILAb-Score

21 | 26 | 27 | Table of Contents 28 |
Introduction
29 |
Dataset
30 |
Method
31 |
Results
32 |
Citation
33 | 34 |
35 |
36 |

A Novel Digital Score for Abundance of Tumour Infiltrating Lymphocytes Predicts Disease Free Survival in Oral Squamous Cell Carcinoma

37 | 38 |

The paper has been published in Nature Scientific Reports journal.

39 | 40 |

Introduction

41 |

Tumour Infiltrating Lymphocytes (TILs) are the lymphocytes that found within or in the vicinity of tumour. Numerous studies have reported the correlation of TIL density with improved overall survival (OS) and longer disease-free survival (DFS). TILAb-Score is a pronostic biomarker which quantify the TIL abundance based on lymphocytes to tumour ratio and their colocalization.

42 | 43 |

Dataset

44 |

The dataset consists of 70 cases including 60 OSCC and 10 control cases. These cases are collected from the patients in Shaukat Khanum Memorial Cancer Hospital & Research Centre (SKMT) Lahore, Pakistan. The malignent cases were splitted into two equal sized subsets, one for modelling and the other for test. For the classification of biologically significant regions, more than half million regions (belonging to different classes such as tumour, lymphocytes, and stroma) were marked by an expert pathologist in all WSIs of the modelling cohort. The annotations were then used for training and validation of the proposed method.

45 |

Please contact Prof. Nasir Rajpoot (n.m.rajpoot@warwick.ac.uk) for dataset related queries.

46 |

Method

47 |

Whole slide images are multi-gigapixel images and cannot be used directly for image analysis tasks particularly training a deep learning based classifier. Therefore, we divide the WSIs into small regions (patches) for processing. A deep learning based classifier is applied on the patches to identify whether the patch contains tumour, lymphocytes or other histological primitives. However, the regions where the lymphocytes are infiltrating the tumour may not be confined within a patch. Besides, there is considerable variation in the size of TIL regions, making the quantification of TILs a non-trivial task. We address this issue by adopting the widely accepted definition of TILs, i.e., lymphocytes that lie in the neighbourhood of tumour areas. The patch labels predicted as lymphocytes or tumour are then used to compute a statistical measure of co-localization, which is further incorporated into the computation of the TILAb score of lymphocytic infiltration.

48 |

49 | 50 |

51 |

Results

52 |

The prognostic significance of TILAb score for DFS is investigated using Kaplan-Meier (KM) curves. Kaplan-Meier curves in following figures show that the proposed TILAb score is significantly associated with long term (low risk) DFS of OSCC patients (p = 0.0039). However, the lymphocytic percentage in a WSI without any correlation with tumour does not show any statistical significance.

53 |

54 | 55 |

56 |

Citation

57 | The journal paper on this work has been published in Nature Scientific Reports and publicly available. If you use the code in your research, please cite this work: 58 |
@article{shaban2019novel,
59 |   title={A novel Digital Score for Abundance of Tumour Infiltrating Lymphocytes predicts Disease free Survival in oral Squamous cell carcinoma},
60 |   author={Shaban, Muhammad and Khurram, Syed Ali and Fraz, Muhammad Moazam and Alsubaie, Najah and Masood, Iqra and Mushtaq, Sajid and Hassan, Mariam and Loya, Asif and Rajpoot, Nasir M},
61 |   journal={Scientific reports},
62 |   volume={9},
63 |   number={1},
64 |   pages={1--13},
65 |   year={2019},
66 |   publisher={Nature Publishing Group}
67 | }
68 | 69 |
70 | 73 |
74 | 75 | 76 | 77 | -------------------------------------------------------------------------------- /docs/javascripts/scale.fix.js: -------------------------------------------------------------------------------- 1 | var metas = document.getElementsByTagName('meta'); 2 | var i; 3 | if (navigator.userAgent.match(/iPhone/i)) { 4 | for (i=0; i 0: # process the remaining patches 51 | probs = model.predict(np.array(batch)) 52 | patch_prob = np.vstack([patch_prob, probs]) 53 | 54 | # coverting patch_probs into prob_map where third dimension represents the probabilities of four classes 55 | prob_map = np.reshape(patch_prob, (np.int64(np.floor(rows/patch_size)), np.int64(np.floor(cols/patch_size)),4)) 56 | 57 | # wsi name without extension 58 | wsi_name, _ = os.path.splitext(wsi_name) 59 | 60 | # saving prob_maps as npy for future use 61 | np.save('../results/%s_prob_map.npy' % wsi_name, prob_map) 62 | 63 | # saving prob_maps as png images for ease of visualization 64 | cv.imwrite('../results/%s_stroma.png' % wsi_name, np.uint8(prob_map[:, :, 0] * 100)) 65 | cv.imwrite('../results/%s_non_roi.png' % wsi_name, np.uint8(prob_map[:, :, 1] * 100)) 66 | cv.imwrite('../results/%s_tumour.png' % wsi_name, np.uint8(prob_map[:, :, 2] * 100)) 67 | cv.imwrite('../results/%s_lymphocyte.png' % wsi_name, np.uint8(prob_map[:, :, 3] * 100)) 68 | 69 | # TIL quantification through TILAb score using given neighbourhood 70 | tilab_score = tq.TILAb_Score(prob_map, cell_size=16) # cell sizes used in the paper are: 4, 6, 8 ,10, 12, 14, 16, and 18 71 | print('TILAb Score: %0.04f' % tilab_score) 72 | -------------------------------------------------------------------------------- /src/survival_analysis.r: -------------------------------------------------------------------------------- 1 | ## remove (almost) everything in the working environment. 2 | ## You will get no warning, so don't do this unless you are really sure. 3 | rm(list = ls()) 4 | 5 | # add utility functions to the path 6 | source("survival_utils.r") 7 | 8 | output_dir <- "../results/" 9 | legend_title = "TILAb Score:" 10 | legend = c("High Risk", "Low Risk") 11 | 12 | # loading survival data required for analysis 13 | # CSV file should consist of three columns (Event, Time, TILAb_Score) 14 | train <- read.csv(file="../survival_data/dummy_train.csv", header=TRUE, sep=",") 15 | valid <- read.csv(file="../survival_data/dummy_valid.csv", header=TRUE, sep=",") 16 | 17 | Uni_Variate_Analysis(train, valid, save_plot = FALSE, output_dir=output_dir, feature_name = "TILAb_Score", legend_title = legend_title, legend_string = legend) -------------------------------------------------------------------------------- /src/survival_utils.r: -------------------------------------------------------------------------------- 1 | # load packages required for the code in file 2 | 3 | library(survival) 4 | library(survMisc) 5 | library(gdata) 6 | library(ggplot2) 7 | library(survminer) 8 | library(rms) 9 | 10 | ########################################################################################### 11 | # Utility Functions # 12 | ########################################################################################### 13 | 14 | best_threshold <- function(feature, time, event, no_quantiles = 13) { 15 | 16 | testrange=seq(0.01, 0.99, len=no_quantiles) 17 | p <- sapply(testrange, function(q){ 18 | dat <- data.frame(x=feature>quantile(feature, q, na.rm=T),S=Surv(time,event)) 19 | fit <- survfit(Surv(time,event) ~ x, data=dat) 20 | test <- survdiff(Surv(time,event) ~ x, data=dat, rho=0) 21 | p.val <- 1 - pchisq(test$chisq, length(test$n) - 1) 22 | p.val}) 23 | 24 | # get best threshold 25 | q <- testrange[which.min(p)] 26 | th <- quantile(feature, q, na.rm=T) 27 | 28 | return(th) 29 | } 30 | 31 | KM_curve <- function(feature, time, event, threshold, save_plot = FALSE, filename = "", legend_title = "", legend_string = "") { 32 | 33 | temp <- feature > threshold 34 | temp <- as.numeric(temp) 35 | 36 | surv_data = data.frame(feature,time,event,temp) 37 | 38 | S <- Surv(time,event) 39 | test <- survdiff(S ~ temp, data= surv_data) 40 | best_p_value <- pchisq(test$chisq, length(test$n)-1, lower.tail = FALSE) 41 | 42 | res.cox <- coxph(Surv(time,event) ~ temp, data = surv_data) 43 | x <- summary(res.cox) 44 | 45 | HR <-signif(x$coef[2], digits=2)#exp(beta) 46 | CIlower <- signif(x$conf.int[,"lower .95"], 2) 47 | CIupper <- signif(x$conf.int[,"upper .95"], 2) 48 | LogLikely <- signif(x$logtest['pvalue'],3) 49 | LogRank <- signif(x$sctest['pvalue'],3) 50 | Wald <-signif(x$wald['pvalue'],3) 51 | 52 | 53 | if(save_plot == TRUE){ 54 | # survfit(Surv(time,event) ~ temp, data=surv_data) 55 | survp <- ggsurvplot(survfit(Surv(time,event) ~ temp, data=surv_data), data=as.data.frame(surv_data), 56 | legend.labs = legend_string, pval = TRUE, conf.int = FALSE, legend.title = legend_title, 57 | font.main=16,font.x = 16,font.y = 16,font.legend=16,font.tickslab=16, 58 | # Add risk table 59 | risk.table = FALSE, 60 | tables.height = 0.25, 61 | ggtheme = theme_bw()) 62 | print(survp, newpage = FALSE) 63 | dev.copy(png,filename) 64 | dev.off() 65 | } 66 | stats <- c(threshold, HR, CIlower, CIupper, LogLikely, LogRank, Wald) 67 | names(stats) <- c("Threshold","HR","CI(95%)-low","CI(95%)-high","LogLikely-pvalue","LogRank-pvalue","Wald-pvalue") 68 | return(stats) 69 | } 70 | 71 | # This function take one feature along with survival even and time data for both train and valid splits 72 | # Output of this function is the KM curve and other survival statistics e.g. HR, CI, and pvalues etc. 73 | Uni_Variate_Analysis <- function(train, valid, save_plot = FALSE, output_dir="", feature_name = "", legend_title = "", legend_string = "") { 74 | 75 | thr <- best_threshold(feature = train[,3], time = train[,2], event = train[,1]) 76 | 77 | filename=sprintf("%s/%s_train.png", output_dir, feature_name) 78 | train_stats <- KM_curve(feature = train[,3], time = train[,2], event = train[,1], thr, save_plot = save_plot, filename = filename, legend_title = legend_title, legend_string = legend_string) 79 | print("Train Stats") 80 | print(train_stats) 81 | 82 | filename=sprintf("%s/%s_valid.png", output_dir, feature_name) 83 | valid_stats <- KM_curve(feature = valid[,3], time = valid[,2], event = valid[,1], thr, save_plot = save_plot, filename = filename, legend_title = legend_title, legend_string = legend_string) 84 | print("Test Stats") 85 | print(train_stats) 86 | } -------------------------------------------------------------------------------- /src/til_quantification.py: -------------------------------------------------------------------------------- 1 | import numpy as np 2 | 3 | def TILAb_Score(prob_map, cell_size): 4 | # prob_map: MxNx4 numpy array contains the probabilities 5 | # cell_size: number of patch to be consider as one grid-cell 6 | 7 | pred_map = np.argmax(prob_map, axis=2) 8 | 9 | T = np.int8(pred_map == 2) # patches predicted as tumour 10 | L = np.int8(pred_map == 3) # patches predicted as lymphocyte 11 | 12 | [rows, cols] = T.shape 13 | stride = np.int32(cell_size / 2) 14 | 15 | t = np.zeros(len(range(0, rows - cell_size + 1, stride))*len(range(0, cols - cell_size + 1, stride))) 16 | l = np.zeros(len(range(0, rows - cell_size + 1, stride))*len(range(0, cols - cell_size + 1, stride))) 17 | k = 0 18 | 19 | # probability of tumour and lymphocytes in each grid cell 20 | for i in range(0, rows - cell_size + 1, stride): 21 | for j in range(0, cols - cell_size + 1, stride): 22 | t[k] = np.mean(np.mean(T[i:i + cell_size, j:j + cell_size])) 23 | l[k] = np.mean(np.mean(L[i:i + cell_size, j:j + cell_size])) 24 | k += 1 25 | 26 | # removing grid cells with no tumour and lymphocytes regions 27 | index = np.logical_and(t == 0, l == 0) 28 | index = np.where(index)[0] 29 | 30 | t = np.delete(t, index) 31 | l = np.delete(l, index) 32 | 33 | tilab_score = 0.0 34 | if len(t) == 0: # ideally each WSI should have some tumour or lymphocyte region to get a sensible TILAb-score 35 | tilab_score = 1 # if there is no tumour then its good for patients long term survival 36 | else: 37 | # normalizaing the percentage of tumour and lymphocyte range to [0-1] in a grid-cell 38 | t = t/(t + l) 39 | l = l/(t + l) 40 | 41 | # Morisita-Horn Index based colocalization socre 42 | coloc_score = (2 * sum(t*l)) / (sum(t**2) + sum(l**2)) 43 | if np.sum(t) == 0: 44 | tilab_score = 1 # when only lymphocytes are present 45 | else: 46 | l2t_ratio = np.sum(l) / np.sum(t) # lymphocyte to tumour ratio 47 | tilab_score = 0.5 * coloc_score * l2t_ratio # final TILAb-score 48 | 49 | return tilab_score -------------------------------------------------------------------------------- /src/training.py: -------------------------------------------------------------------------------- 1 | import keras 2 | ####################################################################################################################### 3 | # The hierarchy of dataset directary should be as follow: 4 | # dataset 5 | # --> train 6 | # --> 0_Stroma 7 | # --> 1_Non_ROI 8 | # --> 2_Tumour 9 | # --> 3_Lymphocyte 10 | # --> valid 11 | # --> 0_Stroma 12 | # --> 1_Non_ROI 13 | # --> 2_Tumour 14 | # --> 3_Lymphocyte 15 | train_dataset_path = './dataset/train' 16 | valid_dataset_path = './dataset/valid' 17 | 18 | # training data generator 19 | train_datagen = keras.preprocessing.image.ImageDataGenerator(zca_whitening=False, horizontal_flip=True, vertical_flip=True) 20 | train_generator = train_datagen.flow_from_directory(train_dataset_path, target_size=(224, 224), batch_size=64, shuffle=True, seed=10, class_mode='categorical') 21 | 22 | # validation data generator 23 | valid_datagen = keras.preprocessing.image.ImageDataGenerator(zca_whitening=False, horizontal_flip=False, vertical_flip=False) 24 | valid_generator = valid_datagen.flow_from_directory(train_dataset_path, target_size=(224, 224), batch_size=64, shuffle=False, seed=10, class_mode='categorical') 25 | 26 | # model training and validation 27 | model = keras.applications.mobilenet.MobileNet(include_top=True, weights=None, classes=4) 28 | model.compile(loss='categorical_crossentropy', optimizer='rmsprop', metrics=['accuracy']) 29 | model.fit_generator(train_generator, epochs=2, verbose=1, validation_data=valid_generator) 30 | model.save('../models/MobileNet.hdf5') -------------------------------------------------------------------------------- /survival_data/README.md: -------------------------------------------------------------------------------- 1 | This directory should contain the CSV files with survival data for train and valid set. 2 | Each CSV should contain three columns with column header as Event, Time, TILAb_Score. --------------------------------------------------------------------------------