├── Analysis of Glioma Immunotherpay scRNA-Seq Libraries ├── 01- Extracting Myeloid Cells Matrix from the published Seurat Object ├── 02- Calculating usages of the recurrent myeloid programs in the published immunotherapy dataset ├── 03- Generating quadrant plot for the myeloid cells in the published glioma dataset to indicate myeloid cells in responders vs non-responders and for SIGLEC9 expression ├── 04 - Generating boxplots for the published glioma immunotherapy dataset for percentage of cells positive for SIGLEC9 expression or immunomodulatory programs usage └── 05 - Generating boxplot for percentage of cells double positive for SIGLEC9 expression and immunomodulatory program usage in responders vs non-responders ├── Bulk ATAC-Seq Analysis ├── 01- Trimming fastq files to remove nextera transposase adaptors ├── 02- Mapping Trimmed Fastqs using STAR ├── 03- Removing Duplicate Reads from mapped libraries ├── 04- Remove reads mapping to chrM ├── 05- Final sorting of processed mapped files ├── 06- Creating normalized bigwigs for visualization ├── 07- Shifting loci to correct for tn5 bias ├── 08- Calling peaks using the bed files with corrected loci ├── 09- Determining Differential Accessible Sites using HOMER ├── 10- Identifying motifs enriched in differential accessible sites └── 11- Creating deeptools heatmap for the differential accessible sites between DMSO and p300i ├── Creation of discretized scRNA-Seq Expression matrix ├── 01- Align 10X V3 Publiashed normal brain scRNA-Seq Libraries ├── 02- Seurat for Processing Adult Normal Brain scRNA-Seq libraries ├── 03 - Seurat for combining Discrete cells from MGB cohort with normal brain cells ├── 04- Extracting Discrete Myeloid Cells for Subsequent Marker Identification using COMET and SCENIC ├── 05- COMET for Marker Identification └── 06- SCENIC Analysis for identification of Regulons governing molecular circuitry in myeloid immunomodulatory programs ├── Deconvolution of Bulk Datasets ├── 1- Creating Gene Sets ├── 2- Calculating Module scores using Seurat in TCGA Glioma Cohorts ├── 3- Calculating Module scores using Seurat in GLASS Glioma Cohorts ├── 4- Calculating Module scores using Seurat in G-SAM Glioblastoma Cohorts ├── 5- Preparing CIBERSORTx single cell reference matrix ├── 6- Estimate cell types fractions in TCGA Matrix ├── 7- Estimate cell types fractions in GLASS Matrix ├── 8- Estimate cell types fractions in G-SAM Matrix ├── 9- Normalization of the Module Scores ├── CIBERSORTx_Input │ ├── Discrete_LowR_Cells.txt │ └── Readme ├── Gene Sets │ ├── Complement_Immunosuppressive.txt │ ├── Endothelial.txt │ ├── IL1B_Inflamm.txt │ ├── Inflamm_Microglia.txt │ ├── Instructions │ ├── Macrophage.txt │ ├── Malignant2.txt │ ├── Malignant3.txt │ ├── Malignant4.txt │ ├── Malignant6.txt │ ├── Malignant7.txt │ ├── Memory_Like_Tcells.txt │ ├── Microglia.txt │ ├── Monocyte.txt │ ├── Neutrophils.txt │ ├── Oligo.txt │ ├── Pericytes.txt │ ├── Scavenger.txt │ ├── Terminal_Effector_Tcells.txt │ ├── Treg.txt │ └── cDC.txt └── x10- Survival Analysis ├── Figure 1 Visualizations ├── MGB Cohort Heatmap myeloid programs Gene Expression ├── Mcgill Cohort Heatmap myeloid programs Gene Expression ├── Quadrant plot generation with dots └── Quadrant plot generation with piecharts ├── Identifying recurrent programs in Myeloid Cells in Gliomas (Related to Figure 1) ├── 01- Identifying Variable Genes in MGB Cohort (Round 1) ├── 02- Round 1 cNMF for myeloid cells in MGB cohort ├── 03- Identifying Variable Genes in Houston Cohort (Round 1) ├── 04- Round 1 cNMF for myeloid cells in Houston cohort ├── 05- Identifying Variable Genes in Jackson's Cohort (Round 1) ├── 06- Round 1 cNMF for myeloid cells in Jackson's cohort ├── 07- Identifying union gene lists suitable for Round 2 NMF from all the cohorts ├── 08- Round 2 cNMF for myeloid cells in MGB cohort ├── 09- Round 2 cNMF for myeloid cells in Houston cohort ├── 10- Round 2 cNMF for myeloid cells in Jackson's cohort ├── 11- Identifying consensus programs ├── 12- Averaging the spectra and calculating usages of the consensus programs in all myeloid cells of all cohorts and └── 13 - Calculating the usages of the consensus myeloid programs in the validation Mcgill cohort ├── LICENSE ├── MAESTER ├── 01- Trimming R2 reads to include high quality bases only ├── 02- Extracting barcode and UMI sequences from R1 and transferring them to read names in processed R2 fastqs ├── 03- Mapping Maester libraries ├── 04- Make the STAR aligner bam file compatible with MAEGATK ├── 05- Run MAEGATK to obtain single cell level values of counts and coverages ├── 06 - Obtain Counts Matrix and Coverage Matrix at single cell level from MAEGATK output ├── 07- Low Resolution Pesduobulking of count matrices ├── 08- Low Resolution Pesduobulking of Coverage matrices ├── 09- Calculating VAFs from pseudobulked counts and coverage matrices ├── 10- Selection of Variants of Interest ├── 11- High-resolution Pseudobulking of Myeloid cell population in the Primary Tumor ├── 12 - Calculating GSVA enrichment for variants categories in myeloid cell identities in tumor microenvironment ├── 13- Visualizations (Dotplot) └── 14- Visualizations (Stacked Columns) ├── Processing of GBO scRNA-Seq libraries (Related to Figure 7) ├── 01- Aligning all GBO Seq-Well scRNA-Seq libraries ├── 02- Seurat Processing for BWH911 to obtain Raw Counts Matrix ├── 03- Calculating the usage of all cell types NMF programs in BWH911 GBO to extract non-doublet myeloid myeloid cells ├── 04- Calculating the usage of myeloid NMF programs in Myeloid cells of BWH911 GBO ├── 05- Seurat Processing for GBOs treated with DMSO and GNE to obtain Raw Counts Matrix ├── 06- Calculating the usage of all cell types NMF programs in GBOs treated with DMSO and GNE to extract non-doublet myeloid cells └── 07- Calculating the usage of myeloid NMF programs in Myeloid cells of GBOs treated with DMSO or GNE ├── Processing of scRNA-Seq Files (Related to Figure 1) ├── 01- Align SeqWell scRNA-Seq Libraries ├── 02- Align 10X V3 scRNA-Seq Libraries ├── 03- Align 10X V2 scRNA-Seq Libraries ├── 04- Seurat for Processing MGB Cohort.R ├── 05- Identifying Variable Genes for NMF in MGB Cohort ├── 06- cNMF for annotating cells in MGB cohort ├── 07- Copy number Variation Analysis ├── 08- Seurat with Batch correction for Myeloid Cells in MGB cohort ├── 09- Seurat for Processing Houston Cohort.R ├── 10- Calculate usage matrix in Houston Cohort for cNMF cell annotation programs identified in MGB cohort ├── 11- Seurat for Processing Jackson's Cohort.R ├── 12- Calculate usage matrix in Jackson's Cohort for cNMF cell annotation programs identified in MGB cohort ├── 13- Seurat for Processing Mcgill Cohort.R ├── 14- Calculate usage matrix in Mcgill Cohort for cNMF cell annotation programs identified in MGB cohort ├── 15- Annotation and Doublet Detection.pdf └── Reference Cells │ ├── Jackson's reference Cells │ ├── MGB Cohort Reference Cells │ ├── McGill Cohort Reference Cells │ └── Methodist Cohort Reference Cells ├── README.md ├── Spatial_transcriptomics ├── 01-make_adata.ipynb ├── 02-select_genes.ipynb ├── 03-merge_adata.ipynb ├── 04-cnmf ├── 05-meta_programs_usage.ipynb ├── 06-RCTD_sc_reference_v2.R ├── 07-RCTD_make_pucks_all.R ├── 08-run_RCTD_all.R ├── 09-RCTD_tocsv_all_patients.R ├── 10-integrate_external_and_distances_all_samples.ipynb ├── 11-corr_env_rctd.ipynb ├── 12-spatial_enrichment_regression_env.ipynb ├── 13-spatial_enrichment_regression_rctd_no_lim_network_plot.ipynb └── 14- ScatterPie Visualization of the niches └── scATAC-Seq Analyses ├── 01- Processing GBM C3L_02705 snATAC-Seq library ├── 02- Processing GBM C3L_03405 snATAC-Seq library ├── 03- Processing GBM C3L_03968 snATAC-Seq library ├── 04- Processing GBM C3N_00662 snATAC-Seq library ├── 05 - Processing GBM C3N_00663 snATAC-Seq library ├── 06- Processing GBM C3N_01334 snATAC-Seq library ├── 07- Processing GBM C3N_01518 snATAC-Seq library ├── 08- Processing GBM C3N_01798 snATAC-Seq library ├── 09- Processing GBM C3N_01814 snATAC-Seq library ├── 10- Processing GBM C3N_01816 snATAC-Seq library ├── 11- Processing GBM C3N_01818 snATAC-Seq library ├── 12- Processing GBM C3N_02181 snATAC-Seq library ├── 13- Processing GBM C3N_02186 snATAC-Seq library ├── 14- Processing GBM C3N_02188 snATAC-Seq library ├── 15- Processing GBM C3N_02769 snATAC-Seq library ├── 16- Processing GBM C3N_02783 snATAC-Seq library ├── 17- Processing GBM C3N_02784 snATAC-Seq library ├── 18- Processing GBM C3N_03186 snATAC-Seq library ├── 19- Processing GBML018G1 snATAC-Seq library ├── 20- Processing GBML019G1 snATAC-Seq library ├── 21- Merging all processed snATAC-Seq libraries into one object ├── 22- Calculating all cell types programs usages in the combined gene activities matrix ├── 23- Extracting Myeloid Cells from the Combined snATAC Object and Calculating Gene Activities ├── 24- Calculating myeloid program usages in myeloid cells of the combined snATAC object ├── 25- Extracting Discrete Myeloid Cells from the combined GBM snATAC object ├── 26- Generating Normalized bigwigs for the pseudobulked discrete myeloid cells ├── 27- Generating Deeptools heatmaps for specific peaks for the immunomodulatory discrete myeloid cells └── 28- Identifying enriched motifs in specific immunomodulatory peaks using monaLISA /Analysis of Glioma Immunotherpay scRNA-Seq Libraries/01- Extracting Myeloid Cells Matrix from the published Seurat Object: -------------------------------------------------------------------------------- 1 | ####### Seurat Object obtained from: Mei, Y., Wang, X., Zhang, J. et al. Siglec-9 acts as an immune-checkpoint molecule on macrophages in glioblastoma, restricting T-cell priming and immunotherapy response. Nat Cancer 4, 1273–1291 (2023). https://doi.org/10.1038/s43018-023-00598-9 ####### 2 | 3 | ####### Seurat object for the study was downloaded from https://figshare.com/articles/dataset/Single-cell_and_spatial_transcriptomic_profiling_of_human_glioblastomas/22434341. The name of the file is "GBM.RNA.integrated.24.rds" ####### 4 | 5 | ############Inside R#################### 6 | 7 | library(dplyr) 8 | library(Seurat) 9 | 10 | Immunotherapy <- readRDS("GBM.RNA.integrated.24.rds") 11 | 12 | Immunotherapy <- UpdateSeuratObject(Immunotherapy) 13 | 14 | write.table(Immunotherapy@meta.data, file="./GBM_Immunotherapy_MetaData.txt", sep="\t", col.names=NA, quote=FALSE) 15 | 16 | 17 | Myeloid <- subset(x = Immunotherapy, subset = anno_ident == c("Macrophages","Microglial", "Monocytes", "cDCs")) 18 | 19 | Myeloid_Matrix <- GetAssayData(Myeloid, slot = "counts") 20 | 21 | Myeloid_Genes <- read.table("/seq/epiprod02/Chadi/Glioblastoma/Myeloid_NMF_Average_Gene_Spectra.txt", sep="\t", head=TRUE, row.names=1) 22 | 23 | Myeloid_Genes2 <- rownames(Myeloid_Genes) 24 | 25 | Myeloid_Matrix2 <- Myeloid_Matrix[rownames(Myeloid_Matrix) %in% Myeloid_Genes2,] 26 | 27 | write.table(t(as.matrix(Myeloid_Matrix2)), file="./GBM_Immunotherapy_Myeloid_cNMF.txt", sep="\t", col.names=NA, quote=FALSE) 28 | 29 | dim(Myeloid_Matrix2) ######### to find out the number for --numgenes in the next step (calculation script (2225)) ################# 30 | 31 | -------------------------------------------------------------------------------- /Analysis of Glioma Immunotherpay scRNA-Seq Libraries/02- Calculating usages of the recurrent myeloid programs in the published immunotherapy dataset: -------------------------------------------------------------------------------- 1 | cnmf prepare --output-dir ./Calculate_Usage_Myeloid/ --name Calculate_Usage_Myeloid -c GBM_Immunotherapy_Myeloid_cNMF.txt -k 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 --n-iter 500 --total-workers 1 --seed 14 --numgenes 2225; 2 | 3 | python; 4 | 5 | ################################## Inside Python ################################# 6 | 7 | import sklearn 8 | import sklearn.decomposition 9 | from sklearn.decomposition import non_negative_factorization 10 | import numpy as np 11 | import scanpy as sc 12 | import csv 13 | import scipy 14 | import pandas as pd 15 | 16 | 17 | H = pd.read_table("Myeloid_NMF_Average_Gene_Spectra.txt", sep="\t", index_col=0) 18 | 19 | H2 = H.T 20 | 21 | X = sc.read_h5ad('Calculate_Usage_Myeloid/Calculate_Usage_Myeloid/cnmf_tmp/Calculate_Usage_Myeloid.norm_counts.h5ad') 22 | 23 | X2 = X.X.toarray() 24 | 25 | X3 = pd.DataFrame(data=X2, columns = X.var_names , index = X.obs.index) 26 | 27 | H3 = H2.filter(items = X3.columns) 28 | 29 | H4 = H3.to_numpy() 30 | 31 | X5 = X2.astype(np.float64) 32 | 33 | test = sklearn.decomposition.non_negative_factorization(X5, W=None, H=H4, n_components= 14, init='random', update_H=False, solver='cd', beta_loss='frobenius', tol=0.0001, max_iter=1000, alpha=0.0, alpha_W=0.0, alpha_H='same', l1_ratio=0.0, regularization=None, random_state=None, verbose=0, shuffle=False) 34 | 35 | test2 = list(test) 36 | 37 | pd.DataFrame(test2[0], columns= H.columns, index=X.obs.index).to_csv(path_or_buf="./GBM_ImmunotherapyCN_Myeloid_cNMF_Usages.txt", sep="\t", quoting=csv.QUOTE_NONE) 38 | -------------------------------------------------------------------------------- /Analysis of Glioma Immunotherpay scRNA-Seq Libraries/03- Generating quadrant plot for the myeloid cells in the published glioma dataset to indicate myeloid cells in responders vs non-responders and for SIGLEC9 expression: -------------------------------------------------------------------------------- 1 | ######## "Quadrant_Plot_Immunotherapy.txt" contains all non-doublet myeloid cells in the published data sets with the usage values of the immunomodulatory programs. It also contains SIGLEC9 expression. 2 | ######## Xaxis is calculated by subtracting the usage of Complement Immunosuppression from the usage of IL1B pro-inflammatory ( Usage of IL1B Inflam - Usage of Complement ) 3 | ######## Yaxis is calculated by subtracting the usage of Scavenger Immunosuppression from the usage of RHOB pro-inflammatory ( Usage of RHOB Inflammatory - Usage of Scavenger ) 4 | 5 | library(ggplot2) 6 | library(dplyr) 7 | library(scatterpie) 8 | 9 | data4 <- read.table("Quadrant_Plot_Immunotherapy.txt", sep="\t", head=TRUE, row.names=1) 10 | 11 | data5 <- data4[sample(nrow(data4)), ] 12 | 13 | 14 | pdf("Immunotherapy_NC_responders.pdf", height = 6, width = 7.5) 15 | 16 | ggplot(data5, aes(Xaxis, Yaxis)) + geom_point(aes(colour = Treatment), size=0.05) + scale_color_manual(values=c("nonresponder"="gray90", "responder"="black")) + theme(panel.grid.major = element_blank(), panel.grid.minor = element_blank(), panel.background = element_blank()) + labs(color=NULL) + scale_x_continuous(limits = c(-100, 100)) + scale_y_continuous(limits = c(-100, 100)) 17 | 18 | dev.off() 19 | 20 | 21 | 22 | pdf("Immunotherapy_NC_non_responders.pdf", height = 6, width = 7.5) 23 | 24 | ggplot(data5, aes(Xaxis, Yaxis)) + geom_point(aes(colour = Treatment), size=0.05) + scale_color_manual(values=c("responder"="gray90", "nonresponder"="black")) + theme(panel.grid.major = element_blank(), panel.grid.minor = element_blank(), panel.background = element_blank()) + labs(color=NULL) + scale_x_continuous(limits = c(-100, 100)) + scale_y_continuous(limits = c(-100, 100)) 25 | 26 | dev.off() 27 | 28 | 29 | 30 | pdf("Immunotherapy_NC_SIGLEC9_Expression_Quadrant_V7_Blackgrey.pdf", height = 6, width = 7.5) 31 | 32 | ggplot(data4 %>% arrange(SIGLEC9), aes(Xaxis, Yaxis)) + geom_point(aes(colour = SIGLEC9), size=0.05) + scale_color_gradient2(low="grey90", mid="grey90", high="black", midpoint = 0.5, space="Lab", limit=c(0,2), na.value="black") + theme(panel.grid.major = element_blank(), panel.grid.minor = element_blank(), panel.background = element_blank()) + labs(color=NULL) + scale_x_continuous(limits = c(-100, 100)) + scale_y_continuous(limits = c(-100, 100)) 33 | 34 | dev.off() 35 | 36 | 37 | -------------------------------------------------------------------------------- /Analysis of Glioma Immunotherpay scRNA-Seq Libraries/04 - Generating boxplots for the published glioma immunotherapy dataset for percentage of cells positive for SIGLEC9 expression or immunomodulatory programs usage: -------------------------------------------------------------------------------- 1 | ######## Immunotherapy_SIGLEC9_Boxplot.txt contains the percentage of myeloid cells in each responder and non-responder tumors that are positive for SIGLEC9 expression and the usage of the four immunomodulatory programs 2 | 3 | library(tidyr) 4 | library(ggplot2) 5 | 6 | 7 | data <- read.table("Immunotherapy_SIGLEC9_Boxplot.txt", head=TRUE, row.names=1, sep="\t") 8 | 9 | data$RowName <- rownames(data) 10 | 11 | 12 | long_df <- gather(data, key = "NMF", value = "Value", -RowName, -Treatment) 13 | 14 | data2 <- long_df[long_df$NMF %in% c("Percentage_SIGLEC9","Scavenger","Complement","Rhob","IL1B"),] 15 | 16 | 17 | 18 | data2$NMF <- factor(data2$NMF, levels = c("Percentage_SIGLEC9","Scavenger","Complement","Rhob","IL1B")) 19 | 20 | data2$Treatment <- factor(data2$Treatment, levels = c("responder", "nonresponder")) 21 | 22 | 23 | 24 | ggplot(data2, aes(x = factor(NMF), y = Value, fill = Treatment)) + 25 | geom_boxplot(width = 0.7, position = position_dodge(width = 0.7)) + geom_point(position = position_dodge(width = 0.7), aes(y = Value), color = "black", size = 1) + stat_boxplot(geom ='errorbar', width = 0.35, position = position_dodge(width = 0.7)) + theme(panel.background = element_blank(), panel.grid.major = element_blank(), panel.grid.minor = element_blank(), panel.border = element_rect(color = "black", fill = NA)) + scale_fill_manual(values = c("responder" = "white", "nonresponder" = "#808080")) 26 | -------------------------------------------------------------------------------- /Analysis of Glioma Immunotherpay scRNA-Seq Libraries/05 - Generating boxplot for percentage of cells double positive for SIGLEC9 expression and immunomodulatory program usage in responders vs non-responders: -------------------------------------------------------------------------------- 1 | ######## Double_Positive_SIGLEC9_NMF_Boxplot.txt contains the percentage of myeloid cells in each responder and non-responder tumors that are double positive for SIGLEC9 expression and the usage of the four immunomodulatory programs 2 | 3 | 4 | library(tidyr) 5 | library(ggplot2) 6 | 7 | 8 | data <- read.table("Double_Positive_SIGLEC9_NMF_Boxplot.txt", head=TRUE, row.names=1, sep="\t") 9 | 10 | data$RowName <- rownames(data) 11 | 12 | 13 | long_df <- gather(data, key = "NMF", value = "Value", -RowName, -Treatment) 14 | 15 | data2 <- long_df[long_df$NMF %in% c("Scavenger","Complement","RHOB","IL1B"),] 16 | 17 | 18 | 19 | data2$NMF <- factor(data2$NMF, levels = c("Scavenger","Complement","RHOB","IL1B")) 20 | 21 | data2$Treatment <- factor(data2$Treatment, levels = c("responder", "nonresponder")) 22 | 23 | 24 | 25 | ggplot(data2, aes(x = factor(NMF), y = Value, fill = Treatment)) + 26 | geom_boxplot(width = 0.7, position = position_dodge(width = 0.7)) + geom_point(position = position_dodge(width = 0.7), aes(y = Value), color = "black", size = 1) + stat_boxplot(geom ='errorbar', width = 0.35, position = position_dodge(width = 0.7)) + theme(panel.background = element_blank(), panel.grid.major = element_blank(), panel.grid.minor = element_blank(), panel.border = element_rect(color = "black", fill = NA)) + scale_fill_manual(values = c("responder" = "white", "nonresponder" = "#808080")) 27 | 28 | 29 | 30 | 31 | 32 | group1 <- data2[data2$Treatment == "responder" & data2$NMF == "Scavenger",] 33 | group2 <- data2[data2$Treatment == "nonresponder" & data2$NMF == "Scavenger",] 34 | 35 | result <- wilcox.test(group1$Value, group2$Value) 36 | 37 | p_value <- result$p.value 38 | 39 | print(p_value) ######## 0.007455462 40 | 41 | 42 | group1 <- data2[data2$Treatment == "responder" & data2$NMF == "Complement",] 43 | group2 <- data2[data2$Treatment == "nonresponder" & data2$NMF == "Complement",] 44 | 45 | result <- wilcox.test(group1$Value, group2$Value) 46 | 47 | p_value <- result$p.value 48 | 49 | print(p_value). ####### 0.2941863 50 | 51 | 52 | 53 | 54 | group1 <- data2[data2$Treatment == "responder" & data2$NMF == "IL1B",] 55 | group2 <- data2[data2$Treatment == "nonresponder" & data2$NMF == "IL1B",] 56 | 57 | result <- wilcox.test(group1$Value, group2$Value) 58 | 59 | p_value <- result$p.value 60 | 61 | print(p_value). ####### 0.7878788 62 | 63 | 64 | 65 | 66 | 67 | group1 <- data2[data2$Treatment == "responder" & data2$NMF == "RHOB",] 68 | group2 <- data2[data2$Treatment == "nonresponder" & data2$NMF == "RHOB",] 69 | 70 | result <- wilcox.test(group1$Value, group2$Value) 71 | 72 | p_value <- result$p.value 73 | 74 | print(p_value) #### 0.1217582 75 | 76 | 77 | 78 | pvalues <- c(0.007455462, 79 | 0.2941863, 80 | 0.1217582, 81 | 0.7878788) 82 | 83 | 84 | adjusted_p_value <- p.adjust(pvalues, method = "fdr") 85 | -------------------------------------------------------------------------------- /Bulk ATAC-Seq Analysis/01- Trimming fastq files to remove nextera transposase adaptors: -------------------------------------------------------------------------------- 1 | trim_galore --phred33 --paired --fastqc --output_dir Trimmed/ DMSO_S1_R1_001.fastq.gz DMSO_S1_R2_001.fastq.gz; 2 | 3 | trim_galore --phred33 --paired --fastqc --output_dir Trimmed/ P300I_S2_R1_001.fastq.gz P300I_S2_R2_001.fastq.gz; 4 | -------------------------------------------------------------------------------- /Bulk ATAC-Seq Analysis/02- Mapping Trimmed Fastqs using STAR: -------------------------------------------------------------------------------- 1 | STAR --genomeDir /seq/epiprod02/Chadi/Genomes/STAR/2.7.10b/hg38 --readFilesIn DMSO_S1_R1_001_val_1.fq.gz DMSO_S1_R2_001_val_2.fq.gz --readFilesCommand zcat --outFileNamePrefix ./BAMS/DMSO --outSAMtype BAM SortedByCoordinate --outFilterMultimapNmax 1000 --outFilterScoreMinOverLread 0.25 --alignIntronMax 1 --alignEndsType EndToEnd; 2 | 3 | 4 | STAR --genomeDir /seq/epiprod02/Chadi/Genomes/STAR/2.7.10b/hg38 --readFilesIn P300I_S2_R1_001_val_1.fq.gz P300I_S2_R2_001_val_2.fq.gz --readFilesCommand zcat --outFileNamePrefix ./BAMS/p300i --outSAMtype BAM SortedByCoordinate --outFilterMultimapNmax 1000 --outFilterScoreMinOverLread 0.25 --alignIntronMax 1 --alignEndsType EndToEnd; 5 | -------------------------------------------------------------------------------- /Bulk ATAC-Seq Analysis/03- Removing Duplicate Reads from mapped libraries: -------------------------------------------------------------------------------- 1 | java -Xmx50g -jar PICARD MarkDuplicates I=DMSOAligned.sortedByCoord.out.bam O=./dup/DMSOAligned.sortedByCoord.out.bam M=./dup/DMSOAligned.sortedByCoord.txt REMOVE_DUPLICATES=TRUE; 2 | 3 | java -Xmx50g -jar PICARD MarkDuplicates I=p300iAligned.sortedByCoord.out.bam O=./dup/p300iAligned.sortedByCoord.out.bam M=./dup/p300iAligned.sortedByCoord.txt REMOVE_DUPLICATES=TRUE; 4 | -------------------------------------------------------------------------------- /Bulk ATAC-Seq Analysis/04- Remove reads mapping to chrM: -------------------------------------------------------------------------------- 1 | for file in *.bam;do samtools index $file;done 2 | 3 | mkdir filter; 4 | 5 | samtools view -b DMSOAligned.sortedByCoord.out.bam chr1 chr2 chr3 chr4 chr5 chr6 chr7 chr8 chr9 chr10 chr11 chr12 chr13 chr14 chr15 chr16 chr17 chr18 chr19 chr20 chr21 chr22 chrX chrY > ./filter/DMSOAligned.sortedByCoord.out.bam; 6 | 7 | samtools view -b p300iAligned.sortedByCoord.out.bam chr1 chr2 chr3 chr4 chr5 chr6 chr7 chr8 chr9 chr10 chr11 chr12 chr13 chr14 chr15 chr16 chr17 chr18 chr19 chr20 chr21 chr22 chrX chrY > ./filter/p300iAligned.sortedByCoord.out.bam; 8 | -------------------------------------------------------------------------------- /Bulk ATAC-Seq Analysis/05- Final sorting of processed mapped files: -------------------------------------------------------------------------------- 1 | samtools sort -o sorted_filter/DMSOAligned.sortedByCoord.out.bam DMSOAligned.sortedByCoord.out.bam; 2 | 3 | samtools sort -o sorted_filter/p300iAligned.sortedByCoord.out.bam p300iAligned.sortedByCoord.out.bam; 4 | -------------------------------------------------------------------------------- /Bulk ATAC-Seq Analysis/06- Creating normalized bigwigs for visualization: -------------------------------------------------------------------------------- 1 | for file in *.bam;do samtools index $file;done 2 | 3 | bamCoverage -b DMSOAligned.sortedByCoord.out.bam --extendReads --normalizeUsing RPGC --effectiveGenomeSize 2747877702 -o ./BW/DMSO_GBO.bw; 4 | 5 | bamCoverage -b p300iAligned.sortedByCoord.out.bam --extendReads --normalizeUsing RPGC --effectiveGenomeSize 2747877702 -o ./BW/p300i_GBO.bw; 6 | -------------------------------------------------------------------------------- /Bulk ATAC-Seq Analysis/07- Shifting loci to correct for tn5 bias: -------------------------------------------------------------------------------- 1 | bedtools bamtobed -i DMSOAligned.sortedByCoord.out.bam > ./BEDs/DMSO_GBO.bed; 2 | 3 | bedtools bamtobed -i p300iAligned.sortedByCoord.out.bam > ./BEDs/p300i_GBO.bed; 4 | 5 | cd BEDs; 6 | 7 | cat DMSO_GBO.bed | awk -F $'\t' 'BEGIN {OFS = FS}{ if ($6 == "+") {$2 = $2 + 4} else if ($6 == "-") {$3 = $3 - 5} print $0}' >| DMSO_GBO_tn5_pe.bed; 8 | 9 | cat p300i_GBO.bed | awk -F $'\t' 'BEGIN {OFS = FS}{ if ($6 == "+") {$2 = $2 + 4} else if ($6 == "-") {$3 = $3 - 5} print $0}' >| p300i_GBO_tn5_pe.bed; 10 | -------------------------------------------------------------------------------- /Bulk ATAC-Seq Analysis/08- Calling peaks using the bed files with corrected loci: -------------------------------------------------------------------------------- 1 | macs2 callpeak -t DMSO_GBO_tn5_pe.bed -g hs -f BED -q 0.01 --nomodel --shift -75 --extsize 150 --keep-dup all -B --SPMR --outdir macs2_pk -n DMSO_GBO; 2 | 3 | macs2 callpeak -t p300i_GBO_tn5_pe.bed -g hs -f BED -q 0.01 --nomodel --shift -75 --extsize 150 --keep-dup all -B --SPMR --outdir macs2_pk -n p300i_GBO; 4 | -------------------------------------------------------------------------------- /Bulk ATAC-Seq Analysis/09- Determining Differential Accessible Sites using HOMER: -------------------------------------------------------------------------------- 1 | ###### Made the xls output of macs2 compatible with HOMER ########### 2 | 3 | ###### Merging peaks of DMSO and p300i into one file ########### 4 | 5 | mergePeaks -d given DMSO_GBO_Ready_peaks.bed p300i_GBO_Ready_peaks.bed > GBO_Myeloid_Peaks_Merged.bed 6 | 7 | 8 | ####### Creating Tag Directories using the bed files with corrected loci ######## 9 | 10 | makeTagDirectory ./TagDirectories/DMSO_GBO/ -format bed DMSO_GBO_tn5_pe.bed 11 | 12 | makeTagDirectory ./TagDirectories/p300i_GBO/ -format bed p300i_GBO_tn5_pe.bed 13 | 14 | 15 | ####### Differential peak analysis ########## 16 | 17 | getDifferentialPeaks GBO_Myeloid_Peaks_Merged.bed ./TagDirectories/p300i_GBO/ ./TagDirectories/DMSO_GBO/ -F 2 > ./Upregulated_in_p300i.txt 18 | 19 | getDifferentialPeaks GBO_Myeloid_Peaks_Merged.bed ./TagDirectories/DMSO_GBO/ ./TagDirectories/p300i_GBO/ -F 2 > ./Upregulated_in_DMSO.txt 20 | -------------------------------------------------------------------------------- /Bulk ATAC-Seq Analysis/10- Identifying motifs enriched in differential accessible sites: -------------------------------------------------------------------------------- 1 | ####### Made the outputs of getDifferentialPeaks compatible with rtracklayer::import option ####### 2 | 3 | R; 4 | 5 | ############################ Inside R ############################## 6 | library(monaLisa) 7 | library(GenomicRanges) 8 | library(SummarizedExperiment) 9 | library(JASPAR2024) 10 | library(TFBSTools) 11 | library(BSgenome.Hsapiens.UCSC.hg38) 12 | library(ComplexHeatmap) 13 | library(circlize) 14 | 15 | 16 | 17 | DMSO <- rtracklayer::import(con = "Upregulated_in_DMSO.bed", format = "bed") 18 | 19 | p300i <- rtracklayer::import(con = "Upregulated_in_p300i.bed", format = "bed") 20 | 21 | 22 | 23 | Peaks <- c(DMSO, p300i) 24 | 25 | Peaks2 <- trim(resize(Peaks, width = median(width(Peaks)), fix = "center")) 26 | summary(width(Peaks2)) 27 | 28 | 29 | bins2 <- rep(c("DMSO", "p300i"), c(length(DMSO), length(p300i))) 30 | 31 | desired_order <- c("DMSO", "p300i") 32 | 33 | bins2 <- factor(bins2, levels = desired_order) 34 | 35 | table(bins2) 36 | 37 | Peakseqs <- getSeq(BSgenome.Hsapiens.UCSC.hg38, Peaks2) 38 | 39 | 40 | 41 | db <- file.path(system.file("extdata", package="JASPAR2024"), 42 | "JASPAR2024.sqlite") 43 | opts <- list() 44 | opts[["tax_group"]] <- "vertebrates" 45 | opts[["matrixtype"]] <- "PWM" 46 | opts[["collection"]] <- "CORE" 47 | pwms <- getMatrixSet(db, opts) 48 | 49 | 50 | hg38 <- Hsapiens 51 | 52 | 53 | se2 <- calcBinnedMotifEnrR(seqs = Peakseqs, bins = bins2, pwmL = pwms, background = "genome", genome = hg38, genome.oversample = 50) 54 | 55 | 56 | Motifs <- scan("Motif_OIbulk.txt", what="") 57 | 58 | seSel2 <- se2[Motifs, ] 59 | 60 | pdf("Bulk_ATAC_DMSO_p300i_LISA_Heatmap.pdf", height=18, width=40) 61 | plotMotifHeatmaps(x = seSel2, which.plots = c("log2enr", "negLog10Padj"), 62 | width = 4, cluster = FALSE, maxEnr = 2, maxSig = 300, 63 | show_dendrogram = TRUE, show_seqlogo = TRUE, 64 | width.seqlogo = 1.75, show_motif_GC = TRUE) 65 | dev.off() 66 | 67 | 68 | matrix_pvalue <- assay(se2, "negLog10Padj") 69 | write.table(matrix_pvalue, file="Bulk_ATAC_GBO_LISA_Motifs_log10pvalues.txt", col.names=NA, sep="\t", quote=FALSE) 70 | 71 | 72 | 73 | background_matrix <- assay(se2, "log2enr") 74 | write.table(background_matrix, file="Bulk_ATAC_GBO_LISA_Motifs_Enrichment.txt", col.names=NA, sep="\t", quote=FALSE) 75 | -------------------------------------------------------------------------------- /Bulk ATAC-Seq Analysis/11- Creating deeptools heatmap for the differential accessible sites between DMSO and p300i: -------------------------------------------------------------------------------- 1 | 2 | computeMatrix reference-point -S DMSO_GBO.bw p300i_GBO.bw -R Upregulated_in_DMSO.bed Upregulated_in_p300i.bed -b 1000 -a 1000 --skipZeros --smartLabels -o Bulk_ATAC_GBO_Differential_Peaks_Center.gz --referencePoint center; 3 | 4 | 5 | plotHeatmap -m Bulk_ATAC_GBO_Differential_Peaks_Center.gz -o Bulk_ATAC_GBO_Differential_Peaks_Center_Better_Pipeline.pdf --startLabel 5 --endLabel 3 --whatToShow "heatmap and colorbar" --colorMap Blues --outFileSortedRegions Bulk_ATAC_GBO_Differential_Peaks_Center.bed; 6 | -------------------------------------------------------------------------------- /Creation of discretized scRNA-Seq Expression matrix/01- Align 10X V3 Publiashed normal brain scRNA-Seq Libraries: -------------------------------------------------------------------------------- 1 | /path/to/STAR --genomeDir /path/to/genome/dir/ --readFilesIn Read2_Lane1.fastq.gz,Read2_Lane2.fastq.gz,Read2_Lane3.fastq.gz Read1_Lane1.fastq.gz,Read1_Lane2.fastq.gz,Read1_Lane3.fastq.gz --soloType CB_UMI_Simple --soloCBstart 1 --soloCBlen 16 --soloUMIstart 17 --soloUMIlen 12 --soloCBmatchWLtype 1MM_multi_Nbase_pseudocounts --soloUMIfiltering MultiGeneUMI_CR --soloUMIdedup 1MM_CR --clipAdapterType CellRanger4 --outFilterScoreMin 30 --outSAMtype BAM SortedByCoordinate --outSAMattributes CR UR CY UY CB UB 2 | -------------------------------------------------------------------------------- /Creation of discretized scRNA-Seq Expression matrix/04- Extracting Discrete Myeloid Cells for Subsequent Marker Identification using COMET and SCENIC: -------------------------------------------------------------------------------- 1 | ####### Discrete Scavenger, Complement, Tissue Resident and Systemic myeloid cells were extracted and their IDs placed in the Cells text files shown below 2 | 3 | ############################ Inside R ############################ 4 | 5 | library(dplyr) 6 | library(Seurat) 7 | options(bitmapType='cairo') 8 | options(future.globals.maxSize = 8000 * 1024^2) 9 | 10 | 11 | 12 | DefaultAssay(Discrete4) <- "RNA" 13 | 14 | 15 | Matrix <- GetAssayData(Discrete4, slot = "counts") 16 | 17 | 18 | Cells2 <- scan("Cells_For_Scenic.txt", what="") 19 | 20 | SCENIC <- Matrix[,colnames(Matrix) %in% Cells2] 21 | 22 | write.table(as.matrix(SCENIC), file="Discrete_Suppressive_Inflammatory_Myeloid_For_SCENIC.txt", sep="\t", quote=FALSE, col.names=NA) 23 | 24 | 25 | DefaultAssay(Discrete4) <- "RNA" 26 | 27 | 28 | Matrix <- GetAssayData(Discrete4, slot = "counts") 29 | 30 | Cells <- scan("Cells_For_COMET.txt", what="") 31 | 32 | Myeloid <- subset(Discrete4, cells= Cells) 33 | 34 | all.genes <- rownames(Myeloid) 35 | 36 | 37 | pdf("Discrete4_Myeloid_Suppressive_QC_AF.pdf", height = 6, width = 20) 38 | VlnPlot(Myeloid, features = c("nFeature_RNA", "nCount_RNA", "percent.mt"), ncol = 3) 39 | dev.off() 40 | 41 | library(sctransform) 42 | Myeloid <- SCTransform(Myeloid, vars.to.regress = "percent.mt", verbose = TRUE) 43 | Myeloid <- RunPCA(Myeloid) 44 | pdf("Discrete4_Myeloid_Suppressive_ElbowPlot.pdf", height = 6, width = 6) 45 | ElbowPlot(Myeloid, ndims=50) 46 | dev.off() 47 | 48 | write.table(Myeloid@meta.data, file="Discrete4_Myeloid_Suppressive_Integrated_MetaData.txt", sep="\t", col.names=NA, quote=FALSE) 49 | 50 | 51 | 52 | Myeloid <- RunUMAP(Myeloid, reduction = "pca", dims = 1:16) 53 | Myeloid <- FindNeighbors(Myeloid, dims = 1:16) 54 | Myeloid <- FindClusters(Myeloid, resolution = 0.3) 55 | pdf("Discrete4_Myeloid_Suppressive_UMAP_Clusters.pdf", height= 6, width = 7) 56 | DimPlot(Myeloid, reduction = "umap") 57 | dev.off() 58 | 59 | pdf("Discrete4_Myeloid_Suppressive_Clusters_With_Labels.pdf", height= 6, width = 7) 60 | DimPlot(Myeloid, reduction = "umap", label=TRUE) 61 | dev.off() 62 | 63 | pdf("Discrete4_Myeloid_Suppressive_UMAP_Patient_ID.pdf", height= 6, width = 9) 64 | DimPlot(Myeloid, reduction = "umap", group.by="orig.ident") 65 | dev.off() 66 | 67 | 68 | pdf("Discrete4_Myeloid_Suppressive_UMAP_IDH_Status.pdf", height= 6, width = 7) 69 | DimPlot(Myeloid, reduction = "umap", group.by="IDH_Status") 70 | dev.off() 71 | 72 | 73 | pdf("Discrete4_Myeloid_Suppressive_UMAP_Annotation.pdf", height= 6, width = 7) 74 | DimPlot(Myeloid, reduction = "umap", group.by="Annotation") 75 | dev.off() 76 | 77 | 78 | pdf("Discrete4_Myeloid_Suppressive_UMAP_Annotation_Labelled.pdf", height= 6, width = 7) 79 | DimPlot(Myeloid, reduction = "umap", group.by="Annotation", label=TRUE) 80 | dev.off() 81 | 82 | 83 | 84 | DefaultAssay(Myeloid) <- "RNA" 85 | Myeloid <- NormalizeData(Myeloid) 86 | Myeloid <- ScaleData(Myeloid, features = all.genes) 87 | 88 | 89 | write.table(Myeloid@meta.data, file="Discrete4_Myeloid_Suppressive_All_MetaData.txt", sep="\t", col.names=NA, quote=FALSE) 90 | 91 | 92 | 93 | library(SCOPfunctions) 94 | 95 | Matrix <- GetAssayData(Myeloid, slot = "data") 96 | 97 | Matrix2 <- as.matrix(Matrix) 98 | 99 | write.table(Matrix2, file="./Discrete4_Myeloid_Suppressive_Normalized_Matrix.txt", sep="\t", col.names=NA, quote=FALSE) 100 | 101 | 102 | umap2 <- Embeddings(object = Myeloid, reduction = "umap") 103 | 104 | write.table(umap2, file="Discrete4_Myeloid_Suppressive_UMAP.txt", sep="\t", col.names=NA, quote=FALSE) 105 | 106 | 107 | Matrix <- GetAssayData(Myeloid, slot = "data") 108 | 109 | Genes <- scan("Human_Surface_Markers.txt", what="") 110 | 111 | data2 <- Matrix[rownames(Matrix) %in% Genes,] 112 | 113 | 114 | data2 <- as.matrix(data2) 115 | 116 | write.table(data2, file="./Discrete4_Myeloid_Suppressive_Normalized_Matrix_Human_Surface_Markers.txt", sep="\t", col.names=NA, quote=FALSE) 117 | -------------------------------------------------------------------------------- /Creation of discretized scRNA-Seq Expression matrix/05- COMET for Marker Identification: -------------------------------------------------------------------------------- 1 | Comet -Abbrev 2 Discrete4_Myeloid_Suppressive_Normalized_Matrix_Human_Surface_Markers.txt Discrete4_Myeloid_Suppressive_UMAP.txt Cells_For_COMET_Supp.txt Discrete4_Myeloid_Suppressive_Surface_Markers_Output/; 2 | 3 | 4 | Comet -Abbrev 2 Discrete4_Myeloid_Suppressive_Normalized_Matrix.txt Discrete4_Myeloid_Suppressive_UMAP.txt Cells_For_COMET_Supp.txt Discrete4_Myeloid_Suppressive_Output/; 5 | -------------------------------------------------------------------------------- /Deconvolution of Bulk Datasets/1- Creating Gene Sets: -------------------------------------------------------------------------------- 1 | ###### The top 50 genes were obtained for each myeloid program by ranking them in the merged gene spectra output. 2 | 3 | ###### For Tcells and Malignant Cells, we ranked the gene spectra from their respective cNMF outputs to obtain the top 50 genes for each program. 4 | 5 | ###### For the other cell types, we used the gene spectra output of the cNMF of all cell types. We ranked and obtained the top 50 genes for the Pericytes, Endothelial and Oligo programs. 6 | 7 | ###### To obtain cleaner signals, we removed any gene from each list appearing in the top 100 genes in all the other programs. 8 | -------------------------------------------------------------------------------- /Deconvolution of Bulk Datasets/2- Calculating Module scores using Seurat in TCGA Glioma Cohorts: -------------------------------------------------------------------------------- 1 | library("Seurat") 2 | library("dplyr") 3 | 4 | ###### Load TCGA Glioma gene expression matrix (It is normalized and log-transformed ###### 5 | 6 | data <- read.table("../TCGA.GBMLGG.sampleMap_HiSeqV2", sep="\t", head=TRUE, row.names=1) 7 | 8 | TCGA <- CreateSeuratObject(counts = data, project = "TCGA", min.cells = 1, min.features = 1) 9 | 10 | DefaultAssay(TCGA) <- "RNA" 11 | 12 | all.genes <- rownames(TCGA) 13 | TCGA <- ScaleData(TCGA, features = all.genes) 14 | 15 | 16 | TCGA <- FindVariableFeatures(TCGA, selection.method = "vst", nfeatures = 2000) 17 | 18 | ######## Load gene sets (Obtained as described in Step 1) ########### 19 | 20 | Microglia <- scan("Microglia.txt", what="") 21 | Macrophage <- scan("Macrophage.txt", what="") 22 | Monocyte <- scan("Monocyte.txt", what="") 23 | cDC <- scan("cDC.txt", what="") 24 | Neutrophils <- scan("Neutrophils.txt", what="") 25 | 26 | 27 | IL1B_Inflamm <- scan("IL1B_Inflamm.txt", what="") 28 | Inflamm_Microglia <- scan("Inflamm_Microglia.txt", what="") 29 | Complement_Immunosuppressive <- scan("Complement_Immunosuppressive.txt", what="") 30 | Scavenger <- scan("Scavenger.txt", what="") 31 | 32 | 33 | Memory_Like_Tcells <- scan("Memory_Like_Tcells.txt", what="") 34 | Terminal_Effector_Tcells <- scan("Terminal_Effector_Tcells.txt", what="") 35 | Treg <- scan("Treg.txt", what="") 36 | 37 | Oligo <- scan("Oligo.txt", what="") 38 | Pericytes <- scan("Pericytes.txt", what="") 39 | Endothelial <- scan("Endothelial.txt", what="") 40 | 41 | Malignant2 <- scan("Malignant2.txt", what="") 42 | Malignant3 <- scan("Malignant3.txt", what="") 43 | Malignant4 <- scan("Malignant4.txt", what="") 44 | Malignant6 <- scan("Malignant6.txt", what="") 45 | Malignant7 <- scan("Malignant7.txt", what="") 46 | 47 | ############ Calculate the Module Scores and output the results ########## 48 | 49 | Features <- list(Microglia, Macrophage, Monocyte, cDC, Neutrophils, IL1B_Inflamm, Inflamm_Microglia, Complement_Immunosuppressive, Scavenger, Memory_Like_Tcells, Terminal_Effector_Tcells, Treg, Oligo, Pericytes, Endothelial, Malignant2, Malignant3, Malignant4, Malignant6, Malignant7) 50 | 51 | TCGA <- AddModuleScore(object = TCGA, features = Features, name = c("Microglia", "Macrophage", "Monocyte", "cDC", "Neutrophils", "IL1B_Inflamm", "Inflamm_Microglia", "Complement_Immunosuppressive", "Scavenger", "Memory_Like_Tcells", "Terminal_Effector_Tcells", "Treg", "Oligo", "Pericytes", "Endothelial", "Malignant2", "Malignant3", "Malignant4", "Malignant6", "Malignant7")) 52 | 53 | 54 | 55 | write.table(TCGA@meta.data, file="TCGA_Glioma_MetaData.txt", sep="\t", col.names=NA, quote=FALSE) 56 | -------------------------------------------------------------------------------- /Deconvolution of Bulk Datasets/3- Calculating Module scores using Seurat in GLASS Glioma Cohorts: -------------------------------------------------------------------------------- 1 | setwd("~/Desktop/Chadi/Bioinformatics/Glioblastoma/Seurat_Bulk_Fig4/230515") 2 | 3 | library("Seurat") 4 | library("dplyr") 5 | 6 | ###### Load GLASS Glioma gene expression matrix (It is normalized but not log-transformed ###### 7 | 8 | data <- read.table("GLASS_Normalized_All_Genes.txt", sep="\t", head=TRUE, row.names=1) 9 | 10 | ##### Log-transform the data ###### 11 | 12 | data <- log1p(data) 13 | 14 | GLASS <- CreateSeuratObject(counts = data, project = "GLASS", min.cells = 1, min.features = 1) 15 | 16 | DefaultAssay(GLASS) <- "RNA" 17 | 18 | all.genes <- rownames(GLASS) 19 | GLASS <- ScaleData(GLASS, features = all.genes) 20 | 21 | 22 | GLASS <- FindVariableFeatures(GLASS, selection.method = "vst", nfeatures = 2000) 23 | 24 | ######## Load gene sets (Obtained as described in Step 1) ########### 25 | 26 | 27 | Microglia <- scan("Microglia.txt", what="") 28 | Macrophage <- scan("Macrophage.txt", what="") 29 | Monocyte <- scan("Monocyte.txt", what="") 30 | cDC <- scan("cDC.txt", what="") 31 | Neutrophils <- scan("Neutrophils.txt", what="") 32 | 33 | 34 | IL1B_Inflamm <- scan("IL1B_Inflamm.txt", what="") 35 | Inflamm_Microglia <- scan("Inflamm_Microglia.txt", what="") 36 | Complement_Immunosuppressive <- scan("Complement_Immunosuppressive.txt", what="") 37 | Scavenger <- scan("Scavenger.txt", what="") 38 | 39 | 40 | Memory_Like_Tcells <- scan("Memory_Like_Tcells.txt", what="") 41 | Terminal_Effector_Tcells <- scan("Terminal_Effector_Tcells.txt", what="") 42 | Treg <- scan("Treg.txt", what="") 43 | 44 | Oligo <- scan("Oligo.txt", what="") 45 | Pericytes <- scan("Pericytes.txt", what="") 46 | Endothelial <- scan("Endothelial.txt", what="") 47 | 48 | Malignant2 <- scan("Malignant2.txt", what="") 49 | Malignant3 <- scan("Malignant3.txt", what="") 50 | Malignant4 <- scan("Malignant4.txt", what="") 51 | Malignant6 <- scan("Malignant6.txt", what="") 52 | Malignant7 <- scan("Malignant7.txt", what="") 53 | 54 | 55 | ############ Calculate the Module Scores and output the results ########## 56 | 57 | Features <- list(Microglia, Macrophage, Monocyte, cDC, Neutrophils, IL1B_Inflamm, Inflamm_Microglia, Complement_Immunosuppressive, Scavenger, Memory_Like_Tcells, Terminal_Effector_Tcells, Treg, Oligo, Pericytes, Endothelial, Malignant2, Malignant3, Malignant4, Malignant6, Malignant7) 58 | 59 | GLASS <- AddModuleScore(object = GLASS, features = Features, name = c("Microglia", "Macrophage", "Monocyte", "cDC", "Neutrophils", "IL1B_Inflamm", "Inflamm_Microglia", "Complement_Immunosuppressive", "Scavenger", "Memory_Like_Tcells", "Terminal_Effector_Tcells", "Treg", "Oligo", "Pericytes", "Endothelial", "Malignant2", "Malignant3", "Malignant4", "Malignant6", "Malignant7")) 60 | 61 | 62 | write.table(GLASS@meta.data, file="GLASS_Glioma_MetaData.txt", sep="\t", col.names=NA, quote=FALSE) 63 | -------------------------------------------------------------------------------- /Deconvolution of Bulk Datasets/4- Calculating Module scores using Seurat in G-SAM Glioblastoma Cohorts: -------------------------------------------------------------------------------- 1 | library(edgeR) 2 | library("Seurat") 3 | library("dplyr") 4 | 5 | ###### Load G-SAM Glioma gene expression matrix (It is raw counts) ###### 6 | 7 | raw_counts <- read.table("gsam.rnaseq.expression-322.txt", head=TRUE, sep="\t", row.names=1) 8 | 9 | 10 | ####### CPM-normalization and log-transformation ########### 11 | 12 | d <- DGEList(counts = raw_counts) 13 | 14 | d <- calcNormFactors(d) 15 | 16 | cpm_matrix <- cpm(d) 17 | 18 | write.table(cpm_matrix, file="GSAM_Normalized_Full_Matrix.txt", sep="\t", col.names=NA, quote=FALSE) 19 | 20 | 21 | data2 <- log1p(cpm_matrix) 22 | 23 | GSAM <- CreateSeuratObject(counts = data2, project = "GSAM", min.cells = 1, min.features = 1) 24 | 25 | 26 | 27 | all.genes <- rownames(GSAM) 28 | GSAM <- ScaleData(GSAM, features = all.genes) 29 | 30 | 31 | GSAM <- FindVariableFeatures(GSAM, selection.method = "vst", nfeatures = 2000) 32 | 33 | 34 | 35 | DefaultAssay(GSAM) <- "RNA" 36 | 37 | 38 | ######## Load gene sets (Obtained as described in Step 1) ########### 39 | 40 | Microglia <- scan("Microglia.txt", what="") 41 | Macrophage <- scan("Macrophage.txt", what="") 42 | Monocyte <- scan("Monocyte.txt", what="") 43 | cDC <- scan("cDC.txt", what="") 44 | Neutrophils <- scan("Neutrophils.txt", what="") 45 | 46 | 47 | IL1B_Inflamm <- scan("IL1B_Inflamm.txt", what="") 48 | Inflamm_Microglia <- scan("Inflamm_Microglia.txt", what="") 49 | Complement_Immunosuppressive <- scan("Complement_Immunosuppressive.txt", what="") 50 | Scavenger <- scan("Scavenger.txt", what="") 51 | 52 | 53 | Memory_Like_Tcells <- scan("Memory_Like_Tcells.txt", what="") 54 | Terminal_Effector_Tcells <- scan("Terminal_Effector_Tcells.txt", what="") 55 | Treg <- scan("Treg.txt", what="") 56 | 57 | Oligo <- scan("Oligo.txt", what="") 58 | Pericytes <- scan("Pericytes.txt", what="") 59 | Endothelial <- scan("Endothelial.txt", what="") 60 | 61 | Malignant2 <- scan("Malignant2.txt", what="") 62 | Malignant3 <- scan("Malignant3.txt", what="") 63 | Malignant4 <- scan("Malignant4.txt", what="") 64 | Malignant6 <- scan("Malignant6.txt", what="") 65 | Malignant7 <- scan("Malignant7.txt", what="") 66 | 67 | 68 | ############ Calculate the Module Scores and output the results ########## 69 | 70 | Features <- list(Microglia, Macrophage, Monocyte, cDC, Neutrophils, IL1B_Inflamm, Inflamm_Microglia, Complement_Immunosuppressive, Scavenger, Memory_Like_Tcells, Terminal_Effector_Tcells, Treg, Oligo, Pericytes, Endothelial, Malignant2, Malignant3, Malignant4, Malignant6, Malignant7) 71 | 72 | GSAM <- AddModuleScore(object = GSAM, features = Features, name = c("Microglia", "Macrophage", "Monocyte", "cDC", "Neutrophils", "IL1B_Inflamm", "Inflamm_Microglia", "Complement_Immunosuppressive", "Scavenger", "Memory_Like_Tcells", "Terminal_Effector_Tcells", "Treg", "Oligo", "Pericytes", "Endothelial", "Malignant2", "Malignant3", "Malignant4", "Malignant6", "Malignant7")) 73 | 74 | 75 | 76 | write.table(GSAM@meta.data, file="GSAM_Glioma_MetaData.txt", sep="\t", col.names=NA, quote=FALSE) 77 | -------------------------------------------------------------------------------- /Deconvolution of Bulk Datasets/5- Preparing CIBERSORTx single cell reference matrix: -------------------------------------------------------------------------------- 1 | ############Inside R#################### 2 | 3 | ########### Discrete_LowR_Cells.txt is a data frame containing cell barcodes in one column. The other column indicates the discrete annotation of the cell ###### 4 | 5 | ########### Cells were annotated as Malignant, Oligo, Vasculature, Myeloid, Tcells, or, other immune cells 6 | 7 | ########## We used the Usage output of the all cell types cNMF 8 | 9 | ########### For a cell to make it to Discrete_LowR_Cells.txt, it had to: (a) have below 10% usage for all the other broad categories than its annotation and (b) more than 2.5 fold than the second highest usage. 10 | 11 | library(dplyr) 12 | library(Seurat) 13 | library(SCOPfunctions) 14 | options(bitmapType='cairo') 15 | options(future.globals.maxSize = 8000 * 1024^2) 16 | 17 | 18 | ###### Load the discrete cells that passed the criteria 19 | 20 | Discrete_LowR_Cells <- read.table("Discrete_LowR_Cells.txt", sep="\t", row.names=1) 21 | 22 | 23 | ##### Load the Seurat object of all cells from MGB cohort ###### 24 | 25 | ID <- subset(x = Tumors.combined, cells = rownames(Discrete_LowR_Cells)) 26 | 27 | ID.data <- GetAssayData(object = ID, slot="counts") 28 | 29 | ######### Order both the data frames in the same way 30 | 31 | order <- match(rownames(Discrete_LowR_Cells), colnames(ID.data)) 32 | 33 | ID.data2 <- ID.data[ , order] 34 | 35 | ######### Change the cell name (colnames) to the annotation as per the requirement of CIBERSORTx and save the matrix ######## 36 | 37 | colnames(ID.data2) <- Discrete_LowR_Cells$V2 38 | 39 | 40 | Matrix2 <- utils_big_as.matrix(ID.data2, n_slices_init = 18, verbose = T) 41 | 42 | 43 | write.table(Matrix2, file="./MGB_LowR_Cibersort_Ready_Full_Raw_Expression.txt", sep="\t", col.names=NA, quote=FALSE) 44 | -------------------------------------------------------------------------------- /Deconvolution of Bulk Datasets/6- Estimate cell types fractions in TCGA Matrix: -------------------------------------------------------------------------------- 1 | #### Inside R ####### 2 | 3 | ###### Matrix has to be normalized without log-transformation ###### 4 | 5 | 6 | #### Load log-transformed and normalized TCGA expression table to remove the log-transformation ##### 7 | TCGA <- read.table("TCGA.GBMLGG.sampleMap_HiSeqV2", sep="\t", head=TRUE, row.names=1) 8 | 9 | TCGA3 <- 2^TCGA 10 | 11 | TCGA4 <- TCGA3-1 12 | 13 | write.table(TCGA4, file="TCGA_Normalized_Full_GBMLGG.txt", sep="\t", quote=FALSE, col.names=NA) 14 | 15 | q() 16 | 17 | ############ Exit R ############## 18 | 19 | 20 | cibersortx/fractions \ 21 | --username ~{username} \ 22 | --token ~{token} \ 23 | --mixture ./TCGA_Normalized_Full_GBMLGG.txt \ 24 | --single_cell TRUE \ 25 | --refsample ./MGB_LowR_Cibersort_Ready_Full_Raw_Expression.txt \ 26 | --outdir . 27 | 28 | 29 | ###### --username and --token to be obtained from CIBERSORTx website by creating an account and contacting authors ###### 30 | 31 | ###### Use the CIBERSORTx website to obtain docker image of CIBERSORTx or after obtaining username and token you can use the Terra image for higher RAM power ######## 32 | -------------------------------------------------------------------------------- /Deconvolution of Bulk Datasets/7- Estimate cell types fractions in GLASS Matrix: -------------------------------------------------------------------------------- 1 | ###### Matrix has to be normalized without log-transformation ###### 2 | 3 | 4 | 5 | 6 | 7 | 8 | cibersortx/fractions \ 9 | --username ~{username} \ 10 | --token ~{token} \ 11 | --mixture ./GLASS_Normalized_All_Genes.txt 12 | .txt \ 13 | --single_cell TRUE \ 14 | --refsample ./MGB_LowR_Cibersort_Ready_Full_Raw_Expression.txt \ 15 | --outdir . 16 | 17 | 18 | ###### --username and --token to be obtained from CIBERSORTx website by creating an account and contacting authors ###### 19 | 20 | ###### Use the CIBERSORTx website to obtain docker image of CIBERSORTx or after obtaining username and token you can use the Terra image for higher RAM power ######## 21 | -------------------------------------------------------------------------------- /Deconvolution of Bulk Datasets/8- Estimate cell types fractions in G-SAM Matrix: -------------------------------------------------------------------------------- 1 | #### Inside R ####### 2 | 3 | ###### Matrix has to be normalized without log-transformation ###### 4 | 5 | 6 | library(edgeR) 7 | library("Seurat") 8 | library("dplyr") 9 | 10 | ###### Load G-SAM Glioma gene expression matrix (It is raw counts) ###### 11 | 12 | raw_counts <- read.table("gsam.rnaseq.expression-322.txt", head=TRUE, sep="\t", row.names=1) 13 | 14 | 15 | ####### CPM-normalization and log-transformation ########### 16 | 17 | d <- DGEList(counts = raw_counts) 18 | 19 | d <- calcNormFactors(d) 20 | 21 | cpm_matrix <- cpm(d) 22 | 23 | write.table(cpm_matrix, file="GSAM_Normalized_Full_Matrix.txt", sep="\t", col.names=NA, quote=FALSE) 24 | 25 | q() 26 | 27 | ############ Exit R ############## 28 | 29 | cibersortx/fractions \ 30 | --username ~{username} \ 31 | --token ~{token} \ 32 | --mixture ./GSAM_Normalized_Full_Matrix.txt \ 33 | --single_cell TRUE \ 34 | --refsample ./MGB_LowR_Cibersort_Ready_Full_Raw_Expression.txt \ 35 | --outdir . 36 | 37 | 38 | ###### --username and --token to be obtained from CIBERSORTx website by creating an account and contacting authors ###### 39 | 40 | ###### Use the CIBERSORTx website to obtain docker image of CIBERSORTx or after obtaining username and token you can use the Terra image for higher RAM power ######## 41 | -------------------------------------------------------------------------------- /Deconvolution of Bulk Datasets/9- Normalization of the Module Scores: -------------------------------------------------------------------------------- 1 | To normalize the module scores in each bulk library, the module score of the program is divided by the imputed value of its respective cell type in the CIBERSORTx results outputs. 2 | 3 | (Module score Value / Imputed CIBERSORTx value for the corresponding category) 4 | 5 | (e.g. Scavenger Module Score / Myeloid Imputed Fraction ) 6 | 7 | 8 | The normalized module scores were then used for correlation with published clinical data available online, including IDH mutation status and, Molecular Grade. 9 | 10 | 11 | For correlation with clinical data, the normalized module scores were log10 transformed by using the following formula: 12 | 13 | if x >= 0: LOG10[Absolute(x) + 1] 14 | if x < 0: - LOG10[Absolute(x) + 1] 15 | 16 | Where “x” is the normalized module score value. 17 | -------------------------------------------------------------------------------- /Deconvolution of Bulk Datasets/CIBERSORTx_Input/Readme: -------------------------------------------------------------------------------- 1 | Inputs for CIBERSORTx: 2 | 3 | "Discrete_LowR_Cells.txt" is required for step 5 "5- Preparing CIBERSORTx single cell reference matrix". 4 | -------------------------------------------------------------------------------- /Deconvolution of Bulk Datasets/Gene Sets/Complement_Immunosuppressive.txt: -------------------------------------------------------------------------------- 1 | C1QA 2 | C1QB 3 | MS4A6A 4 | PLTP 5 | TMEM176B 6 | RASSF4 7 | TMEM176A 8 | TNFSF13 9 | GAA 10 | GNG10 11 | MS4A4A 12 | NPC2 13 | SIGLEC10 14 | CD14 15 | HLA-DOA 16 | HLA-DRB5 17 | EMB 18 | MARCH1 19 | DSE 20 | GOLIM4 21 | AKR1B1 22 | TMEM163 23 | FAM20A 24 | MSR1 25 | TMEM70 26 | C2 27 | GAL3ST4 28 | GPR155 -------------------------------------------------------------------------------- /Deconvolution of Bulk Datasets/Gene Sets/Endothelial.txt: -------------------------------------------------------------------------------- 1 | ADGRL4 2 | VWF 3 | CLDN5 4 | CLEC14A 5 | ECSCR 6 | TM4SF18 7 | ADGRF5 8 | ROBO4 9 | PTPRB 10 | EMCN 11 | MMRN2 12 | CYYR1 13 | ABCG2 14 | ESAM 15 | FLT1 16 | ACVRL1 17 | CD34 18 | CDH5 19 | ERG 20 | TIE1 21 | KDR 22 | DIPK2B 23 | CAVIN2 24 | MECOM 25 | EGFL7 26 | NOSTRIN 27 | SOX18 28 | ABCB1 29 | APOLD1 30 | ANGPT2 31 | TM4SF1 32 | HSPG2 33 | SLCO2A1 34 | PALMD 35 | MPZL2 36 | TMEM204 37 | EDN1 38 | ACE 39 | TEK 40 | BTNL9 41 | SLC38A5 42 | SEMA3G 43 | ITM2A 44 | GNG11 45 | KANK3 46 | PLVAP 47 | ESM1 48 | MFSD2A -------------------------------------------------------------------------------- /Deconvolution of Bulk Datasets/Gene Sets/IL1B_Inflamm.txt: -------------------------------------------------------------------------------- 1 | CD83 2 | IL1B 3 | NR4A3 4 | NR4A1 5 | ABL2 6 | OLR1 7 | NR4A2 8 | BCL2A1 9 | KDM6B 10 | IER3 11 | RASGEF1B 12 | DUSP2 13 | PLEK 14 | MAFF 15 | CXCL8 16 | ATF3 17 | EGR3 18 | IL1A 19 | NFKBIZ 20 | NFKBID 21 | EGR2 22 | ICAM1 23 | PLAUR 24 | STX11 25 | SELENOK 26 | TNF 27 | SERPINB9 28 | ELL2 29 | CSRNP1 30 | PTGS2 -------------------------------------------------------------------------------- /Deconvolution of Bulk Datasets/Gene Sets/Inflamm_Microglia.txt: -------------------------------------------------------------------------------- 1 | PDK4 2 | P2RY13 3 | USP53 4 | RHOB 5 | CTTNBP2 6 | GSTM3 7 | CH25H 8 | SIGLEC8 9 | AC253572.2 10 | CXCR4 11 | PIK3IP1 12 | LINC01480 13 | CSGALNACT1 14 | SPRY1 15 | SRSF7 16 | IGF1 17 | MTUS1 18 | MTSS1 19 | KHDRBS3 20 | PDGFB 21 | PAPOLG 22 | GPM6A 23 | TNFRSF13C 24 | CXCL12 25 | ADRB2 26 | AP005530.1 27 | TAL1 -------------------------------------------------------------------------------- /Deconvolution of Bulk Datasets/Gene Sets/Instructions: -------------------------------------------------------------------------------- 1 | These are the Gene sets used for calculating the module scores (Refer to Step 1 - Creating Gene Sets). 2 | -------------------------------------------------------------------------------- /Deconvolution of Bulk Datasets/Gene Sets/Macrophage.txt: -------------------------------------------------------------------------------- 1 | ACP5 2 | PLA2G7 3 | LGMN 4 | VAT1 5 | GCHFR 6 | FUCA1 7 | LIPA 8 | APOC1 9 | CYP27A1 10 | IFI30 11 | GRN 12 | CAPG 13 | CD68 14 | CD63 15 | PLD3 16 | ALDH1A1 17 | CTSZ 18 | BLVRB 19 | TSPAN4 20 | PRDX1 21 | CD9 22 | ATP6V1F 23 | HS3ST2 24 | OTOA 25 | CTSA 26 | CFD 27 | NCEH1 28 | RMDN3 29 | KLHDC8B 30 | IQGAP2 31 | RRAGD 32 | HTRA4 33 | NPL 34 | PLPP3 35 | PKD2L1 36 | CD59 37 | MGST3 -------------------------------------------------------------------------------- /Deconvolution of Bulk Datasets/Gene Sets/Malignant2.txt: -------------------------------------------------------------------------------- 1 | PTMS 2 | SOX4 3 | BASP1 4 | PXDN 5 | EIF2B4 6 | HIST1H4C 7 | ELAVL4 8 | ANK3 9 | CNTN1 10 | ANKS1B 11 | NSG2 12 | NDUFA9 13 | SALL3 14 | ING4 15 | KLRC2 16 | DCX 17 | IGKC 18 | CD27-AS1 19 | EYA1 20 | KCND2 21 | CACNA1E 22 | MAP2 23 | C12orf57 24 | ATN1 25 | PKIA 26 | CELF5 27 | C12orf4 28 | OPCML 29 | GPC2 30 | REV3L 31 | IGHG3 32 | NETO1 33 | GRM5 34 | SNAP25 35 | CDK5R1 36 | PDGFRA 37 | MYCN 38 | SYT16 39 | MYCNOS 40 | ELAVL3 41 | RAB3C 42 | NRXN1 43 | SYT13 44 | EPB41 45 | PHB2 46 | PCDH15 47 | KLRC4 48 | RTN1 -------------------------------------------------------------------------------- /Deconvolution of Bulk Datasets/Gene Sets/Malignant3.txt: -------------------------------------------------------------------------------- 1 | MAOB 2 | AQP4 3 | APLNR 4 | NTRK2 5 | ID4 6 | LTF 7 | HSPB8 8 | ID3 9 | HLA-DRA 10 | RNF19A 11 | GJA1 12 | FAM107A 13 | CSRP1 14 | GFAP 15 | KCNN3 16 | LRRC2 17 | SERPINA3 18 | SYNM 19 | TJP2 20 | CP 21 | ACTN1 22 | ANOS1 23 | ATP1B1 24 | ECM2 25 | MAN1C1 26 | SLC1A4 27 | EFEMP1 28 | C1S 29 | EZR 30 | CRB2 31 | GALNT15 32 | LRRC55 33 | CRISPLD1 34 | LINC01094 35 | CTSH 36 | SERPING1 37 | ADCYAP1R1 38 | PLAAT4 39 | MGST1 40 | DCLK1 41 | RAMP3 42 | VCL -------------------------------------------------------------------------------- /Deconvolution of Bulk Datasets/Gene Sets/Malignant4.txt: -------------------------------------------------------------------------------- 1 | EGLN3 2 | CA9 3 | VEGFA 4 | LOX 5 | PFKFB4 6 | IGFBP2 7 | STC1 8 | NRN1 9 | AKAP12 10 | TMEM158 11 | SESN2 12 | CA12 13 | GPRC5A 14 | TRIB3 15 | IGFBP5 16 | PLP2 17 | STC2 18 | TFRC 19 | ARL4C 20 | SLC2A3 21 | ITPR1 22 | SLC7A5 23 | SLITRK4 24 | TNFRSF12A 25 | PYGL 26 | IGFBP3 27 | PPP1R14C 28 | SLC39A14 -------------------------------------------------------------------------------- /Deconvolution of Bulk Datasets/Gene Sets/Malignant6.txt: -------------------------------------------------------------------------------- 1 | SNX10 2 | CSRP2 3 | CHI3L1 4 | CNN3 5 | C11orf96 6 | F3 7 | CCL2 8 | MAP1B 9 | RELL1 10 | CHST2 11 | FLNC 12 | SPOCK2 13 | PCSK1 14 | LAMA2 15 | CD99 16 | SRPX2 17 | CCN1 18 | TNXB 19 | FAM20C 20 | NRCAM 21 | BICD1 22 | RASSF8 23 | MMP19 24 | LZTS1 25 | FOSL2 26 | SOCS3 27 | FNDC4 28 | MDK 29 | ABCA1 30 | LAMB1 31 | MT2A 32 | SPRY2 33 | CSF1 34 | PTGFRN 35 | BAALC 36 | BHLHE40 37 | CLU 38 | TM7SF3 39 | GAP43 40 | PALM2-AKAP2 41 | GADD45A 42 | LIF -------------------------------------------------------------------------------- /Deconvolution of Bulk Datasets/Gene Sets/Malignant7.txt: -------------------------------------------------------------------------------- 1 | SMOC1 2 | BCAN 3 | FERMT1 4 | PHYHIPL 5 | SOX8 6 | FXYD6 7 | HES6 8 | OLIG1 9 | LINC00632 10 | SHD 11 | TNR 12 | ATCAY 13 | GRIA2 14 | OLIG2 15 | RPS12 16 | ALCAM 17 | HIP1R 18 | NEU4 19 | ANGPTL2 20 | GALNT13 21 | LRRN1 22 | BMP2 23 | MEGF11 24 | MARCKS 25 | ZCCHC24 26 | LRP4 27 | AC009041.2 28 | GRIA4 29 | RPS24 30 | RAP2A 31 | OMG 32 | LRATD2 33 | ZDHHC22 34 | SAPCD2 35 | DNER 36 | IDI1 37 | COL20A1 38 | TNK2 39 | SCN3A 40 | GABRB3 41 | ARFGEF3 42 | ASIC1 43 | SOX6 44 | ATOH8 45 | DLGAP1 46 | PLPPR1 47 | FAM110B -------------------------------------------------------------------------------- /Deconvolution of Bulk Datasets/Gene Sets/Memory_Like_Tcells.txt: -------------------------------------------------------------------------------- 1 | IL7R 2 | MALAT1 3 | AHNAK 4 | ANXA1 5 | TCF7 6 | FOS 7 | RPS29 8 | MTRNR2L12 9 | SYNE2 10 | CD69 11 | CCR7 12 | ITGA6 13 | TOB1 14 | MT-ATP6 15 | RPLP0 16 | PDE3B 17 | MT-CO3 18 | MT-CO2 19 | MTRNR2L8 20 | PGGHG 21 | LEF1 22 | DDX3Y 23 | MT-CYB 24 | AL627171.2 25 | NKTR 26 | MT-ND1 27 | TMSB10 28 | GIMAP7 29 | ATM 30 | TRABD2A 31 | SERINC5 32 | BCL2 33 | SATB1 34 | DPP4 35 | MT-ND4 36 | MT-ND5 37 | GIMAP4 -------------------------------------------------------------------------------- /Deconvolution of Bulk Datasets/Gene Sets/Microglia.txt: -------------------------------------------------------------------------------- 1 | TMIGD3 2 | CYFIP1 3 | APOC2 4 | SYNDIG1 5 | TMEM119 6 | PIH1D1 7 | WIPF3 8 | MCF2L 9 | TGFBR1 10 | NAV3 11 | BIN1 12 | ZMAT3 13 | LINC01736 14 | GLDN 15 | RTTN 16 | AL078590.2 17 | PDPN 18 | FSCN1 19 | CACNB4 20 | PTCRA 21 | ARMH4 22 | HPGDS 23 | CEBPA 24 | GYPC 25 | RAMP1 26 | IPCEF1 27 | LINC01235 -------------------------------------------------------------------------------- /Deconvolution of Bulk Datasets/Gene Sets/Monocyte.txt: -------------------------------------------------------------------------------- 1 | FCN1 2 | LYZ 3 | CD300E 4 | CD44 5 | MCEMP1 6 | TIMP1 7 | S100A6 8 | S100A8 9 | SH3BGRL3 10 | AC020656.1 11 | FLNA 12 | CFP 13 | ANPEP 14 | S100A12 15 | CCR2 16 | LTA4H 17 | SMIM25 18 | THBS1 19 | STXBP2 20 | UPP1 21 | LILRA5 22 | LILRB2 23 | MYO1G 24 | SGMS2 25 | EREG 26 | CSTA 27 | MPEG1 28 | S100A4 29 | DMXL2 30 | AP1S2 31 | MAP3K20 32 | GLIPR2 33 | CYP1B1 34 | CDA 35 | CLEC12A 36 | NRG1 37 | JARID2 -------------------------------------------------------------------------------- /Deconvolution of Bulk Datasets/Gene Sets/Neutrophils.txt: -------------------------------------------------------------------------------- 1 | IFITM2 2 | FCGR3B 3 | SMCHD1 4 | VNN2 5 | CXCR2 6 | MMP25 7 | MEGF9 8 | MME 9 | MGAM 10 | CXCR1 11 | IGF2R 12 | CLEC4E 13 | IL18RAP 14 | LRRK2 15 | IL1R2 16 | MXD1 17 | ICAM3 18 | TNFRSF10C 19 | RESF1 20 | TMEM154 21 | CPD 22 | S100P 23 | IVNS1ABP 24 | CMTM2 25 | ACSL1 26 | FPR2 27 | NCF1 28 | TUBA4A 29 | IL18R1 30 | CRISPLD2 31 | MSRB1 32 | LITAF 33 | CYP4F3 34 | ADGRE5 35 | QPCT 36 | LILRB3 37 | DGAT2 -------------------------------------------------------------------------------- /Deconvolution of Bulk Datasets/Gene Sets/Oligo.txt: -------------------------------------------------------------------------------- 1 | MAG 2 | ERMN 3 | MOG 4 | SPOCK3 5 | MOBP 6 | CNDP1 7 | TF 8 | TMEM144 9 | CARNS1 10 | KLK6 11 | PEX5L 12 | CLDN11 13 | MYRF 14 | ENPP2 15 | GJB1 16 | FOLH1 17 | PLP1 18 | TPPP 19 | SLC24A2 20 | RAPGEF5 21 | CNTNAP4 22 | EDIL3 23 | AATK 24 | PLEKHH1 25 | PRR18 26 | CAPN3 27 | SOX10 28 | NECAB1 29 | SEPTIN4 30 | HHIP 31 | CNTN2 32 | MAP7 33 | NKAIN2 34 | PPP1R14A 35 | PCSK6 36 | FGFR2 37 | BCAS1 38 | NKX6.2 39 | DBNDD2 40 | GPR37 41 | TMEM125 42 | LGI3 43 | AK5 44 | TUBB4A 45 | PTGDS 46 | BOK 47 | EFHD1 -------------------------------------------------------------------------------- /Deconvolution of Bulk Datasets/Gene Sets/Pericytes.txt: -------------------------------------------------------------------------------- 1 | MYH11 2 | FHL5 3 | LMOD1 4 | ACTA2 5 | ADIRF 6 | RGS5 7 | SLIT3 8 | CASQ2 9 | TINAGL1 10 | MUSTN1 11 | SYNPO2 12 | AOC3 13 | SLC38A11 14 | FRZB 15 | PDE5A 16 | ACTG2 17 | RERGL 18 | MYOCD 19 | PDE3A 20 | MYL9 21 | PLN 22 | TBX18 23 | COL14A1 24 | GJA4 25 | HIGD1B 26 | CNN1 27 | CCDC3 28 | MRVI1 29 | ITGA8 30 | HEYL 31 | NOTCH3 32 | NDUFA4L2 33 | NR2F2 34 | MSRB3 35 | HRC 36 | SMOC2 37 | EDNRA 38 | GUCY1A1 39 | BGN 40 | LRRC32 41 | MCAM 42 | MYLK 43 | ITGA1 44 | AVPR1A 45 | TPM2 46 | PDGFRB -------------------------------------------------------------------------------- /Deconvolution of Bulk Datasets/Gene Sets/Scavenger.txt: -------------------------------------------------------------------------------- 1 | TGFBI 2 | ASPH 3 | SLC16A10 4 | NRP1 5 | ANKH 6 | IGFBP4 7 | RNASE1 8 | ENG 9 | ECM1 10 | MYO5A 11 | FNDC3B 12 | PLXND1 13 | MRC1 14 | LYVE1 15 | ME2 16 | MMP14 17 | PMP22 18 | TNFRSF11A 19 | CLEC5A 20 | ITGB5 21 | GPRIN3 22 | CTSL 23 | AGFG1 24 | SLC39A8 25 | SH3PXD2B 26 | NIBAN2 27 | CEMIP2 28 | GNG12 29 | SPARC 30 | TTYH3 31 | FPR3 32 | ARHGAP18 33 | HIF1A 34 | LTBP2 35 | OLFML2B 36 | LGI2 37 | SEPTIN11 38 | SASH1 39 | ANXA2 40 | EMILIN2 -------------------------------------------------------------------------------- /Deconvolution of Bulk Datasets/Gene Sets/Terminal_Effector_Tcells.txt: -------------------------------------------------------------------------------- 1 | TYROBP 2 | KLRD1 3 | NCAM1 4 | IL2RB 5 | SYK 6 | SH2D1B 7 | B3GNT7 8 | PRF1 9 | CTSW 10 | ITGAX 11 | HOPX 12 | TMIGD2 13 | TRDC 14 | NKG7 15 | KLRC1 16 | GZMB 17 | GNLY 18 | KRT81 19 | KIR2DL4 20 | PTPN12 21 | GSTP1 22 | MATK 23 | GNPTAB 24 | PIK3AP1 25 | IRF8 26 | KLRK1 27 | LAT2 28 | KRT86 29 | NCR1 30 | CD38 31 | GFOD1 32 | XCL1 33 | KLRF1 34 | MAPK1 35 | MAP3K8 36 | CXXC5 37 | CTBP2 38 | LYN 39 | AOAH 40 | ZFHX3 41 | XCL2 42 | GAS7 43 | KLRC3 -------------------------------------------------------------------------------- /Deconvolution of Bulk Datasets/Gene Sets/Treg.txt: -------------------------------------------------------------------------------- 1 | FOXP3 2 | ENTPD1 3 | IL2RA 4 | CTLA4 5 | TNFRSF1B 6 | IKZF2 7 | CCR8 8 | BATF 9 | TNFRSF18 10 | LAYN 11 | TNFRSF4 12 | SLAMF1 13 | ACSL4 14 | TBC1D4 15 | ICOS 16 | IL1R1 17 | CD4 18 | VDR 19 | TNFRSF9 20 | UGP2 21 | GLRX 22 | LAPTM4B 23 | RTKN2 24 | CCNG2 25 | TNFRSF8 26 | CXCR6 27 | STAM 28 | F5 29 | TYMP 30 | UCP2 31 | ICA1 32 | IL32 33 | MYL6 34 | ZC2HC1A 35 | SAT1 36 | FANK1 37 | CD177 38 | PHTF2 39 | RNF213 40 | SPATS2L 41 | CRLF2 42 | DUSP16 -------------------------------------------------------------------------------- /Deconvolution of Bulk Datasets/Gene Sets/cDC.txt: -------------------------------------------------------------------------------- 1 | CLNK 2 | HLA-DQA1 3 | LGALS2 4 | SLAMF7 5 | IDO1 6 | XCR1 7 | P2RY14 8 | HLA-DOB 9 | P2RY10 10 | FCER1A 11 | FLT3 12 | PARM1 13 | CPNE3 14 | WDFY4 15 | PPA1 16 | CST7 17 | CD1C 18 | SLC38A1 19 | DAPP1 20 | CD1E 21 | TACSTD2 22 | CNN2 23 | MCOLN2 24 | CLEC10A 25 | JAML 26 | RAB11FIP1 27 | CCR6 28 | ZNF366 29 | CCSER1 30 | NAAA 31 | LSP1 32 | GPR157 33 | CRIP1 34 | CBFA2T3 35 | DUSP5 -------------------------------------------------------------------------------- /Figure 1 Visualizations/Quadrant plot generation with piecharts: -------------------------------------------------------------------------------- 1 | ######## "matrix_piedot.txt" contains all non-doublet myeloid cells. 2 | ######## Xaxis is calculated by subtracting the usage of Complement Immunosuppression from the usage of IL1B pro-inflammatory ( Usage of IL1B Inflam - Usage of Complement ) 3 | ######## Yaxis is calculated by subtracting the usage of Scavenger Immunosuppression from the usage of RHOB pro-inflammatory ( Usage of RHOB Inflammatory - Usage of Scavenger ) 4 | ######## The matrix contains info about the usage of each immunomodulatory program and the percentage of all other remaining programs (including identities as "Others" category summed together) 5 | 6 | library(ggplot2) 7 | library(dplyr) 8 | library(scatterpie) 9 | library(Cairo) 10 | 11 | 12 | data <- read.table("matrix_piedot.txt", sep="\t", head=TRUE, row.names=1) 13 | 14 | 15 | my_plot <- ggplot() + geom_scatterpie(aes(x=Xaxis, y=Yaxis, r=0.5), data=data, cols=c("Others","Scavenger_Immunosuppressive","IL1B_Inflammatory","Inflammatory_microglia", "Complement_Immunosuppressive"), color=NA) + coord_equal() + theme(panel.grid.major = element_blank(), panel.grid.minor = element_blank(), panel.background = element_blank()) + labs(color=NULL) + scale_fill_manual(values = c("Others" = "gray90", "IL1B_Inflammatory" = "#AB0800", "Complement_Immunosuppressive"="#007AFF", "Inflammatory_microglia"="#FF6961", "Scavenger_Immunosuppressive" = "#0700C4")) 16 | 17 | 18 | ggsave(file="Myeloid_Activities_Scatterpie_Full_Version_V2_OtherInident_05.png", plot = my_plot, height = 8, width = 10, dpi=400) 19 | -------------------------------------------------------------------------------- /Identifying recurrent programs in Myeloid Cells in Gliomas (Related to Figure 1)/01- Identifying Variable Genes in MGB Cohort (Round 1): -------------------------------------------------------------------------------- 1 | library(dplyr) 2 | library(Seurat) 3 | options(bitmapType='cairo') 4 | options(future.globals.maxSize = 8000 * 1024^2) 5 | 6 | ########## Identified High-Quality non-doublet Myeloid cells in MGB cohort as discussed in the "Processing of scRNA-Seq Files section". Generated a list of barcodes as shown below "HQ_Myeloid.txt" (One barcode per line) ############ 7 | 8 | Myeloid_Cells <- scan("HQ_Myeloid.txt", what="") 9 | 10 | ######## Tumors.combined is the Seurat objected generated for all cells ########### 11 | 12 | Myeloid <- subset(x = Tumors.combined, cells = Myeloid_Cells) 13 | 14 | Myeloid.data <- GetAssayData(object = Myeloid, slot="counts") 15 | 16 | 17 | All <- CreateSeuratObject(counts = Myeloid.data, project = "All", min.cells = 3, min.features = 200) 18 | All[["percent.mt"]] <- PercentageFeatureSet(All, pattern = "^MT.") 19 | 20 | 21 | All <- NormalizeData(All) 22 | 23 | All_Matrix2 <- GetAssayData(object = All) 24 | All_Matrix <- utils_big_as.matrix(All_Matrix2, n_slices_init = 10, verbose = T) 25 | 26 | 27 | ########## Filtering the object to obtain genes with expression in at least 20 cells ############### 28 | All_Matrix3 <- All_Matrix[apply(All_Matrix, 1, function(x) sum(x >= 0.1, na.rm=TRUE) > 19),] 29 | All.data2 <- Myeloid.data[rownames(Myeloid.data) %in% rownames(All_Matrix3),] 30 | All <- CreateSeuratObject(counts = All.data2, project = "All", min.cells = 3, min.features = 200) 31 | All[["percent.mt"]] <- PercentageFeatureSet(All, pattern = "^MT.") 32 | 33 | ############ Identifying Variable Genes ################# 34 | 35 | All <- NormalizeData(All) 36 | 37 | 38 | All <- FindVariableFeatures(All, selection.method="vst", nfeatures = 2000) 39 | 40 | Var <- HVFInfo(object = All, selection.method="vst", assay = "RNA") 41 | 42 | write.table(Var, file="MGB_Myeloid_Full_Gene_List_Variable_Score_230206.txt", sep="\t", quote=FALSE, col.names=NA) 43 | 44 | 45 | ####Identify variable Genes from Var (min 0.001 mean expression then top 2000 variance standardized) and then place the genes in a list ######## 46 | 47 | 48 | Var2 <- scan("MGB_Variable_Round1.txt", what="") 49 | 50 | All_Matrix4 <- Myeloid[rownames(Myeloid) %in% Var2,] 51 | 52 | All_Matrix5 <- t(All_Matrix4) 53 | 54 | write.table(All_Matrix5, file="MGB_Myeloid_Matrix_Filtered_For_NMF_Round1.txt", sep="\t", quote=FALSE, col.names=NA) 55 | 56 | 57 | -------------------------------------------------------------------------------- /Identifying recurrent programs in Myeloid Cells in Gliomas (Related to Figure 1)/02- Round 1 cNMF for myeloid cells in MGB cohort: -------------------------------------------------------------------------------- 1 | cnmf prepare --output-dir ./All/ --name All -c MGB_Myeloid_Matrix_Filtered_For_NMF_Round1.txt -k 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 --n-iter 500 --total-workers 1 --seed 14 --numgenes 2000 2 | 3 | cnmf factorize --output-dir ./All/ --name All --worker-index 0; 4 | 5 | cnmf combine --output-dir ./All/ --name All; 6 | 7 | rm ./All/All/cnmf_tmp/All.spectra.k_*.iter_*.df.npz; 8 | 9 | cnmf k_selection_plot --output-dir All --name All; 10 | 11 | 12 | ##### Based on the K plot, we select K=22 ######## 13 | 14 | cnmf consensus --output-dir All --name All --components 22 --local-density-threshold 0.02 --show-clustering 15 | 16 | 17 | ########## The cnmf consensus script outputs usage matrix which is normalized per row to percentages (in each cell, the usages of the programs sums up to 100) ################ 18 | ########## The gene spectra score matrix is used for the annotation of the programs ############# 19 | ########## The first round was used to identify cells using programs that are not myeloid in nature (i.e. different cell type identity) or programs used by less than 100 myeloid cell. We remove cell using these programs (> 20% usage) 20 | -------------------------------------------------------------------------------- /Identifying recurrent programs in Myeloid Cells in Gliomas (Related to Figure 1)/03- Identifying Variable Genes in Houston Cohort (Round 1): -------------------------------------------------------------------------------- 1 | library(dplyr) 2 | library(Seurat) 3 | options(bitmapType='cairo') 4 | options(future.globals.maxSize = 8000 * 1024^2) 5 | 6 | ########## Identified High-Quality non-doublet Myeloid cells in the Houston cohort as discussed in the "Processing of scRNA-Seq Files section". Generated a list of barcodes as shown below "HQ_Myeloid.txt" (One barcode per line) ############ 7 | 8 | Myeloid_Cells <- scan("HQ_Myeloid.txt", what="") 9 | 10 | ######## Tumors.combined is the Seurat objected generated for all cells ########### 11 | 12 | Myeloid <- subset(x = Tumors.combined, cells = Myeloid_Cells) 13 | 14 | Myeloid.data <- GetAssayData(object = Myeloid, slot="counts") 15 | 16 | 17 | All <- CreateSeuratObject(counts = Myeloid.data, project = "All", min.cells = 3, min.features = 200) 18 | All[["percent.mt"]] <- PercentageFeatureSet(All, pattern = "^MT.") 19 | 20 | 21 | All <- NormalizeData(All) 22 | 23 | All_Matrix2 <- GetAssayData(object = All) 24 | All_Matrix <- utils_big_as.matrix(All_Matrix2, n_slices_init = 10, verbose = T) 25 | 26 | 27 | ########## Filtering the object to obtain genes with expression in at least 20 cells ############### 28 | All_Matrix3 <- All_Matrix[apply(All_Matrix, 1, function(x) sum(x >= 0.1, na.rm=TRUE) > 19),] 29 | All.data2 <- Myeloid.data[rownames(Myeloid.data) %in% rownames(All_Matrix3),] 30 | All <- CreateSeuratObject(counts = All.data2, project = "All", min.cells = 3, min.features = 200) 31 | All[["percent.mt"]] <- PercentageFeatureSet(All, pattern = "^MT.") 32 | 33 | ############ Identifying Variable Genes ################# 34 | 35 | All <- NormalizeData(All) 36 | 37 | 38 | All <- FindVariableFeatures(All, selection.method="vst", nfeatures = 2000) 39 | 40 | Var <- HVFInfo(object = All, selection.method="vst", assay = "RNA") 41 | 42 | write.table(Var, file="Houston_Myeloid_Full_Gene_List_Variable_Score_230206.txt", sep="\t", quote=FALSE, col.names=NA) 43 | 44 | 45 | ####Identify variable Genes from Var (min 0.001 mean expression then top 2000 variance standardized) and then place the genes in a list ######## 46 | 47 | 48 | Var2 <- scan("Houston_Variable_Round1.txt", what="") 49 | 50 | All_Matrix4 <- Myeloid[rownames(Myeloid) %in% Var2,] 51 | 52 | All_Matrix5 <- t(All_Matrix4) 53 | 54 | write.table(All_Matrix5, file="Houston_Myeloid_Matrix_Filtered_For_NMF_Round1.txt", sep="\t", quote=FALSE, col.names=NA) 55 | -------------------------------------------------------------------------------- /Identifying recurrent programs in Myeloid Cells in Gliomas (Related to Figure 1)/04- Round 1 cNMF for myeloid cells in Houston cohort: -------------------------------------------------------------------------------- 1 | cnmf prepare --output-dir ./All/ --name All -c Houston_Myeloid_Matrix_Filtered_For_NMF_Round1.txt -k 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 --n-iter 500 --total-workers 1 --seed 14 --numgenes 2000 2 | 3 | cnmf factorize --output-dir ./All/ --name All --worker-index 0; 4 | 5 | cnmf combine --output-dir ./All/ --name All; 6 | 7 | rm ./All/All/cnmf_tmp/All.spectra.k_*.iter_*.df.npz; 8 | 9 | cnmf k_selection_plot --output-dir All --name All; 10 | 11 | 12 | ##### Based on the K plot, we select K=23 ######## 13 | 14 | cnmf consensus --output-dir All --name All --components 23 --local-density-threshold 0.02 --show-clustering 15 | 16 | 17 | ########## The cnmf consensus script outputs usage matrix which is normalized per row to percentages (in each cell, the usages of the programs sums up to 100) ################ 18 | ########## The gene spectra score matrix is used for the annotation of the programs ############# 19 | ########## The first round was used to identify cells using programs that are not myeloid in nature (i.e. different cell type identity) or programs used by less than 100 myeloid cell. We remove cell using these programs (> 20% usage) 20 | -------------------------------------------------------------------------------- /Identifying recurrent programs in Myeloid Cells in Gliomas (Related to Figure 1)/05- Identifying Variable Genes in Jackson's Cohort (Round 1): -------------------------------------------------------------------------------- 1 | library(dplyr) 2 | library(Seurat) 3 | options(bitmapType='cairo') 4 | options(future.globals.maxSize = 8000 * 1024^2) 5 | 6 | ########## Identified High-Quality non-doublet Myeloid cells in the Jackson's cohort as discussed in the "Processing of scRNA-Seq Files section". Generated a list of barcodes as shown below "HQ_Myeloid.txt" (One barcode per line) ############ 7 | 8 | Myeloid_Cells <- scan("HQ_Myeloid.txt", what="") 9 | 10 | ######## Tumors.combined is the Seurat objected generated for all cells ########### 11 | 12 | Myeloid <- subset(x = Tumors.combined, cells = Myeloid_Cells) 13 | 14 | Myeloid.data <- GetAssayData(object = Myeloid, slot="counts") 15 | 16 | 17 | All <- CreateSeuratObject(counts = Myeloid.data, project = "All", min.cells = 3, min.features = 200) 18 | All[["percent.mt"]] <- PercentageFeatureSet(All, pattern = "^MT.") 19 | 20 | 21 | All <- NormalizeData(All) 22 | 23 | All_Matrix2 <- GetAssayData(object = All) 24 | All_Matrix <- utils_big_as.matrix(All_Matrix2, n_slices_init = 10, verbose = T) 25 | 26 | 27 | ########## Filtering the object to obtain genes with expression in at least 20 cells ############### 28 | All_Matrix3 <- All_Matrix[apply(All_Matrix, 1, function(x) sum(x >= 0.1, na.rm=TRUE) > 19),] 29 | All.data2 <- Myeloid.data[rownames(Myeloid.data) %in% rownames(All_Matrix3),] 30 | All <- CreateSeuratObject(counts = All.data2, project = "All", min.cells = 3, min.features = 200) 31 | All[["percent.mt"]] <- PercentageFeatureSet(All, pattern = "^MT.") 32 | 33 | ############ Identifying Variable Genes ################# 34 | 35 | All <- NormalizeData(All) 36 | 37 | 38 | All <- FindVariableFeatures(All, selection.method="vst", nfeatures = 2000) 39 | 40 | Var <- HVFInfo(object = All, selection.method="vst", assay = "RNA") 41 | 42 | write.table(Var, file="Jackson_Myeloid_Full_Gene_List_Variable_Score_230206.txt", sep="\t", quote=FALSE, col.names=NA) 43 | 44 | 45 | ####Identify variable Genes from Var (min 0.001 mean expression then top 2000 variance standardized) and then place the genes in a list ######## 46 | 47 | 48 | Var2 <- scan("MGB_Variable_Round1.txt", what="") 49 | 50 | All_Matrix4 <- Myeloid[rownames(Myeloid) %in% Var2,] 51 | 52 | All_Matrix5 <- t(All_Matrix4) 53 | 54 | write.table(All_Matrix5, file="Jackson_Myeloid_Matrix_Filtered_For_NMF_Round1.txt", sep="\t", quote=FALSE, col.names=NA) 55 | -------------------------------------------------------------------------------- /Identifying recurrent programs in Myeloid Cells in Gliomas (Related to Figure 1)/06- Round 1 cNMF for myeloid cells in Jackson's cohort: -------------------------------------------------------------------------------- 1 | cnmf prepare --output-dir ./All/ --name All -c Jackson_Myeloid_Matrix_Filtered_For_NMF_Round1.txt -k 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 --n-iter 500 --total-workers 1 --seed 14 --numgenes 2000 2 | 3 | cnmf factorize --output-dir ./All/ --name All --worker-index 0; 4 | 5 | cnmf combine --output-dir ./All/ --name All; 6 | 7 | rm ./All/All/cnmf_tmp/All.spectra.k_*.iter_*.df.npz; 8 | 9 | cnmf k_selection_plot --output-dir All --name All; 10 | 11 | 12 | ##### Based on the K plot, we select K=14 ######## 13 | 14 | cnmf consensus --output-dir All --name All --components 14 --local-density-threshold 0.02 --show-clustering 15 | 16 | 17 | ########## The cnmf consensus script outputs usage matrix which is normalized per row to percentages (in each cell, the usages of the programs sums up to 100) ################ 18 | ########## The gene spectra score matrix is used for the annotation of the programs ############# 19 | ########## The first round was used to identify cells using programs that are not myeloid in nature (i.e. different cell type identity) or programs used by less than 100 myeloid cell. We remove cell using these programs (> 20% usage) 20 | -------------------------------------------------------------------------------- /Identifying recurrent programs in Myeloid Cells in Gliomas (Related to Figure 1)/07- Identifying union gene lists suitable for Round 2 NMF from all the cohorts: -------------------------------------------------------------------------------- 1 | library(dplyr) 2 | library(Seurat) 3 | options(bitmapType='cairo') 4 | options(future.globals.maxSize = 8000 * 1024^2) 5 | 6 | ##### Filtering MGB cohort matrix to include myeloid cells cleaned using Round 1 cNMF ################## 7 | 8 | MGB <- read.table("All_MGB_220818_Raw_Expression.txt", sep="\t", head=TRUE, row.names=1) 9 | 10 | Myeloid4 <- scan("HQ_Myeloid_MGB.txt", what="") 11 | 12 | MGB2 <- MGB[,colnames(MGB) %in% Myeloid4] 13 | 14 | 15 | ##### Filtering Jackson's cohort matrix to include myeloid cells cleaned using Round 1 cNMF ################## 16 | 17 | Jackson <- read.table("Jackson_All_Tumors_Raw_Expression.txt", sep="\t", head=TRUE, row.names=1) 18 | 19 | Myeloid2 <- scan("HQ_Myeloid_Jackson.txt", what="") 20 | 21 | Jackson2 <- Jackson[,colnames(Jackson) %in% Myeloid2] 22 | 23 | 24 | ##### Filtering Houston's cohort matrix to include myeloid cells cleaned using Round 1 cNMF ################## 25 | 26 | Houston <- read.table("All_Houston_220826_Raw_Expression.txt", sep="\t", head=TRUE, row.names=1) 27 | 28 | Myeloid3 <- scan("HQ_Myeloid_Houston.txt", what="") 29 | 30 | Houston2 <- Houston[,colnames(Houston) %in% Myeloid3] 31 | 32 | 33 | ######### Creating Seurat object for each cohort and merging the objects to create one ########## 34 | 35 | MGB_Object <- CreateSeuratObject(counts = MGB2, project = "MGB", min.cells = 3, min.features = 200) 36 | 37 | MGB_Object[["percent.mt"]] <- PercentageFeatureSet(MGB_Object, pattern = "^MT.") 38 | 39 | 40 | Jackson_Object <- CreateSeuratObject(counts = Jackson2, project = "Jackson", min.cells = 3, min.features = 200) 41 | 42 | Jackson_Object[["percent.mt"]] <- PercentageFeatureSet(Jackson_Object, pattern = "^MT.") 43 | 44 | 45 | Houston_Object <- CreateSeuratObject(counts = Houston2, project = "Houston", min.cells = 3, min.features = 200) 46 | 47 | Houston_Object[["percent.mt"]] <- PercentageFeatureSet(Houston_Object, pattern = "^MT.") 48 | 49 | 50 | Myeloid <- merge(MGB_Object, y = c(Jackson_Object, Houston_Object), add.cell.ids = c("MGB", "Jackson", "Houston")) 51 | 52 | all.genes <- rownames(Myeloid) 53 | 54 | 55 | ######### Normalization and identifying variable genes suitable for Round 2 cNMF for each cohort #############$ 56 | 57 | 58 | DefaultAssay(Myeloid) <- "RNA" 59 | Myeloid <- NormalizeData(Myeloid) 60 | 61 | Myeloid <- FindVariableFeatures(Myeloid, selection.method="vst", nfeatures = 2000) 62 | 63 | Var <- HVFInfo(object = Myeloid, selection.method="vst", assay = "RNA") 64 | 65 | write.table(Var, file="Combined_Myeloid_Full_Gene_List_Variable_Score_Union_Based.txt", sep="\t", quote=FALSE, col.names=NA) 66 | 67 | all.genes <- rownames(Myeloid) 68 | Myeloid <- ScaleData(Myeloid, features = all.genes) 69 | 70 | 71 | write.table(Myeloid@meta.data, file="Glioma_Combined_Myeloid_TAM_All_MetaData_Union_Based.txt", sep="\t", col.names=NA, quote=FALSE) 72 | 73 | DefaultAssay(Myeloid) <- "RNA" 74 | 75 | Myeloid.data <- GetAssayData(object = Myeloid, slot="counts") 76 | 77 | 78 | #################### Got the union filtered variable by minimum expression of 0.01 in the Combined_Myeloid_Full_Gene_List_Variable_Score_Union_Based.txt and minimum variance standardized of 1 ################################ 79 | 80 | Variable <- scan("union_filtered_Variable.txt", what="") 81 | 82 | 83 | 84 | 85 | #################### Filter the cleaned myeloid matrix of each cohort to include only these genes of interest for Round 2 cNMF for each cohort ############# 86 | 87 | MGB3 <- MGB[rownames(MGB) %in% Variable,colnames(MGB) %in% Myeloid4] 88 | 89 | write.table(t(MGB3), file="MGB_Myeloid_Matrix_Filtered_For_NMF_Round2.txt", sep="\t", quote=FALSE, col.names=NA) 90 | 91 | 92 | Houston3 <- Houston[rownames(Houston) %in% Variable,colnames(Houston) %in% Myeloid3] 93 | 94 | write.table(t(Houston3), file="Houston_Myeloid_Matrix_Filtered_For_NMF_Round2.txt", sep="\t", quote=FALSE, col.names=NA) 95 | 96 | 97 | Jackson3 <- Jackson[rownames(Jackson) %in% Variable,colnames(Jackson) %in% Myeloid2] 98 | 99 | write.table(t(Jackson3), file="Jackson_Myeloid_Matrix_Filtered_For_NMF_Round2.txt", sep="\t", quote=FALSE, col.names=NA) 100 | -------------------------------------------------------------------------------- /Identifying recurrent programs in Myeloid Cells in Gliomas (Related to Figure 1)/08- Round 2 cNMF for myeloid cells in MGB cohort: -------------------------------------------------------------------------------- 1 | ######### The input matrix include only genes identified in Step 7 ######## 2 | 3 | cnmf prepare --output-dir ./All/ --name All -c MGB_Myeloid_Matrix_Filtered_For_NMF_Round2.txt -k 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 --n-iter 500 --total-workers 1 --seed 14 --numgenes 2276 4 | 5 | cnmf factorize --output-dir ./All/ --name All --worker-index 0; 6 | 7 | cnmf combine --output-dir ./All/ --name All; 8 | 9 | rm ./All/All/cnmf_tmp/All.spectra.k_*.iter_*.df.npz; 10 | 11 | cnmf k_selection_plot --output-dir All --name All; 12 | 13 | 14 | ##### Based on the K plot, we select K=18 ######## 15 | 16 | cnmf consensus --output-dir All --name All --components 18 --local-density-threshold 0.02 --show-clustering 17 | 18 | 19 | ########## The cnmf consensus script outputs usage matrix which is normalized per row to percentages (in each cell, the usages of the programs sums up to 100) ################ 20 | ########## The gene spectra score matrix is used for the annotation of the programs ############# 21 | -------------------------------------------------------------------------------- /Identifying recurrent programs in Myeloid Cells in Gliomas (Related to Figure 1)/09- Round 2 cNMF for myeloid cells in Houston cohort: -------------------------------------------------------------------------------- 1 | ######### The input matrix include only genes identified in Step 7 ######## 2 | 3 | cnmf prepare --output-dir ./All/ --name All -c Houston_Myeloid_Matrix_Filtered_For_NMF_Round2.txt -k 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 --n-iter 500 --total-workers 1 --seed 14 --numgenes 2276 4 | 5 | cnmf factorize --output-dir ./All/ --name All --worker-index 0; 6 | 7 | cnmf combine --output-dir ./All/ --name All; 8 | 9 | rm ./All/All/cnmf_tmp/All.spectra.k_*.iter_*.df.npz; 10 | 11 | cnmf k_selection_plot --output-dir All --name All; 12 | 13 | 14 | ##### Based on the K plot, we select K=19 ######## 15 | 16 | cnmf consensus --output-dir All --name All --components 19 --local-density-threshold 0.02 --show-clustering 17 | 18 | 19 | ########## The cnmf consensus script outputs usage matrix which is normalized per row to percentages (in each cell, the usages of the programs sums up to 100) ################ 20 | ########## The gene spectra score matrix is used for the annotation of the programs ############# 21 | -------------------------------------------------------------------------------- /Identifying recurrent programs in Myeloid Cells in Gliomas (Related to Figure 1)/10- Round 2 cNMF for myeloid cells in Jackson's cohort: -------------------------------------------------------------------------------- 1 | ######### The input matrix include only genes identified in Step 7 ######## 2 | 3 | cnmf prepare --output-dir ./All/ --name All -c Jackson_Myeloid_Matrix_Filtered_For_NMF_Round2.txt -k 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 --n-iter 500 --total-workers 1 --seed 14 --numgenes 2276 4 | 5 | cnmf factorize --output-dir ./All/ --name All --worker-index 0; 6 | 7 | cnmf combine --output-dir ./All/ --name All; 8 | 9 | rm ./All/All/cnmf_tmp/All.spectra.k_*.iter_*.df.npz; 10 | 11 | cnmf k_selection_plot --output-dir All --name All; 12 | 13 | 14 | ##### Based on the K plot, we select K=18 ######## 15 | 16 | cnmf consensus --output-dir All --name All --components 18 --local-density-threshold 0.02 --show-clustering 17 | 18 | 19 | ########## The cnmf consensus script outputs usage matrix which is normalized per row to percentages (in each cell, the usages of the programs sums up to 100) ################ 20 | ########## The gene spectra score matrix is used for the annotation of the programs ############# 21 | -------------------------------------------------------------------------------- /Identifying recurrent programs in Myeloid Cells in Gliomas (Related to Figure 1)/11- Identifying consensus programs: -------------------------------------------------------------------------------- 1 | ####### E3, is the obtained by merging the gene spectra outputs of the three cohorts ####### 2 | 3 | library(jacpop) 4 | library(lsa) 5 | 6 | Cos <- cosine(E3) 7 | 8 | Cosine_dist <- as.dist(1-Cos) 9 | -------------------------------------------------------------------------------- /Identifying recurrent programs in Myeloid Cells in Gliomas (Related to Figure 1)/13 - Calculating the usages of the consensus myeloid programs in the validation Mcgill cohort: -------------------------------------------------------------------------------- 1 | ####### After we obtained the spectra for all consensus programs in Step 12, we placed the values in a data frame "consensus_spectra.txt" ###### 2 | 3 | ############## Filter the Mcgill Cohort Matrix to include only non-doublet myeloid cells and genes that were used for Round 2 cNMF in the discovery cohorts ####### 4 | 5 | ############## Inside R ################################ 6 | library(dplyr) 7 | library(Seurat) 8 | options(bitmapType='cairo') 9 | options(future.globals.maxSize = 8000 * 1024^2) 10 | 11 | Matrix <- GetAssayData(Tumors.combined, slot = "counts") 12 | 13 | Genes <- read.table("union_filtered_Variable.txt", sep="\t", head=TRUE, row.names=1) 14 | 15 | Genes2 <- rownames(Genes) 16 | 17 | Matrix2 <- Matrix[rownames(Matrix) %in% Genes2,] 18 | 19 | write.table(t(as.matrix(Matrix2)), file="./OPK_10X_Raw_Counts_Variable_for_Myeloid_cNMF.txt", sep="\t", col.names=NA, quote=FALSE) 20 | 21 | 22 | dim(Matrix2) ######### to find out the number for --numgenes below ################# 23 | 24 | q() 25 | 26 | ##################################### Exit R ########################################### 27 | conda activate cnmf_env 28 | 29 | cnmf prepare --output-dir ./Calculate_Usage_Myeloid/ --name Calculate_Usage_Myeloid -c OPK_10X_Raw_Counts_Variable_for_Myeloid_cNMF.txt -k 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 --n-iter 500 --total-workers 1 --seed 14 --numgenes 2278; 30 | 31 | python; 32 | 33 | ################################## Inside Python ################################# 34 | 35 | import sklearn 36 | import sklearn.decomposition 37 | from sklearn.decomposition import non_negative_factorization 38 | import numpy as np 39 | import scanpy as sc 40 | import csv 41 | import scipy 42 | import pandas as pd 43 | 44 | 45 | H = pd.read_table("consensus_spectra.txt", sep="\t", index_col=0) 46 | 47 | H2 = H.T 48 | 49 | X = sc.read_h5ad('Calculate_Usage_Myeloid/Calculate_Usage_Myeloid/cnmf_tmp/Calculate_Usage_Myeloid.norm_counts.h5ad') 50 | 51 | X2 = X.X.toarray() 52 | 53 | X3 = pd.DataFrame(data=X2, columns = X.var_names , index = X.obs.index) 54 | 55 | H3 = H2.filter(items = X3.columns) 56 | 57 | H4 = H3.to_numpy() 58 | 59 | X5 = X2.astype(np.float64) 60 | 61 | test = sklearn.decomposition.non_negative_factorization(X5, W=None, H=H4, n_components= 14, init='random', update_H=False, solver='cd', beta_loss='frobenius', tol=0.0001, max_iter=1000, alpha=0.0, alpha_W=0.0, alpha_H='same', l1_ratio=0.0, regularization=None, random_state=None, verbose=0, shuffle=False) 62 | 63 | test2 = list(test) 64 | 65 | pd.DataFrame(test2[0], columns= H.columns, index=X.obs.index).to_csv(path_or_buf="./OPK_10X_Myeloid_Programs_Usage.txt", sep="\t", quoting=csv.QUOTE_NONE) 66 | 67 | ############ The output "OPK_10X_Myeloid_Programs_Usage.txt" contains the usage values of all cells which can be then normalized to percentages (each row (cell) the values of the usages should be converted to percentages so that the sums of the program usages will be always 100 in every cell ############## 68 | -------------------------------------------------------------------------------- /MAESTER/01- Trimming R2 reads to include high quality bases only: -------------------------------------------------------------------------------- 1 | mkdir Trimmed; 2 | 3 | cp *_R2_*.fastq.gz ./Trimmed/; 4 | 5 | cd ./Trimmed/; 6 | 7 | for file in *;do gunzip $file;done 8 | 9 | for file in *.fastq;do homerTools trim -5 24 $file;done; 10 | 11 | 12 | ###### This will copy R2 files to another folder, unzip them, trim the first 25 bases in each read 13 | ###### the ouputs can be renamed and gunzipped and placed in the same folder as the R1 read for subsequent steps of processing and analysis 14 | -------------------------------------------------------------------------------- /MAESTER/02- Extracting barcode and UMI sequences from R1 and transferring them to read names in processed R2 fastqs: -------------------------------------------------------------------------------- 1 | Assemble_fastq.R /Path/To/Fastqs Maester barcodes.txt 12 8 2 | 3 | 4 | #### Assemble_fastq.R is obtained from https://github.com/petervangalen/MAESTER-2021/blob/main/Pre-processing/Assemble_fastq.R ##### 5 | 6 | #### /Path/To/Fastqs is the folder which includes R1 and R2 fastqs 7 | 8 | ##### Maester is the name of the sample (Located inside the folder) that will be proccessed 9 | 10 | ###### barcodes.txt is the list of barcodes in the scRNA-Seq part that passes the QC threshold 11 | -------------------------------------------------------------------------------- /MAESTER/03- Mapping Maester libraries: -------------------------------------------------------------------------------- 1 | STAR --genomeDir /seq/epiprod02/Chadi/Genomes/STAR/hg38_MitoMasked/ --readFilesIn Maester_Processed_R2.fastq.gz --outFileNamePrefix ./Mapped/Maester --outSAMtype BAM SortedByCoordinate --outFilterMultimapNmax 2 --outFilterScoreMinOverLread 0.25 --readFilesCommand zcat; 2 | -------------------------------------------------------------------------------- /MAESTER/04- Make the STAR aligner bam file compatible with MAEGATK: -------------------------------------------------------------------------------- 1 | Tag_CB_UMI.sh Maester.bam 2 | 3 | 4 | #### Tag_CB_UMI.sh is obtained from https://github.com/petervangalen/MAESTER-2021/blob/main/Pre-processing/Tag_CB_UMI.sh ### 5 | -------------------------------------------------------------------------------- /MAESTER/05- Run MAEGATK to obtain single cell level values of counts and coverages: -------------------------------------------------------------------------------- 1 | maegatk bcall --input=${bam_input} --output=${maegatk_outputs} --mito-genome=chrM.fa --min-reads=3 --ncores=20 --snake-stdout 2 | 3 | 4 | ##### bam_input is the output from Tag_CB_UMI.sh script #### 5 | ##### Make sure that chrM.fa is indexed using BWA and that all files can be located by maegatk 6 | ##### ${maegatk_outputs} is the folder where the maegatk.rds R object will be generated which is crucial for the next steps of the analysis 7 | -------------------------------------------------------------------------------- /MAESTER/06 - Obtain Counts Matrix and Coverage Matrix at single cell level from MAEGATK output: -------------------------------------------------------------------------------- 1 | #################inside R########### 2 | 3 | #~~~~~~~~~~~~~~~~~~~~~# 4 | #### Upload Prerequisites #### 5 | #~~~~~~~~~~~~~~~~~~~~~# 6 | 7 | options(stringsAsFactors = FALSE) 8 | options(scipen = 999) 9 | 10 | library(tidyverse) 11 | library(SummarizedExperiment) 12 | library(Seurat) 13 | library(data.table) 14 | library(Matrix) 15 | library(ComplexHeatmap) 16 | library(gdata) 17 | library(stringr) 18 | library(ggforce) 19 | 20 | ###### Read MAEGATK Output ############# 21 | 22 | maegatk.rse <- readRDS("maegatk.rds") 23 | 24 | 25 | message("computeAFMutMatrix()") 26 | computeAFMutMatrix <- function(SE){ 27 | ref_allele <- as.character(rowRanges(SE)$refAllele) 28 | 29 | getMutMatrix <- function(letter){ 30 | mat <- (assays(SE)[[paste0(letter, "_counts_fw")]] + assays(SE)[[paste0(letter, "_counts_rev")]]) 31 | rownames(mat) <- paste0(as.character(1:dim(mat)[1]), "_", toupper(ref_allele), ">", letter) 32 | return(mat[toupper(ref_allele) != letter,]) 33 | } 34 | 35 | rbind(getMutMatrix("A"), getMutMatrix("C"), getMutMatrix("G"), getMutMatrix("T")) 36 | } 37 | 38 | af.dm2 <- data.matrix(computeAFMutMatrix(maegatk.rse)) 39 | 40 | 41 | ############# Filter The matrix to include annotated barcodes from the respective scRNA-Seq library ########## 42 | 43 | seu <- read.table("CellTypes_Coarse_V10.txt", sep="\t", head=TRUE, row.names=1) 44 | 45 | common.cells <- intersect(rownames(seu), colnames(af.dm2)) 46 | 47 | seu <- seu[common.cells,] 48 | 49 | af.dm3 <- af.dm2[,common.cells] 50 | 51 | write.table(af.dm3, file="Maester_Counts.txt", sep="\t", col.names=NA, quote=FALSE) 52 | 53 | 54 | ############# Obtain Coverage Matrix ########## 55 | 56 | 57 | cov <- assays(maegatk.rse)[["coverage"]] 58 | 59 | cov2 <- as.matrix(cov) 60 | 61 | seu <- read.table("MGH915_CellTypes_Coarse_V10.txt", sep="\t", head=TRUE, row.names=1) 62 | 63 | common.cells <- intersect(rownames(seu), colnames(cov2)) 64 | 65 | seu <- seu[common.cells,] 66 | 67 | cov3 <- cov2[,common.cells] 68 | 69 | write.table(cov3,file="Coverage.txt", sep="\t", quote=FALSE, col.names=NA) 70 | 71 | 72 | ######## To be performed similarly for Primary tumor and PBMC libraries ########### 73 | -------------------------------------------------------------------------------- /MAESTER/07- Low Resolution Pesduobulking of count matrices: -------------------------------------------------------------------------------- 1 | ######## We combine the single-cell count matrix for primary tumor and PBMC libraries ################# 2 | 3 | ############################ Inside R ########################################## 4 | 5 | data_A <- read.table("PT_Maester_Counts.txt",head=TRUE, sep="\t", row.names=1) 6 | 7 | data_PBMC <- read.table("PBMC_Maester_Counts.txt",head=TRUE, sep="\t", row.names=1) 8 | 9 | 10 | data_Final <- merge(data_A, data_PBMC, by="row.names") 11 | 12 | rownames(data_Final) <- data_Final$Row.names 13 | 14 | data_Final <- data_Final[,-1] 15 | 16 | library(dplyr) 17 | 18 | 19 | ######## The annotation file includes the filtered barcodes and respective low-resolution annotation for both PT and PBMC libraries ########## 20 | ######## Filter the merged matrix to include annotated barcodes only (cells that pass RNA-Seq QC ######## 21 | 22 | annotation <- read.table("CellTypes_Coarse_V10.txt", head=TRUE, row.names = 1, sep="\t") 23 | 24 | common.cells <- intersect(rownames(annotation), colnames(data_Final)) 25 | 26 | annotation <- annotation[common.cells,] 27 | 28 | data_Final <- data_Final[,common.cells] 29 | 30 | 31 | ######### Subset the matrix to generate a matrix for each annotation ######### 32 | 33 | cells.tib <- tibble(cell = common.cells, 34 | orig.ident = annotation$orig.ident, 35 | CellType_RNA = annotation$CellType) 36 | 37 | CellSubsets.ls <- list(unionCells = cells.tib$cell, 38 | TAM = filter(cells.tib, CellType_RNA == "Myeloid")$cell, 39 | Malignant = filter(cells.tib, CellType_RNA == "Malignant")$cell, 40 | Stromal = filter(cells.tib, CellType_RNA == "Stromal")$cell, 41 | Oligo = filter(cells.tib, CellType_RNA == "Oligo")$cell, 42 | Tcells = filter(cells.tib, CellType_RNA == "Tcells")$cell, 43 | Myeloid_PBMC = filter(cells.tib, CellType_RNA == "Myeloid_PBMC")$cell 44 | ) 45 | 46 | 47 | data_Malignant <- data_Final[,colnames(data_Final) %in% CellSubsets.ls$Malignant] 48 | data_Tcells <- data_Final[,colnames(data_Final) %in% CellSubsets.ls$Tcells] 49 | data_TAM <- data_Final[,colnames(data_Final) %in% CellSubsets.ls$TAM] 50 | data_Oligo <- data_Final[,colnames(data_Final) %in% CellSubsets.ls$Oligo] 51 | data_Stromal <- data_Final[,colnames(data_Final) %in% CellSubsets.ls$Stromal] 52 | data_Myeloid_PBMC <- data_Final[,colnames(data_Final) %in% CellSubsets.ls$Myeloid_PBMC] 53 | 54 | 55 | 56 | ############# Perform Pseduobulking by summing the values for each variant in each category and then output the results in a pseudobulked matrix ######### 57 | 58 | Group_Names <- c("Tcells", "TAM", "Oligo", "Stromal", "Myeloid_PBMC", "Malignant", "All_Cells") 59 | 60 | 61 | E <- matrix(data=0, nrow = nrow(data_Final), ncol=length(Group_Names)); 62 | 63 | for (i in 1:nrow(data_Final)){ 64 | 65 | 66 | 67 | data_Tcells_G1 <- data_Tcells[i,,drop=FALSE] 68 | 69 | Test1 <- sum(as.numeric(data_Tcells_G1)) 70 | 71 | 72 | data_TAM_G1 <- data_TAM[i,,drop=FALSE] 73 | 74 | Test2 <- sum(as.numeric(data_TAM_G1)) 75 | 76 | 77 | data_Oligo_G1 <- data_Oligo[i,,drop=FALSE] 78 | 79 | Test3 <- sum(as.numeric(data_Oligo_G1)) 80 | 81 | 82 | data_Stromal_G1 <- data_Stromal[i,,drop=FALSE] 83 | 84 | Test4 <- sum(as.numeric(data_Stromal_G1)) 85 | 86 | 87 | 88 | data_Myeloid_PBMC_G1 <- data_Myeloid_PBMC[i,,drop=FALSE] 89 | 90 | Test6 <- sum(as.numeric(data_Myeloid_PBMC_G1)) 91 | 92 | 93 | 94 | 95 | data_Malignant_G1 <- data_Malignant[i,,drop=FALSE] 96 | 97 | Test7 <- sum(as.numeric(data_Malignant_G1)) 98 | 99 | 100 | 101 | data_All_G1 <- data_Final[i,,drop=FALSE] 102 | 103 | Test8 <- sum(as.numeric(data_All_G1)) 104 | 105 | 106 | E[i,] <- c(Test1, Test2, Test3, Test4, Test6, Test7, Test8) 107 | 108 | } 109 | 110 | 111 | rownames(E) <- rownames(data_Final) 112 | 113 | colnames(E) <- Group_Names 114 | 115 | write.table(E, file="Sums_Counts_Per_CellType_V10.txt", sep="\t", col.names=NA, quote=FALSE) 116 | 117 | colnames(E) 118 | 119 | -------------------------------------------------------------------------------- /MAESTER/08- Low Resolution Pesduobulking of Coverage matrices: -------------------------------------------------------------------------------- 1 | ######## We combine the single-cell coverage matrix for primary tumor and PBMC libraries ################# 2 | 3 | ############################ Inside R ########################################## 4 | 5 | data_A <- read.table("MGH915_PT_Coverage.txt",head=TRUE, sep="\t", row.names=1) 6 | 7 | 8 | data_PBMC <- read.table("MGH915_PBMC_Coverage.txt",head=TRUE, sep="\t", row.names=1) 9 | 10 | 11 | data3 <- merge(data_A, data_PBMC, by="row.names") 12 | 13 | data_Final <- data3 14 | 15 | rownames(data_Final) <- data_Final$Row.names 16 | 17 | data_Final <- data_Final[,-c(1,2)] 18 | 19 | library(dplyr) 20 | 21 | ######## The annotation file includes the filtered barcodes and respective low-resolution annotation for both PT and PBMC libraries ########## 22 | ######## Filter the merged matrix to include annotated barcodes only (cells that pass RNA-Seq QC ######## 23 | 24 | annotation <- read.table("CellTypes_Coarse_V10.txt", head=TRUE, row.names = 1, sep="\t") 25 | 26 | common.cells <- intersect(rownames(annotation), colnames(data_Final)) 27 | 28 | annotation <- annotation[common.cells,] 29 | 30 | data_Final <- data_Final[,common.cells] 31 | 32 | 33 | ######### Subset the matrix to generate a matrix for each annotation ######### 34 | 35 | 36 | cells.tib <- tibble(cell = common.cells, 37 | orig.ident = annotation$orig.ident, 38 | CellType_RNA = annotation$CellType) 39 | 40 | CellSubsets.ls <- list(unionCells = cells.tib$cell, 41 | TAM = filter(cells.tib, CellType_RNA == "Myeloid")$cell, 42 | Malignant = filter(cells.tib, CellType_RNA == "Malignant")$cell, 43 | Stromal = filter(cells.tib, CellType_RNA == "Stromal")$cell, 44 | Oligo = filter(cells.tib, CellType_RNA == "Oligo")$cell, 45 | Tcells = filter(cells.tib, CellType_RNA == "Tcells")$cell, 46 | Myeloid_PBMC = filter(cells.tib, CellType_RNA == "Myeloid_PBMC")$cell 47 | ) 48 | 49 | 50 | data_Malignant <- data_Final[,colnames(data_Final) %in% CellSubsets.ls$Malignant] 51 | data_Tcells <- data_Final[,colnames(data_Final) %in% CellSubsets.ls$Tcells] 52 | data_TAM <- data_Final[,colnames(data_Final) %in% CellSubsets.ls$TAM] 53 | data_Oligo <- data_Final[,colnames(data_Final) %in% CellSubsets.ls$Oligo] 54 | data_Stromal <- data_Final[,colnames(data_Final) %in% CellSubsets.ls$Stromal] 55 | data_Myeloid_PBMC <- data_Final[,colnames(data_Final) %in% CellSubsets.ls$Myeloid_PBMC] 56 | 57 | 58 | ############# Perform Pseduobulking by summing the values for each position in each annotation and then output the results in a pseudobulked matrix ######### 59 | 60 | Group_Names <- c("Tcells", "TAM", "Oligo", "Stromal", "Myeloid_PBMC", "Malignant", "All_Cells") 61 | 62 | 63 | E <- matrix(data=0, nrow = nrow(data_Final), ncol=length(Group_Names)); 64 | 65 | for (i in 1:nrow(data_Final)){ 66 | 67 | 68 | 69 | data_Tcells_G1 <- data_Tcells[i,,drop=FALSE] 70 | 71 | Test1 <- sum(as.numeric(data_Tcells_G1)) 72 | 73 | 74 | data_TAM_G1 <- data_TAM[i,,drop=FALSE] 75 | 76 | Test2 <- sum(as.numeric(data_TAM_G1)) 77 | 78 | 79 | data_Oligo_G1 <- data_Oligo[i,,drop=FALSE] 80 | 81 | Test3 <- sum(as.numeric(data_Oligo_G1)) 82 | 83 | 84 | data_Stromal_G1 <- data_Stromal[i,,drop=FALSE] 85 | 86 | Test4 <- sum(as.numeric(data_Stromal_G1)) 87 | 88 | 89 | 90 | data_Myeloid_PBMC_G1 <- data_Myeloid_PBMC[i,,drop=FALSE] 91 | 92 | Test6 <- sum(as.numeric(data_Myeloid_PBMC_G1)) 93 | 94 | 95 | 96 | 97 | data_Malignant_G1 <- data_Malignant[i,,drop=FALSE] 98 | 99 | Test7 <- sum(as.numeric(data_Malignant_G1)) 100 | 101 | 102 | 103 | data_All_G1 <- data_Final[i,,drop=FALSE] 104 | 105 | Test8 <- sum(as.numeric(data_All_G1)) 106 | 107 | 108 | E[i,] <- c(Test1, Test2, Test3, Test4, Test6, Test7, Test8) 109 | 110 | } 111 | 112 | 113 | rownames(E) <- rownames(data_Final) 114 | 115 | colnames(E) <- Group_Names 116 | 117 | write.table(E, file="MGH915_Sums_Coverage_Per_CellType_V10.txt", sep="\t", col.names=NA, quote=FALSE) 118 | 119 | colnames(E) 120 | -------------------------------------------------------------------------------- /MAESTER/09- Calculating VAFs from pseudobulked counts and coverage matrices: -------------------------------------------------------------------------------- 1 | ############################ Inside R ########################################## 2 | 3 | data <- read.table("Sums_Counts_Per_CellType_V10.txt", head=TRUE, row.names=1, sep="\t") 4 | 5 | data2 <- read.table("Sums_Coverage_Per_CellType_V10.txt", head=TRUE, row.names=1, sep="\t") 6 | 7 | ####### Add a pseudo-count to the coverage matrix to prevent errors ############# 8 | 9 | data2 <- data2 + 0.000001 10 | 11 | 12 | ####### Adjust the counts data frame to enable the calculation script to work ######## 13 | 14 | data$Position <- rownames(data) 15 | 16 | data$Position <- gsub("_...","",data$Position) 17 | 18 | data2$Position <- rownames(data2) 19 | 20 | data$Variant <- rownames(data) 21 | 22 | d = matrix(ncol=7) 23 | 24 | ######### Calculate the VAFs ######## 25 | 26 | for (n in rownames(data2)){a<-(data[data$Position==n,c(1:7)]);b<-as.numeric(data2[n,c(1:7)]); c <- t(a)/b;d<-rbind(d,t(c))} 27 | 28 | d2 <- na.omit(d) 29 | 30 | 31 | d3 <- as.data.frame(d2) 32 | 33 | d4 <- d3*100 34 | 35 | write.table(d4, file="Pseudobulked_VAFs_V10.txt", col.names=NA, quote=FALSE, sep="\t") 36 | -------------------------------------------------------------------------------- /MAESTER/10- Selection of Variants of Interest: -------------------------------------------------------------------------------- 1 | #### To identify variants specific to the myeloid cells in the tumor microenvironment, the variant has to meet the following criteria: 2 | (a) meet the minimum VAF requirement for TAMs for the coverages in TAMs and Myeloid_PBMC categories. 3 | (b) VAF=0 in Myeloid PBMC category. 4 | (c) VAF > minimum required for TAMs. 5 | 6 | ##### To identify variants specific to the myeloid cells in PBMC, the variant has to meet the following criteria: 7 | (a) meet the minimum VAF requirement for Myeloid_PBMC for the coverages in Malignant, TAMs and Myeloid_PBMC categories. 8 | (b) VAF = 0 in the Malignant category. (If the tumor library is enriched for malignant cells, this criteria can be replaced with VAF in Myeloid_PBMC is 20 times more than malignant) 9 | (c) VAF > 0 in the TAM category. 10 | (d) VAF > minimum required in Myeloid_PBMC category 11 | 12 | 13 | The binomial test for detecting the minimum VAF required at a particular coverage was calculated using python as follows: 14 | 15 | 16 | 17 | 18 | -------------------------------------------------------------------------------- /MAESTER/11- High-resolution Pseudobulking of Myeloid cell population in the Primary Tumor: -------------------------------------------------------------------------------- 1 | #################inside R########### 2 | 3 | data_A <- read.table("PT_Maester_Counts.txt",head=TRUE, sep="\t", row.names=1) 4 | 5 | 6 | data_Final <- data_A 7 | 8 | 9 | library(dplyr) 10 | 11 | ######## The annotation file includes the filtered barcodes and respective high-resolution annotation for myeloid cells in Primary Tumor libraries ########## 12 | ######## Annotation criteria shown in the methods section ######## 13 | 14 | annotation <- read.table("CellTypes_Fine_V10.txt", head=TRUE, row.names = 1, sep="\t") 15 | 16 | common.cells <- intersect(rownames(annotation), colnames(data_Final)) 17 | 18 | annotation <- annotation[common.cells,] 19 | 20 | data_Final <- data_Final[,common.cells] 21 | 22 | 23 | ######### Subset the matrix to generate a matrix for each high-resolution annotation ######### 24 | 25 | 26 | cells.tib <- tibble(cell = common.cells, 27 | CellType_RNA = annotation$Identity) 28 | 29 | CellSubsets.ls <- list(unionCells = cells.tib$cell, 30 | Macrophage = filter(cells.tib, CellType_RNA == "Macrophages")$cell, 31 | Monocyte = filter(cells.tib, CellType_RNA == "Monocytes")$cell, 32 | Mono_Macro = filter(cells.tib, CellType_RNA == "Mono_Macro")$cell, 33 | Microglia_Like = filter(cells.tib, CellType_RNA == "Microglia_Like")$cell, 34 | cDC = filter(cells.tib, CellType_RNA == "cDC")$cell, 35 | Microglia = filter(cells.tib, CellType_RNA == "Microglia")$cell, 36 | Neutrophil = filter(cells.tib, CellType_RNA == "Neutrophils")$cell 37 | ) 38 | 39 | 40 | 41 | data_Macrophage <- data_Final[,colnames(data_Final) %in% CellSubsets.ls$Macrophage] 42 | data_Monocyte <- data_Final[,colnames(data_Final) %in% CellSubsets.ls$Monocyte] 43 | data_Mono_Macro <- data_Final[,colnames(data_Final) %in% CellSubsets.ls$Mono_Macro] 44 | data_Microglia_Like <- data_Final[,colnames(data_Final) %in% CellSubsets.ls$Microglia_Like] 45 | data_cDC <- data_Final[,colnames(data_Final) %in% CellSubsets.ls$cDC] 46 | data_Microglia <- data_Final[,colnames(data_Final) %in% CellSubsets.ls$Microglia] 47 | data_Neutrophil <- data_Final[,colnames(data_Final) %in% CellSubsets.ls$Neutrophil] 48 | 49 | ############# Perform Pseduobulking by counting the number of cells that include each variant for each high-resolution identity and then output the results in a pseudobulked matrix ######### 50 | 51 | 52 | Group_Names <- c("Macrophage", "Monocyte", "Mono_Macro", "Microglia_Like", "cDC", "Microglia", "Neutrophil") 53 | 54 | E <- matrix(data=0, nrow = nrow(data_Final), ncol=length(Group_Names)); 55 | 56 | for (i in 1:nrow(data_Final)){ 57 | 58 | data_Macrophage_G1 <- data_Macrophage[i,,drop=FALSE] 59 | data_Macrophage_G1 <- data_Macrophage_G1[,apply(data_Macrophage_G1,2,function(x) sum(x > 0))] 60 | Test1 <- ncol(as.matrix(data_Macrophage_G1)) 61 | 62 | data_Monocyte_G1 <- data_Monocyte[i,,drop=FALSE] 63 | data_Monocyte_G1 <- data_Monocyte_G1[,apply(data_Monocyte_G1,2,function(x) sum(x > 0))] 64 | Test2 <- ncol(as.matrix(data_Monocyte_G1)) 65 | 66 | 67 | data_Mono_Macro_G1 <- data_Mono_Macro[i,,drop=FALSE] 68 | data_Mono_Macro_G1 <- data_Mono_Macro_G1[,apply(data_Mono_Macro_G1,2,function(x) sum(x > 0))] 69 | Test3 <- ncol(as.matrix(data_Mono_Macro_G1)) 70 | 71 | 72 | 73 | data_Microglia_Like_G1 <- data_Microglia_Like[i,,drop=FALSE] 74 | data_Microglia_Like_G1 <- data_Microglia_Like_G1[,apply(data_Microglia_Like_G1,2,function(x) sum(x > 0))] 75 | Test4 <- ncol(as.matrix(data_Microglia_Like_G1)) 76 | 77 | 78 | data_cDC_G1 <- data_cDC[i,,drop=FALSE] 79 | data_cDC_G1 <- data_cDC_G1[,apply(data_cDC_G1,2,function(x) sum(x > 0))] 80 | Test5 <- ncol(as.matrix(data_cDC_G1)) 81 | 82 | 83 | data_Microglia_G1 <- data_Microglia[i,,drop=FALSE] 84 | data_Microglia_G1 <- data_Microglia_G1[,apply(data_Microglia_G1,2,function(x) sum(x > 0))] 85 | Test6 <- ncol(as.matrix(data_Microglia_G1)) 86 | 87 | 88 | data_Neutrophil_G1 <- data_Neutrophil[i,,drop=FALSE] 89 | data_Neutrophil_G1 <- data_Neutrophil_G1[,apply(data_Neutrophil_G1,2,function(x) sum(x > 0))] 90 | Test7 <- ncol(as.matrix(data_Neutrophil_G1)) 91 | 92 | 93 | 94 | E[i,] <- c(Test1, Test2, Test3, Test4, Test5, Test6, Test7) 95 | 96 | } 97 | 98 | 99 | rownames(E) <- rownames(data_Final) 100 | 101 | colnames(E) <- Group_Names 102 | 103 | 104 | write.table(E, file="Cell_Counts_Per_CellType_Fine_V10.txt", sep="\t", col.names=NA, quote=FALSE) 105 | 106 | colnames(E) 107 | 108 | 109 | -------------------------------------------------------------------------------- /MAESTER/12 - Calculating GSVA enrichment for variants categories in myeloid cell identities in tumor microenvironment: -------------------------------------------------------------------------------- 1 | ###################### Inside R ####################### 2 | 3 | ##### read the high-resolution pseudobulked table ####### 4 | 5 | E <- read.table("Cell_Counts_Per_CellType_V10.txt", sep="\t", head=TRUE, row.names=1) 6 | 7 | ##### Read PBMC-specific variants for myeloid cells ###### 8 | 9 | Myeloid_PBMC_Not_Malignant <- scan("PBMC_Not_Malignant2.txt", what="") 10 | 11 | 12 | ##### Read Tumor microenvironment-specific variants of myeloid cells as a list ###### 13 | 14 | 15 | TAM_Not_PBMC <- scan("TAM_Not_PBMC.txt", what="") 16 | 17 | 18 | ##### Place all the variants of interest in a list ###### 19 | Combined_Groups <- scan("Combined_Groups6.txt", what="") 20 | 21 | ######## Filter the pseudo bulked data frame to include only variants of interest 22 | 23 | E2 <- E[rownames(E) %in% Combined_Groups,] 24 | 25 | 26 | ####### Remove variants not detected in any category ##### 27 | 28 | E5 <- E2[rowSums(E4) > 0,] 29 | 30 | 31 | library(ComplexHeatmap) 32 | 33 | library("GSVA") 34 | 35 | Lists <- list(Myeloid_PBMC_Not_Malignant, TAM_Not_PBMC) 36 | 37 | Test <- gsva(as.matrix(E5), Lists, kcdf="Poisson") 38 | 39 | rownames(Test) <- c("Myeloid_PBMC_Not_Malignant", "TAM_Not_PBMC") 40 | 41 | 42 | write.table(Test, file=GSVA_Scores_Identities.txt", col.names=NA, quote=FALSE, sep="\t") 43 | 44 | ###### Manually Added Cell Number and fraction of TAMs to the table and remove any identity with less than 10 cells contributing to the GSVA score to obtain reliable enrichments ######### 45 | -------------------------------------------------------------------------------- /MAESTER/13- Visualizations (Dotplot): -------------------------------------------------------------------------------- 1 | #### Generate the dotplot ##### 2 | 3 | ##### Inside R ######## 4 | 5 | library(ggplot2) 6 | 7 | ######## GSVA is calculated as (GSVA PBMC-Specific enrichment - GSVA TME-Specific enrichment) ### See step 12 ######### 8 | ######## The data frame includes the fraction of cells annotated as each identity in the scRNA-Seq libraries ### 9 | data <- read.table("Identities_Dotplot_V10.txt", head=TRUE, sep="\t") 10 | 11 | 12 | scaled_data <- (data$GSVA - min(data$GSVA)) / (max(data$GSVA) - min(data$GSVA)) 13 | 14 | scaled_data <- scaled_data * 2 - 1 15 | 16 | 17 | data2 <- data 18 | 19 | data2$GSVA <- scaled_data 20 | 21 | 22 | factor_order <- c("Neutrophils", "cDCs", "Monocytes", "Mono_Macro", "Macrophage", "Microglia_Like", "Microglia") 23 | 24 | 25 | data2$TAM <- factor(data2$TAM, levels = factor_order) 26 | 27 | 28 | ggplot(data2, aes(x=GSVA, y=TAM, size=Fraction2*10)) + geom_point(data = subset(data2, Fraction2 != 0)) + labs(y="Identity", x="Enrichment of PBMC Variants - Enrichment of TME Variants") + scale_size_area(breaks = c(0.1,0.5, 1), max_size = 18) + scale_x_continuous(limits = c(-1.1,1.1)) + labs(col="TAM") + theme(panel.grid.major = element_blank(), panel.grid.minor = element_blank(), panel.background = element_blank(), axis.line.x = element_line(colour = "black"), axis.text.y = element_text(face="bold", size=14), axis.text.x = element_text(face="bold", size=14), legend.text=element_text(size=12)) + geom_vline(xintercept = 0, linetype = "dashed", color = "black") 29 | -------------------------------------------------------------------------------- /MAESTER/14- Visualizations (Stacked Columns): -------------------------------------------------------------------------------- 1 | ####### Inside R ############## 2 | ####### The value is the average usage of the four immunomodulatory programs in the four tumors (Maester libraries - scRNA-Seq part) ###### 3 | ######## Others represent the average usage of all other programs (including identities) ######### 4 | 5 | library(ggplot2) 6 | 7 | data <- read.table("Maester_Stack_V12.txt", head=TRUE, sep="\t") 8 | 9 | 10 | 11 | factor_order <- c("Neutrophil", "cDC", "Monocyte", "Mono_Macro", "Macrophage", "Microglia_Like", "Microglia") 12 | 13 | data$Identity <- factor(data$Identity, levels = factor_order) 14 | 15 | data$Program <- factor(data$Program, levels = c("Others", "Scavenger", "Complement", "RHOB", "IL1B")) 16 | 17 | 18 | ggplot(data, aes(x = Value, y = Identity, fill = Program)) + geom_bar(position="fill", stat="identity", width=0.45) + scale_fill_manual(values = c("Others" = "gray90", "IL1B" = "#AB0800", "Complement"="#007AFF", "RHOB"="#FF6961", "Scavenger" = "#0700C4")) + theme(panel.grid.major = element_blank(), panel.grid.minor = element_blank(), panel.background = element_blank()) 19 | -------------------------------------------------------------------------------- /Processing of GBO scRNA-Seq libraries (Related to Figure 7)/01- Aligning all GBO Seq-Well scRNA-Seq libraries: -------------------------------------------------------------------------------- 1 | /path/to/STAR --genomeDir /path/to/genome/dir/ --readFilesIn Read2_Lane1.fastq.gz,Read2_Lane2.fastq.gz,Read2_Lane3.fastq.gz Read1_Lane1.fastq.gz,Read1_Lane2.fastq.gz,Read1_Lane3.fastq.gz --soloType CB_UMI_Simple --soloCBstart 1 --soloCBlen 12 --soloUMIstart 13 --soloUMIlen 8 --outSAMtype BAM SortedByCoordinate --outSAMattributes CR UR CY UY CB UB 2 | -------------------------------------------------------------------------------- /Processing of GBO scRNA-Seq libraries (Related to Figure 7)/02- Seurat Processing for BWH911 to obtain Raw Counts Matrix: -------------------------------------------------------------------------------- 1 | library(dplyr) 2 | library(Seurat) 3 | options(bitmapType='cairo') 4 | options(future.globals.maxSize = 8000 * 1024^2) 5 | 6 | 7 | 8 | DMSO1.data <- Read10X("DMSO1/") 9 | 10 | DMSO1 <- CreateSeuratObject(counts = DMSO1.data, project = "DMSO1", min.cells = 3, min.features = 200) 11 | 12 | DMSO1[["percent.mt"]] <- PercentageFeatureSet(DMSO1, pattern = "^MT.") 13 | 14 | 15 | DMSO2.data <- Read10X("DMSO2/") 16 | 17 | DMSO2 <- CreateSeuratObject(counts = DMSO2.data, project = "DMSO2", min.cells = 3, min.features = 200) 18 | 19 | DMSO2[["percent.mt"]] <- PercentageFeatureSet(DMSO2, pattern = "^MT.") 20 | 21 | 22 | 23 | DMSO3.data <- Read10X("DMSO3/") 24 | 25 | DMSO3 <- CreateSeuratObject(counts = DMSO3.data, project = "DMSO3", min.cells = 3, min.features = 200) 26 | 27 | DMSO3[["percent.mt"]] <- PercentageFeatureSet(DMSO3, pattern = "^MT.") 28 | 29 | 30 | 31 | Dex1.data <- Read10X("Dex1/") 32 | 33 | Dex1 <- CreateSeuratObject(counts = Dex1.data, project = "Dex1", min.cells = 3, min.features = 200) 34 | 35 | Dex1[["percent.mt"]] <- PercentageFeatureSet(Dex1, pattern = "^MT.") 36 | 37 | 38 | Dex2.data <- Read10X("Dex2/") 39 | 40 | Dex2 <- CreateSeuratObject(counts = Dex2.data, project = "Dex2", min.cells = 3, min.features = 200) 41 | 42 | Dex2[["percent.mt"]] <- PercentageFeatureSet(Dex2, pattern = "^MT.") 43 | 44 | 45 | Dex3.data <- Read10X("Dex3/") 46 | 47 | Dex3 <- CreateSeuratObject(counts = Dex3.data, project = "Dex3", min.cells = 3, min.features = 200) 48 | 49 | Dex3[["percent.mt"]] <- PercentageFeatureSet(Dex3, pattern = "^MT.") 50 | 51 | 52 | 53 | BWH911_GBO <- merge(DMSO1, y = c(DMSO2, DMSO3, Dex1, Dex2, Dex3), add.cell.ids = c("BWH911_GBO_DMSO1", "BWH911_GBO_DMSO2", "BWH911_GBO_DMSO3", "BWH911_GBO_Dex1", "BWH911_GBO_Dex2", "BWH911_GBO_Dex3"), project = "BWH911_GBO") 54 | 55 | 56 | pdf("BWH911_GBO_QC_BF.pdf", height = 6, width = 20) 57 | VlnPlot(BWH911_GBO, features = c("nFeature_RNA", "nCount_RNA", "percent.mt"), ncol = 3) 58 | dev.off() 59 | 60 | BWH911_GBO <- subset(BWH911_GBO, subset = nFeature_RNA > 500 & nFeature_RNA < 6000 & nCount_RNA > 1000 & percent.mt < 15) 61 | 62 | 63 | pdf("BWH911_GBO_QC_AF.pdf", height = 6, width = 20) 64 | VlnPlot(BWH911_GBO, features = c("nFeature_RNA", "nCount_RNA", "percent.mt"), ncol = 3) 65 | dev.off() 66 | 67 | library(sctransform) 68 | BWH911_GBO <- SCTransform(BWH911_GBO, vars.to.regress = "percent.mt", verbose = TRUE) 69 | BWH911_GBO <- RunPCA(BWH911_GBO) 70 | pdf("BWH911_GBO_ElbowPlot.pdf", height = 6, width = 6) 71 | ElbowPlot(BWH911_GBO, ndims=50) 72 | dev.off() 73 | 74 | write.table(BWH911_GBO@meta.data, file="BWH911_GBO_Integrated_MetaData.txt", sep="\t", col.names=NA, quote=FALSE) 75 | 76 | 77 | Treatment <- scan("Treatment.txt", what="") 78 | 79 | BWH911_GBO@meta.data$Treatment <- Treatment 80 | 81 | 82 | 83 | BWH911_GBO <- RunUMAP(BWH911_GBO, reduction = "pca", dims = 1:15) 84 | BWH911_GBO <- FindNeighbors(BWH911_GBO, dims = 1:15) 85 | BWH911_GBO <- FindClusters(BWH911_GBO, resolution = 0.3) 86 | 87 | pdf("BWH911_GBO_UMAP_Clusters.pdf", height= 6, width = 7) 88 | DimPlot(BWH911_GBO, reduction = "umap") 89 | dev.off() 90 | 91 | pdf("BWH911_GBO_Clusters_With_Labels.pdf", height= 6, width = 7) 92 | DimPlot(BWH911_GBO, reduction = "umap", label=TRUE) 93 | dev.off() 94 | 95 | pdf("BWH911_GBO_UMAP_PatientID.pdf", height= 6, width = 9) 96 | DimPlot(BWH911_GBO, reduction = "umap", group.by="orig.ident") 97 | dev.off() 98 | 99 | pdf("BWH911_GBO_UMAP_Treatment.pdf", height= 6, width = 9) 100 | DimPlot(BWH911_GBO, reduction = "umap", group.by="Treatment") 101 | dev.off() 102 | 103 | 104 | write.table(BWH911_GBO@meta.data, file="BWH911_GBO_Integrated_MetaData.txt", sep="\t", col.names=NA, quote=FALSE) 105 | 106 | 107 | s.genes <- cc.genes$s.genes 108 | g2m.genes <- cc.genes$g2m.genes 109 | 110 | BWH911_GBO <- CellCycleScoring(BWH911_GBO, s.features = s.genes, g2m.features = g2m.genes, set.ident = FALSE) 111 | 112 | 113 | write.table(BWH911_GBO@meta.data, file="BWH911_GBO_All_MetaData.txt", sep="\t", col.names=NA, quote=FALSE) 114 | 115 | Matrix <- GetAssayData(object = BWH911_GBO, slot = "counts") 116 | 117 | write.table(as.matrix(Matrix), file="./BWH911_GBO_Raw_Counts.txt", sep="\t", col.names=NA, quote=FALSE) 118 | 119 | 120 | 121 | saveRDS(BWH911_GBO, file="BWH911_GBO_Glioma.rds") 122 | -------------------------------------------------------------------------------- /Processing of GBO scRNA-Seq libraries (Related to Figure 7)/03- Calculating the usage of all cell types NMF programs in BWH911 GBO to extract non-doublet myeloid myeloid cells: -------------------------------------------------------------------------------- 1 | ############### Python Scripts ##################### 2 | import sklearn 3 | import sklearn.decomposition 4 | from sklearn.decomposition import non_negative_factorization 5 | import numpy as np 6 | import scanpy as sc 7 | import csv 8 | import scipy 9 | import pandas as pd 10 | 11 | X = pd.read_table("BWH911_GBO_Raw_Counts.txt", index_col=0, sep='\t') 12 | 13 | X2 = X.T 14 | 15 | H = np.load("cnmf_run.spectra.k_18.dt_0_015.consensus.df.npz", allow_pickle=True) 16 | 17 | H2 = pd.DataFrame(H['data'], columns = H['columns'], index = H['index']) 18 | 19 | H3 = H2.filter(items = X2.columns) 20 | 21 | X4 = X2.filter(items = H3.columns) 22 | 23 | H4 = H3.to_numpy() 24 | 25 | X5 = X4.values 26 | 27 | X6 = X5.astype(np.float64) 28 | 29 | test = sklearn.decomposition.non_negative_factorization(X6, W=None, H=H4, n_components= 18, init='random', update_H=False, solver='cd', beta_loss='frobenius', tol=0.0001, max_iter=1000, alpha=0.0, alpha_W=0.0, alpha_H='same', l1_ratio=0.0, regularization=None, random_state=None, verbose=0, shuffle=False) 30 | 31 | test2 = list(test) 32 | 33 | processed = pd.DataFrame(test2[0], columns= H['index'], index=X4.index) 34 | 35 | row_sums = processed.sum(axis=1) 36 | 37 | processed_data = (processed.div(row_sums, axis=0) * 100) 38 | 39 | new_column_names = ['Tcells', 'AC', 'NPC1_OPC', 'Microglia', 'MES2', 'Vascular_MES1', 'Oligodendrocytes', 'MES1', 'CD14_Mono', 'cDC', 'Neutrophils', 'NPC2', 'Giant_Cell_GBM', 'Cycling', 'Pericytes', 'Plasma', 'Endothelial', 'Mast'] 40 | 41 | processed_data.columns = new_column_names 42 | 43 | processed_data.to_csv(path_or_buf="./BWH911_GBO_All_CellType_Usages.txt", sep="\t", quoting=csv.QUOTE_NONE) 44 | -------------------------------------------------------------------------------- /Processing of GBO scRNA-Seq libraries (Related to Figure 7)/04- Calculating the usage of myeloid NMF programs in Myeloid cells of BWH911 GBO: -------------------------------------------------------------------------------- 1 | ########## We extracted the myeloid cells from the output of all cell types usage as follows: 2 | ########## The usage scores for 4 myeloid programs were summed to create the “myeloid usage” per cell ('Microglia', 'CD14_Mono', 'cDC', 'Neutrophils'). Other categoris are also summed (i.e. 'AC', 'NPC1_OPC', 'MES2', 'MES1', 'NPC2', 'Giant_Cell_GBM', as Malignant). 3 | ########## Cells were then annotated as one of the cell types using the top scoring usage for cell type category. Myeloid cells had the highest myeloid usage. 4 | 5 | ############### Python Scripts ##################### 6 | 7 | import sklearn 8 | import sklearn.decomposition 9 | from sklearn.decomposition import non_negative_factorization 10 | import numpy as np 11 | import scanpy as sc 12 | import csv 13 | import scipy 14 | import pandas as pd 15 | 16 | X = pd.read_table("BWH911_GBO_Raw_Counts.txt", index_col=0, sep='\t') 17 | 18 | X2 = X.T 19 | 20 | H = pd.read_table("Myeloid_NMF_Average_Gene_Spectra.txt", sep="\t", index_col=0) 21 | 22 | H2 = H.T 23 | 24 | X3 = X2.filter(items = H2.columns) 25 | 26 | H3 = H2.filter(items = X3.columns) 27 | 28 | H4 = H3.to_numpy() 29 | 30 | X4 = X3.values 31 | 32 | X5 = X4.astype(np.float64) 33 | 34 | test = sklearn.decomposition.non_negative_factorization(X5, W=None, H=H4, n_components= 14, init='random', update_H=False, solver='cd', beta_loss='frobenius', tol=0.0001, max_iter=1000, alpha=0.0, alpha_W=0.0, alpha_H='same', l1_ratio=0.0, regularization=None, random_state=None, verbose=0, shuffle=False) 35 | 36 | test2 = list(test) 37 | 38 | processed = pd.DataFrame(test2[0], columns= H.columns, index=X3.index) 39 | 40 | row_sums = processed.sum(axis=1) 41 | 42 | processed_data = (processed.div(row_sums, axis=0) * 100) 43 | 44 | processed_data.to_csv(path_or_buf="./BWH911_GBO_Myeloid_Programs_Usages.txt", sep="\t", quoting=csv.QUOTE_NONE) 45 | -------------------------------------------------------------------------------- /Processing of GBO scRNA-Seq libraries (Related to Figure 7)/05- Seurat Processing for GBOs treated with DMSO and GNE to obtain Raw Counts Matrix: -------------------------------------------------------------------------------- 1 | library(dplyr) 2 | library(Seurat) 3 | options(bitmapType='cairo') 4 | options(future.globals.maxSize = 8000 * 1024^2) 5 | 6 | 7 | MGH253_GBO.data <- read.table("MGH253_GBO_Cleaned_Raw_Expression.txt", head=TRUE, row.names=1, sep="\t") 8 | MGH253_GBO <- CreateSeuratObject(counts = MGH253_GBO.data, project = "MGH253_GBO", min.cells = 3, min.features = 200) 9 | MGH253_GBO[["percent.mt"]] <- PercentageFeatureSet(MGH253_GBO, pattern = "^MT.") 10 | 11 | MGH314_GBO.data <- read.table("MGH314_GBO_Cleaned_Raw_Expression.txt", head=TRUE, row.names=1, sep="\t") 12 | MGH314_GBO <- CreateSeuratObject(counts = MGH314_GBO.data, project = "MGH314_GBO", min.cells = 3, min.features = 200) 13 | MGH314_GBO[["percent.mt"]] <- PercentageFeatureSet(MGH314_GBO, pattern = "^MT.") 14 | 15 | MGH630_GBO.data <- read.table("MGH630_GBO_Cleaned_Raw_Expression.txt", head=TRUE, row.names=1, sep="\t") 16 | MGH630_GBO <- CreateSeuratObject(counts = MGH630_GBO.data, project = "MGH630_GBO", min.cells = 3, min.features = 200) 17 | MGH630_GBO[["percent.mt"]] <- PercentageFeatureSet(MGH630_GBO, pattern = "^MT.") 18 | 19 | var3 <- intersect(rownames(MGH253_GBO), rownames(MGH314_GBO)) 20 | 21 | all.genes <- intersect(var3, rownames(MGH630_GBO)) 22 | 23 | 24 | GBO.combined <- merge(MGH253_GBO, y = c(MGH314_GBO, MGH630_GBO), add.cell.ids = c("MGH253_GBO", "MGH314_GBO", "MGH630_GBO"), project = "GBO") 25 | 26 | 27 | pdf("SeqWell_GBO_QC_AF.pdf", height = 6, width = 20) 28 | VlnPlot(GBO.combined, features = c("nFeature_RNA", "nCount_RNA", "percent.mt"), ncol = 3) 29 | dev.off() 30 | 31 | library(sctransform) 32 | GBO.combined <- SCTransform(GBO.combined, vars.to.regress = "percent.mt", verbose = TRUE) 33 | GBO.combined <- RunPCA(GBO.combined) 34 | pdf("SeqWell_GBO_ElbowPlot.pdf", height = 6, width = 6) 35 | ElbowPlot(GBO.combined, ndims=50) 36 | dev.off() 37 | 38 | write.table(GBO.combined@meta.data, file="SeqWell_GBO_Integrated_MetaData.txt", sep="\t", col.names=NA, quote=FALSE) 39 | 40 | 41 | GBO.combined <- RunUMAP(GBO.combined, reduction = "pca", dims = 1:20) 42 | GBO.combined <- FindNeighbors(GBO.combined, dims = 1:20) 43 | GBO.combined <- FindClusters(GBO.combined, resolution = 0.3) 44 | pdf("SeqWell_GBO_UMAP_Clusters.pdf", height= 6, width = 7) 45 | DimPlot(GBO.combined, reduction = "umap") 46 | dev.off() 47 | 48 | pdf("SeqWell_GBO_Clusters_With_Labels.pdf", height= 6, width = 7) 49 | DimPlot(GBO.combined, reduction = "umap", label=TRUE) 50 | dev.off() 51 | 52 | pdf("SeqWell_GBO_UMAP_Patient_ID.pdf", height= 6, width = 9) 53 | DimPlot(GBO.combined, reduction = "umap", group.by="orig.ident") 54 | dev.off() 55 | 56 | Treatment <- scan("Treatment.txt", what="") 57 | 58 | GBO.combined@meta.data$Treatment <- Treatment 59 | 60 | pdf("SeqWell_GBO_UMAP_Treatment.pdf", height= 6, width = 9) 61 | DimPlot(GBO.combined, reduction = "umap", group.by="Treatment") 62 | dev.off() 63 | 64 | write.table(GBO.combined@meta.data, file="SeqWell_GBO_Integrated_MetaData.txt", sep="\t", col.names=NA, quote=FALSE) 65 | 66 | s.genes <- cc.genes$s.genes 67 | g2m.genes <- cc.genes$g2m.genes 68 | 69 | GBO.combined <- CellCycleScoring(GBO.combined, s.features = s.genes, g2m.features = g2m.genes, set.ident = FALSE) 70 | 71 | Matrix <- GetAssayData(object = GBO.combined, slot = "counts") 72 | 73 | write.table(as.matrix(Matrix), file="./GBO_DMSO_GNE_Raw_Counts.txt", sep="\t", col.names=NA, quote=FALSE) 74 | 75 | saveRDS(GBO.combined, file="GBO_DMSO_GNE_Glioma.rds") 76 | -------------------------------------------------------------------------------- /Processing of GBO scRNA-Seq libraries (Related to Figure 7)/06- Calculating the usage of all cell types NMF programs in GBOs treated with DMSO and GNE to extract non-doublet myeloid cells: -------------------------------------------------------------------------------- 1 | ############### Python Scripts ##################### 2 | 3 | X = pd.read_table("GBO_DMSO_GNE_Raw_Counts.txt", index_col=0, sep='\t') 4 | 5 | X2 = X.T 6 | 7 | H = np.load("cnmf_run.spectra.k_18.dt_0_015.consensus.df.npz", allow_pickle=True) 8 | 9 | H2 = pd.DataFrame(H['data'], columns = H['columns'], index = H['index']) 10 | 11 | H3 = H2.filter(items = X2.columns) 12 | 13 | X4 = X2.filter(items = H3.columns) 14 | 15 | H4 = H3.to_numpy() 16 | 17 | X5 = X4.values 18 | 19 | X6 = X5.astype(np.float64) 20 | 21 | test = sklearn.decomposition.non_negative_factorization(X6, W=None, H=H4, n_components= 18, init='random', update_H=False, solver='cd', beta_loss='frobenius', tol=0.0001, max_iter=1000, alpha=0.0, alpha_W=0.0, alpha_H='same', l1_ratio=0.0, regularization=None, random_state=None, verbose=0, shuffle=False) 22 | 23 | test2 = list(test) 24 | 25 | processed = pd.DataFrame(test2[0], columns= H['index'], index=X4.index) 26 | 27 | row_sums = processed.sum(axis=1) 28 | 29 | processed_data = (processed.div(row_sums, axis=0) * 100) 30 | 31 | new_column_names = ['Tcells', 'AC', 'NPC1_OPC', 'Microglia', 'MES2', 'Vascular_MES1', 'Oligodendrocytes', 'MES1', 'CD14_Mono', 'cDC', 'Neutrophils', 'NPC2', 'Giant_Cell_GBM', 'Cycling', 'Pericytes', 'Plasma', 'Endothelial', 'Mast'] 32 | 33 | processed_data.columns = new_column_names 34 | 35 | processed_data.to_csv(path_or_buf="./GBO_DMSO_GNE_All_CellType_Usages.txt", sep="\t", quoting=csv.QUOTE_NONE) 36 | -------------------------------------------------------------------------------- /Processing of GBO scRNA-Seq libraries (Related to Figure 7)/07- Calculating the usage of myeloid NMF programs in Myeloid cells of GBOs treated with DMSO or GNE: -------------------------------------------------------------------------------- 1 | ########## We extracted the myeloid cells from the output of all cell types usage as follows: 2 | ########## The usage scores for 4 myeloid programs were summed to create the “myeloid usage” per cell ('Microglia', 'CD14_Mono', 'cDC', 'Neutrophils'). Other categoris are also summed (i.e. 'AC', 'NPC1_OPC', 'MES2', 'MES1', 'NPC2', 'Giant_Cell_GBM', as Malignant). 3 | ########## Cells were then annotated as one of the cell types using the top scoring usage for cell type category. Myeloid cells had the highest myeloid usage. 4 | 5 | ############### Python Scripts ##################### 6 | 7 | X = pd.read_table("GBO_DMSO_GNE_Raw_Counts.txt", index_col=0, sep='\t') 8 | 9 | X2 = X.T 10 | 11 | H = pd.read_table("Myeloid_NMF_Average_Gene_Spectra.txt", sep="\t", index_col=0) 12 | 13 | H2 = H.T 14 | 15 | X3 = X2.filter(items = H2.columns) 16 | 17 | H3 = H2.filter(items = X3.columns) 18 | 19 | H4 = H3.to_numpy() 20 | 21 | X4 = X3.values 22 | 23 | X5 = X4.astype(np.float64) 24 | 25 | test = sklearn.decomposition.non_negative_factorization(X5, W=None, H=H4, n_components= 14, init='random', update_H=False, solver='cd', beta_loss='frobenius', tol=0.0001, max_iter=1000, alpha=0.0, alpha_W=0.0, alpha_H='same', l1_ratio=0.0, regularization=None, random_state=None, verbose=0, shuffle=False) 26 | 27 | test2 = list(test) 28 | 29 | processed = pd.DataFrame(test2[0], columns= H.columns, index=X3.index) 30 | 31 | row_sums = processed.sum(axis=1) 32 | 33 | processed_data = (processed.div(row_sums, axis=0) * 100) 34 | 35 | processed_data.to_csv(path_or_buf="./GBO_DMSO_GNE_Myeloid_Programs_Usages.txt", sep="\t", quoting=csv.QUOTE_NONE) 36 | -------------------------------------------------------------------------------- /Processing of scRNA-Seq Files (Related to Figure 1)/01- Align SeqWell scRNA-Seq Libraries: -------------------------------------------------------------------------------- 1 | /path/to/STAR --genomeDir /path/to/genome/dir/ --readFilesIn Read2_Lane1.fastq.gz,Read2_Lane2.fastq.gz,Read2_Lane3.fastq.gz Read1_Lane1.fastq.gz,Read1_Lane2.fastq.gz,Read1_Lane3.fastq.gz --soloType CB_UMI_Simple --soloCBstart 1 --soloCBlen 12 --soloUMIstart 13 --soloUMIlen 8 --outSAMtype BAM SortedByCoordinate --outSAMattributes CR UR CY UY CB UB 2 | -------------------------------------------------------------------------------- /Processing of scRNA-Seq Files (Related to Figure 1)/02- Align 10X V3 scRNA-Seq Libraries: -------------------------------------------------------------------------------- 1 | /path/to/STAR --genomeDir /path/to/genome/dir/ --readFilesIn Read2_Lane1.fastq.gz,Read2_Lane2.fastq.gz,Read2_Lane3.fastq.gz Read1_Lane1.fastq.gz,Read1_Lane2.fastq.gz,Read1_Lane3.fastq.gz --soloType CB_UMI_Simple --soloCBstart 1 --soloCBlen 16 --soloUMIstart 17 --soloUMIlen 12 --soloCBmatchWLtype 1MM_multi_Nbase_pseudocounts --soloUMIfiltering MultiGeneUMI_CR --soloUMIdedup 1MM_CR --clipAdapterType CellRanger4 --outFilterScoreMin 30 --outSAMtype BAM SortedByCoordinate --outSAMattributes CR UR CY UY CB UB --soloCBwhitelist 3M-february-2018_TRU.txt --readFilesCommand zcat --outFileNamePrefix /Path/To/Output/{sampleName} 2 | 3 | 4 | ###### "3M-february-2018_TRU.txt" can be obtained from CellRanger tool barcode folder 5 | -------------------------------------------------------------------------------- /Processing of scRNA-Seq Files (Related to Figure 1)/03- Align 10X V2 scRNA-Seq Libraries: -------------------------------------------------------------------------------- 1 | /path/to/STAR --genomeDir /path/to/genome/dir/ --readFilesIn Read2_Lane1.fastq.gz,Read2_Lane2.fastq.gz,Read2_Lane3.fastq.gz Read1_Lane1.fastq.gz,Read1_Lane2.fastq.gz,Read1_Lane3.fastq.gz --soloType CB_UMI_Simple --soloCBstart 1 --soloCBlen 16 --soloUMIstart 17 --soloUMIlen 10 --soloCBmatchWLtype 1MM_multi_Nbase_pseudocounts --soloUMIfiltering MultiGeneUMI_CR --soloUMIdedup 1MM_CR --clipAdapterType CellRanger4 --outFilterScoreMin 30 --outSAMtype BAM SortedByCoordinate --outSAMattributes CR UR CY UY CB UB --soloCBwhitelist 737k-august-2016.txt --readFilesCommand zcat --outFileNamePrefix /Path/To/Output/{sampleName} 2 | 3 | ###### "737k-august-2016.txt" can be obtained from CellRanger tool barcode folder 4 | -------------------------------------------------------------------------------- /Processing of scRNA-Seq Files (Related to Figure 1)/04- Seurat for Processing MGB Cohort.R: -------------------------------------------------------------------------------- 1 | library(dplyr) 2 | library(Seurat) 3 | options(bitmapType='cairo') 4 | options(future.globals.maxSize = 8000 * 1024^2) 5 | 6 | ################### We converted all the raw STARsolo outputs into a tab-delimited text matrix (genes in rows, cells in columns) and merged all these matrices to form a single matrix. The barcodes for each tumor were prefixed with "TumorID_" ######### 7 | 8 | data <- read.table("All_SeqWell_220818_Raw_Expression.txt", sep="\t", head=TRUE, row.names=1) 9 | 10 | Tumors.combined <- CreateSeuratObject(counts = data, project = "SeqWell", min.cells = 3, min.features = 200) 11 | 12 | Tumors.combined[["percent.mt"]] <- PercentageFeatureSet(Tumors.combined, pattern = "^MT.") 13 | 14 | all.genes <- rownames(Tumors.combined) 15 | 16 | 17 | ######## Filtering Low Quality Cells ############## 18 | 19 | Tumors.combined <- subset(Tumors.combined, subset = nFeature_RNA > 500 & nFeature_RNA < 6000 & nCount_RNA > 1000 & percent.mt < 25) 20 | 21 | pdf("SeqWell_WT_Mutant_Tumors_QC_AF.pdf", height = 6, width = 20) 22 | VlnPlot(Tumors.combined, features = c("nFeature_RNA", "nCount_RNA", "percent.mt"), ncol = 3) 23 | dev.off() 24 | 25 | ######### Normalization, Scaling, Identification of Variable Genes and Regression of % of mito genes, PCA ######## 26 | 27 | library(sctransform) 28 | Tumors.combined <- SCTransform(Tumors.combined, vars.to.regress = "percent.mt", verbose = TRUE) 29 | Tumors.combined <- RunPCA(Tumors.combined) 30 | pdf("SeqWell_WT_Mutant_Tumors_ElbowPlot.pdf", height = 6, width = 6) 31 | ElbowPlot(Tumors.combined, ndims=50) 32 | dev.off() 33 | 34 | write.table(Tumors.combined@meta.data, file="SeqWell_WT_Mutant_Tumors_Integrated_MetaData.txt", sep="\t", col.names=NA, quote=FALSE) 35 | 36 | 37 | ################# Louvain Clustering and UMAP Generation ################### 38 | 39 | 40 | Tumors.combined <- RunUMAP(Tumors.combined, reduction = "pca", dims = 1:24) 41 | Tumors.combined <- FindNeighbors(Tumors.combined, dims = 1:24) 42 | Tumors.combined <- FindClusters(Tumors.combined, resolution = 0.3) 43 | 44 | 45 | pdf("SeqWell_WT_Mutant_Tumors_Clusters_With_Labels.pdf", height= 6, width = 7) 46 | DimPlot(Tumors.combined, reduction = "umap", label=TRUE) 47 | dev.off() 48 | 49 | pdf("SeqWell_WT_Mutant_Tumors_UMAP_Patient_ID.pdf", height= 6, width = 9) 50 | DimPlot(Tumors.combined, reduction = "umap", group.by="orig.ident") 51 | dev.off() 52 | 53 | 54 | write.table(Tumors.combined@meta.data, file="SeqWell_WT_Mutant_Tumors_Integrated_MetaData.txt", sep="\t", col.names=NA, quote=FALSE) 55 | 56 | 57 | saveRDS(Tumors.combined, file="SeqWell_Brain_Tumors.rds") 58 | -------------------------------------------------------------------------------- /Processing of scRNA-Seq Files (Related to Figure 1)/05- Identifying Variable Genes for NMF in MGB Cohort: -------------------------------------------------------------------------------- 1 | library(dplyr) 2 | library(Seurat) 3 | options(bitmapType='cairo') 4 | options(future.globals.maxSize = 8000 * 1024^2) 5 | 6 | ################### We converted all the raw STARsolo outputs into a tab-delimited text matrix (genes in rows, cells in columns) and merged all these matrices to form a single matrix. The barcodes for each tumor were prefixed with "TumorID_" ######### 7 | 8 | data <- read.table("All_SeqWell_220818_Raw_Expression.txt", sep="\t", head=TRUE, row.names=1) 9 | 10 | Tumors.combined <- CreateSeuratObject(counts = data, project = "SeqWell", min.cells = 3, min.features = 200) 11 | 12 | Tumors.combined[["percent.mt"]] <- PercentageFeatureSet(Tumors.combined, pattern = "^MT.") 13 | 14 | all.genes <- rownames(Tumors.combined) 15 | 16 | 17 | ######## Filtering Low Quality Cells ############## 18 | 19 | Tumors.combined <- subset(Tumors.combined, subset = nFeature_RNA > 500 & nFeature_RNA < 6000 & nCount_RNA > 1000 & percent.mt < 25) 20 | 21 | pdf("SeqWell_WT_Mutant_Tumors_QC_AF.pdf", height = 6, width = 20) 22 | VlnPlot(Tumors.combined, features = c("nFeature_RNA", "nCount_RNA", "percent.mt"), ncol = 3) 23 | dev.off() 24 | 25 | ############### Normalization ################## 26 | 27 | Tumors.combined <- NormalizeData(Tumors.combined) 28 | all.genes <- rownames(Tumors.combined) 29 | Tumors.combined <- ScaleData(Tumors.combined, features = all.genes) 30 | 31 | 32 | ######## Calculating Variable Scores for each gene in the matrix and outputting the results ############## 33 | 34 | Tumors.combined <- FindVariableFeatures(Tumors.combined, selection.method="vst", nfeatures = 2000) 35 | 36 | Var <- HVFInfo(object = Tumors.combined, selection.method="vst", assay = "RNA") 37 | 38 | write.table(Var, file="SeqWell_Full_Gene_List_Variable_Score.txt", sep="\t", quote=FALSE, col.names=NA) 39 | 40 | ########## Identified top 3000 variable genes ################# 41 | 42 | Var <- HVFInfo(object = Myeloid, selection.method="vst", assay = "RNA") 43 | 44 | Var3 <- Var[order(Var$variance.standardized, decreasing = TRUE),] 45 | 46 | Var4 <- Var3[c(1:3000),] 47 | 48 | 49 | Matrix <- as.matrix(GetAssayData(Tumors.combined, slot = "counts")) 50 | 51 | 52 | SeqWell3 <- Matrix[rownames(Matrix) %in% rownames(Var4),] 53 | 54 | write.table(t(SeqWell3), file="SeqWell_Matrix_Filtered_For_NMF.txt", sep="\t", quote=FALSE, col.names=NA) 55 | 56 | -------------------------------------------------------------------------------- /Processing of scRNA-Seq Files (Related to Figure 1)/06- cNMF for annotating cells in MGB cohort: -------------------------------------------------------------------------------- 1 | cnmf prepare --output-dir ./All/ --name All -c SeqWell_Matrix_Filtered_For_NMF.txt -k 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 --n-iter 500 --total-workers 1 --seed 14 --numgenes 3000 2 | 3 | cnmf factorize --output-dir ./All/ --name All --worker-index 0; 4 | 5 | cnmf combine --output-dir ./All/ --name All; 6 | 7 | rm ./All/All/cnmf_tmp/All.spectra.k_*.iter_*.df.npz; 8 | 9 | cnmf k_selection_plot --output-dir All --name All; 10 | 11 | 12 | ##### Based on the K plot, we select K=18 ######## 13 | 14 | cnmf consensus --output-dir All --name All --components 18 --local-density-threshold 0.015 --show-clustering 15 | 16 | 17 | ########## The cnmf consensus script outputs usage matrix which is normalized per row to percentages (in each cell, the usages of the programs sums up to 100) ################ 18 | ########## The gene spectra score matrix is used for annotation of the programs ############# 19 | -------------------------------------------------------------------------------- /Processing of scRNA-Seq Files (Related to Figure 1)/07- Copy number Variation Analysis: -------------------------------------------------------------------------------- 1 | inferCNV.R \ 2 | --raw_counts_matrix Input_Matrix.txt \ 3 | --annotations_file Annotation_File.txt \ 4 | --gene_order_file GRCh38-2020-A_gen_pos.txt \ 5 | --num_threads 16 \ 6 | --out_dir infercnv_Output \ 7 | --denoise --HMM --cluster_by_groups --cutoff 0.1 \ 8 | 9 | 10 | ######## We selected a group of reference cells which are not annotated as any of the malignant programs from various tumors (i.e. a mix of Myeloid, Tcells, Oligos and Vasculature Cells) 11 | ######## We extracted and merged the raw counts of these reference cells into a single matrix. 12 | ######## In the annotation file, we included the reference cells and annotated the cells of each tumor 13 | ######## We merged the raw matrix of each tumor with the raw matrix of the reference cells 14 | ######## Gene order file was constructed using "gtf_to_position_file.py" script provided by infercnv package 15 | 16 | ######## gtf_to_position_file.py genes.gtf GRCh38-2020-A_gen_pos.txt 17 | 18 | ######## To obtain CNV values for each cell, use the "add_to_seurat("./infercnv_Output/", seurat_obj = NULL)" ###### 19 | -------------------------------------------------------------------------------- /Processing of scRNA-Seq Files (Related to Figure 1)/09- Seurat for Processing Houston Cohort.R: -------------------------------------------------------------------------------- 1 | library(dplyr) 2 | library(Seurat) 3 | options(bitmapType='cairo') 4 | options(future.globals.maxSize = 8000 * 1024^2) 5 | 6 | ################### We converted all the raw STARsolo outputs into a tab-delimited text matrix (genes in rows, cells in columns) and merged all these matrices to form a single matrix. The barcodes for each tumor were prefixed with "TumorID_" ######### 7 | 8 | data <- read.table("All_Houston_220826_Raw_Expression.txt"", sep="\t", head=TRUE, row.names=1) 9 | 10 | Tumors.combined <- CreateSeuratObject(counts = data, project = "Houston", min.cells = 3, min.features = 200) 11 | 12 | Tumors.combined[["percent.mt"]] <- PercentageFeatureSet(Tumors.combined, pattern = "^MT.") 13 | 14 | all.genes <- rownames(Tumors.combined) 15 | 16 | 17 | ######## Filtering Low Quality Cells ############## 18 | 19 | Tumors.combined <- subset(Tumors.combined, subset = nFeature_RNA > 500 & nFeature_RNA < 6000 & nCount_RNA > 1000 & percent.mt < 25) 20 | 21 | pdf("Houston_WT_Mutant_Tumors_QC_AF.pdf", height = 6, width = 20) 22 | VlnPlot(Tumors.combined, features = c("nFeature_RNA", "nCount_RNA", "percent.mt"), ncol = 3) 23 | dev.off() 24 | 25 | ######### Normalization, Scaling, Identification of Variable Genes and Regression of % of mito genes, PCA ######## 26 | 27 | library(sctransform) 28 | Tumors.combined <- SCTransform(Tumors.combined, vars.to.regress = "percent.mt", verbose = TRUE) 29 | Tumors.combined <- RunPCA(Tumors.combined) 30 | pdf("Houston_WT_Mutant_Tumors_ElbowPlot.pdf", height = 6, width = 6) 31 | ElbowPlot(Tumors.combined, ndims=50) 32 | dev.off() 33 | 34 | write.table(Tumors.combined@meta.data, file="Houston_WT_Mutant_Tumors_Integrated_MetaData.txt", sep="\t", col.names=NA, quote=FALSE) 35 | 36 | 37 | ################# Louvain Clustering and UMAP Generation ################### 38 | 39 | 40 | Tumors.combined <- RunUMAP(Tumors.combined, reduction = "pca", dims = 1:19) 41 | Tumors.combined <- FindNeighbors(Tumors.combined, dims = 1:19) 42 | Tumors.combined <- FindClusters(Tumors.combined, resolution = 0.3) 43 | 44 | 45 | pdf("Houston_WT_Mutant_Tumors_Clusters_With_Labels.pdf", height= 6, width = 7) 46 | DimPlot(Tumors.combined, reduction = "umap", label=TRUE) 47 | dev.off() 48 | 49 | pdf("Houston_WT_Mutant_Tumors_UMAP_Patient_ID.pdf", height= 6, width = 9) 50 | DimPlot(Tumors.combined, reduction = "umap", group.by="orig.ident") 51 | dev.off() 52 | 53 | 54 | write.table(Tumors.combined@meta.data, file="Houston_WT_Mutant_Tumors_Integrated_MetaData.txt", sep="\t", col.names=NA, quote=FALSE) 55 | 56 | 57 | saveRDS(Tumors.combined, file="Houston_Brain_Tumors.rds") 58 | -------------------------------------------------------------------------------- /Processing of scRNA-Seq Files (Related to Figure 1)/10- Calculate usage matrix in Houston Cohort for cNMF cell annotation programs identified in MGB cohort: -------------------------------------------------------------------------------- 1 | ##################### Inside R ################################ 2 | library(dplyr) 3 | library(Seurat) 4 | options(bitmapType='cairo') 5 | options(future.globals.maxSize = 8000 * 1024^2) 6 | 7 | Matrix <- GetAssayData(object = Tumors.combined, slot = "counts") 8 | 9 | 10 | ########## This loads the genes that were used in the overall MGB cNMF (Top 3000 variable genes) ########### 11 | Genes <- scan("./cnmf_run.overdispersed_genes.txt", what="") 12 | 13 | Matrix2 <- Matrix[rownames(Matrix) %in% Genes,] 14 | 15 | write.table(t(as.matrix(Matrix2)), file="./Houston_Raw_Counts_Variable_for_cNMF.txt", sep="\t", col.names=NA, quote=FALSE) 16 | 17 | dim(Matrix2) ######### to find out the number for --numgenes below ################# 18 | 19 | q() 20 | 21 | ########################### Exit R ####################################### 22 | 23 | ##### We run cNMF prepare script to normalize the raw matrix counts ############### 24 | 25 | cnmf prepare --output-dir ./Calculate_Usage/ --name Calculate_Usage -c Houston_Raw_Counts_Variable_for_cNMF.txt -k 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 --n-iter 500 --total-workers 1 --seed 14 --numgenes 2990; 26 | 27 | 28 | python; 29 | 30 | ################################## Inside Python ################################# 31 | 32 | import sklearn 33 | import sklearn.decomposition 34 | from sklearn.decomposition import non_negative_factorization 35 | import numpy as np 36 | import scanpy as sc 37 | import csv 38 | import scipy 39 | import pandas as pd 40 | 41 | ########### Load the spectra consensus file from the MGB All cells cNMF run ########### 42 | H = np.load("cnmf_run.spectra.k_18.dt_0_015.consensus.df.npz", allow_pickle=True) 43 | 44 | H2 = pd.DataFrame(H['data'], columns = H['columns'], index = H['index']) 45 | 46 | ########## Load the normalized raw counts matrix of the Houston cohort (Normalized by "prepare" script ###### 47 | X = sc.read_h5ad('Calculate_Usage.norm_counts.h5ad') 48 | 49 | X2 = X.X.toarray() 50 | 51 | X3 = pd.DataFrame(data=X2, columns = X.var_names , index = X.obs.index) 52 | 53 | H3 = H2.filter(items = X3.columns) 54 | 55 | H4 = H3.to_numpy() 56 | 57 | X5 = X2.astype(np.float64) 58 | 59 | ########## Perform the calculation ########### 60 | test = sklearn.decomposition.non_negative_factorization(X5, W=None, H=H4, n_components= 18, init='random', update_H=False, solver='cd', beta_loss='frobenius', tol=0.0001, max_iter=1000, alpha=0.0, alpha_W=0.0, alpha_H='same', l1_ratio=0.0, regularization=None, random_state=None, verbose=0, shuffle=False) 61 | 62 | test2 = list(test) 63 | 64 | pd.DataFrame(test2[0], columns=H['index'], index=X.obs.index).to_csv(path_or_buf="./Houston_Glioma_All_cells_Usage.txt", sep="\t", quoting=csv.QUOTE_NONE) 65 | 66 | 67 | ############ This script outputs a usage matrix which is then normalized per row to percentages (in each cell, the usages of the programs sums up to 100). This matrix is used for annotating cells ################ 68 | -------------------------------------------------------------------------------- /Processing of scRNA-Seq Files (Related to Figure 1)/11- Seurat for Processing Jackson's Cohort.R: -------------------------------------------------------------------------------- 1 | library(dplyr) 2 | library(Seurat) 3 | options(bitmapType='cairo') 4 | options(future.globals.maxSize = 8000 * 1024^2) 5 | 6 | ################### We converted all the raw STARsolo outputs into a tab-delimited text matrix (genes in rows, cells in columns) and merged all these matrices to form a single matrix. The barcodes for each tumor were prefixed with "TumorID_" ######### 7 | 8 | data <- read.table("All_JAX_220826_Raw_Expression.txt", sep="\t", head=TRUE, row.names=1) 9 | 10 | Tumors.combined <- CreateSeuratObject(counts = data, project = "JAX", min.cells = 3, min.features = 200) 11 | 12 | Tumors.combined[["percent.mt"]] <- PercentageFeatureSet(Tumors.combined, pattern = "^MT.") 13 | 14 | all.genes <- rownames(Tumors.combined) 15 | 16 | 17 | ######## Filtering Low Quality Cells ############## 18 | 19 | Tumors.combined <- subset(Tumors.combined, subset = nFeature_RNA > 500 & nFeature_RNA < 6000 & nCount_RNA > 1000 & percent.mt < 25) 20 | 21 | pdf("JAX_WT_Mutant_Tumors_QC_AF.pdf", height = 6, width = 20) 22 | VlnPlot(Tumors.combined, features = c("nFeature_RNA", "nCount_RNA", "percent.mt"), ncol = 3) 23 | dev.off() 24 | 25 | ######### Normalization, Scaling, Identification of Variable Genes and Regression of % of mito genes, PCA ######## 26 | 27 | library(sctransform) 28 | Tumors.combined <- SCTransform(Tumors.combined, vars.to.regress = "percent.mt", verbose = TRUE) 29 | Tumors.combined <- RunPCA(Tumors.combined) 30 | pdf("JAX_WT_Mutant_Tumors_ElbowPlot.pdf", height = 6, width = 6) 31 | ElbowPlot(Tumors.combined, ndims=50) 32 | dev.off() 33 | 34 | write.table(Tumors.combined@meta.data, file="JAX_WT_Mutant_Tumors_Integrated_MetaData.txt", sep="\t", col.names=NA, quote=FALSE) 35 | 36 | 37 | ################# Louvain Clustering and UMAP Generation ################### 38 | 39 | 40 | Tumors.combined <- RunUMAP(Tumors.combined, reduction = "pca", dims = 1:16) 41 | Tumors.combined <- FindNeighbors(Tumors.combined, dims = 1:16) 42 | Tumors.combined <- FindClusters(Tumors.combined, resolution = 0.3) 43 | 44 | 45 | pdf("JAX_WT_Mutant_Tumors_Clusters_With_Labels.pdf", height= 6, width = 7) 46 | DimPlot(Tumors.combined, reduction = "umap", label=TRUE) 47 | dev.off() 48 | 49 | pdf("JAX_WT_Mutant_Tumors_UMAP_Patient_ID.pdf", height= 6, width = 9) 50 | DimPlot(Tumors.combined, reduction = "umap", group.by="orig.ident") 51 | dev.off() 52 | 53 | 54 | write.table(Tumors.combined@meta.data, file="JAX_WT_Mutant_Tumors_Integrated_MetaData.txt", sep="\t", col.names=NA, quote=FALSE) 55 | 56 | 57 | saveRDS(Tumors.combined, file="JAX_Brain_Tumors.rds") 58 | -------------------------------------------------------------------------------- /Processing of scRNA-Seq Files (Related to Figure 1)/12- Calculate usage matrix in Jackson's Cohort for cNMF cell annotation programs identified in MGB cohort: -------------------------------------------------------------------------------- 1 | ##################### Inside R ################################ 2 | library(dplyr) 3 | library(Seurat) 4 | options(bitmapType='cairo') 5 | options(future.globals.maxSize = 8000 * 1024^2) 6 | 7 | Matrix <- GetAssayData(object = Tumors.combined, slot = "counts") 8 | 9 | 10 | ########## This loads the genes that were used in the overall MGB cNMF (Top 4000 variable genes) ########### 11 | Genes <- scan("./cnmf_run.overdispersed_genes.txt", what="") 12 | 13 | Matrix2 <- Matrix[rownames(Matrix) %in% Genes,] 14 | 15 | write.table(t(as.matrix(Matrix2)), file="./JAX_Raw_Counts_Variable_for_cNMF.txt", sep="\t", col.names=NA, quote=FALSE) 16 | 17 | dim(Matrix2) ######### to find out the number for --numgenes below ################# 18 | 19 | q() 20 | 21 | ########################### Exit R ####################################### 22 | 23 | ##### We run cNMF prepare script to normalize the raw matrix counts ############### 24 | 25 | cnmf prepare --output-dir ./Calculate_Usage/ --name Calculate_Usage -c JAX_Raw_Counts_Variable_for_cNMF.txt -k 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 --n-iter 500 --total-workers 1 --seed 14 --numgenes 2896; 26 | 27 | 28 | python; 29 | 30 | ################################## Inside Python ################################# 31 | 32 | import sklearn 33 | import sklearn.decomposition 34 | from sklearn.decomposition import non_negative_factorization 35 | import numpy as np 36 | import scanpy as sc 37 | import csv 38 | import scipy 39 | import pandas as pd 40 | 41 | ########### Load the spectra consensus file from the MGB All cells cNMF run ########### 42 | H = np.load("cnmf_run.spectra.k_18.dt_0_015.consensus.df.npz", allow_pickle=True) 43 | 44 | H2 = pd.DataFrame(H['data'], columns = H['columns'], index = H['index']) 45 | 46 | ########## Load the normalized raw counts matrix of the JAX cohort (Normalized by "prepare" script) ###### 47 | X = sc.read_h5ad('Calculate_Usage.norm_counts.h5ad') 48 | 49 | X2 = X.X.toarray() 50 | 51 | X3 = pd.DataFrame(data=X2, columns = X.var_names , index = X.obs.index) 52 | 53 | H3 = H2.filter(items = X3.columns) 54 | 55 | H4 = H3.to_numpy() 56 | 57 | X5 = X2.astype(np.float64) 58 | 59 | ########## Perform the calculation ########### 60 | test = sklearn.decomposition.non_negative_factorization(X5, W=None, H=H4, n_components= 18, init='random', update_H=False, solver='cd', beta_loss='frobenius', tol=0.0001, max_iter=1000, alpha=0.0, alpha_W=0.0, alpha_H='same', l1_ratio=0.0, regularization=None, random_state=None, verbose=0, shuffle=False) 61 | 62 | test2 = list(test) 63 | 64 | pd.DataFrame(test2[0], columns=H['index'], index=X.obs.index).to_csv(path_or_buf="./JAX_Glioma_All_cells_Usage.txt", sep="\t", quoting=csv.QUOTE_NONE) 65 | 66 | 67 | ############ This script outputs a usage matrix which is then normalized per row to percentages (in each cell, the usages of the programs sums up to 100). This matrix is used for annotating cells ################ 68 | -------------------------------------------------------------------------------- /Processing of scRNA-Seq Files (Related to Figure 1)/13- Seurat for Processing Mcgill Cohort.R: -------------------------------------------------------------------------------- 1 | library(dplyr) 2 | library(Seurat) 3 | options(bitmapType='cairo') 4 | options(future.globals.maxSize = 8000 * 1024^2) 5 | 6 | ################### We converted all the raw STARsolo outputs into a tab-delimited text matrix (genes in rows, cells in columns) and merged all these matrices to form a single matrix. The barcodes for each tumor were prefixed with "TumorID_" ######### 7 | 8 | data <- read.table("All_Mcgill_220826_Raw_Expression.txt", sep="\t", head=TRUE, row.names=1) 9 | 10 | Tumors.combined <- CreateSeuratObject(counts = data, project = "Mcgill", min.cells = 3, min.features = 200) 11 | 12 | Tumors.combined[["percent.mt"]] <- PercentageFeatureSet(Tumors.combined, pattern = "^MT.") 13 | 14 | all.genes <- rownames(Tumors.combined) 15 | 16 | 17 | ######## Filtering Low Quality Cells ############## 18 | 19 | Tumors.combined <- subset(Tumors.combined, subset = nFeature_RNA > 500 & nFeature_RNA < 6000 & nCount_RNA > 1000 & percent.mt < 25) 20 | 21 | pdf("Mcgill_WT_Mutant_Tumors_QC_AF.pdf", height = 6, width = 20) 22 | VlnPlot(Tumors.combined, features = c("nFeature_RNA", "nCount_RNA", "percent.mt"), ncol = 3) 23 | dev.off() 24 | 25 | ######### Normalization, Scaling, Identification of Variable Genes and Regression of % of mito genes, PCA ######## 26 | 27 | library(sctransform) 28 | Tumors.combined <- SCTransform(Tumors.combined, vars.to.regress = "percent.mt", verbose = TRUE) 29 | Tumors.combined <- RunPCA(Tumors.combined) 30 | pdf("Mcgill_WT_Mutant_Tumors_ElbowPlot.pdf", height = 6, width = 6) 31 | ElbowPlot(Tumors.combined, ndims=50) 32 | dev.off() 33 | 34 | write.table(Tumors.combined@meta.data, file="Mcgill_WT_Mutant_Tumors_Integrated_MetaData.txt", sep="\t", col.names=NA, quote=FALSE) 35 | 36 | 37 | ################# Louvain Clustering and UMAP Generation ################### 38 | 39 | 40 | Tumors.combined <- RunUMAP(Tumors.combined, reduction = "pca", dims = 1:28) 41 | Tumors.combined <- FindNeighbors(Tumors.combined, dims = 1:28) 42 | Tumors.combined <- FindClusters(Tumors.combined, resolution = 0.3) 43 | 44 | 45 | pdf("Mcgill_WT_Mutant_Tumors_Clusters_With_Labels.pdf", height= 6, width = 7) 46 | DimPlot(Tumors.combined, reduction = "umap", label=TRUE) 47 | dev.off() 48 | 49 | pdf("Mcgill_WT_Mutant_Tumors_UMAP_Patient_ID.pdf", height= 6, width = 9) 50 | DimPlot(Tumors.combined, reduction = "umap", group.by="orig.ident") 51 | dev.off() 52 | 53 | 54 | write.table(Tumors.combined@meta.data, file="Mcgill_WT_Mutant_Tumors_Integrated_MetaData.txt", sep="\t", col.names=NA, quote=FALSE) 55 | 56 | 57 | saveRDS(Tumors.combined, file="Mcgill_Brain_Tumors.rds") 58 | -------------------------------------------------------------------------------- /Processing of scRNA-Seq Files (Related to Figure 1)/14- Calculate usage matrix in Mcgill Cohort for cNMF cell annotation programs identified in MGB cohort: -------------------------------------------------------------------------------- 1 | ##################### Inside R ################################ 2 | library(dplyr) 3 | library(Seurat) 4 | options(bitmapType='cairo') 5 | options(future.globals.maxSize = 8000 * 1024^2) 6 | 7 | Matrix <- GetAssayData(object = Tumors.combined, slot = "counts") 8 | 9 | 10 | ########## This loads the genes that were used in the overall MGB cNMF (Top 4000 variable genes) ########### 11 | Genes <- scan("./cnmf_run.overdispersed_genes.txt", what="") 12 | 13 | Matrix2 <- Matrix[rownames(Matrix) %in% Genes,] 14 | 15 | write.table(t(as.matrix(Matrix2)), file="./Mcgill_Raw_Counts_Variable_for_cNMF.txt", sep="\t", col.names=NA, quote=FALSE) 16 | 17 | dim(Matrix2) ######### to find out the number for --numgenes below ################# 18 | 19 | q() 20 | 21 | ########################### Exit R ####################################### 22 | 23 | ##### We run cNMF prepare script to normalize the raw matrix counts ############### 24 | 25 | cnmf prepare --output-dir ./Calculate_Usage/ --name Calculate_Usage -c Mcgill_Raw_Counts_Variable_for_cNMF.txt -k 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 --n-iter 500 --total-workers 1 --seed 14 --numgenes 2992; 26 | 27 | 28 | python; 29 | 30 | ################################## Inside Python ################################# 31 | 32 | import sklearn 33 | import sklearn.decomposition 34 | from sklearn.decomposition import non_negative_factorization 35 | import numpy as np 36 | import scanpy as sc 37 | import csv 38 | import scipy 39 | import pandas as pd 40 | 41 | ########### Load the spectra consensus file from the MGB All cells cNMF run ########### 42 | H = np.load("cnmf_run.spectra.k_18.dt_0_015.consensus.df.npz", allow_pickle=True) 43 | 44 | H2 = pd.DataFrame(H['data'], columns = H['columns'], index = H['index']) 45 | 46 | ########## Load the normalized raw counts matrix of the Mcgill cohort (Normalized by "prepare" script) ###### 47 | X = sc.read_h5ad('Calculate_Usage.norm_counts.h5ad') 48 | 49 | X2 = X.X.toarray() 50 | 51 | X3 = pd.DataFrame(data=X2, columns = X.var_names , index = X.obs.index) 52 | 53 | H3 = H2.filter(items = X3.columns) 54 | 55 | H4 = H3.to_numpy() 56 | 57 | X5 = X2.astype(np.float64) 58 | 59 | ########## Perform the calculation ########### 60 | test = sklearn.decomposition.non_negative_factorization(X5, W=None, H=H4, n_components= 18, init='random', update_H=False, solver='cd', beta_loss='frobenius', tol=0.0001, max_iter=1000, alpha=0.0, alpha_W=0.0, alpha_H='same', l1_ratio=0.0, regularization=None, random_state=None, verbose=0, shuffle=False) 61 | 62 | test2 = list(test) 63 | 64 | pd.DataFrame(test2[0], columns=H['index'], index=X.obs.index).to_csv(path_or_buf="./Mcgill_Glioma_All_cells_Usage.txt", sep="\t", quoting=csv.QUOTE_NONE) 65 | 66 | 67 | ############ This script outputs a usage matrix which is then normalized per row to percentages (in each cell, the usages of the programs sums up to 100). This matrix is used for annotating cells ################ 68 | -------------------------------------------------------------------------------- /Processing of scRNA-Seq Files (Related to Figure 1)/15- Annotation and Doublet Detection.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/BernsteinLab/Myeloid-Glioma/118e4ce8c41c6a1e121ea1f86f8445da8b74b719/Processing of scRNA-Seq Files (Related to Figure 1)/15- Annotation and Doublet Detection.pdf -------------------------------------------------------------------------------- /Spatial_transcriptomics/04-cnmf: -------------------------------------------------------------------------------- 1 | cnmf prepare --output-dir ./results --name spatial -c ./adata.h5ad -k 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 --n-iter 500 --total-workers 1 --seed 14 --numgenes 1500 2 | 3 | cnmf factorize --output-dir ./results --name spatial --worker-index 0; 4 | 5 | cnmf combine --output-dir ./results/ --name spatial; 6 | 7 | cnmf k_selection_plot --output-dir ./results --name spatial; 8 | 9 | 10 | ##### Based on the K plot, we select K=7 ######## 11 | 12 | cnmf consensus --output-dir ./results --name spatial --components 7 --local-density-threshold 0.1 --show-clustering 13 | 14 | 15 | ########## The cnmf consensus script outputs usage matrix which is normalized per row to percentages (in each cell, the usages of the programs sums up to 100) ################ 16 | ########## The gene spectra score matrix is used for the annotation of the programs ############# 17 | -------------------------------------------------------------------------------- /Spatial_transcriptomics/05-meta_programs_usage.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "code", 5 | "execution_count": 1, 6 | "id": "766baa2e", 7 | "metadata": {}, 8 | "outputs": [], 9 | "source": [ 10 | "import os\n", 11 | "import yaml\n", 12 | "import scanpy as sc\n", 13 | "import pandas as pd\n", 14 | "import numpy as np\n", 15 | "from scipy import spatial\n", 16 | "import squidpy as sq\n", 17 | "from sklearn.decomposition import non_negative_factorization" 18 | ] 19 | }, 20 | { 21 | "cell_type": "code", 22 | "execution_count": 2, 23 | "id": "e130256e", 24 | "metadata": {}, 25 | "outputs": [], 26 | "source": [ 27 | "selected_K = 7\n", 28 | "density_threshold = 0.1\n", 29 | "\n", 30 | "input_directory = 'results'\n", 31 | "adata_dir = 'adata'\n", 32 | "adata_file = 'adata_full.h5ad'\n", 33 | "adata_output_directory = 'adata_env'\n", 34 | "\n", 35 | "program_names = ['env_gray_matter','env_hypoxic','env_white_matter','env_cellular_cancer','env_vasculature',\\\n", 36 | " 'env_astro_inflammatory','env_MT-RPL']" 37 | ] 38 | }, 39 | { 40 | "cell_type": "code", 41 | "execution_count": 3, 42 | "id": "7e3a8780", 43 | "metadata": {}, 44 | "outputs": [], 45 | "source": [ 46 | "# prepare\n", 47 | "patients = [f[5:-5] for f in os.listdir(adata_dir) if '.' != f[0]] \n", 48 | "density_threshold_str = ('%.2f' % density_threshold).replace('.', '_')\n", 49 | "if not os.path.exists(adata_output_directory):\n", 50 | " os.mkdir(adata_output_directory)\n", 51 | " \n", 52 | "k_filename = os.path.join(input_directory,'cnmf_run.k_selection_stats.df.npz')\n", 53 | "with np.load(k_filename, allow_pickle=True) as f:\n", 54 | " k_obj = pd.DataFrame(**f)" 55 | ] 56 | }, 57 | { 58 | "cell_type": "code", 59 | "execution_count": 4, 60 | "id": "e694cbf5", 61 | "metadata": { 62 | "scrolled": false 63 | }, 64 | "outputs": [], 65 | "source": [ 66 | "# get usage scores\n", 67 | "adata = sc.read(adata_file)\n", 68 | " \n", 69 | "rf_usages = pd.read_csv(os.path.join(input_directory, 'cnmf_run.usages.k_%d.dt_%s.consensus.txt'\\\n", 70 | " %(selected_K, density_threshold_str)), sep='\\t', index_col=0)\n", 71 | "rf_usages.columns = program_names\n", 72 | "norm_usages = rf_usages.div(rf_usages.sum(axis=1), axis=0)\n", 73 | " \n", 74 | "for col in norm_usages:\n", 75 | " adata.obs[col] = norm_usages[col]\n", 76 | " \n", 77 | "# save adata for all patients\n", 78 | "saved_adata = os.path.join('adata_env.h5ad')\n", 79 | "adata.write(saved_adata)" 80 | ] 81 | }, 82 | { 83 | "cell_type": "code", 84 | "execution_count": 5, 85 | "id": "bd96bfb6", 86 | "metadata": {}, 87 | "outputs": [], 88 | "source": [ 89 | "# save adata for patients separately\n", 90 | "for patient in patients:\n", 91 | " adata_patient = sc.read(os.path.join(adata_dir,'adata%s.h5ad'%patient))\n", 92 | " \n", 93 | " usages_patient = adata[adata.obs['patient']==patient,:].obs[program_names]\n", 94 | " usages_patient.index = [index.split('-')[0]+'-1' for index in usages_patient.index]\n", 95 | " \n", 96 | " for col in usages_patient:\n", 97 | " adata_patient.obs[col] = usages_patient[col]\n", 98 | " \n", 99 | " # save\n", 100 | " saved_adata = os.path.join(adata_output_directory, 'adata%s.h5ad'%patient)\n", 101 | " adata_patient.write(saved_adata)" 102 | ] 103 | }, 104 | { 105 | "cell_type": "code", 106 | "execution_count": null, 107 | "id": "5cd4c3dd", 108 | "metadata": {}, 109 | "outputs": [], 110 | "source": [] 111 | } 112 | ], 113 | "metadata": { 114 | "kernelspec": { 115 | "display_name": "Python 3 (ipykernel)", 116 | "language": "python", 117 | "name": "python3" 118 | }, 119 | "language_info": { 120 | "codemirror_mode": { 121 | "name": "ipython", 122 | "version": 3 123 | }, 124 | "file_extension": ".py", 125 | "mimetype": "text/x-python", 126 | "name": "python", 127 | "nbconvert_exporter": "python", 128 | "pygments_lexer": "ipython3", 129 | "version": "3.8.16" 130 | } 131 | }, 132 | "nbformat": 4, 133 | "nbformat_minor": 5 134 | } 135 | -------------------------------------------------------------------------------- /Spatial_transcriptomics/06-RCTD_sc_reference_v2.R: -------------------------------------------------------------------------------- 1 | library(spacexr) 2 | library(Matrix) 3 | library(data.table) 4 | 5 | # single-cell reference 6 | 7 | workdir <- '/Users/cpc45/Data/GBM/Henrik_spatial' 8 | datadir <- '/Users/cpc45/Data/GBM/Bernstein_SeqWell/Discrete_March2023' 9 | metadata <- read.table(file.path(datadir, "Discrete4_Final_MetaData.txt"), 10 | sep='\t', header = TRUE, row.names = 1) # load in annotation matrix 11 | counts <- fread(file.path(datadir, "discrete4_mgb_raw_counts.txt"), 12 | sep='\t') # load in counts matrix 13 | 14 | genes <- as.matrix(counts[,1]); counts[,1]<-NULL 15 | barcodes<-colnames(counts) 16 | cell_types<-metadata[barcodes,]$Annotation 17 | cell_types <- as.factor(cell_types) # convert to factor data type 18 | 19 | names(cell_types)<- barcodes 20 | 21 | counts <- apply(counts, 2, function(x) as.numeric(as.character(x))) 22 | counts<-as.data.frame(counts); rownames(counts)<-t(genes); colnames(counts)<-barcodes 23 | nUMI <- colSums(counts) 24 | 25 | ### Create the Reference object 26 | reference <- Reference(counts, cell_types, nUMI) 27 | 28 | ## Examine reference object (optional) 29 | print(dim(reference@counts)) #observe Digital Gene Expression matrix 30 | 31 | table(reference@cell_types) #number of occurences for each cell type 32 | 33 | ## Save RDS object (optional) 34 | saveRDS(reference, file.path(workdir,'SCRef.rds')) 35 | -------------------------------------------------------------------------------- /Spatial_transcriptomics/07-RCTD_make_pucks_all.R: -------------------------------------------------------------------------------- 1 | library(spacexr) 2 | library(Matrix) 3 | library(SPATA2) 4 | library(anndata) 5 | 6 | workdir<- '/Users/cpc45/Data/GBM/Henrik_spatial/' 7 | adatadir<- file.path(workdir,'CancerCell','adata') 8 | puckdir <- file.path(workdir,'pucks') 9 | dir.create(puckdir) 10 | 11 | for (f in list.files(adatadir) ) { 12 | adata <- read_h5ad(file.path(adatadir,f)) 13 | file_split <- strsplit(f,"[.]")[[1]] 14 | sample <- substr(file_split[1], 6, nchar(file_split[1])) 15 | 16 | # extract counts matrix 17 | counts <- t(as.matrix(adata$X)) 18 | colnames(counts) <- row.names(adata$obs) 19 | row.names(counts) <- row.names(adata$var) 20 | 21 | # extract coord 22 | coords <- as.data.frame(adata$obsm$spatial) 23 | rownames(coords) <- row.names(adata$obs) 24 | nUMI <- colSums(counts) # In this case, total counts per pixel is nUMI 25 | 26 | ### Create SpatialRNA object 27 | puck <- SpatialRNA(coords, counts, nUMI) 28 | 29 | print(head(puck@coords)) # start of coordinate data.frame 30 | 31 | saveRDS(puck, file.path(puckdir,sprintf('puck_%s.rds',sample))) 32 | } 33 | -------------------------------------------------------------------------------- /Spatial_transcriptomics/08-run_RCTD_all.R: -------------------------------------------------------------------------------- 1 | library(spacexr) 2 | library(Matrix) 3 | 4 | workdir<- '/Users/cpc45/Data/GBM/Henrik_spatial' 5 | puckdir <- file.path(workdir,'pucks') 6 | RCTD_dir <- file.path(workdir,'RCTD') 7 | dir.create(RCTD_dir, showWarnings = FALSE) 8 | 9 | reference<- readRDS(file.path(workdir,'SCRef.rds')) 10 | 11 | for (f in list.files(puckdir) ) { 12 | patient_sample <- substr(f, 6, nchar(f)-4) # get sample name from file name 13 | 14 | puck<-readRDS(file.path(workdir,'pucks',sprintf('puck_%s.rds',patient_sample))) 15 | 16 | myRCTD <- create.RCTD(puck, reference, max_cores = 4) 17 | myRCTD <- run.RCTD(myRCTD, doublet_mode = 'full') 18 | saveRDS(myRCTD, file.path(RCTD_dir,sprintf('RCTD_%s.rds',patient_sample))) 19 | } 20 | -------------------------------------------------------------------------------- /Spatial_transcriptomics/09-RCTD_tocsv_all_patients.R: -------------------------------------------------------------------------------- 1 | library(spacexr) 2 | library(Matrix) 3 | 4 | workdir<- '/Users/cpc45/Data/GBM/Henrik_spatial' 5 | RCTDir<-file.path(workdir,'RCTD') 6 | write_dir<-file.path(workdir,'RCTD_csv') 7 | dir.create(write_dir) 8 | 9 | for (f in list.files(RCTDir) ) { 10 | sample_id <- substr(f, 6, nchar(f)-4) # get sample name from file name 11 | myRCTD<- readRDS(file.path(RCTDir,sprintf('RCTD_%s.rds',sample_id))) 12 | results <- myRCTD@results 13 | 14 | # normalize the cell type proportions to sum to 1. 15 | norm_weights = normalize_weights(results$weights) 16 | norm_weights = as.matrix(norm_weights) 17 | 18 | write.csv(norm_weights, file.path(write_dir,sprintf('RCTD_%s.csv',sample_id))) 19 | } 20 | -------------------------------------------------------------------------------- /Spatial_transcriptomics/14- ScatterPie Visualization of the niches: -------------------------------------------------------------------------------- 1 | ######## Inside R ######### 2 | 3 | library(ggplot2) 4 | library(scatterpie) 5 | 6 | # example for 1 sample 7 | 8 | sample = 'UKF265_C' 9 | csvdir = file.path(envdir, sprintf('env%s.csv',sample)) 10 | data_ <- read.csv(csvdir) 11 | 12 | pdf(file.path(plotdir, sprintf('%s.sp.pdf',sample) )) 13 | ggplot() + geom_scatterpie(aes(x=x, y=y, r=50), 14 | data=data_, 15 | cols=colnames(data_)[c(1:6)], 16 | color=NA) + 17 | coord_equal() + 18 | scale_fill_manual(values = c('#b3b3b3','#050505','#f5f3ed','#0098d5','#e62c54','#ffe700')) + 19 | theme(panel.grid.major = element_blank(), 20 | panel.grid.minor = element_blank(), 21 | panel.background = element_blank(), 22 | axis.text.x=element_blank(), 23 | axis.ticks.x=element_blank(), 24 | axis.text.y=element_blank(), 25 | axis.ticks.y=element_blank() 26 | ) + 27 | labs(x="", y="") 28 | dev.off() 29 | -------------------------------------------------------------------------------- /scATAC-Seq Analyses/02- Processing GBM C3L_03405 snATAC-Seq library: -------------------------------------------------------------------------------- 1 | library(Signac) 2 | library(Seurat) 3 | library(EnsDb.Hsapiens.v86) 4 | library(BSgenome.Hsapiens.UCSC.hg38) 5 | 6 | options(future.globals.maxSize = 8000 * 1024^2) 7 | 8 | C3L_03405.data=CreateFragmentObject("./C3L-03405_CPT0224600013_snATAC_GBM/outs/fragments.tsv.gz") 9 | 10 | 11 | features <- CallPeaks(C3L_03405.data, format='BED', outdir ='./C3L-03405_CPT0224600013_snATAC_GBM/outs/',name='C3L_03405', cleanup=FALSE) 12 | 13 | C3L_03405_Peaks <- keepStandardChromosomes(features, pruning.mode = "coarse") 14 | C3L_03405_Peaks2 <- subsetByOverlaps(x = C3L_03405_Peaks, ranges = blacklist_hg38, invert = TRUE) 15 | 16 | saveRDS(C3L_03405_Peaks2, file="C3L_03405_Peaks2.rds") 17 | 18 | C3L_03405.counts <- FeatureMatrix( 19 | fragments = C3L_03405.data, 20 | features = C3L_03405_Peaks 21 | ) 22 | 23 | 24 | C3L_03405.chrom_assay <- CreateChromatinAssay( 25 | counts = C3L_03405.counts, 26 | sep = c("-", "-"), 27 | fragments = './C3L-03405_CPT0224600013_snATAC_GBM/outs/fragments.tsv.gz', 28 | min.cells = 10, 29 | min.features = 200 30 | ) 31 | 32 | C3L_03405 <- CreateSeuratObject(counts = C3L_03405.chrom_assay, assay = "peaks", project = "C3L_03405") 33 | 34 | annotations.C3L_03405 <- GetGRangesFromEnsDb(ensdb = EnsDb.Hsapiens.v86) 35 | seqlevels(annotations.C3L_03405) <- paste0('chr', seqlevels(annotations.C3L_03405)) 36 | genome(annotations.C3L_03405) <- "hg38" 37 | 38 | 39 | Annotation(C3L_03405) <- annotations.C3L_03405 40 | 41 | C3L_03405 <- NucleosomeSignal(object = C3L_03405) 42 | C3L_03405 <- TSSEnrichment(object = C3L_03405, fast = FALSE, assay='peaks') 43 | 44 | C3L_03405$blacklist_fraction <- FractionCountsInRegion( 45 | object = C3L_03405, 46 | assay = 'peaks', 47 | regions = blacklist_hg38 48 | ) 49 | 50 | 51 | total_fragments <- CountFragments("./C3L-03405_CPT0224600013_snATAC_GBM/outs/fragments.tsv.gz") 52 | rownames(total_fragments) <- total_fragments$CB 53 | C3L_03405 $fragments <- total_fragments[colnames(C3L_03405), "frequency_count"] 54 | 55 | C3L_03405 <- FRiP( 56 | object = C3L_03405, 57 | assay = 'peaks', 58 | total.fragments = 'fragments' 59 | ) 60 | 61 | 62 | pdf("C3L_03405_WT_ATAC_DensityScatter.pdf", height=5, width=9) 63 | DensityScatter(C3L_03405, x = 'nCount_peaks', y = 'TSS.enrichment', log_x = TRUE, quantiles = TRUE) 64 | dev.off() 65 | 66 | 67 | pdf("C3L_03405_WT_ATAC_QC_BF.pdf", height=5, width=12) 68 | VlnPlot( 69 | object = C3L_03405, 70 | features = c('nCount_peaks', 'TSS.enrichment', 'blacklist_fraction', 'nucleosome_signal', 'FRiP'), 71 | pt.size = 0.1, 72 | ncol = 5 73 | ) 74 | dev.off() 75 | 76 | 77 | pdf("C3L_03405_WT_ATAC_TSS_Enrichment.pdf", height=5, width=6) 78 | C3L_03405$high.tss <- ifelse(C3L_03405$TSS.enrichment > 1.5, 'High', 'Low') 79 | TSSPlot(C3L_03405, group.by = 'high.tss') + NoLegend() 80 | dev.off() 81 | 82 | pdf("C3L_03405_WT_ATAC_Nucleosome_signal.pdf", height=5, width=6) 83 | C3L_03405$nucleosome_group <- ifelse(C3L_03405$nucleosome_signal > 2.5, 'NS > 2.5', 'NS < 2.5') 84 | FragmentHistogram(object = C3L_03405, group.by = 'nucleosome_group') 85 | dev.off() 86 | 87 | 88 | C3L_03405 <- subset( 89 | x = C3L_03405, 90 | subset = nCount_peaks > 350 & 91 | nCount_peaks < 20000 & 92 | FRiP > 0.15 & 93 | blacklist_fraction < 0.05 & 94 | nucleosome_signal < 2.5 & 95 | TSS.enrichment > 1.5 96 | ) 97 | 98 | 99 | pdf("C3L_03405_WT_ATAC_QC_AF.pdf", height=5, width=12) 100 | VlnPlot( 101 | object = C3L_03405, 102 | features = c('nCount_peaks', 'TSS.enrichment', 'blacklist_fraction', 'nucleosome_signal', 'FRiP'), 103 | pt.size = 0.1, 104 | ncol = 5 105 | ) 106 | dev.off() 107 | 108 | 109 | 110 | DefaultAssay(C3L_03405) <- "peaks" 111 | 112 | 113 | C3L_03405 <- RunTFIDF(C3L_03405) 114 | C3L_03405 <- FindTopFeatures(C3L_03405, min.cutoff = 'q0') 115 | C3L_03405 <- RunSVD(C3L_03405) 116 | 117 | pdf("C3L_03405_WT_ATAC_DepthCor.pdf", height=5, width=9) 118 | DepthCor(C3L_03405) 119 | dev.off() 120 | 121 | 122 | C3L_03405 <- RunUMAP(object = C3L_03405, reduction = 'lsi', dims = 2:30) 123 | C3L_03405 <- FindNeighbors(object = C3L_03405, reduction = 'lsi', dims = 2:30) 124 | C3L_03405 <- FindClusters(object = C3L_03405, verbose = FALSE, algorithm = 3) 125 | 126 | 127 | pdf("C3L_03405_WT_ATAC_UMAP.pdf", height=5, width=7) 128 | DimPlot(object = C3L_03405, label = TRUE) + NoLegend() 129 | dev.off() 130 | 131 | 132 | gene.activities <- GeneActivity(C3L_03405) 133 | 134 | 135 | C3L_03405[['RNA']] <- CreateAssayObject(counts = gene.activities) 136 | C3L_03405 <- NormalizeData( 137 | object = C3L_03405, 138 | assay = 'RNA', 139 | normalization.method = 'RC', 140 | scale.factor = median(C3L_03405$nCount_RNA) 141 | ) 142 | 143 | write.table(gene.activities, file="C3L_03405_WT_ATAC_Gene_Activities.txt", sep="\t", col.names=NA, quote=FALSE) 144 | 145 | saveRDS(C3L_03405, file="C3L_03405_WT_ATAC.rds") 146 | -------------------------------------------------------------------------------- /scATAC-Seq Analyses/03- Processing GBM C3L_03968 snATAC-Seq library: -------------------------------------------------------------------------------- 1 | library(Signac) 2 | library(Seurat) 3 | library(EnsDb.Hsapiens.v86) 4 | library(BSgenome.Hsapiens.UCSC.hg38) 5 | 6 | options(future.globals.maxSize = 8000 * 1024^2) 7 | 8 | C3L_03968.data=CreateFragmentObject("./C3L-03968_CPT0228220004_snATAC_GBM/outs/fragments.tsv.gz") 9 | 10 | 11 | features <- CallPeaks(C3L_03968.data, format='BED', outdir ='./C3L-02705_CPT0189650015_snATAC_GBM/outs/',name='C3L_03968', cleanup=FALSE) 12 | 13 | 14 | C3L_03968_Peaks <- keepStandardChromosomes(features, pruning.mode = "coarse") 15 | C3L_03968_Peaks2 <- subsetByOverlaps(x = C3L_03968_Peaks, ranges = blacklist_hg38, invert = TRUE) 16 | 17 | saveRDS(C3L_03968_Peaks2, file="C3L_03968_Peaks.rds") 18 | 19 | C3L_03968.counts <- FeatureMatrix( 20 | fragments = C3L_03968.data, 21 | features = C3L_03968_Peaks 22 | ) 23 | 24 | 25 | C3L_03968.chrom_assay <- CreateChromatinAssay( 26 | counts = C3L_03968.counts, 27 | sep = c("-", "-"), 28 | fragments = './C3L-03968_CPT0228220004_snATAC_GBM/outs/fragments.tsv.gz', 29 | min.cells = 10, 30 | min.features = 200 31 | ) 32 | 33 | C3L_03968 <- CreateSeuratObject(counts = C3L_03968.chrom_assay, assay = "peaks", project = "C3L_03968") 34 | 35 | annotations.C3L_03968 <- GetGRangesFromEnsDb(ensdb = EnsDb.Hsapiens.v86) 36 | seqlevels(annotations.C3L_03968) <- paste0('chr', seqlevels(annotations.C3L_03968)) 37 | genome(annotations.C3L_03968) <- "hg38" 38 | 39 | 40 | Annotation(C3L_03968) <- annotations.C3L_03968 41 | 42 | C3L_03968 <- NucleosomeSignal(object = C3L_03968) 43 | C3L_03968 <- TSSEnrichment(object = C3L_03968, fast = FALSE, assay='peaks') 44 | 45 | C3L_03968$blacklist_fraction <- FractionCountsInRegion( 46 | object = C3L_03968, 47 | assay = 'peaks', 48 | regions = blacklist_hg38 49 | ) 50 | 51 | 52 | total_fragments <- CountFragments("./C3L-03968_CPT0228220004_snATAC_GBM/outs/fragments.tsv.gz") 53 | rownames(total_fragments) <- total_fragments$CB 54 | C3L_03968 $fragments <- total_fragments[colnames(C3L_03968), "frequency_count"] 55 | 56 | C3L_03968 <- FRiP( 57 | object = C3L_03968, 58 | assay = 'peaks', 59 | total.fragments = 'fragments' 60 | ) 61 | 62 | 63 | pdf("C3L_03968_WT_ATAC_DensityScatter.pdf", height=5, width=9) 64 | DensityScatter(C3L_03968, x = 'nCount_peaks', y = 'TSS.enrichment', log_x = TRUE, quantiles = TRUE) 65 | dev.off() 66 | 67 | 68 | pdf("C3L_03968_WT_ATAC_QC_BF.pdf", height=5, width=12) 69 | VlnPlot( 70 | object = C3L_03968, 71 | features = c('nCount_peaks', 'TSS.enrichment', 'blacklist_fraction', 'nucleosome_signal', 'FRiP'), 72 | pt.size = 0.1, 73 | ncol = 5 74 | ) 75 | dev.off() 76 | 77 | 78 | pdf("C3L_03968_WT_ATAC_TSS_Enrichment.pdf", height=5, width=6) 79 | C3L_03968$high.tss <- ifelse(C3L_03968$TSS.enrichment > 1.5, 'High', 'Low') 80 | TSSPlot(C3L_03968, group.by = 'high.tss') + NoLegend() 81 | dev.off() 82 | 83 | pdf("C3L_03968_WT_ATAC_Nucleosome_signal.pdf", height=5, width=6) 84 | C3L_03968$nucleosome_group <- ifelse(C3L_03968$nucleosome_signal > 2.5, 'NS > 2.5', 'NS < 2.5') 85 | FragmentHistogram(object = C3L_03968, group.by = 'nucleosome_group') 86 | dev.off() 87 | 88 | 89 | C3L_03968 <- subset( 90 | x = C3L_03968, 91 | subset = nCount_peaks > 350 & 92 | nCount_peaks < 25000 & 93 | FRiP > 0.15 & 94 | blacklist_fraction < 0.05 & 95 | nucleosome_signal < 2.5 & 96 | TSS.enrichment > 1.5 97 | ) 98 | 99 | 100 | pdf("C3L_03968_WT_ATAC_QC_AF.pdf", height=5, width=12) 101 | VlnPlot( 102 | object = C3L_03968, 103 | features = c('nCount_peaks', 'TSS.enrichment', 'blacklist_fraction', 'nucleosome_signal', 'FRiP'), 104 | pt.size = 0.1, 105 | ncol = 5 106 | ) 107 | dev.off() 108 | 109 | 110 | DefaultAssay(C3L_03968) <- "peaks" 111 | 112 | 113 | C3L_03968 <- RunTFIDF(C3L_03968) 114 | C3L_03968 <- FindTopFeatures(C3L_03968, min.cutoff = 'q0') 115 | C3L_03968 <- RunSVD(C3L_03968) 116 | 117 | pdf("C3L_03968_WT_ATAC_DepthCor.pdf", height=5, width=9) 118 | DepthCor(C3L_03968) 119 | dev.off() 120 | 121 | 122 | C3L_03968 <- RunUMAP(object = C3L_03968, reduction = 'lsi', dims = 2:30) 123 | C3L_03968 <- FindNeighbors(object = C3L_03968, reduction = 'lsi', dims = 2:30) 124 | C3L_03968 <- FindClusters(object = C3L_03968, verbose = FALSE, algorithm = 3) 125 | 126 | 127 | pdf("C3L_03968_WT_ATAC_UMAP.pdf", height=5, width=7) 128 | DimPlot(object = C3L_03968, label = TRUE) + NoLegend() 129 | dev.off() 130 | 131 | 132 | gene.activities <- GeneActivity(C3L_03968) 133 | 134 | 135 | C3L_03968[['RNA']] <- CreateAssayObject(counts = gene.activities) 136 | C3L_03968 <- NormalizeData( 137 | object = C3L_03968, 138 | assay = 'RNA', 139 | normalization.method = 'RC', 140 | scale.factor = median(C3L_03968$nCount_RNA) 141 | ) 142 | 143 | write.table(gene.activities, file="C3L_03968_WT_ATAC_Gene_Activities.txt", sep="\t", col.names=NA, quote=FALSE) 144 | 145 | saveRDS(C3L_03968, file="C3L_03968_WT_ATAC.rds") 146 | -------------------------------------------------------------------------------- /scATAC-Seq Analyses/04- Processing GBM C3N_00662 snATAC-Seq library: -------------------------------------------------------------------------------- 1 | library(Signac) 2 | library(Seurat) 3 | library(EnsDb.Hsapiens.v86) 4 | library(BSgenome.Hsapiens.UCSC.hg38) 5 | 6 | options(future.globals.maxSize = 8000 * 1024^2) 7 | 8 | C3N_00662.data=CreateFragmentObject("./C3N-00662_CPT0087680014_snATAC_GBM/outs/fragments.tsv.gz") 9 | 10 | 11 | features <- CallPeaks(C3N_00662.data, format='BED', outdir ='./C3N-00662_CPT0087680014_snATAC_GBM/outs/',name='C3N_00662', cleanup=FALSE) 12 | 13 | 14 | C3N_00662_Peaks <- keepStandardChromosomes(features, pruning.mode = "coarse") 15 | C3N_00662_Peaks2 <- subsetByOverlaps(x = C3N_00662_Peaks, ranges = blacklist_hg38, invert = TRUE) 16 | 17 | saveRDS(C3N_00662_Peaks2, file="C3N_00662_Peaks.rds") 18 | 19 | C3N_00662.counts <- FeatureMatrix( 20 | fragments = C3N_00662.data, 21 | features = C3N_00662_Peaks 22 | ) 23 | 24 | 25 | C3N_00662.chrom_assay <- CreateChromatinAssay( 26 | counts = C3N_00662.counts, 27 | sep = c("-", "-"), 28 | fragments = './C3N-00662_CPT0087680014_snATAC_GBM/outs/fragments.tsv.gz', 29 | min.cells = 10, 30 | min.features = 200 31 | ) 32 | 33 | C3N_00662 <- CreateSeuratObject(counts = C3N_00662.chrom_assay, assay = "peaks", project = "C3N_00662") 34 | 35 | annotations.C3N_00662 <- GetGRangesFromEnsDb(ensdb = EnsDb.Hsapiens.v86) 36 | seqlevels(annotations.C3N_00662) <- paste0('chr', seqlevels(annotations.C3N_00662)) 37 | genome(annotations.C3N_00662) <- "hg38" 38 | 39 | 40 | Annotation(C3N_00662) <- annotations.C3N_00662 41 | 42 | C3N_00662 <- NucleosomeSignal(object = C3N_00662) 43 | C3N_00662 <- TSSEnrichment(object = C3N_00662, fast = FALSE, assay='peaks') 44 | 45 | C3N_00662$blacklist_fraction <- FractionCountsInRegion( 46 | object = C3N_00662, 47 | assay = 'peaks', 48 | regions = blacklist_hg38 49 | ) 50 | 51 | 52 | total_fragments <- CountFragments("./C3N-00662_CPT0087680014_snATAC_GBM/outs/fragments.tsv.gz") 53 | rownames(total_fragments) <- total_fragments$CB 54 | C3N_00662 $fragments <- total_fragments[colnames(C3N_00662), "frequency_count"] 55 | 56 | C3N_00662 <- FRiP( 57 | object = C3N_00662, 58 | assay = 'peaks', 59 | total.fragments = 'fragments' 60 | ) 61 | 62 | 63 | pdf("C3N_00662_WT_ATAC_DensityScatter.pdf", height=5, width=9) 64 | DensityScatter(C3N_00662, x = 'nCount_peaks', y = 'TSS.enrichment', log_x = TRUE, quantiles = TRUE) 65 | dev.off() 66 | 67 | 68 | pdf("C3N_00662_WT_ATAC_QC_BF.pdf", height=5, width=12) 69 | VlnPlot( 70 | object = C3N_00662, 71 | features = c('nCount_peaks', 'TSS.enrichment', 'blacklist_fraction', 'nucleosome_signal', 'FRiP'), 72 | pt.size = 0.1, 73 | ncol = 5 74 | ) 75 | dev.off() 76 | 77 | 78 | pdf("C3N_00662_WT_ATAC_TSS_Enrichment.pdf", height=5, width=6) 79 | C3N_00662$high.tss <- ifelse(C3N_00662$TSS.enrichment > 1.5, 'High', 'Low') 80 | TSSPlot(C3N_00662, group.by = 'high.tss') + NoLegend() 81 | dev.off() 82 | 83 | pdf("C3N_00662_WT_ATAC_Nucleosome_signal.pdf", height=5, width=6) 84 | C3N_00662$nucleosome_group <- ifelse(C3N_00662$nucleosome_signal > 2, 'NS > 2', 'NS < 2') 85 | FragmentHistogram(object = C3N_00662, group.by = 'nucleosome_group') 86 | dev.off() 87 | 88 | 89 | C3N_00662 <- subset( 90 | x = C3N_00662, 91 | subset = nCount_peaks > 350 & 92 | nCount_peaks < 25000 & 93 | FRiP > 0.15 & 94 | blacklist_fraction < 0.05 & 95 | nucleosome_signal < 2 & 96 | TSS.enrichment > 1.5 97 | ) 98 | 99 | 100 | pdf("C3N_00662_WT_ATAC_QC_AF.pdf", height=5, width=12) 101 | VlnPlot( 102 | object = C3N_00662, 103 | features = c('nCount_peaks', 'TSS.enrichment', 'blacklist_fraction', 'nucleosome_signal', 'FRiP'), 104 | pt.size = 0.1, 105 | ncol = 5 106 | ) 107 | dev.off() 108 | 109 | 110 | DefaultAssay(C3N_00662) <- "peaks" 111 | 112 | 113 | C3N_00662 <- RunTFIDF(C3N_00662) 114 | C3N_00662 <- FindTopFeatures(C3N_00662, min.cutoff = 'q0') 115 | C3N_00662 <- RunSVD(C3N_00662) 116 | 117 | pdf("C3N_00662_WT_ATAC_DepthCor.pdf", height=5, width=9) 118 | DepthCor(C3N_00662) 119 | dev.off() 120 | 121 | 122 | C3N_00662 <- RunUMAP(object = C3N_00662, reduction = 'lsi', dims = 2:30) 123 | C3N_00662 <- FindNeighbors(object = C3N_00662, reduction = 'lsi', dims = 2:30) 124 | C3N_00662 <- FindClusters(object = C3N_00662, verbose = FALSE, algorithm = 3) 125 | 126 | 127 | pdf("C3N_00662_WT_ATAC_UMAP.pdf", height=5, width=7) 128 | DimPlot(object = C3N_00662, label = TRUE) + NoLegend() 129 | dev.off() 130 | 131 | 132 | gene.activities <- GeneActivity(C3N_00662) 133 | 134 | 135 | C3N_00662[['RNA']] <- CreateAssayObject(counts = gene.activities) 136 | C3N_00662 <- NormalizeData( 137 | object = C3N_00662, 138 | assay = 'RNA', 139 | normalization.method = 'RC', 140 | scale.factor = median(C3N_00662$nCount_RNA) 141 | ) 142 | 143 | write.table(gene.activities, file="C3N_00662_WT_ATAC_Gene_Activities.txt", sep="\t", col.names=NA, quote=FALSE) 144 | 145 | saveRDS(C3N_00662, file="C3N_00662_WT_ATAC.rds") 146 | -------------------------------------------------------------------------------- /scATAC-Seq Analyses/05 - Processing GBM C3N_00663 snATAC-Seq library: -------------------------------------------------------------------------------- 1 | library(Signac) 2 | library(Seurat) 3 | library(EnsDb.Hsapiens.v86) 4 | library(BSgenome.Hsapiens.UCSC.hg38) 5 | 6 | options(future.globals.maxSize = 8000 * 1024^2) 7 | 8 | C3N_00663.data=CreateFragmentObject("./C3N-00663_CPT0087730014_snATAC_GBM/outs/fragments.tsv.gz") 9 | 10 | 11 | features <- CallPeaks(C3N_00663.data, format='BED', outdir ='./C3L-03405_CPT0224600013_snATAC_GBM/outs/',name='C3N_00663', cleanup=FALSE) 12 | 13 | C3N_00663_Peaks <- keepStandardChromosomes(features, pruning.mode = "coarse") 14 | C3N_00663_Peaks2 <- subsetByOverlaps(x = C3N_00663_Peaks, ranges = blacklist_hg38, invert = TRUE) 15 | 16 | saveRDS(C3N_00663_Peaks2, file="C3N_00663_Peaks2.rds") 17 | 18 | C3N_00663.counts <- FeatureMatrix( 19 | fragments = C3N_00663.data, 20 | features = C3N_00663_Peaks 21 | ) 22 | 23 | 24 | C3N_00663.chrom_assay <- CreateChromatinAssay( 25 | counts = C3N_00663.counts, 26 | sep = c("-", "-"), 27 | fragments = './C3N-00663_CPT0087730014_snATAC_GBM/outs/fragments.tsv.gz', 28 | min.cells = 10, 29 | min.features = 200 30 | ) 31 | 32 | C3N_00663 <- CreateSeuratObject(counts = C3N_00663.chrom_assay, assay = "peaks", project = "C3N_00663") 33 | 34 | annotations.C3N_00663 <- GetGRangesFromEnsDb(ensdb = EnsDb.Hsapiens.v86) 35 | seqlevels(annotations.C3N_00663) <- paste0('chr', seqlevels(annotations.C3N_00663)) 36 | genome(annotations.C3N_00663) <- "hg38" 37 | 38 | 39 | Annotation(C3N_00663) <- annotations.C3N_00663 40 | 41 | C3N_00663 <- NucleosomeSignal(object = C3N_00663) 42 | C3N_00663 <- TSSEnrichment(object = C3N_00663, fast = FALSE, assay='peaks') 43 | 44 | C3N_00663$blacklist_fraction <- FractionCountsInRegion( 45 | object = C3N_00663, 46 | assay = 'peaks', 47 | regions = blacklist_hg38 48 | ) 49 | 50 | 51 | total_fragments <- CountFragments("./C3N-00663_CPT0087730014_snATAC_GBM/outs/fragments.tsv.gz") 52 | rownames(total_fragments) <- total_fragments$CB 53 | C3N_00663 $fragments <- total_fragments[colnames(C3N_00663), "frequency_count"] 54 | 55 | C3N_00663 <- FRiP( 56 | object = C3N_00663, 57 | assay = 'peaks', 58 | total.fragments = 'fragments' 59 | ) 60 | 61 | 62 | pdf("C3N_00663_WT_ATAC_DensityScatter.pdf", height=5, width=9) 63 | DensityScatter(C3N_00663, x = 'nCount_peaks', y = 'TSS.enrichment', log_x = TRUE, quantiles = TRUE) 64 | dev.off() 65 | 66 | 67 | pdf("C3N_00663_WT_ATAC_QC_BF.pdf", height=5, width=12) 68 | VlnPlot( 69 | object = C3N_00663, 70 | features = c('nCount_peaks', 'TSS.enrichment', 'blacklist_fraction', 'nucleosome_signal', 'FRiP'), 71 | pt.size = 0.1, 72 | ncol = 5 73 | ) 74 | dev.off() 75 | 76 | 77 | pdf("C3N_00663_WT_ATAC_TSS_Enrichment.pdf", height=5, width=6) 78 | C3N_00663$high.tss <- ifelse(C3N_00663$TSS.enrichment > 1.5, 'High', 'Low') 79 | TSSPlot(C3N_00663, group.by = 'high.tss') + NoLegend() 80 | dev.off() 81 | 82 | pdf("C3N_00663_WT_ATAC_Nucleosome_signal.pdf", height=5, width=6) 83 | C3N_00663$nucleosome_group <- ifelse(C3N_00663$nucleosome_signal > 2.5, 'NS > 2', 'NS < 2.5') 84 | FragmentHistogram(object = C3N_00663, group.by = 'nucleosome_group') 85 | dev.off() 86 | 87 | 88 | C3N_00663 <- subset( 89 | x = C3N_00663, 90 | subset = nCount_peaks > 350 & 91 | nCount_peaks < 12500 & 92 | FRiP > 0.15 & 93 | blacklist_fraction < 0.05 & 94 | nucleosome_signal < 2.5 & 95 | TSS.enrichment > 1.5 96 | ) 97 | 98 | 99 | pdf("C3N_00663_WT_ATAC_QC_AF.pdf", height=5, width=12) 100 | VlnPlot( 101 | object = C3N_00663, 102 | features = c('nCount_peaks', 'TSS.enrichment', 'blacklist_fraction', 'nucleosome_signal', 'FRiP'), 103 | pt.size = 0.1, 104 | ncol = 5 105 | ) 106 | dev.off() 107 | 108 | 109 | 110 | DefaultAssay(C3N_00663) <- "peaks" 111 | 112 | 113 | C3N_00663 <- RunTFIDF(C3N_00663) 114 | C3N_00663 <- FindTopFeatures(C3N_00663, min.cutoff = 'q0') 115 | C3N_00663 <- RunSVD(C3N_00663) 116 | 117 | pdf("C3N_00663_WT_ATAC_DepthCor.pdf", height=5, width=9) 118 | DepthCor(C3N_00663) 119 | dev.off() 120 | 121 | 122 | C3N_00663 <- RunUMAP(object = C3N_00663, reduction = 'lsi', dims = 2:30) 123 | C3N_00663 <- FindNeighbors(object = C3N_00663, reduction = 'lsi', dims = 2:30) 124 | C3N_00663 <- FindClusters(object = C3N_00663, verbose = FALSE, algorithm = 3) 125 | 126 | 127 | pdf("C3N_00663_WT_ATAC_UMAP.pdf", height=5, width=7) 128 | DimPlot(object = C3N_00663, label = TRUE) + NoLegend() 129 | dev.off() 130 | 131 | 132 | gene.activities <- GeneActivity(C3N_00663) 133 | 134 | 135 | C3N_00663[['RNA']] <- CreateAssayObject(counts = gene.activities) 136 | C3N_00663 <- NormalizeData( 137 | object = C3N_00663, 138 | assay = 'RNA', 139 | normalization.method = 'RC', 140 | scale.factor = median(C3N_00663$nCount_RNA) 141 | ) 142 | 143 | write.table(gene.activities, file="C3N_00663_WT_ATAC_Gene_Activities.txt", sep="\t", col.names=NA, quote=FALSE) 144 | 145 | saveRDS(C3N_00663, file="C3N_00663_WT_ATAC.rds") 146 | -------------------------------------------------------------------------------- /scATAC-Seq Analyses/07- Processing GBM C3N_01518 snATAC-Seq library: -------------------------------------------------------------------------------- 1 | library(Signac) 2 | library(Seurat) 3 | library(EnsDb.Hsapiens.v86) 4 | library(BSgenome.Hsapiens.UCSC.hg38) 5 | 6 | options(future.globals.maxSize = 8000 * 1024^2) 7 | 8 | C3N_01518.data=CreateFragmentObject("./C3N-01518_CPT0167640014_snATAC_GBM/outs/fragments.tsv.gz") 9 | 10 | 11 | features <- CallPeaks(C3N_01518.data, format='BED', outdir ='./C3N-01518_CPT0167640014_snATAC_GBM/outs/',name='C3N_01518', cleanup=FALSE) 12 | 13 | 14 | C3N_01518_Peaks <- keepStandardChromosomes(features, pruning.mode = "coarse") 15 | C3N_01518_Peaks2 <- subsetByOverlaps(x = C3N_01518_Peaks, ranges = blacklist_hg38, invert = TRUE) 16 | 17 | saveRDS(C3N_01518_Peaks2, file="C3N_01518_Peaks.rds") 18 | 19 | C3N_01518.counts <- FeatureMatrix( 20 | fragments = C3N_01518.data, 21 | features = C3N_01518_Peaks 22 | ) 23 | 24 | 25 | C3N_01518.chrom_assay <- CreateChromatinAssay( 26 | counts = C3N_01518.counts, 27 | sep = c("-", "-"), 28 | fragments = './C3N-01518_CPT0167640014_snATAC_GBM/outs/fragments.tsv.gz', 29 | min.cells = 10, 30 | min.features = 200 31 | ) 32 | 33 | C3N_01518 <- CreateSeuratObject(counts = C3N_01518.chrom_assay, assay = "peaks", project = "C3N_01518") 34 | 35 | annotations.C3N_01518 <- GetGRangesFromEnsDb(ensdb = EnsDb.Hsapiens.v86) 36 | seqlevels(annotations.C3N_01518) <- paste0('chr', seqlevels(annotations.C3N_01518)) 37 | genome(annotations.C3N_01518) <- "hg38" 38 | 39 | 40 | Annotation(C3N_01518) <- annotations.C3N_01518 41 | 42 | C3N_01518 <- NucleosomeSignal(object = C3N_01518) 43 | C3N_01518 <- TSSEnrichment(object = C3N_01518, fast = FALSE, assay='peaks') 44 | 45 | C3N_01518$blacklist_fraction <- FractionCountsInRegion( 46 | object = C3N_01518, 47 | assay = 'peaks', 48 | regions = blacklist_hg38 49 | ) 50 | 51 | 52 | total_fragments <- CountFragments("./C3N-01518_CPT0167640014_snATAC_GBM/outs/fragments.tsv.gz") 53 | rownames(total_fragments) <- total_fragments$CB 54 | C3N_01518 $fragments <- total_fragments[colnames(C3N_01518), "frequency_count"] 55 | 56 | C3N_01518 <- FRiP( 57 | object = C3N_01518, 58 | assay = 'peaks', 59 | total.fragments = 'fragments' 60 | ) 61 | 62 | 63 | pdf("C3N_01518_WT_ATAC_DensityScatter.pdf", height=5, width=9) 64 | DensityScatter(C3N_01518, x = 'nCount_peaks', y = 'TSS.enrichment', log_x = TRUE, quantiles = TRUE) 65 | dev.off() 66 | 67 | 68 | pdf("C3N_01518_WT_ATAC_QC_BF.pdf", height=5, width=12) 69 | VlnPlot( 70 | object = C3N_01518, 71 | features = c('nCount_peaks', 'TSS.enrichment', 'blacklist_fraction', 'nucleosome_signal', 'FRiP'), 72 | pt.size = 0.1, 73 | ncol = 5 74 | ) 75 | dev.off() 76 | 77 | 78 | pdf("C3N_01518_WT_ATAC_TSS_Enrichment.pdf", height=5, width=6) 79 | C3N_01518$high.tss <- ifelse(C3N_01518$TSS.enrichment > 1.5, 'High', 'Low') 80 | TSSPlot(C3N_01518, group.by = 'high.tss') + NoLegend() 81 | dev.off() 82 | 83 | pdf("C3N_01518_WT_ATAC_Nucleosome_signal.pdf", height=5, width=6) 84 | C3N_01518$nucleosome_group <- ifelse(C3N_01518$nucleosome_signal > 2, 'NS > 2', 'NS < 2') 85 | FragmentHistogram(object = C3N_01518, group.by = 'nucleosome_group') 86 | dev.off() 87 | 88 | 89 | C3N_01518 <- subset( 90 | x = C3N_01518, 91 | subset = nCount_peaks > 300 & 92 | nCount_peaks < 7500 & 93 | FRiP > 0.15 & 94 | blacklist_fraction < 0.05 & 95 | nucleosome_signal < 2 & 96 | TSS.enrichment > 1.5 97 | ) 98 | 99 | 100 | pdf("C3N_01518_WT_ATAC_QC_AF.pdf", height=5, width=12) 101 | VlnPlot( 102 | object = C3N_01518, 103 | features = c('nCount_peaks', 'TSS.enrichment', 'blacklist_fraction', 'nucleosome_signal', 'FRiP'), 104 | pt.size = 0.1, 105 | ncol = 5 106 | ) 107 | dev.off() 108 | 109 | 110 | DefaultAssay(C3N_01518) <- "peaks" 111 | 112 | 113 | C3N_01518 <- RunTFIDF(C3N_01518) 114 | C3N_01518 <- FindTopFeatures(C3N_01518, min.cutoff = 'q0') 115 | C3N_01518 <- RunSVD(C3N_01518) 116 | 117 | pdf("C3N_01518_WT_ATAC_DepthCor.pdf", height=5, width=9) 118 | DepthCor(C3N_01518) 119 | dev.off() 120 | 121 | 122 | C3N_01518 <- RunUMAP(object = C3N_01518, reduction = 'lsi', dims = 2:30) 123 | C3N_01518 <- FindNeighbors(object = C3N_01518, reduction = 'lsi', dims = 2:30) 124 | C3N_01518 <- FindClusters(object = C3N_01518, verbose = FALSE, algorithm = 3) 125 | 126 | 127 | pdf("C3N_01518_WT_ATAC_UMAP.pdf", height=5, width=7) 128 | DimPlot(object = C3N_01518, label = TRUE) + NoLegend() 129 | dev.off() 130 | 131 | 132 | gene.activities <- GeneActivity(C3N_01518) 133 | 134 | 135 | C3N_01518[['RNA']] <- CreateAssayObject(counts = gene.activities) 136 | C3N_01518 <- NormalizeData( 137 | object = C3N_01518, 138 | assay = 'RNA', 139 | normalization.method = 'RC', 140 | scale.factor = median(C3N_01518$nCount_RNA) 141 | ) 142 | 143 | write.table(gene.activities, file="C3N_01518_WT_ATAC_Gene_Activities.txt", sep="\t", col.names=NA, quote=FALSE) 144 | 145 | saveRDS(C3N_01518, file="C3N_01518_WT_ATAC.rds") 146 | -------------------------------------------------------------------------------- /scATAC-Seq Analyses/08- Processing GBM C3N_01798 snATAC-Seq library: -------------------------------------------------------------------------------- 1 | library(Signac) 2 | library(Seurat) 3 | library(EnsDb.Hsapiens.v86) 4 | library(BSgenome.Hsapiens.UCSC.hg38) 5 | 6 | options(future.globals.maxSize = 8000 * 1024^2) 7 | 8 | C3N_01798.data=CreateFragmentObject("./C3N-01798_CPT0167750015_snATAC_GBM/outs/fragments.tsv.gz") 9 | 10 | 11 | features <- CallPeaks(C3N_01798.data, format='BED', outdir ='./C3N-01798_CPT0167750015_snATAC_GBM/outs/',name='C3N_01798', cleanup=FALSE) 12 | 13 | 14 | C3N_01798_Peaks <- keepStandardChromosomes(features, pruning.mode = "coarse") 15 | C3N_01798_Peaks2 <- subsetByOverlaps(x = C3N_01798_Peaks, ranges = blacklist_hg38, invert = TRUE) 16 | 17 | saveRDS(C3N_01798_Peaks2, file="C3N_01798_Peaks.rds") 18 | 19 | C3N_01798.counts <- FeatureMatrix( 20 | fragments = C3N_01798.data, 21 | features = C3N_01798_Peaks 22 | ) 23 | 24 | 25 | C3N_01798.chrom_assay <- CreateChromatinAssay( 26 | counts = C3N_01798.counts, 27 | sep = c("-", "-"), 28 | fragments = './C3N-01798_CPT0167750015_snATAC_GBM/outs/fragments.tsv.gz', 29 | min.cells = 10, 30 | min.features = 200 31 | ) 32 | 33 | C3N_01798 <- CreateSeuratObject(counts = C3N_01798.chrom_assay, assay = "peaks", project = "C3N_01798") 34 | 35 | annotations.C3N_01798 <- GetGRangesFromEnsDb(ensdb = EnsDb.Hsapiens.v86) 36 | seqlevels(annotations.C3N_01798) <- paste0('chr', seqlevels(annotations.C3N_01798)) 37 | genome(annotations.C3N_01798) <- "hg38" 38 | 39 | 40 | Annotation(C3N_01798) <- annotations.C3N_01798 41 | 42 | C3N_01798 <- NucleosomeSignal(object = C3N_01798) 43 | C3N_01798 <- TSSEnrichment(object = C3N_01798, fast = FALSE, assay='peaks') 44 | 45 | C3N_01798$blacklist_fraction <- FractionCountsInRegion( 46 | object = C3N_01798, 47 | assay = 'peaks', 48 | regions = blacklist_hg38 49 | ) 50 | 51 | 52 | total_fragments <- CountFragments("./C3N-01798_CPT0167750015_snATAC_GBM/outs/fragments.tsv.gz") 53 | rownames(total_fragments) <- total_fragments$CB 54 | C3N_01798 $fragments <- total_fragments[colnames(C3N_01798), "frequency_count"] 55 | 56 | C3N_01798 <- FRiP( 57 | object = C3N_01798, 58 | assay = 'peaks', 59 | total.fragments = 'fragments' 60 | ) 61 | 62 | 63 | pdf("C3N_01798_WT_ATAC_DensityScatter.pdf", height=5, width=9) 64 | DensityScatter(C3N_01798, x = 'nCount_peaks', y = 'TSS.enrichment', log_x = TRUE, quantiles = TRUE) 65 | dev.off() 66 | 67 | 68 | pdf("C3N_01798_WT_ATAC_QC_BF.pdf", height=5, width=12) 69 | VlnPlot( 70 | object = C3N_01798, 71 | features = c('nCount_peaks', 'TSS.enrichment', 'blacklist_fraction', 'nucleosome_signal', 'FRiP'), 72 | pt.size = 0.1, 73 | ncol = 5 74 | ) 75 | dev.off() 76 | 77 | 78 | pdf("C3N_01798_WT_ATAC_TSS_Enrichment.pdf", height=5, width=6) 79 | C3N_01798$high.tss <- ifelse(C3N_01798$TSS.enrichment > 1.5, 'High', 'Low') 80 | TSSPlot(C3N_01798, group.by = 'high.tss') + NoLegend() 81 | dev.off() 82 | 83 | pdf("C3N_01798_WT_ATAC_Nucleosome_signal.pdf", height=5, width=6) 84 | C3N_01798$nucleosome_group <- ifelse(C3N_01798$nucleosome_signal > 2, 'NS > 2', 'NS < 2') 85 | FragmentHistogram(object = C3N_01798, group.by = 'nucleosome_group') 86 | dev.off() 87 | 88 | 89 | C3N_01798 <- subset( 90 | x = C3N_01798, 91 | subset = nCount_peaks > 350 & 92 | nCount_peaks < 25000 & 93 | FRiP > 0.15 & 94 | blacklist_fraction < 0.05 & 95 | nucleosome_signal < 2 & 96 | TSS.enrichment > 1.5 97 | ) 98 | 99 | 100 | pdf("C3N_01798_WT_ATAC_QC_AF.pdf", height=5, width=12) 101 | VlnPlot( 102 | object = C3N_01798, 103 | features = c('nCount_peaks', 'TSS.enrichment', 'blacklist_fraction', 'nucleosome_signal', 'FRiP'), 104 | pt.size = 0.1, 105 | ncol = 5 106 | ) 107 | dev.off() 108 | 109 | 110 | DefaultAssay(C3N_01798) <- "peaks" 111 | 112 | 113 | C3N_01798 <- RunTFIDF(C3N_01798) 114 | C3N_01798 <- FindTopFeatures(C3N_01798, min.cutoff = 'q0') 115 | C3N_01798 <- RunSVD(C3N_01798) 116 | 117 | pdf("C3N_01798_WT_ATAC_DepthCor.pdf", height=5, width=9) 118 | DepthCor(C3N_01798) 119 | dev.off() 120 | 121 | 122 | C3N_01798 <- RunUMAP(object = C3N_01798, reduction = 'lsi', dims = 2:30) 123 | C3N_01798 <- FindNeighbors(object = C3N_01798, reduction = 'lsi', dims = 2:30) 124 | C3N_01798 <- FindClusters(object = C3N_01798, verbose = FALSE, algorithm = 3) 125 | 126 | 127 | pdf("C3N_01798_WT_ATAC_UMAP.pdf", height=5, width=7) 128 | DimPlot(object = C3N_01798, label = TRUE) + NoLegend() 129 | dev.off() 130 | 131 | 132 | gene.activities <- GeneActivity(C3N_01798) 133 | 134 | 135 | C3N_01798[['RNA']] <- CreateAssayObject(counts = gene.activities) 136 | C3N_01798 <- NormalizeData( 137 | object = C3N_01798, 138 | assay = 'RNA', 139 | normalization.method = 'RC', 140 | scale.factor = median(C3N_01798$nCount_RNA) 141 | ) 142 | 143 | write.table(gene.activities, file="C3N_01798_WT_ATAC_Gene_Activities.txt", sep="\t", col.names=NA, quote=FALSE) 144 | 145 | saveRDS(C3N_01798, file="C3N_01798_WT_ATAC.rds") 146 | -------------------------------------------------------------------------------- /scATAC-Seq Analyses/11- Processing GBM C3N_01818 snATAC-Seq library: -------------------------------------------------------------------------------- 1 | library(Signac) 2 | library(Seurat) 3 | library(EnsDb.Hsapiens.v86) 4 | library(BSgenome.Hsapiens.UCSC.hg38) 5 | 6 | options(future.globals.maxSize = 8000 * 1024^2) 7 | 8 | C3N_01818.data=CreateFragmentObject("./C3N-01818_CPT0168270014_snATAC_GBM/outs/fragments.tsv.gz") 9 | 10 | 11 | features <- CallPeaks(C3N_01818.data, format='BED', outdir ='./C3N-01818_CPT0168270014_snATAC_GBM/outs/',name='C3N_01818', cleanup=FALSE) 12 | 13 | 14 | C3N_01818_Peaks <- keepStandardChromosomes(features, pruning.mode = "coarse") 15 | C3N_01818_Peaks2 <- subsetByOverlaps(x = C3N_01818_Peaks, ranges = blacklist_hg38, invert = TRUE) 16 | 17 | saveRDS(C3N_01818_Peaks2, file="C3N_01818_Peaks.rds") 18 | 19 | C3N_01818.counts <- FeatureMatrix( 20 | fragments = C3N_01818.data, 21 | features = C3N_01818_Peaks 22 | ) 23 | 24 | 25 | C3N_01818.chrom_assay <- CreateChromatinAssay( 26 | counts = C3N_01818.counts, 27 | sep = c("-", "-"), 28 | fragments = './C3N-01818_CPT0168270014_snATAC_GBM/outs/fragments.tsv.gz', 29 | min.cells = 10, 30 | min.features = 200 31 | ) 32 | 33 | C3N_01818 <- CreateSeuratObject(counts = C3N_01818.chrom_assay, assay = "peaks", project = "C3N_01818") 34 | 35 | annotations.C3N_01818 <- GetGRangesFromEnsDb(ensdb = EnsDb.Hsapiens.v86) 36 | seqlevels(annotations.C3N_01818) <- paste0('chr', seqlevels(annotations.C3N_01818)) 37 | genome(annotations.C3N_01818) <- "hg38" 38 | 39 | 40 | Annotation(C3N_01818) <- annotations.C3N_01818 41 | 42 | C3N_01818 <- NucleosomeSignal(object = C3N_01818) 43 | C3N_01818 <- TSSEnrichment(object = C3N_01818, fast = FALSE, assay='peaks') 44 | 45 | C3N_01818$blacklist_fraction <- FractionCountsInRegion( 46 | object = C3N_01818, 47 | assay = 'peaks', 48 | regions = blacklist_hg38 49 | ) 50 | 51 | 52 | total_fragments <- CountFragments("./C3N-01818_CPT0168270014_snATAC_GBM/outs/fragments.tsv.gz") 53 | rownames(total_fragments) <- total_fragments$CB 54 | C3N_01818 $fragments <- total_fragments[colnames(C3N_01818), "frequency_count"] 55 | 56 | C3N_01818 <- FRiP( 57 | object = C3N_01818, 58 | assay = 'peaks', 59 | total.fragments = 'fragments' 60 | ) 61 | 62 | 63 | pdf("C3N_01818_WT_ATAC_DensityScatter.pdf", height=5, width=9) 64 | DensityScatter(C3N_01818, x = 'nCount_peaks', y = 'TSS.enrichment', log_x = TRUE, quantiles = TRUE) 65 | dev.off() 66 | 67 | 68 | pdf("C3N_01818_WT_ATAC_QC_BF.pdf", height=5, width=12) 69 | VlnPlot( 70 | object = C3N_01818, 71 | features = c('nCount_peaks', 'TSS.enrichment', 'blacklist_fraction', 'nucleosome_signal', 'FRiP'), 72 | pt.size = 0.1, 73 | ncol = 5 74 | ) 75 | dev.off() 76 | 77 | 78 | pdf("C3N_01818_WT_ATAC_TSS_Enrichment.pdf", height=5, width=6) 79 | C3N_01818$high.tss <- ifelse(C3N_01818$TSS.enrichment > 1.5, 'High', 'Low') 80 | TSSPlot(C3N_01818, group.by = 'high.tss') + NoLegend() 81 | dev.off() 82 | 83 | pdf("C3N_01818_WT_ATAC_Nucleosome_signal.pdf", height=5, width=6) 84 | C3N_01818$nucleosome_group <- ifelse(C3N_01818$nucleosome_signal > 2, 'NS > 2', 'NS < 2') 85 | FragmentHistogram(object = C3N_01818, group.by = 'nucleosome_group') 86 | dev.off() 87 | 88 | 89 | C3N_01818 <- subset( 90 | x = C3N_01818, 91 | subset = nCount_peaks > 350 & 92 | nCount_peaks < 20000 & 93 | FRiP > 0.15 & 94 | blacklist_fraction < 0.05 & 95 | nucleosome_signal < 2 & 96 | TSS.enrichment > 1.5 97 | ) 98 | 99 | 100 | pdf("C3N_01818_WT_ATAC_QC_AF.pdf", height=5, width=12) 101 | VlnPlot( 102 | object = C3N_01818, 103 | features = c('nCount_peaks', 'TSS.enrichment', 'blacklist_fraction', 'nucleosome_signal', 'FRiP'), 104 | pt.size = 0.1, 105 | ncol = 5 106 | ) 107 | dev.off() 108 | 109 | 110 | DefaultAssay(C3N_01818) <- "peaks" 111 | 112 | 113 | C3N_01818 <- RunTFIDF(C3N_01818) 114 | C3N_01818 <- FindTopFeatures(C3N_01818, min.cutoff = 'q0') 115 | C3N_01818 <- RunSVD(C3N_01818) 116 | 117 | pdf("C3N_01818_WT_ATAC_DepthCor.pdf", height=5, width=9) 118 | DepthCor(C3N_01818) 119 | dev.off() 120 | 121 | 122 | C3N_01818 <- RunUMAP(object = C3N_01818, reduction = 'lsi', dims = 2:30) 123 | C3N_01818 <- FindNeighbors(object = C3N_01818, reduction = 'lsi', dims = 2:30) 124 | C3N_01818 <- FindClusters(object = C3N_01818, verbose = FALSE, algorithm = 3) 125 | 126 | 127 | pdf("C3N_01818_WT_ATAC_UMAP.pdf", height=5, width=7) 128 | DimPlot(object = C3N_01818, label = TRUE) + NoLegend() 129 | dev.off() 130 | 131 | 132 | gene.activities <- GeneActivity(C3N_01818) 133 | 134 | 135 | C3N_01818[['RNA']] <- CreateAssayObject(counts = gene.activities) 136 | C3N_01818 <- NormalizeData( 137 | object = C3N_01818, 138 | assay = 'RNA', 139 | normalization.method = 'RC', 140 | scale.factor = median(C3N_01818$nCount_RNA) 141 | ) 142 | 143 | write.table(gene.activities, file="C3N_01818_WT_ATAC_Gene_Activities.txt", sep="\t", col.names=NA, quote=FALSE) 144 | 145 | saveRDS(C3N_01818, file="C3N_01818_WT_ATAC.rds") 146 | -------------------------------------------------------------------------------- /scATAC-Seq Analyses/12- Processing GBM C3N_02181 snATAC-Seq library: -------------------------------------------------------------------------------- 1 | library(Signac) 2 | library(Seurat) 3 | library(EnsDb.Hsapiens.v86) 4 | library(BSgenome.Hsapiens.UCSC.hg38) 5 | 6 | options(future.globals.maxSize = 8000 * 1024^2) 7 | 8 | C3N_02181.data=CreateFragmentObject("./C3N-02181_CPT0168380014_snATAC_GBM/outs/fragments.tsv.gz") 9 | 10 | 11 | features <- CallPeaks(C3N_02181.data, format='BED', outdir ='./C3N-02181_CPT0168380014_snATAC_GBM/outs/',name='C3N_02181', cleanup=FALSE) 12 | 13 | 14 | C3N_02181_Peaks <- keepStandardChromosomes(features, pruning.mode = "coarse") 15 | C3N_02181_Peaks2 <- subsetByOverlaps(x = C3N_02181_Peaks, ranges = blacklist_hg38, invert = TRUE) 16 | 17 | saveRDS(C3N_02181_Peaks2, file="C3N_02181_Peaks.rds") 18 | 19 | C3N_02181.counts <- FeatureMatrix( 20 | fragments = C3N_02181.data, 21 | features = C3N_02181_Peaks 22 | ) 23 | 24 | 25 | C3N_02181.chrom_assay <- CreateChromatinAssay( 26 | counts = C3N_02181.counts, 27 | sep = c("-", "-"), 28 | fragments = './C3N-02181_CPT0168380014_snATAC_GBM/outs/fragments.tsv.gz', 29 | min.cells = 10, 30 | min.features = 200 31 | ) 32 | 33 | C3N_02181 <- CreateSeuratObject(counts = C3N_02181.chrom_assay, assay = "peaks", project = "C3N_02181") 34 | 35 | annotations.C3N_02181 <- GetGRangesFromEnsDb(ensdb = EnsDb.Hsapiens.v86) 36 | seqlevels(annotations.C3N_02181) <- paste0('chr', seqlevels(annotations.C3N_02181)) 37 | genome(annotations.C3N_02181) <- "hg38" 38 | 39 | 40 | Annotation(C3N_02181) <- annotations.C3N_02181 41 | 42 | C3N_02181 <- NucleosomeSignal(object = C3N_02181) 43 | C3N_02181 <- TSSEnrichment(object = C3N_02181, fast = FALSE, assay='peaks') 44 | 45 | C3N_02181$blacklist_fraction <- FractionCountsInRegion( 46 | object = C3N_02181, 47 | assay = 'peaks', 48 | regions = blacklist_hg38 49 | ) 50 | 51 | 52 | total_fragments <- CountFragments("./C3N-02181_CPT0168380014_snATAC_GBM/outs/fragments.tsv.gz") 53 | rownames(total_fragments) <- total_fragments$CB 54 | C3N_02181 $fragments <- total_fragments[colnames(C3N_02181), "frequency_count"] 55 | 56 | C3N_02181 <- FRiP( 57 | object = C3N_02181, 58 | assay = 'peaks', 59 | total.fragments = 'fragments' 60 | ) 61 | 62 | 63 | pdf("C3N_02181_WT_ATAC_DensityScatter.pdf", height=5, width=9) 64 | DensityScatter(C3N_02181, x = 'nCount_peaks', y = 'TSS.enrichment', log_x = TRUE, quantiles = TRUE) 65 | dev.off() 66 | 67 | 68 | pdf("C3N_02181_WT_ATAC_QC_BF.pdf", height=5, width=12) 69 | VlnPlot( 70 | object = C3N_02181, 71 | features = c('nCount_peaks', 'TSS.enrichment', 'blacklist_fraction', 'nucleosome_signal', 'FRiP'), 72 | pt.size = 0.1, 73 | ncol = 5 74 | ) 75 | dev.off() 76 | 77 | 78 | pdf("C3N_02181_WT_ATAC_TSS_Enrichment.pdf", height=5, width=6) 79 | C3N_02181$high.tss <- ifelse(C3N_02181$TSS.enrichment > 1.5, 'High', 'Low') 80 | TSSPlot(C3N_02181, group.by = 'high.tss') + NoLegend() 81 | dev.off() 82 | 83 | pdf("C3N_02181_WT_ATAC_Nucleosome_signal.pdf", height=5, width=6) 84 | C3N_02181$nucleosome_group <- ifelse(C3N_02181$nucleosome_signal > 2, 'NS > 2', 'NS < 2') 85 | FragmentHistogram(object = C3N_02181, group.by = 'nucleosome_group') 86 | dev.off() 87 | 88 | 89 | C3N_02181 <- subset( 90 | x = C3N_02181, 91 | subset = nCount_peaks > 350 & 92 | nCount_peaks < 4000 & 93 | FRiP > 0.15 & 94 | blacklist_fraction < 0.05 & 95 | nucleosome_signal < 2 & 96 | TSS.enrichment > 1.5 97 | ) 98 | 99 | 100 | pdf("C3N_02181_WT_ATAC_QC_AF.pdf", height=5, width=12) 101 | VlnPlot( 102 | object = C3N_02181, 103 | features = c('nCount_peaks', 'TSS.enrichment', 'blacklist_fraction', 'nucleosome_signal', 'FRiP'), 104 | pt.size = 0.1, 105 | ncol = 5 106 | ) 107 | dev.off() 108 | 109 | 110 | DefaultAssay(C3N_02181) <- "peaks" 111 | 112 | 113 | C3N_02181 <- RunTFIDF(C3N_02181) 114 | C3N_02181 <- FindTopFeatures(C3N_02181, min.cutoff = 'q0') 115 | C3N_02181 <- RunSVD(C3N_02181) 116 | 117 | pdf("C3N_02181_WT_ATAC_DepthCor.pdf", height=5, width=9) 118 | DepthCor(C3N_02181) 119 | dev.off() 120 | 121 | 122 | C3N_02181 <- RunUMAP(object = C3N_02181, reduction = 'lsi', dims = 2:30) 123 | C3N_02181 <- FindNeighbors(object = C3N_02181, reduction = 'lsi', dims = 2:30) 124 | C3N_02181 <- FindClusters(object = C3N_02181, verbose = FALSE, algorithm = 3) 125 | 126 | 127 | pdf("C3N_02181_WT_ATAC_UMAP.pdf", height=5, width=7) 128 | DimPlot(object = C3N_02181, label = TRUE) + NoLegend() 129 | dev.off() 130 | 131 | 132 | gene.activities <- GeneActivity(C3N_02181) 133 | 134 | 135 | C3N_02181[['RNA']] <- CreateAssayObject(counts = gene.activities) 136 | C3N_02181 <- NormalizeData( 137 | object = C3N_02181, 138 | assay = 'RNA', 139 | normalization.method = 'RC', 140 | scale.factor = median(C3N_02181$nCount_RNA) 141 | ) 142 | 143 | write.table(gene.activities, file="C3N_02181_WT_ATAC_Gene_Activities.txt", sep="\t", col.names=NA, quote=FALSE) 144 | 145 | saveRDS(C3N_02181, file="C3N_02181_WT_ATAC.rds") 146 | -------------------------------------------------------------------------------- /scATAC-Seq Analyses/13- Processing GBM C3N_02186 snATAC-Seq library: -------------------------------------------------------------------------------- 1 | library(Signac) 2 | library(Seurat) 3 | library(EnsDb.Hsapiens.v86) 4 | library(BSgenome.Hsapiens.UCSC.hg38) 5 | 6 | options(future.globals.maxSize = 8000 * 1024^2) 7 | 8 | C3N_02186.data=CreateFragmentObject("./C3N-02186_CPT0168720014_snATAC_GBM/outs/fragments.tsv.gz") 9 | 10 | 11 | features <- CallPeaks(C3N_02186.data, format='BED', outdir ='./C3N-02186_CPT0168720014_snATAC_GBM/outs/',name='C3N_02186', cleanup=FALSE) 12 | 13 | 14 | C3N_02186_Peaks <- keepStandardChromosomes(features, pruning.mode = "coarse") 15 | C3N_02186_Peaks2 <- subsetByOverlaps(x = C3N_02186_Peaks, ranges = blacklist_hg38, invert = TRUE) 16 | 17 | saveRDS(C3N_02186_Peaks2, file="C3N_02186_Peaks.rds") 18 | 19 | C3N_02186.counts <- FeatureMatrix( 20 | fragments = C3N_02186.data, 21 | features = C3N_02186_Peaks 22 | ) 23 | 24 | 25 | C3N_02186.chrom_assay <- CreateChromatinAssay( 26 | counts = C3N_02186.counts, 27 | sep = c("-", "-"), 28 | fragments = './C3N-02186_CPT0168720014_snATAC_GBM/outs/fragments.tsv.gz', 29 | min.cells = 10, 30 | min.features = 200 31 | ) 32 | 33 | C3N_02186 <- CreateSeuratObject(counts = C3N_02186.chrom_assay, assay = "peaks", project = "C3N_02186") 34 | 35 | annotations.C3N_02186 <- GetGRangesFromEnsDb(ensdb = EnsDb.Hsapiens.v86) 36 | seqlevels(annotations.C3N_02186) <- paste0('chr', seqlevels(annotations.C3N_02186)) 37 | genome(annotations.C3N_02186) <- "hg38" 38 | 39 | 40 | Annotation(C3N_02186) <- annotations.C3N_02186 41 | 42 | C3N_02186 <- NucleosomeSignal(object = C3N_02186) 43 | C3N_02186 <- TSSEnrichment(object = C3N_02186, fast = FALSE, assay='peaks') 44 | 45 | C3N_02186$blacklist_fraction <- FractionCountsInRegion( 46 | object = C3N_02186, 47 | assay = 'peaks', 48 | regions = blacklist_hg38 49 | ) 50 | 51 | 52 | total_fragments <- CountFragments("./C3N-02186_CPT0168720014_snATAC_GBM/outs/fragments.tsv.gz") 53 | rownames(total_fragments) <- total_fragments$CB 54 | C3N_02186$fragments <- total_fragments[colnames(C3N_02186), "frequency_count"] 55 | 56 | C3N_02186 <- FRiP( 57 | object = C3N_02186, 58 | assay = 'peaks', 59 | total.fragments = 'fragments' 60 | ) 61 | 62 | 63 | pdf("C3N_02186_WT_ATAC_DensityScatter.pdf", height=5, width=9) 64 | DensityScatter(C3N_02186, x = 'nCount_peaks', y = 'TSS.enrichment', log_x = TRUE, quantiles = TRUE) 65 | dev.off() 66 | 67 | 68 | pdf("C3N_02186_WT_ATAC_QC_BF.pdf", height=5, width=12) 69 | VlnPlot( 70 | object = C3N_02186, 71 | features = c('nCount_peaks', 'TSS.enrichment', 'blacklist_fraction', 'nucleosome_signal', 'FRiP'), 72 | pt.size = 0.1, 73 | ncol = 5 74 | ) 75 | dev.off() 76 | 77 | 78 | pdf("C3N_02186_WT_ATAC_TSS_Enrichment.pdf", height=5, width=6) 79 | C3N_02186$high.tss <- ifelse(C3N_02186$TSS.enrichment > 1.5, 'High', 'Low') 80 | TSSPlot(C3N_02186, group.by = 'high.tss') + NoLegend() 81 | dev.off() 82 | 83 | pdf("C3N_02186_WT_ATAC_Nucleosome_signal.pdf", height=5, width=6) 84 | C3N_02186$nucleosome_group <- ifelse(C3N_02186$nucleosome_signal > 2, 'NS > 2', 'NS < 2') 85 | FragmentHistogram(object = C3N_02186, group.by = 'nucleosome_group') 86 | dev.off() 87 | 88 | 89 | C3N_02186 <- subset( 90 | x = C3N_02186, 91 | subset = nCount_peaks > 350 & 92 | nCount_peaks < 10000 & 93 | FRiP > 0.15 & 94 | blacklist_fraction < 0.05 & 95 | nucleosome_signal < 2.5 & 96 | TSS.enrichment > 1.5 97 | ) 98 | 99 | 100 | pdf("C3N_02186_WT_ATAC_QC_AF.pdf", height=5, width=12) 101 | VlnPlot( 102 | object = C3N_02186, 103 | features = c('nCount_peaks', 'TSS.enrichment', 'blacklist_fraction', 'nucleosome_signal', 'FRiP'), 104 | pt.size = 0.1, 105 | ncol = 5 106 | ) 107 | dev.off() 108 | 109 | 110 | DefaultAssay(C3N_02186) <- "peaks" 111 | 112 | 113 | C3N_02186 <- RunTFIDF(C3N_02186) 114 | C3N_02186 <- FindTopFeatures(C3N_02186, min.cutoff = 'q0') 115 | C3N_02186 <- RunSVD(C3N_02186) 116 | 117 | pdf("C3N_02186_WT_ATAC_DepthCor.pdf", height=5, width=9) 118 | DepthCor(C3N_02186) 119 | dev.off() 120 | 121 | 122 | C3N_02186 <- RunUMAP(object = C3N_02186, reduction = 'lsi', dims = 2:30) 123 | C3N_02186 <- FindNeighbors(object = C3N_02186, reduction = 'lsi', dims = 2:30) 124 | C3N_02186 <- FindClusters(object = C3N_02186, verbose = FALSE, algorithm = 3) 125 | 126 | 127 | pdf("C3N_02186_WT_ATAC_UMAP.pdf", height=5, width=7) 128 | DimPlot(object = C3N_02186, label = TRUE) + NoLegend() 129 | dev.off() 130 | 131 | 132 | gene.activities <- GeneActivity(C3N_02186) 133 | 134 | 135 | C3N_02186[['RNA']] <- CreateAssayObject(counts = gene.activities) 136 | C3N_02186 <- NormalizeData( 137 | object = C3N_02186, 138 | assay = 'RNA', 139 | normalization.method = 'RC', 140 | scale.factor = median(C3N_02186$nCount_RNA) 141 | ) 142 | 143 | write.table(gene.activities, file="C3N_02186_WT_ATAC_Gene_Activities.txt", sep="\t", col.names=NA, quote=FALSE) 144 | 145 | saveRDS(C3N_02186, file="C3N_02186_WT_ATAC.rds") 146 | -------------------------------------------------------------------------------- /scATAC-Seq Analyses/14- Processing GBM C3N_02188 snATAC-Seq library: -------------------------------------------------------------------------------- 1 | library(Signac) 2 | library(Seurat) 3 | library(EnsDb.Hsapiens.v86) 4 | library(BSgenome.Hsapiens.UCSC.hg38) 5 | 6 | options(future.globals.maxSize = 8000 * 1024^2) 7 | 8 | C3N_02188.data=CreateFragmentObject("./C3N-02188_CPT0168830014_snATAC_GBM/outs/fragments.tsv.gz") 9 | 10 | 11 | features <- CallPeaks(C3N_02188.data, format='BED', outdir ='./C3N-02188_CPT0168830014_snATAC_GBM/outs/',name='C3N_02188', cleanup=FALSE) 12 | 13 | 14 | C3N_02188_Peaks <- keepStandardChromosomes(features, pruning.mode = "coarse") 15 | C3N_02188_Peaks2 <- subsetByOverlaps(x = C3N_02188_Peaks, ranges = blacklist_hg38, invert = TRUE) 16 | 17 | saveRDS(C3N_02188_Peaks2, file="C3N_02188_Peaks.rds") 18 | 19 | C3N_02188.counts <- FeatureMatrix( 20 | fragments = C3N_02188.data, 21 | features = C3N_02188_Peaks 22 | ) 23 | 24 | 25 | C3N_02188.chrom_assay <- CreateChromatinAssay( 26 | counts = C3N_02188.counts, 27 | sep = c("-", "-"), 28 | fragments = './C3N-02188_CPT0168830014_snATAC_GBM/outs/fragments.tsv.gz', 29 | min.cells = 10, 30 | min.features = 200 31 | ) 32 | 33 | C3N_02188 <- CreateSeuratObject(counts = C3N_02188.chrom_assay, assay = "peaks", project = "C3N_02188") 34 | 35 | annotations.C3N_02188 <- GetGRangesFromEnsDb(ensdb = EnsDb.Hsapiens.v86) 36 | seqlevels(annotations.C3N_02188) <- paste0('chr', seqlevels(annotations.C3N_02188)) 37 | genome(annotations.C3N_02188) <- "hg38" 38 | 39 | 40 | Annotation(C3N_02188) <- annotations.C3N_02188 41 | 42 | C3N_02188 <- NucleosomeSignal(object = C3N_02188) 43 | C3N_02188 <- TSSEnrichment(object = C3N_02188, fast = FALSE, assay='peaks') 44 | 45 | C3N_02188$blacklist_fraction <- FractionCountsInRegion( 46 | object = C3N_02188, 47 | assay = 'peaks', 48 | regions = blacklist_hg38 49 | ) 50 | 51 | 52 | total_fragments <- CountFragments("./C3N-02188_CPT0168830014_snATAC_GBM/outs/fragments.tsv.gz") 53 | rownames(total_fragments) <- total_fragments$CB 54 | C3N_02188 $fragments <- total_fragments[colnames(C3N_02188), "frequency_count"] 55 | 56 | C3N_02188 <- FRiP( 57 | object = C3N_02188, 58 | assay = 'peaks', 59 | total.fragments = 'fragments' 60 | ) 61 | 62 | 63 | pdf("C3N_02188_WT_ATAC_DensityScatter.pdf", height=5, width=9) 64 | DensityScatter(C3N_02188, x = 'nCount_peaks', y = 'TSS.enrichment', log_x = TRUE, quantiles = TRUE) 65 | dev.off() 66 | 67 | 68 | pdf("C3N_02188_WT_ATAC_QC_BF.pdf", height=5, width=12) 69 | VlnPlot( 70 | object = C3N_02188, 71 | features = c('nCount_peaks', 'TSS.enrichment', 'blacklist_fraction', 'nucleosome_signal', 'FRiP'), 72 | pt.size = 0.1, 73 | ncol = 5 74 | ) 75 | dev.off() 76 | 77 | 78 | pdf("C3N_02188_WT_ATAC_TSS_Enrichment.pdf", height=5, width=6) 79 | C3N_02188$high.tss <- ifelse(C3N_02188$TSS.enrichment > 1.5, 'High', 'Low') 80 | TSSPlot(C3N_02188, group.by = 'high.tss') + NoLegend() 81 | dev.off() 82 | 83 | pdf("C3N_02188_WT_ATAC_Nucleosome_signal.pdf", height=5, width=6) 84 | C3N_02188$nucleosome_group <- ifelse(C3N_02188$nucleosome_signal > 5, 'NS > 5', 'NS < 5') 85 | FragmentHistogram(object = C3N_02188, group.by = 'nucleosome_group') 86 | dev.off() 87 | 88 | 89 | C3N_02188 <- subset( 90 | x = C3N_02188, 91 | subset = nCount_peaks > 350 & 92 | nCount_peaks < 35000 & 93 | FRiP > 0.15 & 94 | blacklist_fraction < 0.05 & 95 | nucleosome_signal < 5 & 96 | TSS.enrichment > 1.5 97 | ) 98 | 99 | 100 | pdf("C3N_02188_WT_ATAC_QC_AF.pdf", height=5, width=12) 101 | VlnPlot( 102 | object = C3N_02188, 103 | features = c('nCount_peaks', 'TSS.enrichment', 'blacklist_fraction', 'nucleosome_signal', 'FRiP'), 104 | pt.size = 0.1, 105 | ncol = 5 106 | ) 107 | dev.off() 108 | 109 | 110 | DefaultAssay(C3N_02188) <- "peaks" 111 | 112 | 113 | C3N_02188 <- RunTFIDF(C3N_02188) 114 | C3N_02188 <- FindTopFeatures(C3N_02188, min.cutoff = 'q0') 115 | C3N_02188 <- RunSVD(C3N_02188) 116 | 117 | pdf("C3N_02188_WT_ATAC_DepthCor.pdf", height=5, width=9) 118 | DepthCor(C3N_02188) 119 | dev.off() 120 | 121 | 122 | C3N_02188 <- RunUMAP(object = C3N_02188, reduction = 'lsi', dims = 2:30) 123 | C3N_02188 <- FindNeighbors(object = C3N_02188, reduction = 'lsi', dims = 2:30) 124 | C3N_02188 <- FindClusters(object = C3N_02188, verbose = FALSE, algorithm = 3) 125 | 126 | 127 | pdf("C3N_02188_WT_ATAC_UMAP.pdf", height=5, width=7) 128 | DimPlot(object = C3N_02188, label = TRUE) + NoLegend() 129 | dev.off() 130 | 131 | 132 | gene.activities <- GeneActivity(C3N_02188) 133 | 134 | 135 | C3N_02188[['RNA']] <- CreateAssayObject(counts = gene.activities) 136 | C3N_02188 <- NormalizeData( 137 | object = C3N_02188, 138 | assay = 'RNA', 139 | normalization.method = 'RC', 140 | scale.factor = median(C3N_02188$nCount_RNA) 141 | ) 142 | 143 | write.table(gene.activities, file="C3N_02188_WT_ATAC_Gene_Activities.txt", sep="\t", col.names=NA, quote=FALSE) 144 | 145 | saveRDS(C3N_02188, file="C3N_02188_WT_ATAC.rds") 146 | -------------------------------------------------------------------------------- /scATAC-Seq Analyses/16- Processing GBM C3N_02783 snATAC-Seq library: -------------------------------------------------------------------------------- 1 | library(Signac) 2 | library(Seurat) 3 | library(EnsDb.Hsapiens.v86) 4 | library(BSgenome.Hsapiens.UCSC.hg38) 5 | 6 | options(future.globals.maxSize = 8000 * 1024^2) 7 | 8 | C3N_02783.data=CreateFragmentObject("./C3N-02783_CPT0205890014_snATAC_GBM/outs/fragments.tsv.gz") 9 | 10 | 11 | features <- CallPeaks(C3N_02783.data, format='BED', outdir ='./C3N-02783_CPT0205890014_snATAC_GBM/outs/',name='C3N_02783', cleanup=FALSE) 12 | 13 | 14 | C3N_02783_Peaks <- keepStandardChromosomes(features, pruning.mode = "coarse") 15 | C3N_02783_Peaks2 <- subsetByOverlaps(x = C3N_02783_Peaks, ranges = blacklist_hg38, invert = TRUE) 16 | 17 | saveRDS(C3N_02783_Peaks2, file="C3N_02783_Peaks.rds") 18 | 19 | C3N_02783.counts <- FeatureMatrix( 20 | fragments = C3N_02783.data, 21 | features = C3N_02783_Peaks 22 | ) 23 | 24 | 25 | C3N_02783.chrom_assay <- CreateChromatinAssay( 26 | counts = C3N_02783.counts, 27 | sep = c("-", "-"), 28 | fragments = './C3N-02783_CPT0205890014_snATAC_GBM/outs/fragments.tsv.gz', 29 | min.cells = 10, 30 | min.features = 200 31 | ) 32 | 33 | C3N_02783 <- CreateSeuratObject(counts = C3N_02783.chrom_assay, assay = "peaks", project = "C3N_02783") 34 | 35 | annotations.C3N_02783 <- GetGRangesFromEnsDb(ensdb = EnsDb.Hsapiens.v86) 36 | seqlevels(annotations.C3N_02783) <- paste0('chr', seqlevels(annotations.C3N_02783)) 37 | genome(annotations.C3N_02783) <- "hg38" 38 | 39 | 40 | Annotation(C3N_02783) <- annotations.C3N_02783 41 | 42 | C3N_02783 <- NucleosomeSignal(object = C3N_02783) 43 | C3N_02783 <- TSSEnrichment(object = C3N_02783, fast = FALSE, assay='peaks') 44 | 45 | C3N_02783$blacklist_fraction <- FractionCountsInRegion( 46 | object = C3N_02783, 47 | assay = 'peaks', 48 | regions = blacklist_hg38 49 | ) 50 | 51 | 52 | total_fragments <- CountFragments("./C3N-02783_CPT0205890014_snATAC_GBM/outs/fragments.tsv.gz") 53 | rownames(total_fragments) <- total_fragments$CB 54 | C3N_02783 $fragments <- total_fragments[colnames(C3N_02783), "frequency_count"] 55 | 56 | C3N_02783 <- FRiP( 57 | object = C3N_02783, 58 | assay = 'peaks', 59 | total.fragments = 'fragments' 60 | ) 61 | 62 | 63 | pdf("C3N_02783_WT_ATAC_DensityScatter.pdf", height=5, width=9) 64 | DensityScatter(C3N_02783, x = 'nCount_peaks', y = 'TSS.enrichment', log_x = TRUE, quantiles = TRUE) 65 | dev.off() 66 | 67 | 68 | pdf("C3N_02783_WT_ATAC_QC_BF.pdf", height=5, width=12) 69 | VlnPlot( 70 | object = C3N_02783, 71 | features = c('nCount_peaks', 'TSS.enrichment', 'blacklist_fraction', 'nucleosome_signal', 'FRiP'), 72 | pt.size = 0.1, 73 | ncol = 5 74 | ) 75 | dev.off() 76 | 77 | 78 | pdf("C3N_02783_WT_ATAC_TSS_Enrichment.pdf", height=5, width=6) 79 | C3N_02783$high.tss <- ifelse(C3N_02783$TSS.enrichment > 1.5, 'High', 'Low') 80 | TSSPlot(C3N_02783, group.by = 'high.tss') + NoLegend() 81 | dev.off() 82 | 83 | pdf("C3N_02783_WT_ATAC_Nucleosome_signal.pdf", height=5, width=6) 84 | C3N_02783$nucleosome_group <- ifelse(C3N_02783$nucleosome_signal > 2, 'NS > 2', 'NS < 2') 85 | FragmentHistogram(object = C3N_02783, group.by = 'nucleosome_group') 86 | dev.off() 87 | 88 | 89 | C3N_02783 <- subset( 90 | x = C3N_02783, 91 | subset = nCount_peaks > 350 & 92 | nCount_peaks < 40000 & 93 | FRiP > 0.15 & 94 | blacklist_fraction < 0.05 & 95 | nucleosome_signal < 2 & 96 | TSS.enrichment > 1.5 97 | ) 98 | 99 | 100 | pdf("C3N_02783_WT_ATAC_QC_AF.pdf", height=5, width=12) 101 | VlnPlot( 102 | object = C3N_02783, 103 | features = c('nCount_peaks', 'TSS.enrichment', 'blacklist_fraction', 'nucleosome_signal', 'FRiP'), 104 | pt.size = 0.1, 105 | ncol = 5 106 | ) 107 | dev.off() 108 | 109 | 110 | DefaultAssay(C3N_02783) <- "peaks" 111 | 112 | 113 | C3N_02783 <- RunTFIDF(C3N_02783) 114 | C3N_02783 <- FindTopFeatures(C3N_02783, min.cutoff = 'q0') 115 | C3N_02783 <- RunSVD(C3N_02783) 116 | 117 | pdf("C3N_02783_WT_ATAC_DepthCor.pdf", height=5, width=9) 118 | DepthCor(C3N_02783) 119 | dev.off() 120 | 121 | 122 | C3N_02783 <- RunUMAP(object = C3N_02783, reduction = 'lsi', dims = 2:30) 123 | C3N_02783 <- FindNeighbors(object = C3N_02783, reduction = 'lsi', dims = 2:30) 124 | C3N_02783 <- FindClusters(object = C3N_02783, verbose = FALSE, algorithm = 3) 125 | 126 | 127 | pdf("C3N_02783_WT_ATAC_UMAP.pdf", height=5, width=7) 128 | DimPlot(object = C3N_02783, label = TRUE) + NoLegend() 129 | dev.off() 130 | 131 | 132 | gene.activities <- GeneActivity(C3N_02783) 133 | 134 | 135 | C3N_02783[['RNA']] <- CreateAssayObject(counts = gene.activities) 136 | C3N_02783 <- NormalizeData( 137 | object = C3N_02783, 138 | assay = 'RNA', 139 | normalization.method = 'RC', 140 | scale.factor = median(C3N_02783$nCount_RNA) 141 | ) 142 | 143 | write.table(gene.activities, file="C3N_02783_WT_ATAC_Gene_Activities.txt", sep="\t", col.names=NA, quote=FALSE) 144 | 145 | saveRDS(C3N_02783, file="C3N_02783_WT_ATAC.rds") 146 | -------------------------------------------------------------------------------- /scATAC-Seq Analyses/22- Calculating all cell types programs usages in the combined gene activities matrix: -------------------------------------------------------------------------------- 1 | ############### Python Scripts ##################### 2 | 3 | X = pd.read_table("Combined_snATAC_Glioma_Ding_Dataset_Gene_Activities.txt", index_col=0, sep='\t') 4 | 5 | X2 = X.T 6 | 7 | H = np.load("cnmf_run.spectra.k_18.dt_0_015.consensus.df.npz", allow_pickle=True) 8 | 9 | H2 = pd.DataFrame(H['data'], columns = H['columns'], index = H['index']) 10 | 11 | H3 = H2.filter(items = X2.columns) 12 | 13 | X4 = X2.filter(items = H3.columns) 14 | 15 | H4 = H3.to_numpy() 16 | 17 | X5 = X4.values 18 | 19 | X6 = X5.astype(np.float64) 20 | 21 | test = sklearn.decomposition.non_negative_factorization(X6, W=None, H=H4, n_components= 18, init='random', update_H=False, solver='cd', beta_loss='frobenius', tol=0.0001, max_iter=1000, alpha=0.0, alpha_W=0.0, alpha_H='same', l1_ratio=0.0, regularization=None, random_state=None, verbose=0, shuffle=False) 22 | 23 | test2 = list(test) 24 | 25 | processed = pd.DataFrame(test2[0], columns= H['index'], index=X4.index) 26 | 27 | row_sums = processed.sum(axis=1) 28 | 29 | processed_data = (processed.div(row_sums, axis=0) * 100) 30 | 31 | new_column_names = ['Tcells', 'AC', 'NPC1_OPC', 'Microglia', 'MES2', 'Vascular_MES1', 'Oligodendrocytes', 'MES1', 'CD14_Mono', 'cDC', 'Neutrophils', 'NPC2', 'Giant_Cell_GBM', 'Cycling', 'Pericytes', 'Plasma', 'Endothelial', 'Mast'] 32 | 33 | processed_data.columns = new_column_names 34 | 35 | processed_data.to_csv(path_or_buf="./Combined_snATAC_Glioma_Ding_Dataset_All_CellType_Usages.txt", sep="\t", quoting=csv.QUOTE_NONE) 36 | -------------------------------------------------------------------------------- /scATAC-Seq Analyses/23- Extracting Myeloid Cells from the Combined snATAC Object and Calculating Gene Activities: -------------------------------------------------------------------------------- 1 | library(Signac) 2 | library(Seurat) 3 | library(EnsDb.Hsapiens.v86) 4 | library(BSgenome.Hsapiens.UCSC.hg38) 5 | library(JASPAR2020) 6 | library(TFBSTools) 7 | 8 | options(future.globals.maxSize = 8000 * 1024^2) 9 | 10 | annotation <- GetGRangesFromEnsDb(ensdb = EnsDb.Hsapiens.v86) 11 | seqlevels(annotation) <- paste0('chr', seqlevels(annotation)) 12 | genome(annotation) <- "hg38" 13 | 14 | combined <- readRDS("Combined_snATAC_Glioma_ATAC.rds") 15 | 16 | 17 | Myeloid <- subset(combined, subset = Annotation == "Myeloid") 18 | 19 | 20 | Myeloid$allcells <- "all" 21 | 22 | Discrete_Peaks <- CallPeaks(Myeloid, group.by = "allcells", format='BED', outdir ='.', fragment.tempdir=".", name='Myeloid_Ding', cleanup=FALSE) 23 | 24 | Discrete_Peaks2 <- keepStandardChromosomes(Discrete_Peaks, pruning.mode = "coarse") 25 | Discrete_Peaks2 <- subsetByOverlaps(x = Discrete_Peaks2, ranges = blacklist_hg38, invert = TRUE) 26 | 27 | 28 | macs2_counts <- FeatureMatrix( 29 | fragments = Fragments(Myeloid), 30 | features = Discrete_Peaks2, 31 | cells = colnames(Myeloid) 32 | ) 33 | 34 | Myeloid[["peaks_macs2"]] <- CreateChromatinAssay( 35 | counts = macs2_counts, 36 | fragments = Fragments(Myeloid), 37 | annotation = annotation 38 | ) 39 | 40 | DefaultAssay(Myeloid) <- "peaks_macs2" 41 | 42 | Myeloid <- RunTFIDF(Myeloid) 43 | Myeloid <- FindTopFeatures(Myeloid, min.cutoff = 'q0') 44 | Myeloid <- RunSVD(Myeloid) 45 | 46 | pdf("Myeloid_Myeloid_Ding_ATAC_DepthCor.pdf", height=5, width=9) 47 | DepthCor(Myeloid) 48 | dev.off() 49 | 50 | 51 | Myeloid <- RunUMAP(object = Myeloid, reduction = 'lsi', dims = 2:30) 52 | Myeloid <- FindNeighbors(object = Myeloid, reduction = 'lsi', dims = 2:30) 53 | Myeloid <- FindClusters(object = Myeloid, verbose = FALSE, algorithm = 3) 54 | 55 | 56 | pdf("Myeloid_Myeloid_Ding_ATAC_UMAP_Clusters.pdf", height=5, width=7) 57 | DimPlot(object = Myeloid, label = TRUE) 58 | dev.off() 59 | 60 | gene.activities <- GeneActivity(Myeloid) 61 | 62 | 63 | Myeloid[['RNA']] <- CreateAssayObject(counts = gene.activities) 64 | Myeloid <- NormalizeData( 65 | object = combined, 66 | assay = 'RNA', 67 | normalization.method = 'RC', 68 | scale.factor = median(Myeloid$nCount_RNA) 69 | ) 70 | 71 | write.table(gene.activities, file="Combined_Myeloid_snATAC_Glioma_Gene_Activities.txt", sep="\t", col.names=NA, quote=FALSE) 72 | 73 | saveRDS(Myeloid, file="Ding_Dataset_Myeloid.rds") 74 | 75 | -------------------------------------------------------------------------------- /scATAC-Seq Analyses/24- Calculating myeloid program usages in myeloid cells of the combined snATAC object: -------------------------------------------------------------------------------- 1 | X = pd.read_table("Combined_Myeloid_snATAC_Glioma_Gene_Activities.txt", index_col=0, sep='\t') 2 | 3 | X2 = X.T 4 | 5 | H = pd.read_table("Myeloid_NMF_Average_Gene_Spectra.txt", sep="\t", index_col=0) 6 | 7 | H2 = H.T 8 | 9 | X3 = X2.filter(items = H2.columns) 10 | 11 | H3 = H2.filter(items = X3.columns) 12 | 13 | H4 = H3.to_numpy() 14 | 15 | X4 = X3.values 16 | 17 | X5 = X4.astype(np.float64) 18 | 19 | test = sklearn.decomposition.non_negative_factorization(X5, W=None, H=H4, n_components= 14, init='random', update_H=False, solver='cd', beta_loss='frobenius', tol=0.0001, max_iter=1000, alpha=0.0, alpha_W=0.0, alpha_H='same', l1_ratio=0.0, regularization=None, random_state=None, verbose=0, shuffle=False) 20 | 21 | test2 = list(test) 22 | 23 | processed = pd.DataFrame(test2[0], columns= H.columns, index=X3.index) 24 | 25 | row_sums = processed.sum(axis=1) 26 | 27 | processed_data = (processed.div(row_sums, axis=0) * 100) 28 | 29 | processed_data.to_csv(path_or_buf="./Combined_Myeloid_snATAC_Glioma_Ding_Dataset_Myeloid_Usages.txt", sep="\t", quoting=csv.QUOTE_NONE) 30 | -------------------------------------------------------------------------------- /scATAC-Seq Analyses/25- Extracting Discrete Myeloid Cells from the combined GBM snATAC object: -------------------------------------------------------------------------------- 1 | library(Signac) 2 | library(Seurat) 3 | library(EnsDb.Hsapiens.v86) 4 | library(BSgenome.Hsapiens.UCSC.hg38) 5 | library(JASPAR2020) 6 | library(TFBSTools) 7 | 8 | options(future.globals.maxSize = 8000 * 1024^2) 9 | 10 | annotation <- GetGRangesFromEnsDb(ensdb = EnsDb.Hsapiens.v86) 11 | seqlevels(annotation) <- paste0('chr', seqlevels(annotation)) 12 | genome(annotation) <- "hg38" 13 | 14 | nnotation <- GetGRangesFromEnsDb(ensdb = EnsDb.Hsapiens.v86) 15 | seqlevels(annotation) <- paste0('chr', seqlevels(annotation)) 16 | genome(annotation) <- "hg38" 17 | 18 | combined <- readRDS("Combined_snATAC_Glioma_ATAC.rds") 19 | 20 | ##### File containing the IDs of the discrete myeloid cells ######### 21 | Myeloid_Discrete_Status4 <- scan("Myeloid_Discrete_Status_top400_WithIdentity.txt", what="") 22 | 23 | combined@meta.data$Myeloid_Discrete_Status4 <- Myeloid_Discrete_Status4 24 | 25 | 26 | Discrete4 <- subset(combined, subset = Myeloid_Discrete_Status4 == "Systemic" | Myeloid_Discrete_Status4 == "Tissue_resident" | Myeloid_Discrete_Status4 == "Complement" | Myeloid_Discrete_Status4 == "Scavenger" | Myeloid_Discrete_Status4 == "Monocyte" | Myeloid_Discrete_Status4 == "Microglia") 27 | 28 | Discrete4$allcells <- "all" 29 | 30 | Discrete_Peaks <- CallPeaks(Discrete4, group.by = "Myeloid_Discrete_Status4", format='BED', outdir ='.', fragment.tempdir=".", name='Myeloid_Discrete_Status4', cleanup=FALSE) 31 | 32 | 33 | Discrete_Peaks2 <- keepStandardChromosomes(Discrete_Peaks, pruning.mode = "coarse") 34 | Discrete_Peaks2 <- subsetByOverlaps(x = Discrete_Peaks2, ranges = blacklist_hg38, invert = TRUE) 35 | 36 | macs2_counts <- FeatureMatrix( 37 | fragments = Fragments(Discrete4), 38 | features = Discrete_Peaks2, 39 | cells = colnames(Discrete4) 40 | ) 41 | 42 | Discrete4[["peaks_macs2"]] <- CreateChromatinAssay( 43 | counts = macs2_counts, 44 | fragments = Fragments(Discrete4), 45 | annotation = annotation 46 | ) 47 | 48 | DefaultAssay(Discrete4) <- "peaks_macs2" 49 | 50 | Discrete4 <- RunTFIDF(Discrete4) 51 | Discrete4 <- FindTopFeatures(Discrete4, min.cutoff = 'q0') 52 | Discrete4 <- RunSVD(Discrete4) 53 | 54 | pdf("Discrete4_Myeloid_Ding_ATAC_DepthCor.pdf", height=5, width=9) 55 | DepthCor(Discrete4) 56 | dev.off() 57 | 58 | 59 | Discrete4 <- RunUMAP(object = Discrete4, reduction = 'lsi', dims = 2:30) 60 | Discrete4 <- FindNeighbors(object = Discrete4, reduction = 'lsi', dims = 2:30) 61 | Discrete4 <- FindClusters(object = Discrete4, verbose = FALSE, algorithm = 3) 62 | 63 | 64 | pdf("Discrete4_Myeloid_Ding_ATAC_UMAP_Annotation.pdf", height=5, width=7) 65 | DimPlot(object = Discrete4, label = TRUE, group.by="Myeloid_Discrete_Status4") 66 | dev.off() 67 | 68 | 69 | Pseudobulk <- AggregateExpression(Discrete4, assays = "peaks_macs2", return.seurat = TRUE, group.by = "Myeloid_Discrete_Status4", normalization.method = "LogNormalize", scale.factor = 10000, margin = 1) 70 | 71 | 72 | Matrix2 <- LayerData(Pseudobulk, assay = "peaks_macs2", layer = "data") 73 | write.table(as.matrix(Matrix2), file="Discrete4_Pseudobulked_Normalized_Counts.txt", sep="\t", col.names=NA, quote=FALSE) 74 | 75 | saveRDS(Discrete4, file="Discrete_Myeloid_GBM_snATAC.rds") 76 | -------------------------------------------------------------------------------- /scATAC-Seq Analyses/27- Generating Deeptools heatmaps for specific peaks for the immunomodulatory discrete myeloid cells: -------------------------------------------------------------------------------- 1 | ######### Specific peaks that are used as input for the -R option in computeMatrix are identified in the normalized pseudo-bulked peaks file. We converted the log1p values to exponential values and considered a peak to be specific to a particular discrete annotation if it had a count at least 2.5 times higher than the average counts of the other annotations. 2 | 3 | computeMatrix reference-point -S ../sorted_Discrete_Fragments/Monocyte-TileSize-50-normMethod-nCount_peaks_macs2.bw ../sorted_Discrete_Fragments/Systemic-TileSize-50-normMethod-nCount_peaks_macs2.bw ../sorted_Discrete_Fragments/Scavenger-TileSize-50-normMethod-nCount_peaks_macs2.bw ../sorted_Discrete_Fragments/Complement-TileSize-50-normMethod-nCount_peaks_macs2.bw ../sorted_Discrete_Fragments/Tissue_resident-TileSize-50-normMethod-nCount_peaks_macs2.bw ../sorted_Discrete_Fragments/Microglia-TileSize-50-normMethod-nCount_peaks_macs2.bw -R Monocytes_Specific.bed Systemic_Specific.bed Scavenger_Specific.bed Complement_Specific.bed Tissue_Resident_Specific.bed Microglia_Specific.bed -b 1000 -a 1000 --skipZeros --smartLabels -o Ding_Discrete_Supervised_Specific_Peaks_Centered_Signac_Normalized_Version4_ArchR.gz --referencePoint center; 4 | 5 | plotHeatmap -m Ding_Discrete_Supervised_Specific_Peaks_Centered_Signac_Normalized_Version4_ArchR.gz -o Ding_Discrete_Supervised_Specific_Peaks_Centered_Signac_Normalized_Version4_ArchR.pdf --startLabel 5 --endLabel 3 --whatToShow "heatmap and colorbar" --colorMap Blues --outFileSortedRegions Ding_Discrete_Supervised_Specific_Peaks_Centered_Signac_Normalized_Version4_ArchR.bed; 6 | -------------------------------------------------------------------------------- /scATAC-Seq Analyses/28- Identifying enriched motifs in specific immunomodulatory peaks using monaLISA: -------------------------------------------------------------------------------- 1 | ######### Specific peaks that are used as input for the are identified using the normalized pseudo-bulked peaks file. We converted the log1p values to exponential values and considered a peak to be specific to a particular discrete annotation if it had a count at least 2.5 times higher than the average counts of the other annotations. 2 | 3 | R; 4 | 5 | ############################ Inside R ############################## 6 | library(monaLisa) 7 | library(GenomicRanges) 8 | library(SummarizedExperiment) 9 | library(JASPAR2024) 10 | library(TFBSTools) 11 | library(BSgenome.Hsapiens.UCSC.hg38) 12 | library(ComplexHeatmap) 13 | library(circlize) 14 | library(universalmotif) 15 | 16 | 17 | Systemic <- rtracklayer::import(con = "Systemic_Specific.bed", format = "bed") 18 | 19 | Scavenger <- rtracklayer::import(con = "Scavenger_Specific.bed", format = "bed") 20 | 21 | Complement <- rtracklayer::import(con = "Complement_Specific.bed", format = "bed") 22 | 23 | Tissue_Resident <- rtracklayer::import(con = "Tissue_Resident_Specific.bed", format = "bed") 24 | 25 | 26 | Peaks <- c(Systemic, Scavenger, Complement, Tissue_Resident) 27 | 28 | Peaks2 <- trim(resize(Peaks, width = median(width(Peaks)), fix = "center")) 29 | summary(width(Peaks2)) 30 | 31 | 32 | bins2 <- rep(c("Systemic", "Scavenger", "Complement", "Tissue_Resident"), c(length(Systemic), length(Scavenger), length(Complement), length(Tissue_Resident))) 33 | 34 | desired_order <- c("Systemic", "Scavenger", "Complement", "Tissue_Resident") 35 | 36 | bins2 <- factor(bins2, levels = desired_order) 37 | 38 | table(bins2) 39 | 40 | Peakseqs <- getSeq(BSgenome.Hsapiens.UCSC.hg38, Peaks2) 41 | 42 | 43 | 44 | 45 | db <- file.path(system.file("extdata", package="JASPAR2024"), 46 | "JASPAR2024.sqlite") 47 | opts <- list() 48 | opts[["tax_group"]] <- "vertebrates" 49 | opts[["matrixtype"]] <- "PWM" 50 | opts[["collection"]] <- "CORE" 51 | pwms <- getMatrixSet(db, opts) 52 | 53 | 54 | hg38 <- Hsapiens 55 | 56 | 57 | se2 <- calcBinnedMotifEnrR(seqs = Peakseqs, bins = bins2, pwmL = pwms, background = "genome", genome = hg38, genome.oversample = 500) 58 | 59 | background_matrix_pvalue <- assay(se2, "negLog10Padj") 60 | write.table(background_matrix_pvalue, file="LISA_Supervised_Specific_Peaks_Background_Mode_log10pvalues_NoCollapsing_Background500.txt", col.names=NA, sep="\t", quote=FALSE) 61 | 62 | 63 | background_matrix <- assay(se2, "log2enr") 64 | write.table(background_matrix, file="LISA_Supervised_Specific_Peaks_Background_Mode_log2enrichment_No_Collapsing_Background500.txt", col.names=NA, sep="\t", quote=FALSE) 65 | 66 | saveRDS(se2, file="LISA_Supervised_Specific_Peaks_Background_Mode_500.rds") 67 | 68 | 69 | 70 | Motifs <- scan("Motifs_OI3.txt", what="") 71 | 72 | seSel2 <- se2[Motifs, ] 73 | 74 | pdf("LISA_Heatmap_Supervised_Specific_Peaks_Background_Mode_NoCollapsing_Selected_Motifs_Version6_No_Clustering.pdf", height=18, width=40) 75 | plotMotifHeatmaps(x = seSel2, which.plots = c("log2enr", "negLog10Padj"), 76 | width = 4, cluster = FALSE, maxEnr = 2, maxSig = 10, 77 | show_dendrogram = TRUE, show_seqlogo = TRUE, 78 | width.seqlogo = 1.75, show_motif_GC = TRUE) 79 | dev.off() 80 | --------------------------------------------------------------------------------