├── Analysis of Glioma Immunotherpay scRNA-Seq Libraries
    ├── 01- Extracting Myeloid Cells Matrix from the published Seurat Object
    ├── 02- Calculating usages of the recurrent myeloid programs in the published immunotherapy dataset
    ├── 03- Generating quadrant plot for the myeloid cells in the published glioma dataset to indicate myeloid cells in responders vs non-responders and for SIGLEC9 expression
    ├── 04 - Generating boxplots for the published glioma immunotherapy dataset for percentage of cells positive for SIGLEC9 expression or immunomodulatory programs usage
    └── 05 - Generating boxplot for percentage of cells double positive for SIGLEC9 expression and immunomodulatory program usage in responders vs non-responders
├── Bulk ATAC-Seq Analysis
    ├── 01- Trimming fastq files to remove nextera transposase adaptors
    ├── 02- Mapping Trimmed Fastqs using STAR
    ├── 03- Removing Duplicate Reads from mapped libraries
    ├── 04- Remove reads mapping to chrM
    ├── 05- Final sorting of processed mapped files
    ├── 06- Creating normalized bigwigs for visualization
    ├── 07- Shifting loci to correct for tn5 bias
    ├── 08- Calling peaks using the bed files with corrected loci
    ├── 09- Determining Differential Accessible Sites using HOMER
    ├── 10- Identifying motifs enriched in differential accessible sites
    └── 11- Creating deeptools heatmap for the differential accessible sites between DMSO and p300i
├── Creation of discretized scRNA-Seq Expression matrix
    ├── 01- Align 10X V3 Publiashed normal brain scRNA-Seq Libraries
    ├── 02- Seurat for Processing Adult Normal Brain scRNA-Seq libraries
    ├── 03 - Seurat for combining Discrete cells from MGB cohort with normal brain cells
    ├── 04- Extracting Discrete Myeloid Cells for Subsequent Marker Identification using COMET and SCENIC
    ├── 05- COMET for Marker Identification
    └── 06- SCENIC Analysis for identification of Regulons governing molecular circuitry in myeloid immunomodulatory programs
├── Deconvolution of Bulk Datasets
    ├── 1- Creating Gene Sets
    ├── 2- Calculating Module scores using Seurat in TCGA Glioma Cohorts
    ├── 3- Calculating Module scores using Seurat in GLASS Glioma Cohorts
    ├── 4- Calculating Module scores using Seurat in G-SAM Glioblastoma Cohorts
    ├── 5- Preparing CIBERSORTx single cell reference matrix
    ├── 6- Estimate cell types fractions in TCGA Matrix
    ├── 7- Estimate cell types fractions in GLASS Matrix
    ├── 8- Estimate cell types fractions in G-SAM Matrix
    ├── 9- Normalization of the Module Scores
    ├── CIBERSORTx_Input
    │   ├── Discrete_LowR_Cells.txt
    │   └── Readme
    ├── Gene Sets
    │   ├── Complement_Immunosuppressive.txt
    │   ├── Endothelial.txt
    │   ├── IL1B_Inflamm.txt
    │   ├── Inflamm_Microglia.txt
    │   ├── Instructions
    │   ├── Macrophage.txt
    │   ├── Malignant2.txt
    │   ├── Malignant3.txt
    │   ├── Malignant4.txt
    │   ├── Malignant6.txt
    │   ├── Malignant7.txt
    │   ├── Memory_Like_Tcells.txt
    │   ├── Microglia.txt
    │   ├── Monocyte.txt
    │   ├── Neutrophils.txt
    │   ├── Oligo.txt
    │   ├── Pericytes.txt
    │   ├── Scavenger.txt
    │   ├── Terminal_Effector_Tcells.txt
    │   ├── Treg.txt
    │   └── cDC.txt
    └── x10- Survival Analysis
├── Figure 1 Visualizations
    ├── MGB Cohort Heatmap myeloid programs Gene Expression
    ├── Mcgill Cohort Heatmap myeloid programs Gene Expression
    ├── Quadrant plot generation with dots
    └── Quadrant plot generation with piecharts
├── Identifying recurrent programs in Myeloid Cells in Gliomas (Related to Figure 1)
    ├── 01- Identifying Variable Genes in MGB Cohort (Round 1)
    ├── 02- Round 1 cNMF for myeloid cells in MGB cohort
    ├── 03- Identifying Variable Genes in Houston Cohort (Round 1)
    ├── 04- Round 1 cNMF for myeloid cells in Houston cohort
    ├── 05- Identifying Variable Genes in Jackson's Cohort (Round 1)
    ├── 06- Round 1 cNMF for myeloid cells in Jackson's cohort
    ├── 07- Identifying union gene lists suitable for Round 2 NMF from all the cohorts
    ├── 08- Round 2 cNMF for myeloid cells in MGB cohort
    ├── 09- Round 2 cNMF for myeloid cells in Houston cohort
    ├── 10- Round 2 cNMF for myeloid cells in Jackson's cohort
    ├── 11- Identifying consensus programs
    ├── 12- Averaging the spectra and calculating usages of the consensus programs in all myeloid cells of all cohorts and
    └── 13 - Calculating the usages of the consensus myeloid programs in the validation Mcgill cohort
├── LICENSE
├── MAESTER
    ├── 01- Trimming R2 reads to include high quality bases only
    ├── 02- Extracting barcode and UMI sequences from R1 and transferring them to read names in processed R2 fastqs
    ├── 03- Mapping Maester libraries
    ├── 04- Make the STAR aligner bam file compatible with MAEGATK
    ├── 05- Run MAEGATK to obtain single cell level values of counts and coverages
    ├── 06 - Obtain Counts Matrix and Coverage Matrix at single cell level from MAEGATK output
    ├── 07- Low Resolution Pesduobulking of count matrices
    ├── 08- Low Resolution Pesduobulking of Coverage matrices
    ├── 09- Calculating VAFs from pseudobulked counts and coverage matrices
    ├── 10- Selection of Variants of Interest
    ├── 11- High-resolution Pseudobulking of Myeloid cell population in the Primary Tumor
    ├── 12 - Calculating GSVA enrichment for variants categories in myeloid cell identities in tumor microenvironment
    ├── 13- Visualizations (Dotplot)
    └── 14- Visualizations (Stacked Columns)
├── Processing of GBO scRNA-Seq libraries (Related to Figure 7)
    ├── 01- Aligning all GBO Seq-Well scRNA-Seq libraries
    ├── 02- Seurat Processing for BWH911 to obtain Raw Counts Matrix
    ├── 03- Calculating the usage of all cell types NMF programs in BWH911 GBO to extract non-doublet myeloid myeloid cells
    ├── 04- Calculating the usage of myeloid NMF programs in Myeloid cells of BWH911 GBO
    ├── 05- Seurat Processing for GBOs treated with DMSO and GNE to obtain Raw Counts Matrix
    ├── 06- Calculating the usage of all cell types NMF programs in GBOs treated with DMSO and GNE to extract non-doublet myeloid cells
    └── 07- Calculating the usage of myeloid NMF programs in Myeloid cells of GBOs treated with DMSO or GNE
├── Processing of scRNA-Seq Files (Related to Figure 1)
    ├── 01- Align SeqWell scRNA-Seq Libraries
    ├── 02- Align 10X V3 scRNA-Seq Libraries
    ├── 03- Align 10X V2 scRNA-Seq Libraries
    ├── 04- Seurat for Processing MGB Cohort.R
    ├── 05- Identifying Variable Genes for NMF in MGB Cohort
    ├── 06- cNMF for annotating cells in MGB cohort
    ├── 07- Copy number Variation Analysis
    ├── 08- Seurat with Batch correction for Myeloid Cells in MGB cohort
    ├── 09- Seurat for Processing Houston Cohort.R
    ├── 10- Calculate usage matrix in Houston Cohort for cNMF cell annotation programs identified in MGB cohort
    ├── 11- Seurat for Processing Jackson's Cohort.R
    ├── 12- Calculate usage matrix in Jackson's Cohort for cNMF cell annotation programs identified in MGB cohort
    ├── 13- Seurat for Processing Mcgill Cohort.R
    ├── 14- Calculate usage matrix in Mcgill Cohort for cNMF cell annotation programs identified in MGB cohort
    ├── 15- Annotation and Doublet Detection.pdf
    └── Reference Cells
    │   ├── Jackson's reference Cells
    │   ├── MGB Cohort Reference Cells
    │   ├── McGill Cohort Reference Cells
    │   └── Methodist Cohort Reference Cells
├── README.md
├── Spatial_transcriptomics
    ├── 01-make_adata.ipynb
    ├── 02-select_genes.ipynb
    ├── 03-merge_adata.ipynb
    ├── 04-cnmf
    ├── 05-meta_programs_usage.ipynb
    ├── 06-RCTD_sc_reference_v2.R
    ├── 07-RCTD_make_pucks_all.R
    ├── 08-run_RCTD_all.R
    ├── 09-RCTD_tocsv_all_patients.R
    ├── 10-integrate_external_and_distances_all_samples.ipynb
    ├── 11-corr_env_rctd.ipynb
    ├── 12-spatial_enrichment_regression_env.ipynb
    ├── 13-spatial_enrichment_regression_rctd_no_lim_network_plot.ipynb
    └── 14- ScatterPie Visualization of the niches
└── scATAC-Seq Analyses
    ├── 01- Processing  GBM C3L_02705  snATAC-Seq library
    ├── 02- Processing GBM C3L_03405  snATAC-Seq library
    ├── 03- Processing GBM C3L_03968  snATAC-Seq library
    ├── 04- Processing GBM C3N_00662  snATAC-Seq library
    ├── 05 - Processing GBM C3N_00663  snATAC-Seq library
    ├── 06- Processing GBM C3N_01334  snATAC-Seq library
    ├── 07- Processing GBM C3N_01518  snATAC-Seq library
    ├── 08- Processing GBM C3N_01798  snATAC-Seq library
    ├── 09- Processing GBM C3N_01814 snATAC-Seq library
    ├── 10- Processing GBM C3N_01816  snATAC-Seq library
    ├── 11- Processing GBM C3N_01818  snATAC-Seq library
    ├── 12- Processing GBM C3N_02181  snATAC-Seq library
    ├── 13- Processing GBM C3N_02186  snATAC-Seq library
    ├── 14- Processing GBM C3N_02188  snATAC-Seq library
    ├── 15- Processing GBM C3N_02769  snATAC-Seq library
    ├── 16- Processing GBM C3N_02783  snATAC-Seq library
    ├── 17- Processing GBM C3N_02784  snATAC-Seq library
    ├── 18- Processing GBM C3N_03186 snATAC-Seq library
    ├── 19- Processing GBML018G1 snATAC-Seq library
    ├── 20- Processing GBML019G1 snATAC-Seq library
    ├── 21- Merging all processed snATAC-Seq libraries into one object
    ├── 22- Calculating all cell types programs usages in the combined gene activities matrix
    ├── 23- Extracting Myeloid Cells from the Combined snATAC Object and Calculating Gene Activities
    ├── 24- Calculating myeloid program usages in myeloid cells of the combined snATAC object
    ├── 25- Extracting Discrete Myeloid Cells from the combined GBM snATAC object
    ├── 26- Generating Normalized bigwigs for the pseudobulked discrete myeloid cells
    ├── 27- Generating Deeptools heatmaps for specific peaks for the immunomodulatory discrete myeloid cells
    └── 28- Identifying enriched motifs in specific immunomodulatory peaks using monaLISA


/Analysis of Glioma Immunotherpay scRNA-Seq Libraries/01- Extracting Myeloid Cells Matrix from the published Seurat Object:
--------------------------------------------------------------------------------
 1 | ####### Seurat Object obtained from: Mei, Y., Wang, X., Zhang, J. et al. Siglec-9 acts as an immune-checkpoint molecule on macrophages in glioblastoma, restricting T-cell priming and immunotherapy response. Nat Cancer 4, 1273–1291 (2023). https://doi.org/10.1038/s43018-023-00598-9 #######
 2 | 
 3 | ####### Seurat object for the study was downloaded from https://figshare.com/articles/dataset/Single-cell_and_spatial_transcriptomic_profiling_of_human_glioblastomas/22434341. The name of the file is "GBM.RNA.integrated.24.rds" #######
 4 | 
 5 | ############Inside R####################
 6 | 
 7 | library(dplyr)
 8 | library(Seurat)
 9 | 
10 | Immunotherapy <- readRDS("GBM.RNA.integrated.24.rds")
11 | 
12 | Immunotherapy <- UpdateSeuratObject(Immunotherapy)
13 | 
14 | write.table(Immunotherapy@meta.data, file="./GBM_Immunotherapy_MetaData.txt", sep="\t", col.names=NA, quote=FALSE)
15 | 
16 | 
17 | Myeloid <- subset(x = Immunotherapy, subset = anno_ident == c("Macrophages","Microglial", "Monocytes", "cDCs"))
18 | 
19 | Myeloid_Matrix <- GetAssayData(Myeloid, slot = "counts")
20 | 
21 | Myeloid_Genes <- read.table("/seq/epiprod02/Chadi/Glioblastoma/Myeloid_NMF_Average_Gene_Spectra.txt", sep="\t", head=TRUE, row.names=1)
22 | 
23 | Myeloid_Genes2 <- rownames(Myeloid_Genes)
24 | 
25 | Myeloid_Matrix2 <- Myeloid_Matrix[rownames(Myeloid_Matrix) %in% Myeloid_Genes2,]
26 | 
27 | write.table(t(as.matrix(Myeloid_Matrix2)), file="./GBM_Immunotherapy_Myeloid_cNMF.txt", sep="\t", col.names=NA, quote=FALSE)
28 | 
29 | dim(Myeloid_Matrix2) ######### to find out the number for --numgenes in the next step (calculation script (2225)) #################
30 | 
31 | 


--------------------------------------------------------------------------------
/Analysis of Glioma Immunotherpay scRNA-Seq Libraries/02- Calculating usages of the recurrent myeloid programs in the published immunotherapy dataset:
--------------------------------------------------------------------------------
 1 | cnmf prepare --output-dir ./Calculate_Usage_Myeloid/ --name Calculate_Usage_Myeloid -c GBM_Immunotherapy_Myeloid_cNMF.txt -k 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 --n-iter 500 --total-workers 1 --seed 14 --numgenes 2225;
 2 | 
 3 | python;
 4 | 
 5 | ################################## Inside Python #################################
 6 | 
 7 | import sklearn
 8 | import sklearn.decomposition
 9 | from sklearn.decomposition import non_negative_factorization
10 | import numpy as np
11 | import scanpy as sc
12 | import csv
13 | import scipy
14 | import pandas as pd
15 |  
16 | 
17 | H = pd.read_table("Myeloid_NMF_Average_Gene_Spectra.txt", sep="\t", index_col=0)
18 | 
19 | H2 = H.T
20 | 
21 | X = sc.read_h5ad('Calculate_Usage_Myeloid/Calculate_Usage_Myeloid/cnmf_tmp/Calculate_Usage_Myeloid.norm_counts.h5ad')
22 | 
23 | X2 = X.X.toarray()
24 | 
25 | X3 = pd.DataFrame(data=X2, columns = X.var_names , index = X.obs.index)
26 | 
27 | H3 = H2.filter(items = X3.columns)
28 | 
29 | H4 = H3.to_numpy()
30 | 
31 | X5 = X2.astype(np.float64)
32 | 
33 | test = sklearn.decomposition.non_negative_factorization(X5, W=None, H=H4, n_components= 14, init='random', update_H=False, solver='cd', beta_loss='frobenius', tol=0.0001, max_iter=1000, alpha=0.0, alpha_W=0.0, alpha_H='same', l1_ratio=0.0, regularization=None, random_state=None, verbose=0, shuffle=False)
34 | 
35 | test2 = list(test)
36 | 
37 | pd.DataFrame(test2[0], columns= H.columns, index=X.obs.index).to_csv(path_or_buf="./GBM_ImmunotherapyCN_Myeloid_cNMF_Usages.txt", sep="\t", quoting=csv.QUOTE_NONE)
38 | 


--------------------------------------------------------------------------------
/Analysis of Glioma Immunotherpay scRNA-Seq Libraries/03- Generating quadrant plot for the myeloid cells in the published glioma dataset to indicate myeloid cells in responders vs non-responders and for SIGLEC9 expression:
--------------------------------------------------------------------------------
 1 | ######## "Quadrant_Plot_Immunotherapy.txt" contains all non-doublet myeloid cells in the published data sets with the usage values of the immunomodulatory programs. It also contains SIGLEC9 expression. 
 2 | ########  Xaxis is calculated by subtracting the usage of Complement Immunosuppression from the usage of IL1B pro-inflammatory ( Usage of IL1B Inflam - Usage of Complement )
 3 | ########  Yaxis is calculated by subtracting the usage of Scavenger Immunosuppression from the usage of RHOB pro-inflammatory ( Usage of RHOB Inflammatory - Usage of Scavenger )
 4 | 
 5 | library(ggplot2)
 6 | library(dplyr)
 7 | library(scatterpie)
 8 | 
 9 | data4 <- read.table("Quadrant_Plot_Immunotherapy.txt", sep="\t", head=TRUE, row.names=1)
10 | 
11 | data5 <- data4[sample(nrow(data4)), ]
12 | 
13 | 
14 | pdf("Immunotherapy_NC_responders.pdf", height = 6, width = 7.5)
15 | 
16 | ggplot(data5, aes(Xaxis, Yaxis)) + geom_point(aes(colour = Treatment), size=0.05) + scale_color_manual(values=c("nonresponder"="gray90", "responder"="black")) + theme(panel.grid.major = element_blank(), panel.grid.minor = element_blank(), panel.background = element_blank()) + labs(color=NULL) + scale_x_continuous(limits = c(-100, 100)) + scale_y_continuous(limits = c(-100, 100))
17 | 
18 | dev.off()
19 | 
20 | 
21 | 
22 | pdf("Immunotherapy_NC_non_responders.pdf", height = 6, width = 7.5)
23 | 
24 | ggplot(data5, aes(Xaxis, Yaxis)) + geom_point(aes(colour = Treatment), size=0.05) + scale_color_manual(values=c("responder"="gray90", "nonresponder"="black")) + theme(panel.grid.major = element_blank(), panel.grid.minor = element_blank(), panel.background = element_blank()) + labs(color=NULL) + scale_x_continuous(limits = c(-100, 100)) + scale_y_continuous(limits = c(-100, 100))
25 | 
26 | dev.off()
27 | 
28 | 
29 | 
30 | pdf("Immunotherapy_NC_SIGLEC9_Expression_Quadrant_V7_Blackgrey.pdf", height = 6, width = 7.5)
31 | 
32 | ggplot(data4 %>% arrange(SIGLEC9), aes(Xaxis, Yaxis)) + geom_point(aes(colour = SIGLEC9), size=0.05) + scale_color_gradient2(low="grey90", mid="grey90", high="black", midpoint = 0.5, space="Lab", limit=c(0,2), na.value="black") + theme(panel.grid.major = element_blank(), panel.grid.minor = element_blank(), panel.background = element_blank()) + labs(color=NULL) + scale_x_continuous(limits = c(-100, 100)) + scale_y_continuous(limits = c(-100, 100))
33 | 
34 | dev.off()
35 | 
36 | 
37 | 


--------------------------------------------------------------------------------
/Analysis of Glioma Immunotherpay scRNA-Seq Libraries/04 - Generating boxplots for the published glioma immunotherapy dataset for percentage of cells positive for SIGLEC9 expression or immunomodulatory programs usage:
--------------------------------------------------------------------------------
 1 | ######## Immunotherapy_SIGLEC9_Boxplot.txt contains the percentage of myeloid cells in each responder and non-responder tumors that are positive for SIGLEC9 expression and the usage of the four immunomodulatory programs
 2 | 
 3 | library(tidyr)
 4 | library(ggplot2)
 5 | 
 6 | 
 7 | data <- read.table("Immunotherapy_SIGLEC9_Boxplot.txt", head=TRUE, row.names=1, sep="\t")
 8 | 
 9 | data$RowName <- rownames(data)
10 | 
11 | 
12 | long_df <- gather(data, key = "NMF", value = "Value", -RowName, -Treatment)
13 | 
14 | data2 <- long_df[long_df$NMF %in% c("Percentage_SIGLEC9","Scavenger","Complement","Rhob","IL1B"),]
15 | 
16 | 
17 | 
18 | data2$NMF <- factor(data2$NMF, levels = c("Percentage_SIGLEC9","Scavenger","Complement","Rhob","IL1B"))
19 | 
20 | data2$Treatment <- factor(data2$Treatment, levels = c("responder", "nonresponder"))
21 | 
22 | 
23 | 
24 | ggplot(data2, aes(x = factor(NMF), y = Value, fill = Treatment)) +
25 |     geom_boxplot(width = 0.7, position = position_dodge(width = 0.7)) + geom_point(position = position_dodge(width = 0.7), aes(y = Value), color = "black", size = 1) + stat_boxplot(geom ='errorbar', width = 0.35, position = position_dodge(width = 0.7)) + theme(panel.background = element_blank(), panel.grid.major = element_blank(), panel.grid.minor = element_blank(), panel.border = element_rect(color = "black", fill = NA)) + scale_fill_manual(values = c("responder" = "white", "nonresponder" = "#808080"))
26 | 


--------------------------------------------------------------------------------
/Analysis of Glioma Immunotherpay scRNA-Seq Libraries/05 - Generating boxplot for percentage of cells double positive for SIGLEC9 expression and immunomodulatory program usage in responders vs non-responders:
--------------------------------------------------------------------------------
 1 | ######## Double_Positive_SIGLEC9_NMF_Boxplot.txt contains the percentage of myeloid cells in each responder and non-responder tumors that are double positive for SIGLEC9 expression and the usage of the four immunomodulatory programs
 2 | 
 3 | 
 4 | library(tidyr)
 5 | library(ggplot2)
 6 | 
 7 | 
 8 | data <- read.table("Double_Positive_SIGLEC9_NMF_Boxplot.txt", head=TRUE, row.names=1, sep="\t")
 9 | 
10 | data$RowName <- rownames(data)
11 | 
12 | 
13 | long_df <- gather(data, key = "NMF", value = "Value", -RowName, -Treatment)
14 | 
15 | data2 <- long_df[long_df$NMF %in% c("Scavenger","Complement","RHOB","IL1B"),]
16 | 
17 | 
18 | 
19 | data2$NMF <- factor(data2$NMF, levels = c("Scavenger","Complement","RHOB","IL1B"))
20 | 
21 | data2$Treatment <- factor(data2$Treatment, levels = c("responder", "nonresponder"))
22 | 
23 | 
24 | 
25 | ggplot(data2, aes(x = factor(NMF), y = Value, fill = Treatment)) +
26 |     geom_boxplot(width = 0.7, position = position_dodge(width = 0.7)) + geom_point(position = position_dodge(width = 0.7), aes(y = Value), color = "black", size = 1) + stat_boxplot(geom ='errorbar', width = 0.35, position = position_dodge(width = 0.7)) + theme(panel.background = element_blank(), panel.grid.major = element_blank(), panel.grid.minor = element_blank(), panel.border = element_rect(color = "black", fill = NA)) + scale_fill_manual(values = c("responder" = "white", "nonresponder" = "#808080"))
27 | 
28 | 
29 | 
30 | 
31 | 
32 | group1 <- data2[data2$Treatment == "responder" & data2$NMF == "Scavenger",]
33 | group2 <- data2[data2$Treatment == "nonresponder" & data2$NMF == "Scavenger",]
34 | 
35 | result <- wilcox.test(group1$Value, group2$Value)
36 | 
37 | p_value <- result$p.value
38 | 
39 | print(p_value) ######## 0.007455462
40 | 
41 | 
42 | group1 <- data2[data2$Treatment == "responder" & data2$NMF == "Complement",]
43 | group2 <- data2[data2$Treatment == "nonresponder" & data2$NMF == "Complement",]
44 | 
45 | result <- wilcox.test(group1$Value, group2$Value)
46 | 
47 | p_value <- result$p.value
48 | 
49 | print(p_value). ####### 0.2941863
50 | 
51 | 
52 | 
53 | 
54 | group1 <- data2[data2$Treatment == "responder" & data2$NMF == "IL1B",]
55 | group2 <- data2[data2$Treatment == "nonresponder" & data2$NMF == "IL1B",]
56 | 
57 | result <- wilcox.test(group1$Value, group2$Value)
58 | 
59 | p_value <- result$p.value
60 | 
61 | print(p_value). ####### 0.7878788
62 | 
63 | 
64 | 
65 | 
66 | 
67 | group1 <- data2[data2$Treatment == "responder" & data2$NMF == "RHOB",]
68 | group2 <- data2[data2$Treatment == "nonresponder" & data2$NMF == "RHOB",]
69 | 
70 | result <- wilcox.test(group1$Value, group2$Value)
71 | 
72 | p_value <- result$p.value
73 | 
74 | print(p_value) #### 0.1217582
75 | 
76 | 
77 | 
78 | pvalues <- c(0.007455462,
79 | 0.2941863,
80 | 0.1217582,
81 | 0.7878788)
82 | 
83 | 
84 | adjusted_p_value <- p.adjust(pvalues, method = "fdr")
85 | 


--------------------------------------------------------------------------------
/Bulk ATAC-Seq Analysis/01- Trimming fastq files to remove nextera transposase adaptors:
--------------------------------------------------------------------------------
1 | trim_galore --phred33 --paired --fastqc --output_dir Trimmed/  DMSO_S1_R1_001.fastq.gz DMSO_S1_R2_001.fastq.gz;
2 | 
3 | trim_galore --phred33 --paired --fastqc --output_dir Trimmed/ P300I_S2_R1_001.fastq.gz P300I_S2_R2_001.fastq.gz;
4 | 


--------------------------------------------------------------------------------
/Bulk ATAC-Seq Analysis/02- Mapping Trimmed Fastqs using STAR:
--------------------------------------------------------------------------------
1 | STAR --genomeDir /seq/epiprod02/Chadi/Genomes/STAR/2.7.10b/hg38 --readFilesIn DMSO_S1_R1_001_val_1.fq.gz DMSO_S1_R2_001_val_2.fq.gz --readFilesCommand zcat --outFileNamePrefix ./BAMS/DMSO --outSAMtype BAM SortedByCoordinate --outFilterMultimapNmax 1000 --outFilterScoreMinOverLread 0.25 --alignIntronMax 1 --alignEndsType EndToEnd;
2 | 
3 | 
4 | STAR --genomeDir /seq/epiprod02/Chadi/Genomes/STAR/2.7.10b/hg38 --readFilesIn P300I_S2_R1_001_val_1.fq.gz P300I_S2_R2_001_val_2.fq.gz --readFilesCommand zcat --outFileNamePrefix ./BAMS/p300i --outSAMtype BAM SortedByCoordinate --outFilterMultimapNmax 1000 --outFilterScoreMinOverLread 0.25 --alignIntronMax 1 --alignEndsType EndToEnd;
5 | 


--------------------------------------------------------------------------------
/Bulk ATAC-Seq Analysis/03- Removing Duplicate Reads from mapped libraries:
--------------------------------------------------------------------------------
1 | java -Xmx50g -jar PICARD MarkDuplicates I=DMSOAligned.sortedByCoord.out.bam O=./dup/DMSOAligned.sortedByCoord.out.bam M=./dup/DMSOAligned.sortedByCoord.txt REMOVE_DUPLICATES=TRUE;
2 | 
3 | java -Xmx50g -jar PICARD MarkDuplicates I=p300iAligned.sortedByCoord.out.bam O=./dup/p300iAligned.sortedByCoord.out.bam M=./dup/p300iAligned.sortedByCoord.txt REMOVE_DUPLICATES=TRUE;
4 | 


--------------------------------------------------------------------------------
/Bulk ATAC-Seq Analysis/04- Remove reads mapping to chrM:
--------------------------------------------------------------------------------
1 | for file in *.bam;do samtools index $file;done
2 | 
3 | mkdir filter;
4 | 
5 | samtools view -b DMSOAligned.sortedByCoord.out.bam chr1 chr2 chr3 chr4 chr5 chr6 chr7 chr8 chr9 chr10 chr11 chr12 chr13 chr14 chr15 chr16 chr17 chr18 chr19 chr20 chr21 chr22 chrX chrY > ./filter/DMSOAligned.sortedByCoord.out.bam;
6 | 
7 | samtools view -b p300iAligned.sortedByCoord.out.bam chr1 chr2 chr3 chr4 chr5 chr6 chr7 chr8 chr9 chr10 chr11 chr12 chr13 chr14 chr15 chr16 chr17 chr18 chr19 chr20 chr21 chr22 chrX chrY > ./filter/p300iAligned.sortedByCoord.out.bam;
8 | 


--------------------------------------------------------------------------------
/Bulk ATAC-Seq Analysis/05- Final sorting of processed mapped files:
--------------------------------------------------------------------------------
1 | samtools sort -o sorted_filter/DMSOAligned.sortedByCoord.out.bam DMSOAligned.sortedByCoord.out.bam;
2 | 
3 | samtools sort -o sorted_filter/p300iAligned.sortedByCoord.out.bam p300iAligned.sortedByCoord.out.bam;
4 | 


--------------------------------------------------------------------------------
/Bulk ATAC-Seq Analysis/06- Creating normalized bigwigs for visualization:
--------------------------------------------------------------------------------
1 | for file in *.bam;do samtools index $file;done
2 | 
3 | bamCoverage -b DMSOAligned.sortedByCoord.out.bam --extendReads --normalizeUsing RPGC --effectiveGenomeSize 2747877702 -o ./BW/DMSO_GBO.bw;
4 | 
5 | bamCoverage -b p300iAligned.sortedByCoord.out.bam --extendReads --normalizeUsing RPGC --effectiveGenomeSize 2747877702 -o ./BW/p300i_GBO.bw;
6 | 


--------------------------------------------------------------------------------
/Bulk ATAC-Seq Analysis/07- Shifting loci to correct for tn5 bias:
--------------------------------------------------------------------------------
 1 | bedtools bamtobed -i DMSOAligned.sortedByCoord.out.bam > ./BEDs/DMSO_GBO.bed;
 2 | 
 3 | bedtools bamtobed -i p300iAligned.sortedByCoord.out.bam > ./BEDs/p300i_GBO.bed;
 4 | 
 5 | cd BEDs;
 6 | 
 7 | cat DMSO_GBO.bed | awk -F $'\t' 'BEGIN {OFS = FS}{ if ($6 == "+") {$2 = $2 + 4} else if ($6 == "-") {$3 = $3 - 5} print $0}' >| DMSO_GBO_tn5_pe.bed;
 8 | 
 9 | cat p300i_GBO.bed | awk -F $'\t' 'BEGIN {OFS = FS}{ if ($6 == "+") {$2 = $2 + 4} else if ($6 == "-") {$3 = $3 - 5} print $0}' >| p300i_GBO_tn5_pe.bed;
10 | 


--------------------------------------------------------------------------------
/Bulk ATAC-Seq Analysis/08- Calling peaks using the bed files with corrected loci:
--------------------------------------------------------------------------------
1 | macs2 callpeak -t DMSO_GBO_tn5_pe.bed -g hs -f BED -q 0.01 --nomodel --shift -75 --extsize 150 --keep-dup all -B --SPMR --outdir macs2_pk -n DMSO_GBO;
2 | 
3 | macs2 callpeak -t p300i_GBO_tn5_pe.bed -g hs -f BED -q 0.01 --nomodel --shift -75 --extsize 150 --keep-dup all -B --SPMR --outdir macs2_pk -n p300i_GBO;
4 | 


--------------------------------------------------------------------------------
/Bulk ATAC-Seq Analysis/09- Determining Differential Accessible Sites using HOMER:
--------------------------------------------------------------------------------
 1 | ###### Made the xls output of macs2 compatible with HOMER ###########
 2 | 
 3 | ###### Merging peaks of DMSO and p300i into one file ###########
 4 | 
 5 | mergePeaks -d given DMSO_GBO_Ready_peaks.bed p300i_GBO_Ready_peaks.bed > GBO_Myeloid_Peaks_Merged.bed
 6 | 
 7 | 
 8 | ####### Creating Tag Directories using the bed files with corrected loci ########
 9 | 
10 | makeTagDirectory ./TagDirectories/DMSO_GBO/ -format bed DMSO_GBO_tn5_pe.bed
11 | 
12 | makeTagDirectory ./TagDirectories/p300i_GBO/ -format bed p300i_GBO_tn5_pe.bed
13 | 
14 | 
15 | ####### Differential peak analysis ##########
16 | 
17 | getDifferentialPeaks GBO_Myeloid_Peaks_Merged.bed ./TagDirectories/p300i_GBO/ ./TagDirectories/DMSO_GBO/ -F 2 > ./Upregulated_in_p300i.txt
18 | 
19 | getDifferentialPeaks GBO_Myeloid_Peaks_Merged.bed ./TagDirectories/DMSO_GBO/ ./TagDirectories/p300i_GBO/ -F 2 > ./Upregulated_in_DMSO.txt
20 | 


--------------------------------------------------------------------------------
/Bulk ATAC-Seq Analysis/10- Identifying motifs enriched in differential accessible sites:
--------------------------------------------------------------------------------
 1 | ####### Made the outputs of getDifferentialPeaks compatible with rtracklayer::import option #######
 2 | 
 3 | R;
 4 | 
 5 | ############################ Inside R ##############################
 6 | library(monaLisa)
 7 | library(GenomicRanges)
 8 | library(SummarizedExperiment)
 9 | library(JASPAR2024)
10 | library(TFBSTools)
11 | library(BSgenome.Hsapiens.UCSC.hg38)
12 | library(ComplexHeatmap)
13 | library(circlize)
14 | 
15 | 
16 | 
17 | DMSO <- rtracklayer::import(con = "Upregulated_in_DMSO.bed", format = "bed")
18 | 
19 | p300i <- rtracklayer::import(con = "Upregulated_in_p300i.bed", format = "bed")
20 | 
21 | 
22 | 
23 | Peaks <- c(DMSO, p300i)
24 | 
25 | Peaks2 <- trim(resize(Peaks, width = median(width(Peaks)), fix = "center"))
26 | summary(width(Peaks2))
27 | 
28 | 
29 | bins2 <- rep(c("DMSO", "p300i"), c(length(DMSO), length(p300i)))
30 | 
31 | desired_order <- c("DMSO", "p300i")
32 | 
33 | bins2 <- factor(bins2, levels = desired_order)
34 | 
35 | table(bins2)
36 | 
37 | Peakseqs <- getSeq(BSgenome.Hsapiens.UCSC.hg38, Peaks2)
38 | 
39 | 
40 | 
41 | db <- file.path(system.file("extdata", package="JASPAR2024"), 
42 |                     "JASPAR2024.sqlite")
43 | opts <- list()
44 | opts[["tax_group"]] <- "vertebrates"
45 | opts[["matrixtype"]] <- "PWM"
46 | opts[["collection"]] <- "CORE"
47 | pwms <- getMatrixSet(db, opts)
48 | 
49 | 
50 | hg38 <- Hsapiens
51 | 
52 | 
53 | se2 <- calcBinnedMotifEnrR(seqs = Peakseqs, bins = bins2, pwmL = pwms, background = "genome", genome = hg38, genome.oversample = 50)
54 | 
55 | 
56 | Motifs <- scan("Motif_OIbulk.txt", what="")
57 | 
58 | seSel2 <- se2[Motifs, ]
59 | 
60 | pdf("Bulk_ATAC_DMSO_p300i_LISA_Heatmap.pdf", height=18, width=40)
61 | plotMotifHeatmaps(x = seSel2, which.plots = c("log2enr", "negLog10Padj"), 
62 |                   width = 4, cluster = FALSE, maxEnr = 2, maxSig = 300,
63 |                   show_dendrogram = TRUE, show_seqlogo = TRUE,
64 |                   width.seqlogo = 1.75, show_motif_GC = TRUE)
65 | dev.off()
66 | 
67 | 
68 | matrix_pvalue <- assay(se2, "negLog10Padj")
69 | write.table(matrix_pvalue, file="Bulk_ATAC_GBO_LISA_Motifs_log10pvalues.txt", col.names=NA, sep="\t", quote=FALSE)
70 | 
71 | 
72 | 
73 | background_matrix <- assay(se2, "log2enr")
74 | write.table(background_matrix, file="Bulk_ATAC_GBO_LISA_Motifs_Enrichment.txt", col.names=NA, sep="\t", quote=FALSE)
75 | 


--------------------------------------------------------------------------------
/Bulk ATAC-Seq Analysis/11- Creating deeptools heatmap for the differential accessible sites between DMSO and p300i:
--------------------------------------------------------------------------------
1 | 
2 | computeMatrix reference-point -S DMSO_GBO.bw p300i_GBO.bw -R Upregulated_in_DMSO.bed Upregulated_in_p300i.bed -b 1000 -a 1000 --skipZeros --smartLabels -o Bulk_ATAC_GBO_Differential_Peaks_Center.gz --referencePoint center;
3 | 
4 | 
5 | plotHeatmap -m Bulk_ATAC_GBO_Differential_Peaks_Center.gz -o Bulk_ATAC_GBO_Differential_Peaks_Center_Better_Pipeline.pdf --startLabel 5 --endLabel 3 --whatToShow "heatmap and colorbar" --colorMap Blues --outFileSortedRegions Bulk_ATAC_GBO_Differential_Peaks_Center.bed;
6 | 


--------------------------------------------------------------------------------
/Creation of discretized scRNA-Seq Expression matrix/01- Align 10X V3 Publiashed normal brain scRNA-Seq Libraries:
--------------------------------------------------------------------------------
1 | /path/to/STAR --genomeDir /path/to/genome/dir/ --readFilesIn Read2_Lane1.fastq.gz,Read2_Lane2.fastq.gz,Read2_Lane3.fastq.gz Read1_Lane1.fastq.gz,Read1_Lane2.fastq.gz,Read1_Lane3.fastq.gz --soloType CB_UMI_Simple --soloCBstart 1 --soloCBlen 16 --soloUMIstart 17 --soloUMIlen 12 --soloCBmatchWLtype 1MM_multi_Nbase_pseudocounts --soloUMIfiltering MultiGeneUMI_CR --soloUMIdedup 1MM_CR --clipAdapterType CellRanger4 --outFilterScoreMin 30 --outSAMtype BAM SortedByCoordinate --outSAMattributes CR UR CY UY CB UB
2 | 


--------------------------------------------------------------------------------
/Creation of discretized scRNA-Seq Expression matrix/04- Extracting Discrete Myeloid Cells for Subsequent Marker Identification using COMET and SCENIC:
--------------------------------------------------------------------------------
  1 | ####### Discrete Scavenger, Complement, Tissue Resident and Systemic myeloid cells were extracted and their IDs placed in the Cells text files shown below
  2 | 
  3 | ############################ Inside R ############################
  4 | 
  5 | library(dplyr)
  6 | library(Seurat)
  7 | options(bitmapType='cairo')
  8 | options(future.globals.maxSize = 8000 * 1024^2)
  9 | 
 10 | 
 11 | 
 12 | DefaultAssay(Discrete4) <- "RNA"
 13 | 
 14 | 
 15 | Matrix <- GetAssayData(Discrete4, slot = "counts")
 16 | 
 17 | 
 18 | Cells2 <- scan("Cells_For_Scenic.txt", what="")
 19 | 
 20 | SCENIC <- Matrix[,colnames(Matrix) %in% Cells2]
 21 | 
 22 | write.table(as.matrix(SCENIC), file="Discrete_Suppressive_Inflammatory_Myeloid_For_SCENIC.txt", sep="\t", quote=FALSE, col.names=NA)
 23 | 
 24 | 
 25 | DefaultAssay(Discrete4) <- "RNA"
 26 | 
 27 | 
 28 | Matrix <- GetAssayData(Discrete4, slot = "counts")
 29 | 
 30 | Cells <- scan("Cells_For_COMET.txt", what="")
 31 | 
 32 | Myeloid <- subset(Discrete4, cells= Cells)
 33 | 
 34 | all.genes <- rownames(Myeloid)
 35 | 
 36 | 
 37 | pdf("Discrete4_Myeloid_Suppressive_QC_AF.pdf", height = 6, width = 20)
 38 | VlnPlot(Myeloid, features = c("nFeature_RNA", "nCount_RNA", "percent.mt"), ncol = 3)
 39 | dev.off()
 40 | 
 41 | library(sctransform)
 42 | Myeloid <- SCTransform(Myeloid, vars.to.regress = "percent.mt", verbose = TRUE)
 43 | Myeloid <- RunPCA(Myeloid)
 44 | pdf("Discrete4_Myeloid_Suppressive_ElbowPlot.pdf", height = 6, width = 6)
 45 | ElbowPlot(Myeloid, ndims=50)
 46 | dev.off()
 47 | 
 48 | write.table(Myeloid@meta.data, file="Discrete4_Myeloid_Suppressive_Integrated_MetaData.txt", sep="\t", col.names=NA, quote=FALSE)
 49 | 
 50 | 
 51 | 
 52 | Myeloid <- RunUMAP(Myeloid, reduction = "pca", dims = 1:16)
 53 | Myeloid <- FindNeighbors(Myeloid, dims = 1:16)
 54 | Myeloid <- FindClusters(Myeloid, resolution = 0.3)
 55 | pdf("Discrete4_Myeloid_Suppressive_UMAP_Clusters.pdf", height= 6, width = 7)
 56 | DimPlot(Myeloid, reduction = "umap")
 57 | dev.off()
 58 | 
 59 | pdf("Discrete4_Myeloid_Suppressive_Clusters_With_Labels.pdf", height= 6, width = 7)
 60 | DimPlot(Myeloid, reduction = "umap", label=TRUE)
 61 | dev.off()
 62 | 
 63 | pdf("Discrete4_Myeloid_Suppressive_UMAP_Patient_ID.pdf", height= 6, width = 9)
 64 | DimPlot(Myeloid, reduction = "umap", group.by="orig.ident")
 65 | dev.off()
 66 | 
 67 | 
 68 | pdf("Discrete4_Myeloid_Suppressive_UMAP_IDH_Status.pdf", height= 6, width = 7)
 69 | DimPlot(Myeloid, reduction = "umap", group.by="IDH_Status")
 70 | dev.off()
 71 | 
 72 | 
 73 | pdf("Discrete4_Myeloid_Suppressive_UMAP_Annotation.pdf", height= 6, width = 7)
 74 | DimPlot(Myeloid, reduction = "umap", group.by="Annotation")
 75 | dev.off()
 76 | 
 77 | 
 78 | pdf("Discrete4_Myeloid_Suppressive_UMAP_Annotation_Labelled.pdf", height= 6, width = 7)
 79 | DimPlot(Myeloid, reduction = "umap", group.by="Annotation", label=TRUE)
 80 | dev.off()
 81 | 
 82 | 
 83 | 
 84 | DefaultAssay(Myeloid) <- "RNA"
 85 | Myeloid <- NormalizeData(Myeloid)
 86 | Myeloid <- ScaleData(Myeloid, features = all.genes)
 87 | 
 88 | 
 89 | write.table(Myeloid@meta.data, file="Discrete4_Myeloid_Suppressive_All_MetaData.txt", sep="\t", col.names=NA, quote=FALSE)
 90 | 
 91 | 
 92 | 
 93 | library(SCOPfunctions)
 94 | 
 95 | Matrix <- GetAssayData(Myeloid, slot = "data")
 96 | 
 97 | Matrix2 <- as.matrix(Matrix)
 98 | 
 99 | write.table(Matrix2, file="./Discrete4_Myeloid_Suppressive_Normalized_Matrix.txt", sep="\t", col.names=NA, quote=FALSE)
100 | 
101 | 
102 | umap2 <- Embeddings(object = Myeloid, reduction = "umap")
103 | 
104 | write.table(umap2, file="Discrete4_Myeloid_Suppressive_UMAP.txt", sep="\t", col.names=NA, quote=FALSE)
105 | 
106 | 
107 | Matrix <- GetAssayData(Myeloid, slot = "data")
108 | 
109 | Genes <- scan("Human_Surface_Markers.txt", what="")
110 | 
111 | data2 <- Matrix[rownames(Matrix) %in% Genes,]
112 | 
113 | 
114 | data2 <- as.matrix(data2)
115 | 
116 | write.table(data2, file="./Discrete4_Myeloid_Suppressive_Normalized_Matrix_Human_Surface_Markers.txt", sep="\t", col.names=NA, quote=FALSE)
117 | 


--------------------------------------------------------------------------------
/Creation of discretized scRNA-Seq Expression matrix/05- COMET for Marker Identification:
--------------------------------------------------------------------------------
1 | Comet -Abbrev 2 Discrete4_Myeloid_Suppressive_Normalized_Matrix_Human_Surface_Markers.txt Discrete4_Myeloid_Suppressive_UMAP.txt Cells_For_COMET_Supp.txt Discrete4_Myeloid_Suppressive_Surface_Markers_Output/;
2 | 
3 | 
4 | Comet -Abbrev 2 Discrete4_Myeloid_Suppressive_Normalized_Matrix.txt Discrete4_Myeloid_Suppressive_UMAP.txt Cells_For_COMET_Supp.txt Discrete4_Myeloid_Suppressive_Output/;
5 | 


--------------------------------------------------------------------------------
/Deconvolution of Bulk Datasets/1- Creating Gene Sets:
--------------------------------------------------------------------------------
1 | ###### The top 50 genes were obtained for each myeloid program by ranking them in the merged gene spectra output. 
2 | 
3 | ###### For Tcells and Malignant Cells, we ranked the gene spectra from their respective cNMF outputs to obtain the top 50 genes for each program. 
4 | 
5 | ###### For the other cell types, we used the gene spectra output of the cNMF of all cell types. We ranked and obtained the top 50 genes for the Pericytes, Endothelial and Oligo programs. 
6 | 
7 | ###### To obtain cleaner signals, we removed any gene from each list appearing in the top 100 genes in all the other programs.
8 | 


--------------------------------------------------------------------------------
/Deconvolution of Bulk Datasets/2- Calculating Module scores using Seurat in TCGA Glioma Cohorts:
--------------------------------------------------------------------------------
 1 | library("Seurat")
 2 | library("dplyr")
 3 | 
 4 | ###### Load TCGA Glioma gene expression matrix (It is normalized and log-transformed ######
 5 | 
 6 | data <- read.table("../TCGA.GBMLGG.sampleMap_HiSeqV2", sep="\t", head=TRUE, row.names=1)
 7 | 
 8 | TCGA <- CreateSeuratObject(counts = data, project = "TCGA", min.cells = 1, min.features = 1)
 9 | 
10 | DefaultAssay(TCGA) <- "RNA"
11 | 
12 | all.genes <- rownames(TCGA)
13 | TCGA <- ScaleData(TCGA, features = all.genes)
14 | 
15 | 
16 | TCGA <- FindVariableFeatures(TCGA, selection.method = "vst", nfeatures = 2000)
17 | 
18 | ######## Load gene sets (Obtained as described in Step 1) ###########
19 | 
20 | Microglia <- scan("Microglia.txt", what="")
21 | Macrophage <- scan("Macrophage.txt", what="")
22 | Monocyte <- scan("Monocyte.txt", what="")
23 | cDC <- scan("cDC.txt", what="")
24 | Neutrophils <- scan("Neutrophils.txt", what="")
25 | 
26 | 
27 | IL1B_Inflamm <- scan("IL1B_Inflamm.txt", what="")
28 | Inflamm_Microglia <- scan("Inflamm_Microglia.txt", what="")
29 | Complement_Immunosuppressive <- scan("Complement_Immunosuppressive.txt", what="")
30 | Scavenger <- scan("Scavenger.txt", what="")
31 | 
32 | 
33 | Memory_Like_Tcells <- scan("Memory_Like_Tcells.txt", what="")
34 | Terminal_Effector_Tcells <- scan("Terminal_Effector_Tcells.txt", what="")
35 | Treg <- scan("Treg.txt", what="")
36 | 
37 | Oligo <- scan("Oligo.txt", what="")
38 | Pericytes <- scan("Pericytes.txt", what="")
39 | Endothelial <- scan("Endothelial.txt", what="")
40 | 
41 | Malignant2 <- scan("Malignant2.txt", what="")
42 | Malignant3 <- scan("Malignant3.txt", what="")
43 | Malignant4 <- scan("Malignant4.txt", what="")
44 | Malignant6 <- scan("Malignant6.txt", what="")
45 | Malignant7 <- scan("Malignant7.txt", what="")
46 | 
47 | ############ Calculate the Module Scores and output the results ##########
48 | 
49 | Features <- list(Microglia, Macrophage, Monocyte, cDC, Neutrophils, IL1B_Inflamm, Inflamm_Microglia, Complement_Immunosuppressive, Scavenger, Memory_Like_Tcells, Terminal_Effector_Tcells, Treg, Oligo, Pericytes, Endothelial, Malignant2, Malignant3, Malignant4, Malignant6, Malignant7)
50 | 
51 | TCGA <-  AddModuleScore(object = TCGA, features = Features, name = c("Microglia", "Macrophage", "Monocyte", "cDC", "Neutrophils", "IL1B_Inflamm", "Inflamm_Microglia", "Complement_Immunosuppressive", "Scavenger", "Memory_Like_Tcells", "Terminal_Effector_Tcells", "Treg", "Oligo", "Pericytes", "Endothelial", "Malignant2", "Malignant3", "Malignant4", "Malignant6", "Malignant7"))
52 | 
53 | 
54 | 
55 | write.table(TCGA@meta.data, file="TCGA_Glioma_MetaData.txt", sep="\t", col.names=NA, quote=FALSE)
56 | 


--------------------------------------------------------------------------------
/Deconvolution of Bulk Datasets/3- Calculating Module scores using Seurat in GLASS Glioma Cohorts:
--------------------------------------------------------------------------------
 1 | setwd("~/Desktop/Chadi/Bioinformatics/Glioblastoma/Seurat_Bulk_Fig4/230515")
 2 | 
 3 | library("Seurat")
 4 | library("dplyr")
 5 | 
 6 | ###### Load GLASS Glioma gene expression matrix (It is normalized but not log-transformed ######
 7 | 
 8 | data <- read.table("GLASS_Normalized_All_Genes.txt", sep="\t", head=TRUE, row.names=1)
 9 | 
10 | ##### Log-transform the data ######
11 | 
12 | data <- log1p(data) 
13 | 
14 | GLASS <- CreateSeuratObject(counts = data, project = "GLASS", min.cells = 1, min.features = 1)
15 | 
16 | DefaultAssay(GLASS) <- "RNA"
17 | 
18 | all.genes <- rownames(GLASS)
19 | GLASS <- ScaleData(GLASS, features = all.genes)
20 | 
21 | 
22 | GLASS <- FindVariableFeatures(GLASS, selection.method = "vst", nfeatures = 2000)
23 | 
24 | ######## Load gene sets (Obtained as described in Step 1) ###########
25 | 
26 | 
27 | Microglia <- scan("Microglia.txt", what="")
28 | Macrophage <- scan("Macrophage.txt", what="")
29 | Monocyte <- scan("Monocyte.txt", what="")
30 | cDC <- scan("cDC.txt", what="")
31 | Neutrophils <- scan("Neutrophils.txt", what="")
32 | 
33 | 
34 | IL1B_Inflamm <- scan("IL1B_Inflamm.txt", what="")
35 | Inflamm_Microglia <- scan("Inflamm_Microglia.txt", what="")
36 | Complement_Immunosuppressive <- scan("Complement_Immunosuppressive.txt", what="")
37 | Scavenger <- scan("Scavenger.txt", what="")
38 | 
39 | 
40 | Memory_Like_Tcells <- scan("Memory_Like_Tcells.txt", what="")
41 | Terminal_Effector_Tcells <- scan("Terminal_Effector_Tcells.txt", what="")
42 | Treg <- scan("Treg.txt", what="")
43 | 
44 | Oligo <- scan("Oligo.txt", what="")
45 | Pericytes <- scan("Pericytes.txt", what="")
46 | Endothelial <- scan("Endothelial.txt", what="")
47 | 
48 | Malignant2 <- scan("Malignant2.txt", what="")
49 | Malignant3 <- scan("Malignant3.txt", what="")
50 | Malignant4 <- scan("Malignant4.txt", what="")
51 | Malignant6 <- scan("Malignant6.txt", what="")
52 | Malignant7 <- scan("Malignant7.txt", what="")
53 | 
54 | 
55 | ############ Calculate the Module Scores and output the results ##########
56 | 
57 | Features <- list(Microglia, Macrophage, Monocyte, cDC, Neutrophils, IL1B_Inflamm, Inflamm_Microglia, Complement_Immunosuppressive, Scavenger, Memory_Like_Tcells, Terminal_Effector_Tcells, Treg, Oligo, Pericytes, Endothelial, Malignant2, Malignant3, Malignant4, Malignant6, Malignant7)
58 | 
59 | GLASS <-  AddModuleScore(object = GLASS, features = Features, name = c("Microglia", "Macrophage", "Monocyte", "cDC", "Neutrophils", "IL1B_Inflamm", "Inflamm_Microglia", "Complement_Immunosuppressive", "Scavenger", "Memory_Like_Tcells", "Terminal_Effector_Tcells", "Treg", "Oligo", "Pericytes", "Endothelial", "Malignant2", "Malignant3", "Malignant4", "Malignant6", "Malignant7"))
60 | 
61 | 
62 | write.table(GLASS@meta.data, file="GLASS_Glioma_MetaData.txt", sep="\t", col.names=NA, quote=FALSE)
63 | 


--------------------------------------------------------------------------------
/Deconvolution of Bulk Datasets/4- Calculating Module scores using Seurat in G-SAM Glioblastoma Cohorts:
--------------------------------------------------------------------------------
 1 | library(edgeR)
 2 | library("Seurat")
 3 | library("dplyr")
 4 | 
 5 | ###### Load G-SAM Glioma gene expression matrix (It is raw counts) ######
 6 | 
 7 | raw_counts <- read.table("gsam.rnaseq.expression-322.txt", head=TRUE, sep="\t", row.names=1)
 8 | 
 9 | 
10 | ####### CPM-normalization and log-transformation ###########
11 | 
12 | d <- DGEList(counts = raw_counts)
13 | 
14 | d <- calcNormFactors(d)
15 | 
16 | cpm_matrix <- cpm(d)
17 | 
18 | write.table(cpm_matrix, file="GSAM_Normalized_Full_Matrix.txt", sep="\t", col.names=NA, quote=FALSE)
19 | 
20 | 
21 | data2 <- log1p(cpm_matrix) 
22 | 
23 | GSAM <- CreateSeuratObject(counts = data2, project = "GSAM", min.cells = 1, min.features = 1)
24 | 
25 | 
26 | 
27 | all.genes <- rownames(GSAM)
28 | GSAM <- ScaleData(GSAM, features = all.genes)
29 | 
30 | 
31 | GSAM <- FindVariableFeatures(GSAM, selection.method = "vst", nfeatures = 2000)
32 | 
33 | 
34 | 
35 | DefaultAssay(GSAM) <- "RNA"
36 | 
37 | 
38 | ######## Load gene sets (Obtained as described in Step 1) ###########
39 | 
40 | Microglia <- scan("Microglia.txt", what="")
41 | Macrophage <- scan("Macrophage.txt", what="")
42 | Monocyte <- scan("Monocyte.txt", what="")
43 | cDC <- scan("cDC.txt", what="")
44 | Neutrophils <- scan("Neutrophils.txt", what="")
45 | 
46 | 
47 | IL1B_Inflamm <- scan("IL1B_Inflamm.txt", what="")
48 | Inflamm_Microglia <- scan("Inflamm_Microglia.txt", what="")
49 | Complement_Immunosuppressive <- scan("Complement_Immunosuppressive.txt", what="")
50 | Scavenger <- scan("Scavenger.txt", what="")
51 | 
52 | 
53 | Memory_Like_Tcells <- scan("Memory_Like_Tcells.txt", what="")
54 | Terminal_Effector_Tcells <- scan("Terminal_Effector_Tcells.txt", what="")
55 | Treg <- scan("Treg.txt", what="")
56 | 
57 | Oligo <- scan("Oligo.txt", what="")
58 | Pericytes <- scan("Pericytes.txt", what="")
59 | Endothelial <- scan("Endothelial.txt", what="")
60 | 
61 | Malignant2 <- scan("Malignant2.txt", what="")
62 | Malignant3 <- scan("Malignant3.txt", what="")
63 | Malignant4 <- scan("Malignant4.txt", what="")
64 | Malignant6 <- scan("Malignant6.txt", what="")
65 | Malignant7 <- scan("Malignant7.txt", what="")
66 | 
67 | 
68 | ############ Calculate the Module Scores and output the results ##########
69 | 
70 | Features <- list(Microglia, Macrophage, Monocyte, cDC, Neutrophils, IL1B_Inflamm, Inflamm_Microglia, Complement_Immunosuppressive, Scavenger, Memory_Like_Tcells, Terminal_Effector_Tcells, Treg, Oligo, Pericytes, Endothelial, Malignant2, Malignant3, Malignant4, Malignant6, Malignant7)
71 | 
72 | GSAM <-  AddModuleScore(object = GSAM, features = Features, name = c("Microglia", "Macrophage", "Monocyte", "cDC", "Neutrophils", "IL1B_Inflamm", "Inflamm_Microglia", "Complement_Immunosuppressive", "Scavenger", "Memory_Like_Tcells", "Terminal_Effector_Tcells", "Treg", "Oligo", "Pericytes", "Endothelial", "Malignant2", "Malignant3", "Malignant4", "Malignant6", "Malignant7"))
73 | 
74 | 
75 | 
76 | write.table(GSAM@meta.data, file="GSAM_Glioma_MetaData.txt", sep="\t", col.names=NA, quote=FALSE)
77 | 


--------------------------------------------------------------------------------
/Deconvolution of Bulk Datasets/5- Preparing CIBERSORTx single cell reference matrix:
--------------------------------------------------------------------------------
 1 | ############Inside R####################
 2 | 
 3 | ########### Discrete_LowR_Cells.txt is a data frame containing cell barcodes in one column. The other column indicates the discrete annotation of the cell ######
 4 | 
 5 | ########### Cells were annotated as Malignant, Oligo, Vasculature, Myeloid, Tcells, or, other immune cells
 6 | 
 7 | ########## We used the Usage output of the all cell types cNMF
 8 | 
 9 | ########### For a cell to make it to Discrete_LowR_Cells.txt, it had to: (a) have below 10% usage for all the other broad categories than its annotation and (b) more than 2.5 fold than the second highest usage.
10 | 
11 | library(dplyr)
12 | library(Seurat)
13 | library(SCOPfunctions)
14 | options(bitmapType='cairo')
15 | options(future.globals.maxSize = 8000 * 1024^2)
16 | 
17 | 
18 | ###### Load the discrete cells that passed the criteria
19 | 
20 | Discrete_LowR_Cells <- read.table("Discrete_LowR_Cells.txt", sep="\t", row.names=1)
21 | 
22 | 
23 | ##### Load the Seurat object of all cells from MGB cohort ######
24 | 
25 | ID <- subset(x = Tumors.combined, cells = rownames(Discrete_LowR_Cells))
26 | 
27 | ID.data <- GetAssayData(object = ID, slot="counts")
28 | 
29 | ######### Order both the data frames in the same way
30 | 
31 | order <- match(rownames(Discrete_LowR_Cells), colnames(ID.data))
32 | 
33 | ID.data2  <- ID.data[ , order]
34 | 
35 | ######### Change the cell name (colnames) to the annotation as per the requirement of CIBERSORTx and save the matrix ######## 
36 | 
37 | colnames(ID.data2) <- Discrete_LowR_Cells$V2
38 | 
39 | 
40 | Matrix2 <- utils_big_as.matrix(ID.data2, n_slices_init = 18, verbose = T)
41 | 
42 | 
43 | write.table(Matrix2, file="./MGB_LowR_Cibersort_Ready_Full_Raw_Expression.txt", sep="\t", col.names=NA, quote=FALSE)
44 | 


--------------------------------------------------------------------------------
/Deconvolution of Bulk Datasets/6- Estimate cell types fractions in TCGA Matrix:
--------------------------------------------------------------------------------
 1 | #### Inside R #######
 2 | 
 3 | ###### Matrix has to be normalized without log-transformation ######
 4 | 
 5 | 
 6 | #### Load log-transformed and normalized TCGA expression table to remove the log-transformation #####
 7 | TCGA <- read.table("TCGA.GBMLGG.sampleMap_HiSeqV2", sep="\t", head=TRUE, row.names=1)
 8 | 
 9 | TCGA3 <- 2^TCGA
10 | 
11 | TCGA4 <- TCGA3-1 
12 | 
13 | write.table(TCGA4, file="TCGA_Normalized_Full_GBMLGG.txt", sep="\t", quote=FALSE, col.names=NA)
14 | 
15 | q()
16 | 
17 | ############ Exit R ##############
18 | 
19 | 
20 |         cibersortx/fractions \
21 |         --username ~{username} \
22 |         --token ~{token} \
23 |         --mixture ./TCGA_Normalized_Full_GBMLGG.txt \
24 |         --single_cell TRUE \
25 |         --refsample ./MGB_LowR_Cibersort_Ready_Full_Raw_Expression.txt \
26 |         --outdir .
27 | 
28 | 
29 |         ###### --username and --token to be obtained from CIBERSORTx website by creating an account and contacting authors ######
30 | 
31 |         ###### Use the CIBERSORTx website to obtain docker image of CIBERSORTx or after obtaining username and token you can use the Terra image for higher RAM power ########
32 | 


--------------------------------------------------------------------------------
/Deconvolution of Bulk Datasets/7- Estimate cell types fractions in GLASS Matrix:
--------------------------------------------------------------------------------
 1 | ###### Matrix has to be normalized without log-transformation ######
 2 | 
 3 | 
 4 | 
 5 | 
 6 | 
 7 | 
 8 |         cibersortx/fractions \
 9 |         --username ~{username} \
10 |         --token ~{token} \
11 |         --mixture ./GLASS_Normalized_All_Genes.txt
12 | .txt \
13 |         --single_cell TRUE \
14 |         --refsample ./MGB_LowR_Cibersort_Ready_Full_Raw_Expression.txt \
15 |         --outdir .
16 | 
17 | 
18 |         ###### --username and --token to be obtained from CIBERSORTx website by creating an account and contacting authors ######
19 | 
20 |         ###### Use the CIBERSORTx website to obtain docker image of CIBERSORTx or after obtaining username and token you can use the Terra image for higher RAM power ########
21 | 


--------------------------------------------------------------------------------
/Deconvolution of Bulk Datasets/8- Estimate cell types fractions in G-SAM Matrix:
--------------------------------------------------------------------------------
 1 | #### Inside R #######
 2 | 
 3 | ###### Matrix has to be normalized without log-transformation ######
 4 | 
 5 | 
 6 | library(edgeR)
 7 | library("Seurat")
 8 | library("dplyr")
 9 | 
10 | ###### Load G-SAM Glioma gene expression matrix (It is raw counts) ######
11 | 
12 | raw_counts <- read.table("gsam.rnaseq.expression-322.txt", head=TRUE, sep="\t", row.names=1)
13 | 
14 | 
15 | ####### CPM-normalization and log-transformation ###########
16 | 
17 | d <- DGEList(counts = raw_counts)
18 | 
19 | d <- calcNormFactors(d)
20 | 
21 | cpm_matrix <- cpm(d)
22 | 
23 | write.table(cpm_matrix, file="GSAM_Normalized_Full_Matrix.txt", sep="\t", col.names=NA, quote=FALSE)
24 | 
25 | q()
26 | 
27 | ############ Exit R ##############
28 | 
29 |         cibersortx/fractions \
30 |         --username ~{username} \
31 |         --token ~{token} \
32 |         --mixture ./GSAM_Normalized_Full_Matrix.txt \
33 |         --single_cell TRUE \
34 |         --refsample ./MGB_LowR_Cibersort_Ready_Full_Raw_Expression.txt \
35 |         --outdir .
36 | 
37 | 
38 |         ###### --username and --token to be obtained from CIBERSORTx website by creating an account and contacting authors ######
39 | 
40 |         ###### Use the CIBERSORTx website to obtain docker image of CIBERSORTx or after obtaining username and token you can use the Terra image for higher RAM power ########
41 | 


--------------------------------------------------------------------------------
/Deconvolution of Bulk Datasets/9- Normalization of the Module Scores:
--------------------------------------------------------------------------------
 1 | To normalize the module scores in each bulk library, the module score of the program is divided by the imputed value of its respective cell type in the CIBERSORTx results outputs.  
 2 | 
 3 | (Module score Value /  Imputed CIBERSORTx value for the corresponding category)
 4 | 
 5 | (e.g.   Scavenger Module Score / Myeloid Imputed Fraction )
 6 | 
 7 | 
 8 | The normalized module scores were then used for correlation with published clinical data available online, including IDH mutation status and, Molecular Grade. 
 9 | 
10 | 
11 | For correlation with clinical data, the normalized module scores were log10 transformed by using the following formula: 
12 | 
13 | if x >= 0:  LOG10[Absolute(x) + 1]
14 | if x < 0: - LOG10[Absolute(x) + 1]
15 | 
16 | Where “x” is the normalized module score value.
17 | 


--------------------------------------------------------------------------------
/Deconvolution of Bulk Datasets/CIBERSORTx_Input/Readme:
--------------------------------------------------------------------------------
1 | Inputs for CIBERSORTx:
2 | 
3 | "Discrete_LowR_Cells.txt" is required for step 5 "5- Preparing CIBERSORTx single cell reference matrix".
4 | 


--------------------------------------------------------------------------------
/Deconvolution of Bulk Datasets/Gene Sets/Complement_Immunosuppressive.txt:
--------------------------------------------------------------------------------
 1 | C1QA
 2 | C1QB
 3 | MS4A6A
 4 | PLTP
 5 | TMEM176B
 6 | RASSF4
 7 | TMEM176A
 8 | TNFSF13
 9 | GAA
10 | GNG10
11 | MS4A4A
12 | NPC2
13 | SIGLEC10
14 | CD14
15 | HLA-DOA
16 | HLA-DRB5
17 | EMB
18 | MARCH1
19 | DSE
20 | GOLIM4
21 | AKR1B1
22 | TMEM163
23 | FAM20A
24 | MSR1
25 | TMEM70
26 | C2
27 | GAL3ST4
28 | GPR155


--------------------------------------------------------------------------------
/Deconvolution of Bulk Datasets/Gene Sets/Endothelial.txt:
--------------------------------------------------------------------------------
 1 | ADGRL4
 2 | VWF
 3 | CLDN5
 4 | CLEC14A
 5 | ECSCR
 6 | TM4SF18
 7 | ADGRF5
 8 | ROBO4
 9 | PTPRB
10 | EMCN
11 | MMRN2
12 | CYYR1
13 | ABCG2
14 | ESAM
15 | FLT1
16 | ACVRL1
17 | CD34
18 | CDH5
19 | ERG
20 | TIE1
21 | KDR
22 | DIPK2B
23 | CAVIN2
24 | MECOM
25 | EGFL7
26 | NOSTRIN
27 | SOX18
28 | ABCB1
29 | APOLD1
30 | ANGPT2
31 | TM4SF1
32 | HSPG2
33 | SLCO2A1
34 | PALMD
35 | MPZL2
36 | TMEM204
37 | EDN1
38 | ACE
39 | TEK
40 | BTNL9
41 | SLC38A5
42 | SEMA3G
43 | ITM2A
44 | GNG11
45 | KANK3
46 | PLVAP
47 | ESM1
48 | MFSD2A


--------------------------------------------------------------------------------
/Deconvolution of Bulk Datasets/Gene Sets/IL1B_Inflamm.txt:
--------------------------------------------------------------------------------
 1 | CD83
 2 | IL1B
 3 | NR4A3
 4 | NR4A1
 5 | ABL2
 6 | OLR1
 7 | NR4A2
 8 | BCL2A1
 9 | KDM6B
10 | IER3
11 | RASGEF1B
12 | DUSP2
13 | PLEK
14 | MAFF
15 | CXCL8
16 | ATF3
17 | EGR3
18 | IL1A
19 | NFKBIZ
20 | NFKBID
21 | EGR2
22 | ICAM1
23 | PLAUR
24 | STX11
25 | SELENOK
26 | TNF
27 | SERPINB9
28 | ELL2
29 | CSRNP1
30 | PTGS2


--------------------------------------------------------------------------------
/Deconvolution of Bulk Datasets/Gene Sets/Inflamm_Microglia.txt:
--------------------------------------------------------------------------------
 1 | PDK4
 2 | P2RY13
 3 | USP53
 4 | RHOB
 5 | CTTNBP2
 6 | GSTM3
 7 | CH25H
 8 | SIGLEC8
 9 | AC253572.2
10 | CXCR4
11 | PIK3IP1
12 | LINC01480
13 | CSGALNACT1
14 | SPRY1
15 | SRSF7
16 | IGF1
17 | MTUS1
18 | MTSS1
19 | KHDRBS3
20 | PDGFB
21 | PAPOLG
22 | GPM6A
23 | TNFRSF13C
24 | CXCL12
25 | ADRB2
26 | AP005530.1
27 | TAL1


--------------------------------------------------------------------------------
/Deconvolution of Bulk Datasets/Gene Sets/Instructions:
--------------------------------------------------------------------------------
1 | These are the Gene sets used for calculating the module scores (Refer to Step 1 - Creating Gene Sets).
2 | 


--------------------------------------------------------------------------------
/Deconvolution of Bulk Datasets/Gene Sets/Macrophage.txt:
--------------------------------------------------------------------------------
 1 | ACP5
 2 | PLA2G7
 3 | LGMN
 4 | VAT1
 5 | GCHFR
 6 | FUCA1
 7 | LIPA
 8 | APOC1
 9 | CYP27A1
10 | IFI30
11 | GRN
12 | CAPG
13 | CD68
14 | CD63
15 | PLD3
16 | ALDH1A1
17 | CTSZ
18 | BLVRB
19 | TSPAN4
20 | PRDX1
21 | CD9
22 | ATP6V1F
23 | HS3ST2
24 | OTOA
25 | CTSA
26 | CFD
27 | NCEH1
28 | RMDN3
29 | KLHDC8B
30 | IQGAP2
31 | RRAGD
32 | HTRA4
33 | NPL
34 | PLPP3
35 | PKD2L1
36 | CD59
37 | MGST3


--------------------------------------------------------------------------------
/Deconvolution of Bulk Datasets/Gene Sets/Malignant2.txt:
--------------------------------------------------------------------------------
 1 | PTMS
 2 | SOX4
 3 | BASP1
 4 | PXDN
 5 | EIF2B4
 6 | HIST1H4C
 7 | ELAVL4
 8 | ANK3
 9 | CNTN1
10 | ANKS1B
11 | NSG2
12 | NDUFA9
13 | SALL3
14 | ING4
15 | KLRC2
16 | DCX
17 | IGKC
18 | CD27-AS1
19 | EYA1
20 | KCND2
21 | CACNA1E
22 | MAP2
23 | C12orf57
24 | ATN1
25 | PKIA
26 | CELF5
27 | C12orf4
28 | OPCML
29 | GPC2
30 | REV3L
31 | IGHG3
32 | NETO1
33 | GRM5
34 | SNAP25
35 | CDK5R1
36 | PDGFRA
37 | MYCN
38 | SYT16
39 | MYCNOS
40 | ELAVL3
41 | RAB3C
42 | NRXN1
43 | SYT13
44 | EPB41
45 | PHB2
46 | PCDH15
47 | KLRC4
48 | RTN1


--------------------------------------------------------------------------------
/Deconvolution of Bulk Datasets/Gene Sets/Malignant3.txt:
--------------------------------------------------------------------------------
 1 | MAOB
 2 | AQP4
 3 | APLNR
 4 | NTRK2
 5 | ID4
 6 | LTF
 7 | HSPB8
 8 | ID3
 9 | HLA-DRA
10 | RNF19A
11 | GJA1
12 | FAM107A
13 | CSRP1
14 | GFAP
15 | KCNN3
16 | LRRC2
17 | SERPINA3
18 | SYNM
19 | TJP2
20 | CP
21 | ACTN1
22 | ANOS1
23 | ATP1B1
24 | ECM2
25 | MAN1C1
26 | SLC1A4
27 | EFEMP1
28 | C1S
29 | EZR
30 | CRB2
31 | GALNT15
32 | LRRC55
33 | CRISPLD1
34 | LINC01094
35 | CTSH
36 | SERPING1
37 | ADCYAP1R1
38 | PLAAT4
39 | MGST1
40 | DCLK1
41 | RAMP3
42 | VCL


--------------------------------------------------------------------------------
/Deconvolution of Bulk Datasets/Gene Sets/Malignant4.txt:
--------------------------------------------------------------------------------
 1 | EGLN3
 2 | CA9
 3 | VEGFA
 4 | LOX
 5 | PFKFB4
 6 | IGFBP2
 7 | STC1
 8 | NRN1
 9 | AKAP12
10 | TMEM158
11 | SESN2
12 | CA12
13 | GPRC5A
14 | TRIB3
15 | IGFBP5
16 | PLP2
17 | STC2
18 | TFRC
19 | ARL4C
20 | SLC2A3
21 | ITPR1
22 | SLC7A5
23 | SLITRK4
24 | TNFRSF12A
25 | PYGL
26 | IGFBP3
27 | PPP1R14C
28 | SLC39A14


--------------------------------------------------------------------------------
/Deconvolution of Bulk Datasets/Gene Sets/Malignant6.txt:
--------------------------------------------------------------------------------
 1 | SNX10
 2 | CSRP2
 3 | CHI3L1
 4 | CNN3
 5 | C11orf96
 6 | F3
 7 | CCL2
 8 | MAP1B
 9 | RELL1
10 | CHST2
11 | FLNC
12 | SPOCK2
13 | PCSK1
14 | LAMA2
15 | CD99
16 | SRPX2
17 | CCN1
18 | TNXB
19 | FAM20C
20 | NRCAM
21 | BICD1
22 | RASSF8
23 | MMP19
24 | LZTS1
25 | FOSL2
26 | SOCS3
27 | FNDC4
28 | MDK
29 | ABCA1
30 | LAMB1
31 | MT2A
32 | SPRY2
33 | CSF1
34 | PTGFRN
35 | BAALC
36 | BHLHE40
37 | CLU
38 | TM7SF3
39 | GAP43
40 | PALM2-AKAP2
41 | GADD45A
42 | LIF


--------------------------------------------------------------------------------
/Deconvolution of Bulk Datasets/Gene Sets/Malignant7.txt:
--------------------------------------------------------------------------------
 1 | SMOC1
 2 | BCAN
 3 | FERMT1
 4 | PHYHIPL
 5 | SOX8
 6 | FXYD6
 7 | HES6
 8 | OLIG1
 9 | LINC00632
10 | SHD
11 | TNR
12 | ATCAY
13 | GRIA2
14 | OLIG2
15 | RPS12
16 | ALCAM
17 | HIP1R
18 | NEU4
19 | ANGPTL2
20 | GALNT13
21 | LRRN1
22 | BMP2
23 | MEGF11
24 | MARCKS
25 | ZCCHC24
26 | LRP4
27 | AC009041.2
28 | GRIA4
29 | RPS24
30 | RAP2A
31 | OMG
32 | LRATD2
33 | ZDHHC22
34 | SAPCD2
35 | DNER
36 | IDI1
37 | COL20A1
38 | TNK2
39 | SCN3A
40 | GABRB3
41 | ARFGEF3
42 | ASIC1
43 | SOX6
44 | ATOH8
45 | DLGAP1
46 | PLPPR1
47 | FAM110B


--------------------------------------------------------------------------------
/Deconvolution of Bulk Datasets/Gene Sets/Memory_Like_Tcells.txt:
--------------------------------------------------------------------------------
 1 | IL7R
 2 | MALAT1
 3 | AHNAK
 4 | ANXA1
 5 | TCF7
 6 | FOS
 7 | RPS29
 8 | MTRNR2L12
 9 | SYNE2
10 | CD69
11 | CCR7
12 | ITGA6
13 | TOB1
14 | MT-ATP6
15 | RPLP0
16 | PDE3B
17 | MT-CO3
18 | MT-CO2
19 | MTRNR2L8
20 | PGGHG
21 | LEF1
22 | DDX3Y
23 | MT-CYB
24 | AL627171.2
25 | NKTR
26 | MT-ND1
27 | TMSB10
28 | GIMAP7
29 | ATM
30 | TRABD2A
31 | SERINC5
32 | BCL2
33 | SATB1
34 | DPP4
35 | MT-ND4
36 | MT-ND5
37 | GIMAP4


--------------------------------------------------------------------------------
/Deconvolution of Bulk Datasets/Gene Sets/Microglia.txt:
--------------------------------------------------------------------------------
 1 | TMIGD3
 2 | CYFIP1
 3 | APOC2
 4 | SYNDIG1
 5 | TMEM119
 6 | PIH1D1
 7 | WIPF3
 8 | MCF2L
 9 | TGFBR1
10 | NAV3
11 | BIN1
12 | ZMAT3
13 | LINC01736
14 | GLDN
15 | RTTN
16 | AL078590.2
17 | PDPN
18 | FSCN1
19 | CACNB4
20 | PTCRA
21 | ARMH4
22 | HPGDS
23 | CEBPA
24 | GYPC
25 | RAMP1
26 | IPCEF1
27 | LINC01235


--------------------------------------------------------------------------------
/Deconvolution of Bulk Datasets/Gene Sets/Monocyte.txt:
--------------------------------------------------------------------------------
 1 | FCN1
 2 | LYZ
 3 | CD300E
 4 | CD44
 5 | MCEMP1
 6 | TIMP1
 7 | S100A6
 8 | S100A8
 9 | SH3BGRL3
10 | AC020656.1
11 | FLNA
12 | CFP
13 | ANPEP
14 | S100A12
15 | CCR2
16 | LTA4H
17 | SMIM25
18 | THBS1
19 | STXBP2
20 | UPP1
21 | LILRA5
22 | LILRB2
23 | MYO1G
24 | SGMS2
25 | EREG
26 | CSTA
27 | MPEG1
28 | S100A4
29 | DMXL2
30 | AP1S2
31 | MAP3K20
32 | GLIPR2
33 | CYP1B1
34 | CDA
35 | CLEC12A
36 | NRG1
37 | JARID2


--------------------------------------------------------------------------------
/Deconvolution of Bulk Datasets/Gene Sets/Neutrophils.txt:
--------------------------------------------------------------------------------
 1 | IFITM2
 2 | FCGR3B
 3 | SMCHD1
 4 | VNN2
 5 | CXCR2
 6 | MMP25
 7 | MEGF9
 8 | MME
 9 | MGAM
10 | CXCR1
11 | IGF2R
12 | CLEC4E
13 | IL18RAP
14 | LRRK2
15 | IL1R2
16 | MXD1
17 | ICAM3
18 | TNFRSF10C
19 | RESF1
20 | TMEM154
21 | CPD
22 | S100P
23 | IVNS1ABP
24 | CMTM2
25 | ACSL1
26 | FPR2
27 | NCF1
28 | TUBA4A
29 | IL18R1
30 | CRISPLD2
31 | MSRB1
32 | LITAF
33 | CYP4F3
34 | ADGRE5
35 | QPCT
36 | LILRB3
37 | DGAT2


--------------------------------------------------------------------------------
/Deconvolution of Bulk Datasets/Gene Sets/Oligo.txt:
--------------------------------------------------------------------------------
 1 | MAG
 2 | ERMN
 3 | MOG
 4 | SPOCK3
 5 | MOBP
 6 | CNDP1
 7 | TF
 8 | TMEM144
 9 | CARNS1
10 | KLK6
11 | PEX5L
12 | CLDN11
13 | MYRF
14 | ENPP2
15 | GJB1
16 | FOLH1
17 | PLP1
18 | TPPP
19 | SLC24A2
20 | RAPGEF5
21 | CNTNAP4
22 | EDIL3
23 | AATK
24 | PLEKHH1
25 | PRR18
26 | CAPN3
27 | SOX10
28 | NECAB1
29 | SEPTIN4
30 | HHIP
31 | CNTN2
32 | MAP7
33 | NKAIN2
34 | PPP1R14A
35 | PCSK6
36 | FGFR2
37 | BCAS1
38 | NKX6.2
39 | DBNDD2
40 | GPR37
41 | TMEM125
42 | LGI3
43 | AK5
44 | TUBB4A
45 | PTGDS
46 | BOK
47 | EFHD1


--------------------------------------------------------------------------------
/Deconvolution of Bulk Datasets/Gene Sets/Pericytes.txt:
--------------------------------------------------------------------------------
 1 | MYH11
 2 | FHL5
 3 | LMOD1
 4 | ACTA2
 5 | ADIRF
 6 | RGS5
 7 | SLIT3
 8 | CASQ2
 9 | TINAGL1
10 | MUSTN1
11 | SYNPO2
12 | AOC3
13 | SLC38A11
14 | FRZB
15 | PDE5A
16 | ACTG2
17 | RERGL
18 | MYOCD
19 | PDE3A
20 | MYL9
21 | PLN
22 | TBX18
23 | COL14A1
24 | GJA4
25 | HIGD1B
26 | CNN1
27 | CCDC3
28 | MRVI1
29 | ITGA8
30 | HEYL
31 | NOTCH3
32 | NDUFA4L2
33 | NR2F2
34 | MSRB3
35 | HRC
36 | SMOC2
37 | EDNRA
38 | GUCY1A1
39 | BGN
40 | LRRC32
41 | MCAM
42 | MYLK
43 | ITGA1
44 | AVPR1A
45 | TPM2
46 | PDGFRB


--------------------------------------------------------------------------------
/Deconvolution of Bulk Datasets/Gene Sets/Scavenger.txt:
--------------------------------------------------------------------------------
 1 | TGFBI
 2 | ASPH
 3 | SLC16A10
 4 | NRP1
 5 | ANKH
 6 | IGFBP4
 7 | RNASE1
 8 | ENG
 9 | ECM1
10 | MYO5A
11 | FNDC3B
12 | PLXND1
13 | MRC1
14 | LYVE1
15 | ME2
16 | MMP14
17 | PMP22
18 | TNFRSF11A
19 | CLEC5A
20 | ITGB5
21 | GPRIN3
22 | CTSL
23 | AGFG1
24 | SLC39A8
25 | SH3PXD2B
26 | NIBAN2
27 | CEMIP2
28 | GNG12
29 | SPARC
30 | TTYH3
31 | FPR3
32 | ARHGAP18
33 | HIF1A
34 | LTBP2
35 | OLFML2B
36 | LGI2
37 | SEPTIN11
38 | SASH1
39 | ANXA2
40 | EMILIN2


--------------------------------------------------------------------------------
/Deconvolution of Bulk Datasets/Gene Sets/Terminal_Effector_Tcells.txt:
--------------------------------------------------------------------------------
 1 | TYROBP
 2 | KLRD1
 3 | NCAM1
 4 | IL2RB
 5 | SYK
 6 | SH2D1B
 7 | B3GNT7
 8 | PRF1
 9 | CTSW
10 | ITGAX
11 | HOPX
12 | TMIGD2
13 | TRDC
14 | NKG7
15 | KLRC1
16 | GZMB
17 | GNLY
18 | KRT81
19 | KIR2DL4
20 | PTPN12
21 | GSTP1
22 | MATK
23 | GNPTAB
24 | PIK3AP1
25 | IRF8
26 | KLRK1
27 | LAT2
28 | KRT86
29 | NCR1
30 | CD38
31 | GFOD1
32 | XCL1
33 | KLRF1
34 | MAPK1
35 | MAP3K8
36 | CXXC5
37 | CTBP2
38 | LYN
39 | AOAH
40 | ZFHX3
41 | XCL2
42 | GAS7
43 | KLRC3


--------------------------------------------------------------------------------
/Deconvolution of Bulk Datasets/Gene Sets/Treg.txt:
--------------------------------------------------------------------------------
 1 | FOXP3
 2 | ENTPD1
 3 | IL2RA
 4 | CTLA4
 5 | TNFRSF1B
 6 | IKZF2
 7 | CCR8
 8 | BATF
 9 | TNFRSF18
10 | LAYN
11 | TNFRSF4
12 | SLAMF1
13 | ACSL4
14 | TBC1D4
15 | ICOS
16 | IL1R1
17 | CD4
18 | VDR
19 | TNFRSF9
20 | UGP2
21 | GLRX
22 | LAPTM4B
23 | RTKN2
24 | CCNG2
25 | TNFRSF8
26 | CXCR6
27 | STAM
28 | F5
29 | TYMP
30 | UCP2
31 | ICA1
32 | IL32
33 | MYL6
34 | ZC2HC1A
35 | SAT1
36 | FANK1
37 | CD177
38 | PHTF2
39 | RNF213
40 | SPATS2L
41 | CRLF2
42 | DUSP16


--------------------------------------------------------------------------------
/Deconvolution of Bulk Datasets/Gene Sets/cDC.txt:
--------------------------------------------------------------------------------
 1 | CLNK
 2 | HLA-DQA1
 3 | LGALS2
 4 | SLAMF7
 5 | IDO1
 6 | XCR1
 7 | P2RY14
 8 | HLA-DOB
 9 | P2RY10
10 | FCER1A
11 | FLT3
12 | PARM1
13 | CPNE3
14 | WDFY4
15 | PPA1
16 | CST7
17 | CD1C
18 | SLC38A1
19 | DAPP1
20 | CD1E
21 | TACSTD2
22 | CNN2
23 | MCOLN2
24 | CLEC10A
25 | JAML
26 | RAB11FIP1
27 | CCR6
28 | ZNF366
29 | CCSER1
30 | NAAA
31 | LSP1
32 | GPR157
33 | CRIP1
34 | CBFA2T3
35 | DUSP5


--------------------------------------------------------------------------------
/Figure 1 Visualizations/Quadrant plot generation with piecharts:
--------------------------------------------------------------------------------
 1 | ######## "matrix_piedot.txt" contains all non-doublet myeloid cells. 
 2 | ########  Xaxis is calculated by subtracting the usage of Complement Immunosuppression from the usage of IL1B pro-inflammatory ( Usage of IL1B Inflam - Usage of Complement )
 3 | ########  Yaxis is calculated by subtracting the usage of Scavenger Immunosuppression from the usage of RHOB pro-inflammatory ( Usage of RHOB Inflammatory - Usage of Scavenger )
 4 | ######## The matrix contains info about the usage of each immunomodulatory program and the percentage of all other remaining programs (including identities as "Others" category summed together) 
 5 | 
 6 | library(ggplot2)
 7 | library(dplyr)
 8 | library(scatterpie)
 9 | library(Cairo)
10 | 
11 | 
12 | data <- read.table("matrix_piedot.txt", sep="\t", head=TRUE, row.names=1)
13 | 
14 | 
15 | my_plot <- ggplot() + geom_scatterpie(aes(x=Xaxis, y=Yaxis, r=0.5), data=data, cols=c("Others","Scavenger_Immunosuppressive","IL1B_Inflammatory","Inflammatory_microglia", "Complement_Immunosuppressive"), color=NA) + coord_equal() + theme(panel.grid.major = element_blank(), panel.grid.minor = element_blank(), panel.background = element_blank()) + labs(color=NULL) + scale_fill_manual(values = c("Others" = "gray90", "IL1B_Inflammatory" = "#AB0800", "Complement_Immunosuppressive"="#007AFF", "Inflammatory_microglia"="#FF6961", "Scavenger_Immunosuppressive" = "#0700C4"))
16 | 
17 | 
18 | ggsave(file="Myeloid_Activities_Scatterpie_Full_Version_V2_OtherInident_05.png", plot = my_plot, height = 8, width = 10, dpi=400)
19 | 


--------------------------------------------------------------------------------
/Identifying recurrent programs in Myeloid Cells in Gliomas (Related to Figure 1)/01- Identifying Variable Genes in MGB Cohort (Round 1):
--------------------------------------------------------------------------------
 1 | library(dplyr)
 2 | library(Seurat)
 3 | options(bitmapType='cairo')
 4 | options(future.globals.maxSize = 8000 * 1024^2)
 5 | 
 6 | ########## Identified High-Quality non-doublet Myeloid cells in MGB cohort as discussed in the "Processing of scRNA-Seq Files section". Generated a list of barcodes as shown below "HQ_Myeloid.txt" (One barcode per line) ############
 7 | 
 8 | Myeloid_Cells <- scan("HQ_Myeloid.txt", what="")
 9 | 
10 | ######## Tumors.combined is the Seurat objected generated for all cells ###########
11 | 
12 | Myeloid <- subset(x = Tumors.combined, cells = Myeloid_Cells)
13 | 
14 | Myeloid.data <- GetAssayData(object = Myeloid, slot="counts")
15 | 
16 | 
17 | All <- CreateSeuratObject(counts = Myeloid.data, project = "All", min.cells = 3, min.features = 200)
18 | All[["percent.mt"]] <- PercentageFeatureSet(All, pattern = "^MT.")
19 | 
20 | 
21 | All <- NormalizeData(All)
22 | 
23 | All_Matrix2 <- GetAssayData(object = All)
24 | All_Matrix <- utils_big_as.matrix(All_Matrix2, n_slices_init = 10, verbose = T)
25 | 
26 | 
27 | ########## Filtering the object to obtain genes with expression in at least 20 cells ###############
28 | All_Matrix3 <- All_Matrix[apply(All_Matrix, 1, function(x) sum(x >= 0.1, na.rm=TRUE) > 19),]
29 | All.data2 <- Myeloid.data[rownames(Myeloid.data) %in% rownames(All_Matrix3),]
30 | All <- CreateSeuratObject(counts = All.data2, project = "All", min.cells = 3, min.features = 200)
31 | All[["percent.mt"]] <- PercentageFeatureSet(All, pattern = "^MT.")
32 | 
33 | ############ Identifying Variable Genes #################
34 | 
35 | All <- NormalizeData(All)
36 | 
37 | 
38 | All <- FindVariableFeatures(All, selection.method="vst", nfeatures = 2000)
39 | 
40 | Var <- HVFInfo(object = All, selection.method="vst", assay = "RNA")
41 | 
42 | write.table(Var, file="MGB_Myeloid_Full_Gene_List_Variable_Score_230206.txt", sep="\t", quote=FALSE, col.names=NA)
43 | 
44 | 
45 | ####Identify variable Genes from Var (min 0.001 mean expression then top 2000 variance standardized) and then place the genes in a list ########
46 | 
47 | 
48 | Var2 <- scan("MGB_Variable_Round1.txt", what="")
49 | 
50 | All_Matrix4 <- Myeloid[rownames(Myeloid) %in% Var2,] 
51 | 
52 | All_Matrix5 <- t(All_Matrix4)
53 | 
54 | write.table(All_Matrix5, file="MGB_Myeloid_Matrix_Filtered_For_NMF_Round1.txt", sep="\t", quote=FALSE, col.names=NA)
55 | 
56 | 
57 | 


--------------------------------------------------------------------------------
/Identifying recurrent programs in Myeloid Cells in Gliomas (Related to Figure 1)/02- Round 1 cNMF for myeloid cells in MGB cohort:
--------------------------------------------------------------------------------
 1 | cnmf prepare --output-dir ./All/ --name All -c MGB_Myeloid_Matrix_Filtered_For_NMF_Round1.txt -k 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 --n-iter 500 --total-workers 1 --seed 14 --numgenes 2000
 2 | 
 3 | cnmf factorize --output-dir ./All/ --name All --worker-index 0;
 4 | 
 5 | cnmf combine --output-dir ./All/ --name All;
 6 | 
 7 | rm ./All/All/cnmf_tmp/All.spectra.k_*.iter_*.df.npz;
 8 | 
 9 | cnmf k_selection_plot --output-dir All --name All;
10 | 
11 | 
12 | ##### Based on the K plot, we select K=22 ########
13 | 
14 | cnmf consensus --output-dir All --name All --components 22 --local-density-threshold 0.02 --show-clustering
15 | 
16 | 
17 | ########## The cnmf consensus script outputs usage matrix which is normalized per row to percentages (in each cell, the usages of the programs sums up to 100) ################
18 | ########## The gene spectra score matrix is used for the annotation of the programs #############
19 | ########## The first round was used to identify cells using programs that are not myeloid in nature (i.e. different cell type identity) or programs used by less than 100 myeloid cell. We remove cell using these programs (> 20% usage)
20 | 


--------------------------------------------------------------------------------
/Identifying recurrent programs in Myeloid Cells in Gliomas (Related to Figure 1)/03- Identifying Variable Genes in Houston Cohort (Round 1):
--------------------------------------------------------------------------------
 1 | library(dplyr)
 2 | library(Seurat)
 3 | options(bitmapType='cairo')
 4 | options(future.globals.maxSize = 8000 * 1024^2)
 5 | 
 6 | ########## Identified High-Quality non-doublet Myeloid cells in the Houston cohort as discussed in the "Processing of scRNA-Seq Files section". Generated a list of barcodes as shown below "HQ_Myeloid.txt" (One barcode per line) ############
 7 | 
 8 | Myeloid_Cells <- scan("HQ_Myeloid.txt", what="")
 9 | 
10 | ######## Tumors.combined is the Seurat objected generated for all cells ###########
11 | 
12 | Myeloid <- subset(x = Tumors.combined, cells = Myeloid_Cells)
13 | 
14 | Myeloid.data <- GetAssayData(object = Myeloid, slot="counts")
15 | 
16 | 
17 | All <- CreateSeuratObject(counts = Myeloid.data, project = "All", min.cells = 3, min.features = 200)
18 | All[["percent.mt"]] <- PercentageFeatureSet(All, pattern = "^MT.")
19 | 
20 | 
21 | All <- NormalizeData(All)
22 | 
23 | All_Matrix2 <- GetAssayData(object = All)
24 | All_Matrix <- utils_big_as.matrix(All_Matrix2, n_slices_init = 10, verbose = T)
25 | 
26 | 
27 | ########## Filtering the object to obtain genes with expression in at least 20 cells ###############
28 | All_Matrix3 <- All_Matrix[apply(All_Matrix, 1, function(x) sum(x >= 0.1, na.rm=TRUE) > 19),]
29 | All.data2 <- Myeloid.data[rownames(Myeloid.data) %in% rownames(All_Matrix3),]
30 | All <- CreateSeuratObject(counts = All.data2, project = "All", min.cells = 3, min.features = 200)
31 | All[["percent.mt"]] <- PercentageFeatureSet(All, pattern = "^MT.")
32 | 
33 | ############ Identifying Variable Genes #################
34 | 
35 | All <- NormalizeData(All)
36 | 
37 | 
38 | All <- FindVariableFeatures(All, selection.method="vst", nfeatures = 2000)
39 | 
40 | Var <- HVFInfo(object = All, selection.method="vst", assay = "RNA")
41 | 
42 | write.table(Var, file="Houston_Myeloid_Full_Gene_List_Variable_Score_230206.txt", sep="\t", quote=FALSE, col.names=NA)
43 | 
44 | 
45 | ####Identify variable Genes from Var (min 0.001 mean expression then top 2000 variance standardized) and then place the genes in a list ########
46 | 
47 | 
48 | Var2 <- scan("Houston_Variable_Round1.txt", what="")
49 | 
50 | All_Matrix4 <- Myeloid[rownames(Myeloid) %in% Var2,] 
51 | 
52 | All_Matrix5 <- t(All_Matrix4)
53 | 
54 | write.table(All_Matrix5, file="Houston_Myeloid_Matrix_Filtered_For_NMF_Round1.txt", sep="\t", quote=FALSE, col.names=NA)
55 | 


--------------------------------------------------------------------------------
/Identifying recurrent programs in Myeloid Cells in Gliomas (Related to Figure 1)/04- Round 1 cNMF for myeloid cells in Houston cohort:
--------------------------------------------------------------------------------
 1 | cnmf prepare --output-dir ./All/ --name All -c Houston_Myeloid_Matrix_Filtered_For_NMF_Round1.txt -k 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 --n-iter 500 --total-workers 1 --seed 14 --numgenes 2000
 2 | 
 3 | cnmf factorize --output-dir ./All/ --name All --worker-index 0;
 4 | 
 5 | cnmf combine --output-dir ./All/ --name All;
 6 | 
 7 | rm ./All/All/cnmf_tmp/All.spectra.k_*.iter_*.df.npz;
 8 | 
 9 | cnmf k_selection_plot --output-dir All --name All;
10 | 
11 | 
12 | ##### Based on the K plot, we select K=23 ########
13 | 
14 | cnmf consensus --output-dir All --name All --components 23 --local-density-threshold 0.02 --show-clustering
15 | 
16 | 
17 | ########## The cnmf consensus script outputs usage matrix which is normalized per row to percentages (in each cell, the usages of the programs sums up to 100) ################
18 | ########## The gene spectra score matrix is used for the annotation of the programs #############
19 | ########## The first round was used to identify cells using programs that are not myeloid in nature (i.e. different cell type identity) or programs used by less than 100 myeloid cell. We remove cell using these programs (> 20% usage)
20 | 


--------------------------------------------------------------------------------
/Identifying recurrent programs in Myeloid Cells in Gliomas (Related to Figure 1)/05- Identifying Variable Genes in Jackson's Cohort (Round 1):
--------------------------------------------------------------------------------
 1 | library(dplyr)
 2 | library(Seurat)
 3 | options(bitmapType='cairo')
 4 | options(future.globals.maxSize = 8000 * 1024^2)
 5 | 
 6 | ########## Identified High-Quality non-doublet Myeloid cells in the Jackson's cohort as discussed in the "Processing of scRNA-Seq Files section". Generated a list of barcodes as shown below "HQ_Myeloid.txt" (One barcode per line) ############
 7 | 
 8 | Myeloid_Cells <- scan("HQ_Myeloid.txt", what="")
 9 | 
10 | ######## Tumors.combined is the Seurat objected generated for all cells ###########
11 | 
12 | Myeloid <- subset(x = Tumors.combined, cells = Myeloid_Cells)
13 | 
14 | Myeloid.data <- GetAssayData(object = Myeloid, slot="counts")
15 | 
16 | 
17 | All <- CreateSeuratObject(counts = Myeloid.data, project = "All", min.cells = 3, min.features = 200)
18 | All[["percent.mt"]] <- PercentageFeatureSet(All, pattern = "^MT.")
19 | 
20 | 
21 | All <- NormalizeData(All)
22 | 
23 | All_Matrix2 <- GetAssayData(object = All)
24 | All_Matrix <- utils_big_as.matrix(All_Matrix2, n_slices_init = 10, verbose = T)
25 | 
26 | 
27 | ########## Filtering the object to obtain genes with expression in at least 20 cells ###############
28 | All_Matrix3 <- All_Matrix[apply(All_Matrix, 1, function(x) sum(x >= 0.1, na.rm=TRUE) > 19),]
29 | All.data2 <- Myeloid.data[rownames(Myeloid.data) %in% rownames(All_Matrix3),]
30 | All <- CreateSeuratObject(counts = All.data2, project = "All", min.cells = 3, min.features = 200)
31 | All[["percent.mt"]] <- PercentageFeatureSet(All, pattern = "^MT.")
32 | 
33 | ############ Identifying Variable Genes #################
34 | 
35 | All <- NormalizeData(All)
36 | 
37 | 
38 | All <- FindVariableFeatures(All, selection.method="vst", nfeatures = 2000)
39 | 
40 | Var <- HVFInfo(object = All, selection.method="vst", assay = "RNA")
41 | 
42 | write.table(Var, file="Jackson_Myeloid_Full_Gene_List_Variable_Score_230206.txt", sep="\t", quote=FALSE, col.names=NA)
43 | 
44 | 
45 | ####Identify variable Genes from Var (min 0.001 mean expression then top 2000 variance standardized) and then place the genes in a list ########
46 | 
47 | 
48 | Var2 <- scan("MGB_Variable_Round1.txt", what="")
49 | 
50 | All_Matrix4 <- Myeloid[rownames(Myeloid) %in% Var2,] 
51 | 
52 | All_Matrix5 <- t(All_Matrix4)
53 | 
54 | write.table(All_Matrix5, file="Jackson_Myeloid_Matrix_Filtered_For_NMF_Round1.txt", sep="\t", quote=FALSE, col.names=NA)
55 | 


--------------------------------------------------------------------------------
/Identifying recurrent programs in Myeloid Cells in Gliomas (Related to Figure 1)/06- Round 1 cNMF for myeloid cells in Jackson's cohort:
--------------------------------------------------------------------------------
 1 | cnmf prepare --output-dir ./All/ --name All -c Jackson_Myeloid_Matrix_Filtered_For_NMF_Round1.txt -k 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 --n-iter 500 --total-workers 1 --seed 14 --numgenes 2000
 2 | 
 3 | cnmf factorize --output-dir ./All/ --name All --worker-index 0;
 4 | 
 5 | cnmf combine --output-dir ./All/ --name All;
 6 | 
 7 | rm ./All/All/cnmf_tmp/All.spectra.k_*.iter_*.df.npz;
 8 | 
 9 | cnmf k_selection_plot --output-dir All --name All;
10 | 
11 | 
12 | ##### Based on the K plot, we select K=14 ########
13 | 
14 | cnmf consensus --output-dir All --name All --components 14 --local-density-threshold 0.02 --show-clustering
15 | 
16 | 
17 | ########## The cnmf consensus script outputs usage matrix which is normalized per row to percentages (in each cell, the usages of the programs sums up to 100) ################
18 | ########## The gene spectra score matrix is used for the annotation of the programs #############
19 | ########## The first round was used to identify cells using programs that are not myeloid in nature (i.e. different cell type identity) or programs used by less than 100 myeloid cell. We remove cell using these programs (> 20% usage)
20 | 


--------------------------------------------------------------------------------
/Identifying recurrent programs in Myeloid Cells in Gliomas (Related to Figure 1)/07- Identifying union gene lists suitable for Round 2 NMF from all the cohorts:
--------------------------------------------------------------------------------
  1 | library(dplyr)
  2 | library(Seurat)
  3 | options(bitmapType='cairo')
  4 | options(future.globals.maxSize = 8000 * 1024^2)
  5 | 
  6 | ##### Filtering MGB cohort matrix to include myeloid cells cleaned using Round 1 cNMF ##################
  7 | 
  8 | MGB <- read.table("All_MGB_220818_Raw_Expression.txt", sep="\t", head=TRUE, row.names=1)
  9 | 
 10 | Myeloid4 <- scan("HQ_Myeloid_MGB.txt", what="")
 11 | 
 12 | MGB2 <- MGB[,colnames(MGB) %in% Myeloid4]
 13 | 
 14 | 
 15 | ##### Filtering Jackson's cohort matrix to include myeloid cells cleaned using Round 1 cNMF ##################
 16 | 
 17 | Jackson <- read.table("Jackson_All_Tumors_Raw_Expression.txt", sep="\t", head=TRUE, row.names=1)
 18 | 
 19 | Myeloid2 <- scan("HQ_Myeloid_Jackson.txt", what="")
 20 | 
 21 | Jackson2 <- Jackson[,colnames(Jackson) %in% Myeloid2]
 22 | 
 23 | 
 24 | ##### Filtering Houston's cohort matrix to include myeloid cells cleaned using Round 1 cNMF ##################
 25 | 
 26 | Houston <- read.table("All_Houston_220826_Raw_Expression.txt", sep="\t", head=TRUE, row.names=1)
 27 | 
 28 | Myeloid3 <- scan("HQ_Myeloid_Houston.txt", what="")
 29 | 
 30 | Houston2 <- Houston[,colnames(Houston) %in% Myeloid3]
 31 | 
 32 | 
 33 | ######### Creating Seurat object for each cohort and merging the objects to create one ##########
 34 | 
 35 | MGB_Object <- CreateSeuratObject(counts = MGB2, project = "MGB", min.cells = 3, min.features = 200)
 36 | 
 37 | MGB_Object[["percent.mt"]] <- PercentageFeatureSet(MGB_Object, pattern = "^MT.")
 38 | 
 39 | 
 40 | Jackson_Object <- CreateSeuratObject(counts = Jackson2, project = "Jackson", min.cells = 3, min.features = 200)
 41 | 
 42 | Jackson_Object[["percent.mt"]] <- PercentageFeatureSet(Jackson_Object, pattern = "^MT.")
 43 | 
 44 | 
 45 | Houston_Object <- CreateSeuratObject(counts = Houston2, project = "Houston", min.cells = 3, min.features = 200)
 46 | 
 47 | Houston_Object[["percent.mt"]] <- PercentageFeatureSet(Houston_Object, pattern = "^MT.")
 48 | 
 49 | 
 50 | Myeloid <-  merge(MGB_Object, y = c(Jackson_Object, Houston_Object), add.cell.ids = c("MGB", "Jackson", "Houston"))
 51 | 
 52 | all.genes <- rownames(Myeloid)
 53 | 
 54 | 
 55 | ######### Normalization and identifying variable genes  suitable for Round 2 cNMF for each cohort #############$
 56 | 
 57 | 
 58 | DefaultAssay(Myeloid) <- "RNA"
 59 | Myeloid <- NormalizeData(Myeloid)
 60 | 
 61 | Myeloid <- FindVariableFeatures(Myeloid, selection.method="vst", nfeatures = 2000)
 62 | 
 63 | Var <- HVFInfo(object = Myeloid, selection.method="vst", assay = "RNA")
 64 | 
 65 | write.table(Var, file="Combined_Myeloid_Full_Gene_List_Variable_Score_Union_Based.txt", sep="\t", quote=FALSE, col.names=NA)
 66 | 
 67 | all.genes <- rownames(Myeloid)
 68 | Myeloid <- ScaleData(Myeloid, features = all.genes)
 69 | 
 70 | 
 71 | write.table(Myeloid@meta.data, file="Glioma_Combined_Myeloid_TAM_All_MetaData_Union_Based.txt", sep="\t", col.names=NA, quote=FALSE)
 72 | 
 73 | DefaultAssay(Myeloid) <- "RNA"
 74 | 
 75 | Myeloid.data <- GetAssayData(object = Myeloid, slot="counts")
 76 | 
 77 | 
 78 | #################### Got the union filtered variable by minimum expression of 0.01 in the Combined_Myeloid_Full_Gene_List_Variable_Score_Union_Based.txt and minimum variance standardized of 1 ################################
 79 | 
 80 | Variable <- scan("union_filtered_Variable.txt", what="")
 81 | 
 82 | 
 83 | 
 84 | 
 85 | #################### Filter the cleaned myeloid matrix of each cohort to include only these genes of interest for Round 2 cNMF for each cohort #############
 86 | 
 87 | MGB3 <- MGB[rownames(MGB) %in% Variable,colnames(MGB) %in% Myeloid4]
 88 | 
 89 | write.table(t(MGB3), file="MGB_Myeloid_Matrix_Filtered_For_NMF_Round2.txt", sep="\t", quote=FALSE, col.names=NA)
 90 | 
 91 | 
 92 | Houston3 <- Houston[rownames(Houston) %in% Variable,colnames(Houston) %in% Myeloid3]
 93 | 
 94 | write.table(t(Houston3), file="Houston_Myeloid_Matrix_Filtered_For_NMF_Round2.txt", sep="\t", quote=FALSE, col.names=NA)
 95 | 
 96 | 
 97 | Jackson3 <- Jackson[rownames(Jackson) %in% Variable,colnames(Jackson) %in% Myeloid2]
 98 | 
 99 | write.table(t(Jackson3), file="Jackson_Myeloid_Matrix_Filtered_For_NMF_Round2.txt", sep="\t", quote=FALSE, col.names=NA)
100 | 


--------------------------------------------------------------------------------
/Identifying recurrent programs in Myeloid Cells in Gliomas (Related to Figure 1)/08- Round 2 cNMF for myeloid cells in MGB cohort:
--------------------------------------------------------------------------------
 1 | ######### The input matrix include only genes identified in Step 7 ########
 2 | 
 3 | cnmf prepare --output-dir ./All/ --name All -c MGB_Myeloid_Matrix_Filtered_For_NMF_Round2.txt -k 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 --n-iter 500 --total-workers 1 --seed 14 --numgenes 2276
 4 | 
 5 | cnmf factorize --output-dir ./All/ --name All --worker-index 0;
 6 | 
 7 | cnmf combine --output-dir ./All/ --name All;
 8 | 
 9 | rm ./All/All/cnmf_tmp/All.spectra.k_*.iter_*.df.npz;
10 | 
11 | cnmf k_selection_plot --output-dir All --name All;
12 | 
13 | 
14 | ##### Based on the K plot, we select K=18 ########
15 | 
16 | cnmf consensus --output-dir All --name All --components 18 --local-density-threshold 0.02 --show-clustering
17 | 
18 | 
19 | ########## The cnmf consensus script outputs usage matrix which is normalized per row to percentages (in each cell, the usages of the programs sums up to 100) ################
20 | ########## The gene spectra score matrix is used for the annotation of the programs #############
21 | 


--------------------------------------------------------------------------------
/Identifying recurrent programs in Myeloid Cells in Gliomas (Related to Figure 1)/09- Round 2 cNMF for myeloid cells in Houston cohort:
--------------------------------------------------------------------------------
 1 | ######### The input matrix include only genes identified in Step 7 ########
 2 | 
 3 | cnmf prepare --output-dir ./All/ --name All -c Houston_Myeloid_Matrix_Filtered_For_NMF_Round2.txt -k 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 --n-iter 500 --total-workers 1 --seed 14 --numgenes 2276
 4 | 
 5 | cnmf factorize --output-dir ./All/ --name All --worker-index 0;
 6 | 
 7 | cnmf combine --output-dir ./All/ --name All;
 8 | 
 9 | rm ./All/All/cnmf_tmp/All.spectra.k_*.iter_*.df.npz;
10 | 
11 | cnmf k_selection_plot --output-dir All --name All;
12 | 
13 | 
14 | ##### Based on the K plot, we select K=19 ########
15 | 
16 | cnmf consensus --output-dir All --name All --components 19 --local-density-threshold 0.02 --show-clustering
17 | 
18 | 
19 | ########## The cnmf consensus script outputs usage matrix which is normalized per row to percentages (in each cell, the usages of the programs sums up to 100) ################
20 | ########## The gene spectra score matrix is used for the annotation of the programs #############
21 | 


--------------------------------------------------------------------------------
/Identifying recurrent programs in Myeloid Cells in Gliomas (Related to Figure 1)/10- Round 2 cNMF for myeloid cells in Jackson's cohort:
--------------------------------------------------------------------------------
 1 | ######### The input matrix include only genes identified in Step 7 ########
 2 | 
 3 | cnmf prepare --output-dir ./All/ --name All -c Jackson_Myeloid_Matrix_Filtered_For_NMF_Round2.txt -k 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 --n-iter 500 --total-workers 1 --seed 14 --numgenes 2276
 4 | 
 5 | cnmf factorize --output-dir ./All/ --name All --worker-index 0;
 6 | 
 7 | cnmf combine --output-dir ./All/ --name All;
 8 | 
 9 | rm ./All/All/cnmf_tmp/All.spectra.k_*.iter_*.df.npz;
10 | 
11 | cnmf k_selection_plot --output-dir All --name All;
12 | 
13 | 
14 | ##### Based on the K plot, we select K=18 ########
15 | 
16 | cnmf consensus --output-dir All --name All --components 18 --local-density-threshold 0.02 --show-clustering
17 | 
18 | 
19 | ########## The cnmf consensus script outputs usage matrix which is normalized per row to percentages (in each cell, the usages of the programs sums up to 100) ################
20 | ########## The gene spectra score matrix is used for the annotation of the programs #############
21 | 


--------------------------------------------------------------------------------
/Identifying recurrent programs in Myeloid Cells in Gliomas (Related to Figure 1)/11- Identifying consensus programs:
--------------------------------------------------------------------------------
1 | ####### E3, is the obtained by merging the gene spectra outputs of the three cohorts #######
2 | 
3 | library(jacpop)
4 | library(lsa)
5 | 
6 | Cos <- cosine(E3)
7 | 
8 | Cosine_dist <- as.dist(1-Cos)
9 | 


--------------------------------------------------------------------------------
/Identifying recurrent programs in Myeloid Cells in Gliomas (Related to Figure 1)/13 - Calculating the usages of the consensus myeloid programs in the validation Mcgill cohort:
--------------------------------------------------------------------------------
 1 | ####### After we obtained the spectra for all consensus programs in Step 12, we placed the values in a data frame "consensus_spectra.txt" ######
 2 | 
 3 | ############## Filter the Mcgill Cohort Matrix to include only non-doublet myeloid cells and genes that were used for Round 2 cNMF in the discovery cohorts #######
 4 | 
 5 | ############## Inside R ################################
 6 | library(dplyr)
 7 | library(Seurat)
 8 | options(bitmapType='cairo')
 9 | options(future.globals.maxSize = 8000 * 1024^2)
10 | 
11 | Matrix <- GetAssayData(Tumors.combined, slot = "counts")
12 | 
13 | Genes <- read.table("union_filtered_Variable.txt", sep="\t", head=TRUE, row.names=1)
14 | 
15 | Genes2 <- rownames(Genes)
16 | 
17 | Matrix2 <- Matrix[rownames(Matrix) %in% Genes2,]
18 | 
19 | write.table(t(as.matrix(Matrix2)), file="./OPK_10X_Raw_Counts_Variable_for_Myeloid_cNMF.txt", sep="\t", col.names=NA, quote=FALSE)
20 | 
21 | 
22 | dim(Matrix2) ######### to find out the number for --numgenes below #################
23 | 
24 | q()
25 | 
26 | ##################################### Exit R ###########################################
27 | conda activate cnmf_env
28 | 
29 | cnmf prepare --output-dir ./Calculate_Usage_Myeloid/ --name Calculate_Usage_Myeloid -c OPK_10X_Raw_Counts_Variable_for_Myeloid_cNMF.txt -k 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 --n-iter 500 --total-workers 1 --seed 14 --numgenes 2278;
30 | 
31 | python;
32 | 
33 | ################################## Inside Python #################################
34 | 
35 | import sklearn
36 | import sklearn.decomposition
37 | from sklearn.decomposition import non_negative_factorization
38 | import numpy as np
39 | import scanpy as sc
40 | import csv
41 | import scipy
42 | import pandas as pd
43 |  
44 | 
45 | H = pd.read_table("consensus_spectra.txt", sep="\t", index_col=0)
46 | 
47 | H2 = H.T
48 | 
49 | X = sc.read_h5ad('Calculate_Usage_Myeloid/Calculate_Usage_Myeloid/cnmf_tmp/Calculate_Usage_Myeloid.norm_counts.h5ad')
50 | 
51 | X2 = X.X.toarray()
52 | 
53 | X3 = pd.DataFrame(data=X2, columns = X.var_names , index = X.obs.index)
54 | 
55 | H3 = H2.filter(items = X3.columns)
56 | 
57 | H4 = H3.to_numpy()
58 | 
59 | X5 = X2.astype(np.float64)
60 | 
61 | test = sklearn.decomposition.non_negative_factorization(X5, W=None, H=H4, n_components= 14, init='random', update_H=False, solver='cd', beta_loss='frobenius', tol=0.0001, max_iter=1000, alpha=0.0, alpha_W=0.0, alpha_H='same', l1_ratio=0.0, regularization=None, random_state=None, verbose=0, shuffle=False)
62 | 
63 | test2 = list(test)
64 | 
65 | pd.DataFrame(test2[0], columns= H.columns, index=X.obs.index).to_csv(path_or_buf="./OPK_10X_Myeloid_Programs_Usage.txt", sep="\t", quoting=csv.QUOTE_NONE)
66 | 
67 | ############ The output "OPK_10X_Myeloid_Programs_Usage.txt" contains the usage values of all cells which can be then normalized to percentages (each row (cell) the values of the usages should be converted to percentages so that the sums of the program usages will be always 100 in every cell ##############
68 | 


--------------------------------------------------------------------------------
/MAESTER/01- Trimming R2 reads to include high quality bases only:
--------------------------------------------------------------------------------
 1 | mkdir Trimmed;
 2 | 
 3 | cp *_R2_*.fastq.gz ./Trimmed/;
 4 | 
 5 | cd ./Trimmed/;
 6 | 
 7 | for file in *;do gunzip $file;done 
 8 | 
 9 | for file in *.fastq;do homerTools trim -5 24 $file;done;
10 | 
11 | 
12 | ###### This will copy R2 files to another folder, unzip them, trim the first 25 bases in each read
13 | ###### the ouputs can be renamed and gunzipped and placed in the same folder as the R1 read for subsequent steps of processing and analysis
14 | 


--------------------------------------------------------------------------------
/MAESTER/02- Extracting barcode and UMI sequences from R1 and transferring them to read names in processed R2 fastqs:
--------------------------------------------------------------------------------
 1 | Assemble_fastq.R /Path/To/Fastqs Maester barcodes.txt 12 8
 2 | 
 3 | 
 4 | #### Assemble_fastq.R is obtained from https://github.com/petervangalen/MAESTER-2021/blob/main/Pre-processing/Assemble_fastq.R #####
 5 | 
 6 | #### /Path/To/Fastqs is the folder which includes R1 and R2 fastqs
 7 | 
 8 | ##### Maester is the name of the sample (Located inside the folder) that will be proccessed
 9 | 
10 | ###### barcodes.txt is the list of barcodes in the scRNA-Seq part that passes the QC threshold
11 | 


--------------------------------------------------------------------------------
/MAESTER/03- Mapping Maester libraries:
--------------------------------------------------------------------------------
1 | STAR --genomeDir /seq/epiprod02/Chadi/Genomes/STAR/hg38_MitoMasked/ --readFilesIn Maester_Processed_R2.fastq.gz --outFileNamePrefix ./Mapped/Maester --outSAMtype BAM SortedByCoordinate --outFilterMultimapNmax 2 --outFilterScoreMinOverLread 0.25 --readFilesCommand zcat;
2 | 


--------------------------------------------------------------------------------
/MAESTER/04- Make the STAR aligner bam file compatible with MAEGATK:
--------------------------------------------------------------------------------
1 | Tag_CB_UMI.sh Maester.bam
2 | 
3 | 
4 | #### Tag_CB_UMI.sh is obtained from https://github.com/petervangalen/MAESTER-2021/blob/main/Pre-processing/Tag_CB_UMI.sh ###
5 | 


--------------------------------------------------------------------------------
/MAESTER/05- Run MAEGATK to obtain single cell level values of counts and coverages:
--------------------------------------------------------------------------------
1 | maegatk bcall --input=${bam_input} --output=${maegatk_outputs} --mito-genome=chrM.fa --min-reads=3 --ncores=20  --snake-stdout
2 | 
3 | 
4 | ##### bam_input is the output from Tag_CB_UMI.sh script ####
5 | ##### Make sure that chrM.fa is indexed using BWA and that all files can be located by maegatk 
6 | ##### ${maegatk_outputs} is the folder where the maegatk.rds R object will be generated which is crucial for the next steps of the analysis
7 | 


--------------------------------------------------------------------------------
/MAESTER/06 - Obtain Counts Matrix and Coverage Matrix at single cell level from MAEGATK output:
--------------------------------------------------------------------------------
 1 | #################inside R###########
 2 | 
 3 | #~~~~~~~~~~~~~~~~~~~~~#
 4 | #### Upload Prerequisites ####
 5 | #~~~~~~~~~~~~~~~~~~~~~#
 6 | 
 7 | options(stringsAsFactors = FALSE)
 8 | options(scipen = 999)
 9 | 
10 | library(tidyverse)
11 | library(SummarizedExperiment)
12 | library(Seurat)
13 | library(data.table)
14 | library(Matrix)
15 | library(ComplexHeatmap)
16 | library(gdata)
17 | library(stringr)
18 | library(ggforce)
19 | 
20 | ###### Read MAEGATK Output #############
21 | 
22 | maegatk.rse <- readRDS("maegatk.rds")
23 | 
24 | 
25 | message("computeAFMutMatrix()")
26 | computeAFMutMatrix <- function(SE){
27 |   ref_allele <- as.character(rowRanges(SE)$refAllele)
28 |   
29 |   getMutMatrix <- function(letter){
30 |     mat <- (assays(SE)[[paste0(letter, "_counts_fw")]] + assays(SE)[[paste0(letter, "_counts_rev")]]) 
31 |     rownames(mat) <- paste0(as.character(1:dim(mat)[1]), "_", toupper(ref_allele), ">", letter)
32 |     return(mat[toupper(ref_allele) != letter,])
33 |   }
34 |   
35 |   rbind(getMutMatrix("A"), getMutMatrix("C"), getMutMatrix("G"), getMutMatrix("T"))
36 | }
37 | 
38 | af.dm2 <- data.matrix(computeAFMutMatrix(maegatk.rse))
39 | 
40 | 
41 | ############# Filter The matrix to include annotated barcodes from the respective scRNA-Seq library ##########
42 | 
43 | seu <- read.table("CellTypes_Coarse_V10.txt", sep="\t", head=TRUE, row.names=1)
44 | 
45 | common.cells <- intersect(rownames(seu), colnames(af.dm2))
46 | 
47 | seu <- seu[common.cells,]
48 | 
49 | af.dm3 <- af.dm2[,common.cells]
50 | 
51 | write.table(af.dm3, file="Maester_Counts.txt", sep="\t", col.names=NA, quote=FALSE)
52 | 
53 | 
54 | ############# Obtain Coverage Matrix ##########
55 | 
56 | 
57 | cov <- assays(maegatk.rse)[["coverage"]]
58 | 
59 | cov2 <- as.matrix(cov)
60 | 
61 | seu <- read.table("MGH915_CellTypes_Coarse_V10.txt", sep="\t", head=TRUE, row.names=1)
62 | 
63 | common.cells <- intersect(rownames(seu), colnames(cov2))
64 | 
65 | seu <- seu[common.cells,]
66 | 
67 | cov3 <- cov2[,common.cells]
68 | 
69 | write.table(cov3,file="Coverage.txt", sep="\t", quote=FALSE, col.names=NA)
70 | 
71 | 
72 | ######## To be performed similarly for Primary tumor and PBMC libraries ###########
73 | 


--------------------------------------------------------------------------------
/MAESTER/07- Low Resolution Pesduobulking of count matrices:
--------------------------------------------------------------------------------
  1 | ######## We combine the single-cell count matrix for primary tumor and PBMC libraries #################
  2 | 
  3 | ############################ Inside R ##########################################
  4 | 
  5 | data_A <- read.table("PT_Maester_Counts.txt",head=TRUE, sep="\t", row.names=1)
  6 | 
  7 | data_PBMC <- read.table("PBMC_Maester_Counts.txt",head=TRUE, sep="\t", row.names=1)
  8 | 
  9 | 
 10 | data_Final <- merge(data_A, data_PBMC, by="row.names")
 11 | 
 12 | rownames(data_Final) <- data_Final$Row.names
 13 | 
 14 | data_Final <- data_Final[,-1]
 15 | 
 16 | library(dplyr)
 17 | 
 18 | 
 19 | ######## The annotation file includes the filtered barcodes and respective low-resolution annotation for both PT and PBMC libraries ##########
 20 | ######## Filter the merged matrix to include annotated barcodes only (cells that pass RNA-Seq QC ########
 21 | 
 22 | annotation <- read.table("CellTypes_Coarse_V10.txt", head=TRUE, row.names = 1, sep="\t")
 23 | 
 24 | common.cells <- intersect(rownames(annotation), colnames(data_Final))
 25 | 
 26 | annotation <- annotation[common.cells,]
 27 | 
 28 | data_Final <- data_Final[,common.cells]
 29 | 
 30 | 
 31 | ######### Subset the matrix to generate a matrix for each annotation #########
 32 | 
 33 | cells.tib <- tibble(cell = common.cells,
 34 |                     orig.ident = annotation$orig.ident,
 35 |                     CellType_RNA = annotation$CellType)
 36 | 
 37 | CellSubsets.ls <- list(unionCells = cells.tib$cell,
 38 |                        TAM = filter(cells.tib, CellType_RNA == "Myeloid")$cell,
 39 |                        Malignant = filter(cells.tib, CellType_RNA == "Malignant")$cell,
 40 | 	               Stromal = filter(cells.tib, CellType_RNA == "Stromal")$cell,
 41 | 	               Oligo = filter(cells.tib, CellType_RNA == "Oligo")$cell,
 42 | 	               Tcells = filter(cells.tib, CellType_RNA == "Tcells")$cell,
 43 |                        Myeloid_PBMC = filter(cells.tib, CellType_RNA == "Myeloid_PBMC")$cell
 44 | )
 45 | 
 46 | 
 47 | data_Malignant <- data_Final[,colnames(data_Final) %in% CellSubsets.ls$Malignant]
 48 | data_Tcells <- data_Final[,colnames(data_Final) %in% CellSubsets.ls$Tcells]
 49 | data_TAM <- data_Final[,colnames(data_Final) %in% CellSubsets.ls$TAM]
 50 | data_Oligo <- data_Final[,colnames(data_Final) %in% CellSubsets.ls$Oligo]
 51 | data_Stromal <- data_Final[,colnames(data_Final) %in% CellSubsets.ls$Stromal]
 52 | data_Myeloid_PBMC <- data_Final[,colnames(data_Final) %in% CellSubsets.ls$Myeloid_PBMC]
 53 | 
 54 | 
 55 | 
 56 | ############# Perform Pseduobulking by summing the values for each variant in each category and then output the results in a pseudobulked matrix #########
 57 | 
 58 | Group_Names <- c("Tcells", "TAM", "Oligo", "Stromal", "Myeloid_PBMC", "Malignant", "All_Cells")
 59 | 
 60 | 
 61 | E <- matrix(data=0, nrow = nrow(data_Final), ncol=length(Group_Names));
 62 | 
 63 | for (i in 1:nrow(data_Final)){
 64 |     
 65 |         
 66 |         
 67 |         data_Tcells_G1 <- data_Tcells[i,,drop=FALSE]
 68 |         
 69 |         Test1 <- sum(as.numeric(data_Tcells_G1))
 70 |         
 71 |         
 72 |         data_TAM_G1 <- data_TAM[i,,drop=FALSE]
 73 |         
 74 |         Test2 <- sum(as.numeric(data_TAM_G1))
 75 | 
 76 | 
 77 |         data_Oligo_G1 <- data_Oligo[i,,drop=FALSE]
 78 |         
 79 |         Test3 <- sum(as.numeric(data_Oligo_G1))
 80 | 
 81 | 
 82 |         data_Stromal_G1 <- data_Stromal[i,,drop=FALSE]
 83 |         
 84 |         Test4 <- sum(as.numeric(data_Stromal_G1))
 85 |         
 86 |         
 87 |         
 88 |         data_Myeloid_PBMC_G1 <- data_Myeloid_PBMC[i,,drop=FALSE]
 89 |         
 90 |         Test6 <- sum(as.numeric(data_Myeloid_PBMC_G1))
 91 |         
 92 |         
 93 |      
 94 | 
 95 |         data_Malignant_G1 <- data_Malignant[i,,drop=FALSE]
 96 |         
 97 |         Test7 <- sum(as.numeric(data_Malignant_G1))
 98 |         
 99 |         
100 |     
101 |         data_All_G1 <- data_Final[i,,drop=FALSE]
102 |         
103 |         Test8 <- sum(as.numeric(data_All_G1))
104 | 
105 |         
106 |         E[i,] <- c(Test1, Test2, Test3, Test4, Test6, Test7, Test8)
107 |         
108 | }
109 | 
110 | 
111 | rownames(E) <- rownames(data_Final)
112 | 
113 | colnames(E) <- Group_Names
114 | 
115 | write.table(E, file="Sums_Counts_Per_CellType_V10.txt", sep="\t", col.names=NA, quote=FALSE)
116 | 
117 | colnames(E)
118 | 
119 | 


--------------------------------------------------------------------------------
/MAESTER/08- Low Resolution Pesduobulking of Coverage matrices:
--------------------------------------------------------------------------------
  1 | ######## We combine the single-cell coverage matrix for primary tumor and PBMC libraries #################
  2 | 
  3 | ############################ Inside R ##########################################
  4 | 
  5 | data_A <- read.table("MGH915_PT_Coverage.txt",head=TRUE, sep="\t", row.names=1)
  6 | 
  7 | 
  8 | data_PBMC <- read.table("MGH915_PBMC_Coverage.txt",head=TRUE, sep="\t", row.names=1)
  9 | 
 10 | 
 11 | data3 <- merge(data_A, data_PBMC, by="row.names")
 12 | 
 13 | data_Final <- data3
 14 | 
 15 | rownames(data_Final) <- data_Final$Row.names
 16 | 
 17 | data_Final <- data_Final[,-c(1,2)]
 18 | 
 19 | library(dplyr)
 20 | 
 21 | ######## The annotation file includes the filtered barcodes and respective low-resolution annotation for both PT and PBMC libraries ##########
 22 | ######## Filter the merged matrix to include annotated barcodes only (cells that pass RNA-Seq QC ########
 23 | 
 24 | annotation <- read.table("CellTypes_Coarse_V10.txt", head=TRUE, row.names = 1, sep="\t")
 25 | 
 26 | common.cells <- intersect(rownames(annotation), colnames(data_Final))
 27 | 
 28 | annotation <- annotation[common.cells,]
 29 | 
 30 | data_Final <- data_Final[,common.cells]
 31 | 
 32 | 
 33 | ######### Subset the matrix to generate a matrix for each annotation #########
 34 | 
 35 | 
 36 | cells.tib <- tibble(cell = common.cells,
 37 |                     orig.ident = annotation$orig.ident,
 38 |                     CellType_RNA = annotation$CellType)
 39 | 
 40 | CellSubsets.ls <- list(unionCells = cells.tib$cell,
 41 |                        TAM = filter(cells.tib, CellType_RNA == "Myeloid")$cell,
 42 |                        Malignant = filter(cells.tib, CellType_RNA == "Malignant")$cell,
 43 | 	               Stromal = filter(cells.tib, CellType_RNA == "Stromal")$cell,
 44 | 	               Oligo = filter(cells.tib, CellType_RNA == "Oligo")$cell,
 45 | 	               Tcells = filter(cells.tib, CellType_RNA == "Tcells")$cell,
 46 |                        Myeloid_PBMC = filter(cells.tib, CellType_RNA == "Myeloid_PBMC")$cell
 47 | )
 48 | 
 49 | 
 50 | data_Malignant <- data_Final[,colnames(data_Final) %in% CellSubsets.ls$Malignant]
 51 | data_Tcells <- data_Final[,colnames(data_Final) %in% CellSubsets.ls$Tcells]
 52 | data_TAM <- data_Final[,colnames(data_Final) %in% CellSubsets.ls$TAM]
 53 | data_Oligo <- data_Final[,colnames(data_Final) %in% CellSubsets.ls$Oligo]
 54 | data_Stromal <- data_Final[,colnames(data_Final) %in% CellSubsets.ls$Stromal]
 55 | data_Myeloid_PBMC <- data_Final[,colnames(data_Final) %in% CellSubsets.ls$Myeloid_PBMC]
 56 | 
 57 | 
 58 | ############# Perform Pseduobulking by summing the values for each position in each annotation and then output the results in a pseudobulked matrix #########
 59 | 
 60 | Group_Names <- c("Tcells", "TAM", "Oligo", "Stromal", "Myeloid_PBMC", "Malignant", "All_Cells")
 61 | 
 62 | 
 63 | E <- matrix(data=0, nrow = nrow(data_Final), ncol=length(Group_Names));
 64 | 
 65 | for (i in 1:nrow(data_Final)){
 66 |     
 67 |         
 68 |         
 69 |         data_Tcells_G1 <- data_Tcells[i,,drop=FALSE]
 70 |         
 71 |         Test1 <- sum(as.numeric(data_Tcells_G1))
 72 |         
 73 |         
 74 |         data_TAM_G1 <- data_TAM[i,,drop=FALSE]
 75 |         
 76 |         Test2 <- sum(as.numeric(data_TAM_G1))
 77 | 
 78 | 
 79 |         data_Oligo_G1 <- data_Oligo[i,,drop=FALSE]
 80 |         
 81 |         Test3 <- sum(as.numeric(data_Oligo_G1))
 82 | 
 83 | 
 84 |         data_Stromal_G1 <- data_Stromal[i,,drop=FALSE]
 85 |         
 86 |         Test4 <- sum(as.numeric(data_Stromal_G1))
 87 |         
 88 |         
 89 |         
 90 |         data_Myeloid_PBMC_G1 <- data_Myeloid_PBMC[i,,drop=FALSE]
 91 |         
 92 |         Test6 <- sum(as.numeric(data_Myeloid_PBMC_G1))
 93 |         
 94 |         
 95 |      
 96 | 
 97 |         data_Malignant_G1 <- data_Malignant[i,,drop=FALSE]
 98 |         
 99 |         Test7 <- sum(as.numeric(data_Malignant_G1))
100 |         
101 |         
102 |     
103 |         data_All_G1 <- data_Final[i,,drop=FALSE]
104 |         
105 |         Test8 <- sum(as.numeric(data_All_G1))
106 | 
107 |         
108 |         E[i,] <- c(Test1, Test2, Test3, Test4, Test6, Test7, Test8)
109 |         
110 | }
111 | 
112 | 
113 | rownames(E) <- rownames(data_Final)
114 | 
115 | colnames(E) <- Group_Names
116 | 
117 | write.table(E, file="MGH915_Sums_Coverage_Per_CellType_V10.txt", sep="\t", col.names=NA, quote=FALSE)
118 | 
119 | colnames(E)
120 | 


--------------------------------------------------------------------------------
/MAESTER/09- Calculating VAFs from pseudobulked counts and coverage matrices:
--------------------------------------------------------------------------------
 1 | ############################ Inside R ##########################################
 2 | 
 3 | data <- read.table("Sums_Counts_Per_CellType_V10.txt", head=TRUE, row.names=1, sep="\t")
 4 | 
 5 | data2 <- read.table("Sums_Coverage_Per_CellType_V10.txt", head=TRUE, row.names=1, sep="\t")
 6 | 
 7 | ####### Add a pseudo-count to the coverage matrix to prevent errors #############
 8 | 
 9 | data2 <- data2 + 0.000001
10 | 
11 | 
12 | ####### Adjust the counts data frame to enable the calculation script to work ########
13 | 
14 | data$Position <- rownames(data)
15 | 
16 | data$Position <- gsub("_...","",data$Position)
17 | 
18 | data2$Position <- rownames(data2)
19 | 
20 | data$Variant <- rownames(data)
21 | 
22 | d = matrix(ncol=7)
23 | 
24 | ######### Calculate the VAFs ########
25 | 
26 | for (n in rownames(data2)){a<-(data[data$Position==n,c(1:7)]);b<-as.numeric(data2[n,c(1:7)]); c <- t(a)/b;d<-rbind(d,t(c))}
27 | 
28 | d2 <- na.omit(d)
29 | 
30 | 
31 | d3 <- as.data.frame(d2)
32 | 
33 | d4 <- d3*100
34 | 
35 | write.table(d4, file="Pseudobulked_VAFs_V10.txt", col.names=NA, quote=FALSE, sep="\t")
36 | 


--------------------------------------------------------------------------------
/MAESTER/10- Selection of Variants of Interest:
--------------------------------------------------------------------------------
 1 | #### To identify variants specific to the myeloid cells in the tumor microenvironment, the variant has to meet the following criteria: 
 2 | (a) meet the minimum VAF requirement for TAMs for the coverages in TAMs and Myeloid_PBMC categories. 
 3 | (b) VAF=0 in Myeloid PBMC category. 
 4 | (c)  VAF > minimum required for TAMs.
 5 | 
 6 | ##### To identify variants specific to the myeloid cells in PBMC, the variant has to meet the following criteria: 
 7 | (a) meet the minimum VAF requirement for Myeloid_PBMC for the coverages in Malignant, TAMs and Myeloid_PBMC categories. 
 8 | (b) VAF = 0 in the Malignant category. (If the tumor library is enriched for malignant cells, this criteria can be replaced with VAF in Myeloid_PBMC is 20 times more than malignant) 
 9 | (c) VAF > 0 in the TAM category. 
10 | (d) VAF > minimum required in Myeloid_PBMC category
11 | 
12 | 
13 | The binomial test for detecting the minimum VAF required at a particular coverage was calculated using python as follows:
14 | 
15 | 
16 | 
17 | 
18 | 


--------------------------------------------------------------------------------
/MAESTER/11- High-resolution Pseudobulking of Myeloid cell population in the Primary Tumor:
--------------------------------------------------------------------------------
  1 | #################inside R###########
  2 | 
  3 | data_A <- read.table("PT_Maester_Counts.txt",head=TRUE, sep="\t", row.names=1)
  4 | 
  5 | 
  6 | data_Final <- data_A
  7 | 
  8 | 
  9 | library(dplyr)
 10 | 
 11 | ######## The annotation file includes the filtered barcodes and respective high-resolution annotation for myeloid cells in Primary Tumor libraries ##########
 12 | ######## Annotation criteria shown in the methods section ########
 13 | 
 14 | annotation <- read.table("CellTypes_Fine_V10.txt", head=TRUE, row.names = 1, sep="\t")
 15 | 
 16 | common.cells <- intersect(rownames(annotation), colnames(data_Final))
 17 | 
 18 | annotation <- annotation[common.cells,]
 19 | 
 20 | data_Final <- data_Final[,common.cells]
 21 | 
 22 | 
 23 | ######### Subset the matrix to generate a matrix for each high-resolution annotation #########
 24 | 
 25 | 
 26 | cells.tib <- tibble(cell = common.cells,
 27 |                     CellType_RNA = annotation$Identity)
 28 | 
 29 | CellSubsets.ls <- list(unionCells = cells.tib$cell,
 30 |                        Macrophage = filter(cells.tib, CellType_RNA == "Macrophages")$cell,
 31 |                        Monocyte = filter(cells.tib, CellType_RNA == "Monocytes")$cell,
 32 |                        Mono_Macro = filter(cells.tib, CellType_RNA == "Mono_Macro")$cell,
 33 |                        Microglia_Like = filter(cells.tib, CellType_RNA == "Microglia_Like")$cell,
 34 |                        cDC = filter(cells.tib, CellType_RNA == "cDC")$cell,
 35 |                        Microglia = filter(cells.tib, CellType_RNA == "Microglia")$cell,
 36 |                        Neutrophil = filter(cells.tib, CellType_RNA == "Neutrophils")$cell
 37 | )
 38 | 
 39 | 
 40 | 
 41 | data_Macrophage <- data_Final[,colnames(data_Final) %in% CellSubsets.ls$Macrophage]
 42 | data_Monocyte <- data_Final[,colnames(data_Final) %in% CellSubsets.ls$Monocyte]
 43 | data_Mono_Macro <- data_Final[,colnames(data_Final) %in% CellSubsets.ls$Mono_Macro]
 44 | data_Microglia_Like <- data_Final[,colnames(data_Final) %in% CellSubsets.ls$Microglia_Like]
 45 | data_cDC <- data_Final[,colnames(data_Final) %in% CellSubsets.ls$cDC]
 46 | data_Microglia <- data_Final[,colnames(data_Final) %in% CellSubsets.ls$Microglia]
 47 | data_Neutrophil <- data_Final[,colnames(data_Final) %in% CellSubsets.ls$Neutrophil]
 48 | 
 49 | ############# Perform Pseduobulking by counting the number of cells that include each variant for each high-resolution identity  and then output the results in a pseudobulked matrix #########
 50 | 
 51 | 
 52 | Group_Names <- c("Macrophage", "Monocyte", "Mono_Macro", "Microglia_Like", "cDC", "Microglia", "Neutrophil")
 53 | 
 54 | E <- matrix(data=0, nrow = nrow(data_Final), ncol=length(Group_Names));
 55 | 
 56 | for (i in 1:nrow(data_Final)){
 57 | 
 58 |  data_Macrophage_G1 <- data_Macrophage[i,,drop=FALSE] 
 59 |  data_Macrophage_G1 <- data_Macrophage_G1[,apply(data_Macrophage_G1,2,function(x) sum(x > 0))]
 60 |  Test1 <- ncol(as.matrix(data_Macrophage_G1))
 61 | 
 62 |         data_Monocyte_G1 <- data_Monocyte[i,,drop=FALSE]
 63 |         data_Monocyte_G1 <- data_Monocyte_G1[,apply(data_Monocyte_G1,2,function(x) sum(x > 0))]
 64 |         Test2 <- ncol(as.matrix(data_Monocyte_G1))
 65 | 
 66 |         
 67 |         data_Mono_Macro_G1 <- data_Mono_Macro[i,,drop=FALSE]
 68 |         data_Mono_Macro_G1 <- data_Mono_Macro_G1[,apply(data_Mono_Macro_G1,2,function(x) sum(x > 0))]
 69 |         Test3 <- ncol(as.matrix(data_Mono_Macro_G1))
 70 | 
 71 | 
 72 | 
 73 |         data_Microglia_Like_G1 <- data_Microglia_Like[i,,drop=FALSE]
 74 |         data_Microglia_Like_G1 <- data_Microglia_Like_G1[,apply(data_Microglia_Like_G1,2,function(x) sum(x > 0))]
 75 |         Test4 <- ncol(as.matrix(data_Microglia_Like_G1))
 76 | 
 77 | 
 78 |         data_cDC_G1 <- data_cDC[i,,drop=FALSE]
 79 |         data_cDC_G1 <- data_cDC_G1[,apply(data_cDC_G1,2,function(x) sum(x > 0))]
 80 |         Test5 <- ncol(as.matrix(data_cDC_G1))
 81 | 
 82 | 
 83 |         data_Microglia_G1 <- data_Microglia[i,,drop=FALSE]
 84 |         data_Microglia_G1 <- data_Microglia_G1[,apply(data_Microglia_G1,2,function(x) sum(x > 0))]
 85 |         Test6 <- ncol(as.matrix(data_Microglia_G1))
 86 | 
 87 | 
 88 |         data_Neutrophil_G1 <- data_Neutrophil[i,,drop=FALSE]
 89 |         data_Neutrophil_G1 <- data_Neutrophil_G1[,apply(data_Neutrophil_G1,2,function(x) sum(x > 0))]
 90 |         Test7 <- ncol(as.matrix(data_Neutrophil_G1))
 91 | 
 92 | 
 93 |         
 94 |         E[i,] <- c(Test1, Test2, Test3, Test4, Test5, Test6, Test7)
 95 |         
 96 | }
 97 | 
 98 | 
 99 | rownames(E) <- rownames(data_Final)
100 | 
101 | colnames(E) <- Group_Names
102 | 
103 | 
104 | write.table(E, file="Cell_Counts_Per_CellType_Fine_V10.txt", sep="\t", col.names=NA, quote=FALSE)
105 | 
106 | colnames(E)
107 | 
108 | 
109 | 


--------------------------------------------------------------------------------
/MAESTER/12 - Calculating GSVA enrichment for variants categories in myeloid cell identities in tumor microenvironment:
--------------------------------------------------------------------------------
 1 | ###################### Inside R #######################
 2 | 
 3 | ##### read the high-resolution pseudobulked table #######
 4 | 
 5 | E <- read.table("Cell_Counts_Per_CellType_V10.txt", sep="\t", head=TRUE, row.names=1)
 6 | 
 7 | ##### Read PBMC-specific variants for myeloid cells ######
 8 | 
 9 | Myeloid_PBMC_Not_Malignant <- scan("PBMC_Not_Malignant2.txt", what="")
10 | 
11 | 
12 | ##### Read Tumor microenvironment-specific variants of myeloid cells as a list ######
13 | 
14 | 
15 | TAM_Not_PBMC <- scan("TAM_Not_PBMC.txt", what="")
16 | 
17 | 
18 | ##### Place all the variants of interest in a list ######
19 | Combined_Groups <- scan("Combined_Groups6.txt", what="")
20 | 
21 | ######## Filter the pseudo bulked data frame to include only variants of interest
22 | 
23 | E2 <- E[rownames(E) %in% Combined_Groups,]
24 | 
25 | 
26 | ####### Remove variants not detected in any category #####
27 | 
28 | E5 <- E2[rowSums(E4) > 0,]
29 | 
30 | 
31 | library(ComplexHeatmap)
32 | 
33 | library("GSVA")
34 | 
35 | Lists <- list(Myeloid_PBMC_Not_Malignant, TAM_Not_PBMC)
36 | 
37 | Test <- gsva(as.matrix(E5), Lists, kcdf="Poisson")
38 | 
39 | rownames(Test) <- c("Myeloid_PBMC_Not_Malignant", "TAM_Not_PBMC")
40 | 
41 | 
42 | write.table(Test, file=GSVA_Scores_Identities.txt", col.names=NA, quote=FALSE, sep="\t")
43 | 
44 | ###### Manually Added Cell Number and fraction of TAMs to the table and remove any identity with less than 10 cells contributing to the GSVA score to obtain reliable enrichments #########
45 | 


--------------------------------------------------------------------------------
/MAESTER/13- Visualizations (Dotplot):
--------------------------------------------------------------------------------
 1 | #### Generate the dotplot #####
 2 | 
 3 | ##### Inside R ########
 4 | 
 5 | library(ggplot2)
 6 | 
 7 | ######## GSVA is calculated as (GSVA PBMC-Specific enrichment - GSVA TME-Specific enrichment) ### See step 12 ######### 
 8 | ######## The data frame includes the fraction of cells annotated as each identity in the scRNA-Seq libraries ###
 9 | data <- read.table("Identities_Dotplot_V10.txt", head=TRUE, sep="\t")
10 | 
11 | 
12 | scaled_data <- (data$GSVA - min(data$GSVA)) / (max(data$GSVA) - min(data$GSVA))
13 | 
14 | scaled_data <- scaled_data * 2 - 1
15 | 
16 | 
17 | data2 <- data
18 | 
19 | data2$GSVA <- scaled_data
20 | 
21 | 
22 | factor_order <- c("Neutrophils", "cDCs", "Monocytes", "Mono_Macro", "Macrophage", "Microglia_Like", "Microglia")
23 | 
24 | 
25 | data2$TAM <- factor(data2$TAM, levels = factor_order)
26 | 
27 | 
28 | ggplot(data2, aes(x=GSVA, y=TAM, size=Fraction2*10)) + geom_point(data = subset(data2, Fraction2 != 0)) + labs(y="Identity", x="Enrichment of PBMC Variants - Enrichment of TME Variants") +  scale_size_area(breaks = c(0.1,0.5, 1), max_size = 18) + scale_x_continuous(limits = c(-1.1,1.1)) + labs(col="TAM") + theme(panel.grid.major = element_blank(), panel.grid.minor = element_blank(), panel.background = element_blank(), axis.line.x = element_line(colour = "black"),  axis.text.y = element_text(face="bold", size=14), axis.text.x = element_text(face="bold", size=14), legend.text=element_text(size=12)) + geom_vline(xintercept = 0, linetype = "dashed", color = "black")
29 | 


--------------------------------------------------------------------------------
/MAESTER/14- Visualizations (Stacked Columns):
--------------------------------------------------------------------------------
 1 | ####### Inside R ##############
 2 | ####### The value is the average usage of the four immunomodulatory programs in the four tumors (Maester libraries - scRNA-Seq part) ######
 3 | ######## Others represent the average usage of all other programs (including identities) #########
 4 | 
 5 | library(ggplot2)
 6 | 
 7 | data <- read.table("Maester_Stack_V12.txt", head=TRUE, sep="\t")
 8 | 
 9 | 
10 | 
11 | factor_order <- c("Neutrophil", "cDC", "Monocyte", "Mono_Macro", "Macrophage", "Microglia_Like", "Microglia")
12 | 
13 | data$Identity <- factor(data$Identity, levels = factor_order)
14 | 
15 | data$Program <- factor(data$Program, levels = c("Others", "Scavenger", "Complement", "RHOB", "IL1B"))
16 | 
17 | 
18 | ggplot(data, aes(x = Value, y = Identity, fill = Program)) + geom_bar(position="fill", stat="identity", width=0.45) + scale_fill_manual(values = c("Others" = "gray90", "IL1B" = "#AB0800", "Complement"="#007AFF", "RHOB"="#FF6961", "Scavenger" = "#0700C4")) + theme(panel.grid.major = element_blank(), panel.grid.minor = element_blank(), panel.background = element_blank())
19 | 


--------------------------------------------------------------------------------
/Processing of GBO scRNA-Seq libraries (Related to Figure 7)/01- Aligning all GBO Seq-Well scRNA-Seq libraries:
--------------------------------------------------------------------------------
1 | /path/to/STAR --genomeDir /path/to/genome/dir/ --readFilesIn Read2_Lane1.fastq.gz,Read2_Lane2.fastq.gz,Read2_Lane3.fastq.gz Read1_Lane1.fastq.gz,Read1_Lane2.fastq.gz,Read1_Lane3.fastq.gz --soloType CB_UMI_Simple --soloCBstart 1 --soloCBlen 12 --soloUMIstart 13 --soloUMIlen 8 --outSAMtype BAM SortedByCoordinate --outSAMattributes CR UR CY UY CB UB 
2 | 


--------------------------------------------------------------------------------
/Processing of GBO scRNA-Seq libraries (Related to Figure 7)/02- Seurat Processing for BWH911 to obtain Raw Counts Matrix:
--------------------------------------------------------------------------------
  1 | library(dplyr)
  2 | library(Seurat)
  3 | options(bitmapType='cairo')
  4 | options(future.globals.maxSize = 8000 * 1024^2)
  5 | 
  6 | 
  7 | 
  8 | DMSO1.data <- Read10X("DMSO1/")
  9 | 
 10 | DMSO1 <- CreateSeuratObject(counts = DMSO1.data, project = "DMSO1", min.cells = 3, min.features = 200)
 11 | 
 12 | DMSO1[["percent.mt"]] <- PercentageFeatureSet(DMSO1, pattern = "^MT.")
 13 | 
 14 | 
 15 | DMSO2.data <- Read10X("DMSO2/")
 16 | 
 17 | DMSO2 <- CreateSeuratObject(counts = DMSO2.data, project = "DMSO2", min.cells = 3, min.features = 200)
 18 | 
 19 | DMSO2[["percent.mt"]] <- PercentageFeatureSet(DMSO2, pattern = "^MT.")
 20 | 
 21 | 
 22 | 
 23 | DMSO3.data <- Read10X("DMSO3/")
 24 | 
 25 | DMSO3 <- CreateSeuratObject(counts = DMSO3.data, project = "DMSO3", min.cells = 3, min.features = 200)
 26 | 
 27 | DMSO3[["percent.mt"]] <- PercentageFeatureSet(DMSO3, pattern = "^MT.")
 28 | 
 29 | 
 30 | 
 31 | Dex1.data <- Read10X("Dex1/")
 32 | 
 33 | Dex1 <- CreateSeuratObject(counts = Dex1.data, project = "Dex1", min.cells = 3, min.features = 200)
 34 | 
 35 | Dex1[["percent.mt"]] <- PercentageFeatureSet(Dex1, pattern = "^MT.")
 36 | 
 37 | 
 38 | Dex2.data <- Read10X("Dex2/")
 39 | 
 40 | Dex2 <- CreateSeuratObject(counts = Dex2.data, project = "Dex2", min.cells = 3, min.features = 200)
 41 | 
 42 | Dex2[["percent.mt"]] <- PercentageFeatureSet(Dex2, pattern = "^MT.")
 43 | 
 44 | 
 45 | Dex3.data <- Read10X("Dex3/")
 46 | 
 47 | Dex3 <- CreateSeuratObject(counts = Dex3.data, project = "Dex3", min.cells = 3, min.features = 200)
 48 | 
 49 | Dex3[["percent.mt"]] <- PercentageFeatureSet(Dex3, pattern = "^MT.")
 50 | 
 51 | 
 52 | 
 53 | BWH911_GBO <- merge(DMSO1, y = c(DMSO2, DMSO3, Dex1, Dex2, Dex3), add.cell.ids = c("BWH911_GBO_DMSO1", "BWH911_GBO_DMSO2", "BWH911_GBO_DMSO3", "BWH911_GBO_Dex1", "BWH911_GBO_Dex2", "BWH911_GBO_Dex3"), project = "BWH911_GBO")
 54 | 
 55 | 
 56 | pdf("BWH911_GBO_QC_BF.pdf", height = 6, width = 20)
 57 | VlnPlot(BWH911_GBO, features = c("nFeature_RNA", "nCount_RNA", "percent.mt"), ncol = 3)
 58 | dev.off()
 59 | 
 60 | BWH911_GBO <- subset(BWH911_GBO, subset = nFeature_RNA > 500 & nFeature_RNA < 6000 & nCount_RNA > 1000 & percent.mt < 15)
 61 | 
 62 | 
 63 | pdf("BWH911_GBO_QC_AF.pdf", height = 6, width = 20)
 64 | VlnPlot(BWH911_GBO, features = c("nFeature_RNA", "nCount_RNA", "percent.mt"), ncol = 3)
 65 | dev.off()
 66 | 
 67 | library(sctransform)
 68 | BWH911_GBO <- SCTransform(BWH911_GBO, vars.to.regress = "percent.mt", verbose = TRUE)
 69 | BWH911_GBO <- RunPCA(BWH911_GBO)
 70 | pdf("BWH911_GBO_ElbowPlot.pdf", height = 6, width = 6)
 71 | ElbowPlot(BWH911_GBO, ndims=50)
 72 | dev.off()
 73 | 
 74 | write.table(BWH911_GBO@meta.data, file="BWH911_GBO_Integrated_MetaData.txt", sep="\t", col.names=NA, quote=FALSE)
 75 | 
 76 | 
 77 | Treatment <- scan("Treatment.txt", what="")
 78 | 
 79 | BWH911_GBO@meta.data$Treatment <- Treatment
 80 | 
 81 | 
 82 | 
 83 | BWH911_GBO <- RunUMAP(BWH911_GBO, reduction = "pca", dims = 1:15)
 84 | BWH911_GBO <- FindNeighbors(BWH911_GBO, dims = 1:15)
 85 | BWH911_GBO <- FindClusters(BWH911_GBO, resolution = 0.3)
 86 | 
 87 | pdf("BWH911_GBO_UMAP_Clusters.pdf", height= 6, width = 7)
 88 | DimPlot(BWH911_GBO, reduction = "umap")
 89 | dev.off()
 90 | 
 91 | pdf("BWH911_GBO_Clusters_With_Labels.pdf", height= 6, width = 7)
 92 | DimPlot(BWH911_GBO, reduction = "umap", label=TRUE)
 93 | dev.off()
 94 | 
 95 | pdf("BWH911_GBO_UMAP_PatientID.pdf", height= 6, width = 9)
 96 | DimPlot(BWH911_GBO, reduction = "umap", group.by="orig.ident")
 97 | dev.off()
 98 | 
 99 | pdf("BWH911_GBO_UMAP_Treatment.pdf", height= 6, width = 9)
100 | DimPlot(BWH911_GBO, reduction = "umap", group.by="Treatment")
101 | dev.off()
102 | 
103 | 
104 | write.table(BWH911_GBO@meta.data, file="BWH911_GBO_Integrated_MetaData.txt", sep="\t", col.names=NA, quote=FALSE)
105 | 
106 | 
107 | s.genes <- cc.genes$s.genes
108 | g2m.genes <- cc.genes$g2m.genes
109 | 
110 | BWH911_GBO <- CellCycleScoring(BWH911_GBO, s.features = s.genes, g2m.features = g2m.genes, set.ident = FALSE)
111 | 
112 | 
113 | write.table(BWH911_GBO@meta.data, file="BWH911_GBO_All_MetaData.txt", sep="\t", col.names=NA, quote=FALSE)
114 | 
115 | Matrix <- GetAssayData(object = BWH911_GBO, slot = "counts")
116 | 
117 | write.table(as.matrix(Matrix), file="./BWH911_GBO_Raw_Counts.txt", sep="\t", col.names=NA, quote=FALSE)
118 | 
119 | 
120 | 
121 | saveRDS(BWH911_GBO, file="BWH911_GBO_Glioma.rds")
122 | 


--------------------------------------------------------------------------------
/Processing of GBO scRNA-Seq libraries (Related to Figure 7)/03- Calculating the usage of all cell types NMF programs in BWH911 GBO to extract non-doublet myeloid myeloid cells:
--------------------------------------------------------------------------------
 1 | ############### Python Scripts #####################
 2 | import sklearn
 3 | import sklearn.decomposition
 4 | from sklearn.decomposition import non_negative_factorization
 5 | import numpy as np
 6 | import scanpy as sc
 7 | import csv
 8 | import scipy
 9 | import pandas as pd
10 | 
11 | X = pd.read_table("BWH911_GBO_Raw_Counts.txt", index_col=0, sep='\t')
12 | 
13 | X2 = X.T
14 | 
15 | H = np.load("cnmf_run.spectra.k_18.dt_0_015.consensus.df.npz", allow_pickle=True)
16 | 
17 | H2 = pd.DataFrame(H['data'], columns = H['columns'], index = H['index'])
18 | 
19 | H3 = H2.filter(items = X2.columns)
20 |       
21 | X4 = X2.filter(items = H3.columns)
22 | 
23 | H4 = H3.to_numpy()
24 | 
25 | X5 = X4.values
26 |       
27 | X6 = X5.astype(np.float64)
28 | 
29 | test = sklearn.decomposition.non_negative_factorization(X6, W=None, H=H4, n_components= 18, init='random', update_H=False, solver='cd', beta_loss='frobenius', tol=0.0001, max_iter=1000, alpha=0.0, alpha_W=0.0, alpha_H='same', l1_ratio=0.0, regularization=None, random_state=None, verbose=0, shuffle=False)
30 | 
31 | test2 = list(test)
32 | 
33 | processed = pd.DataFrame(test2[0], columns= H['index'], index=X4.index)
34 | 
35 | row_sums = processed.sum(axis=1)
36 | 
37 | processed_data = (processed.div(row_sums, axis=0) * 100)
38 |       
39 | new_column_names = ['Tcells', 'AC', 'NPC1_OPC', 'Microglia', 'MES2', 'Vascular_MES1', 'Oligodendrocytes', 'MES1', 'CD14_Mono', 'cDC', 'Neutrophils', 'NPC2', 'Giant_Cell_GBM', 'Cycling', 'Pericytes', 'Plasma', 'Endothelial', 'Mast']
40 | 
41 | processed_data.columns = new_column_names
42 | 
43 | processed_data.to_csv(path_or_buf="./BWH911_GBO_All_CellType_Usages.txt", sep="\t", quoting=csv.QUOTE_NONE)
44 | 


--------------------------------------------------------------------------------
/Processing of GBO scRNA-Seq libraries (Related to Figure 7)/04- Calculating the usage of myeloid NMF programs in Myeloid cells of BWH911 GBO:
--------------------------------------------------------------------------------
 1 | ########## We extracted the myeloid cells from the output of all cell types usage as follows: 
 2 | ########## The usage scores for 4 myeloid programs were summed to create the “myeloid usage” per cell ('Microglia', 'CD14_Mono', 'cDC', 'Neutrophils'). Other categoris are also summed (i.e. 'AC', 'NPC1_OPC', 'MES2', 'MES1', 'NPC2', 'Giant_Cell_GBM', as Malignant).
 3 | ########## Cells were then annotated as one of the cell types using the top scoring usage for cell type category. Myeloid cells had the highest myeloid usage.
 4 | 
 5 | ############### Python Scripts #####################
 6 | 
 7 | import sklearn
 8 | import sklearn.decomposition
 9 | from sklearn.decomposition import non_negative_factorization
10 | import numpy as np
11 | import scanpy as sc
12 | import csv
13 | import scipy
14 | import pandas as pd
15 | 
16 | X = pd.read_table("BWH911_GBO_Raw_Counts.txt", index_col=0, sep='\t')
17 | 
18 | X2 = X.T
19 | 
20 | H = pd.read_table("Myeloid_NMF_Average_Gene_Spectra.txt", sep="\t", index_col=0)
21 | 
22 | H2 = H.T
23 | 
24 | X3 = X2.filter(items = H2.columns)
25 | 
26 | H3 = H2.filter(items = X3.columns)
27 | 
28 | H4 = H3.to_numpy()
29 |       
30 | X4 = X3.values
31 | 
32 | X5 = X4.astype(np.float64)
33 | 
34 | test = sklearn.decomposition.non_negative_factorization(X5, W=None, H=H4, n_components= 14, init='random', update_H=False, solver='cd', beta_loss='frobenius', tol=0.0001, max_iter=1000, alpha=0.0, alpha_W=0.0, alpha_H='same', l1_ratio=0.0, regularization=None, random_state=None, verbose=0, shuffle=False)
35 | 
36 | test2 = list(test)
37 | 
38 | processed = pd.DataFrame(test2[0], columns= H.columns, index=X3.index)
39 | 
40 | row_sums = processed.sum(axis=1)
41 | 
42 | processed_data = (processed.div(row_sums, axis=0) * 100)
43 | 
44 | processed_data.to_csv(path_or_buf="./BWH911_GBO_Myeloid_Programs_Usages.txt", sep="\t", quoting=csv.QUOTE_NONE)
45 | 


--------------------------------------------------------------------------------
/Processing of GBO scRNA-Seq libraries (Related to Figure 7)/05- Seurat Processing for GBOs treated with DMSO and GNE to obtain Raw Counts Matrix:
--------------------------------------------------------------------------------
 1 | library(dplyr)
 2 | library(Seurat)
 3 | options(bitmapType='cairo')
 4 | options(future.globals.maxSize = 8000 * 1024^2)
 5 | 
 6 | 
 7 | MGH253_GBO.data <- read.table("MGH253_GBO_Cleaned_Raw_Expression.txt", head=TRUE, row.names=1, sep="\t")
 8 | MGH253_GBO <- CreateSeuratObject(counts = MGH253_GBO.data, project = "MGH253_GBO", min.cells = 3, min.features = 200)
 9 | MGH253_GBO[["percent.mt"]] <- PercentageFeatureSet(MGH253_GBO, pattern = "^MT.")
10 | 
11 | MGH314_GBO.data <- read.table("MGH314_GBO_Cleaned_Raw_Expression.txt", head=TRUE, row.names=1, sep="\t")
12 | MGH314_GBO <- CreateSeuratObject(counts = MGH314_GBO.data, project = "MGH314_GBO", min.cells = 3, min.features = 200)
13 | MGH314_GBO[["percent.mt"]] <- PercentageFeatureSet(MGH314_GBO, pattern = "^MT.")
14 | 
15 | MGH630_GBO.data <- read.table("MGH630_GBO_Cleaned_Raw_Expression.txt", head=TRUE, row.names=1, sep="\t")
16 | MGH630_GBO <- CreateSeuratObject(counts = MGH630_GBO.data, project = "MGH630_GBO", min.cells = 3, min.features = 200)
17 | MGH630_GBO[["percent.mt"]] <- PercentageFeatureSet(MGH630_GBO, pattern = "^MT.")
18 | 
19 | var3 <- intersect(rownames(MGH253_GBO), rownames(MGH314_GBO))
20 | 
21 | all.genes <- intersect(var3, rownames(MGH630_GBO))
22 | 
23 | 
24 | GBO.combined <- merge(MGH253_GBO, y = c(MGH314_GBO, MGH630_GBO), add.cell.ids = c("MGH253_GBO", "MGH314_GBO", "MGH630_GBO"), project = "GBO")
25 | 
26 | 
27 | pdf("SeqWell_GBO_QC_AF.pdf", height = 6, width = 20)
28 | VlnPlot(GBO.combined, features = c("nFeature_RNA", "nCount_RNA", "percent.mt"), ncol = 3)
29 | dev.off()
30 | 
31 | library(sctransform)
32 | GBO.combined <- SCTransform(GBO.combined, vars.to.regress = "percent.mt", verbose = TRUE)
33 | GBO.combined <- RunPCA(GBO.combined)
34 | pdf("SeqWell_GBO_ElbowPlot.pdf", height = 6, width = 6)
35 | ElbowPlot(GBO.combined, ndims=50)
36 | dev.off()
37 | 
38 | write.table(GBO.combined@meta.data, file="SeqWell_GBO_Integrated_MetaData.txt", sep="\t", col.names=NA, quote=FALSE)
39 | 
40 | 
41 | GBO.combined <- RunUMAP(GBO.combined, reduction = "pca", dims = 1:20)
42 | GBO.combined <- FindNeighbors(GBO.combined, dims = 1:20)
43 | GBO.combined <- FindClusters(GBO.combined, resolution = 0.3)
44 | pdf("SeqWell_GBO_UMAP_Clusters.pdf", height= 6, width = 7)
45 | DimPlot(GBO.combined, reduction = "umap")
46 | dev.off()
47 | 
48 | pdf("SeqWell_GBO_Clusters_With_Labels.pdf", height= 6, width = 7)
49 | DimPlot(GBO.combined, reduction = "umap", label=TRUE)
50 | dev.off()
51 | 
52 | pdf("SeqWell_GBO_UMAP_Patient_ID.pdf", height= 6, width = 9)
53 | DimPlot(GBO.combined, reduction = "umap", group.by="orig.ident")
54 | dev.off()
55 | 
56 | Treatment <- scan("Treatment.txt", what="")
57 | 
58 | GBO.combined@meta.data$Treatment <- Treatment
59 | 
60 | pdf("SeqWell_GBO_UMAP_Treatment.pdf", height= 6, width = 9)
61 | DimPlot(GBO.combined, reduction = "umap", group.by="Treatment")
62 | dev.off()
63 | 
64 | write.table(GBO.combined@meta.data, file="SeqWell_GBO_Integrated_MetaData.txt", sep="\t", col.names=NA, quote=FALSE)
65 | 
66 | s.genes <- cc.genes$s.genes
67 | g2m.genes <- cc.genes$g2m.genes
68 | 
69 | GBO.combined <- CellCycleScoring(GBO.combined, s.features = s.genes, g2m.features = g2m.genes, set.ident = FALSE)
70 | 
71 | Matrix <- GetAssayData(object = GBO.combined, slot = "counts")
72 | 
73 | write.table(as.matrix(Matrix), file="./GBO_DMSO_GNE_Raw_Counts.txt", sep="\t", col.names=NA, quote=FALSE)
74 | 
75 | saveRDS(GBO.combined, file="GBO_DMSO_GNE_Glioma.rds")
76 | 


--------------------------------------------------------------------------------
/Processing of GBO scRNA-Seq libraries (Related to Figure 7)/06- Calculating the usage of all cell types NMF programs in GBOs treated with DMSO and GNE to extract non-doublet myeloid cells:
--------------------------------------------------------------------------------
 1 | ############### Python Scripts #####################
 2 | 
 3 | X = pd.read_table("GBO_DMSO_GNE_Raw_Counts.txt", index_col=0, sep='\t')
 4 | 
 5 | X2 = X.T
 6 | 
 7 | H = np.load("cnmf_run.spectra.k_18.dt_0_015.consensus.df.npz", allow_pickle=True)
 8 | 
 9 | H2 = pd.DataFrame(H['data'], columns = H['columns'], index = H['index'])
10 | 
11 | H3 = H2.filter(items = X2.columns)
12 |       
13 | X4 = X2.filter(items = H3.columns)
14 | 
15 | H4 = H3.to_numpy()
16 | 
17 | X5 = X4.values
18 |       
19 | X6 = X5.astype(np.float64)
20 | 
21 | test = sklearn.decomposition.non_negative_factorization(X6, W=None, H=H4, n_components= 18, init='random', update_H=False, solver='cd', beta_loss='frobenius', tol=0.0001, max_iter=1000, alpha=0.0, alpha_W=0.0, alpha_H='same', l1_ratio=0.0, regularization=None, random_state=None, verbose=0, shuffle=False)
22 | 
23 | test2 = list(test)
24 | 
25 | processed = pd.DataFrame(test2[0], columns= H['index'], index=X4.index)
26 | 
27 | row_sums = processed.sum(axis=1)
28 | 
29 | processed_data = (processed.div(row_sums, axis=0) * 100)
30 |       
31 | new_column_names = ['Tcells', 'AC', 'NPC1_OPC', 'Microglia', 'MES2', 'Vascular_MES1', 'Oligodendrocytes', 'MES1', 'CD14_Mono', 'cDC', 'Neutrophils', 'NPC2', 'Giant_Cell_GBM', 'Cycling', 'Pericytes', 'Plasma', 'Endothelial', 'Mast']
32 | 
33 | processed_data.columns = new_column_names
34 | 
35 | processed_data.to_csv(path_or_buf="./GBO_DMSO_GNE_All_CellType_Usages.txt", sep="\t", quoting=csv.QUOTE_NONE)
36 | 


--------------------------------------------------------------------------------
/Processing of GBO scRNA-Seq libraries (Related to Figure 7)/07- Calculating the usage of myeloid NMF programs in Myeloid cells of GBOs treated with DMSO or GNE:
--------------------------------------------------------------------------------
 1 | ########## We extracted the myeloid cells from the output of all cell types usage as follows: 
 2 | ########## The usage scores for 4 myeloid programs were summed to create the “myeloid usage” per cell ('Microglia', 'CD14_Mono', 'cDC', 'Neutrophils'). Other categoris are also summed (i.e. 'AC', 'NPC1_OPC', 'MES2', 'MES1', 'NPC2', 'Giant_Cell_GBM', as Malignant).
 3 | ########## Cells were then annotated as one of the cell types using the top scoring usage for cell type category. Myeloid cells had the highest myeloid usage.
 4 | 
 5 | ############### Python Scripts #####################
 6 | 
 7 | X = pd.read_table("GBO_DMSO_GNE_Raw_Counts.txt", index_col=0, sep='\t')
 8 | 
 9 | X2 = X.T
10 | 
11 | H = pd.read_table("Myeloid_NMF_Average_Gene_Spectra.txt", sep="\t", index_col=0)
12 | 
13 | H2 = H.T
14 | 
15 | X3 = X2.filter(items = H2.columns)
16 | 
17 | H3 = H2.filter(items = X3.columns)
18 | 
19 | H4 = H3.to_numpy()
20 |       
21 | X4 = X3.values
22 | 
23 | X5 = X4.astype(np.float64)
24 | 
25 | test = sklearn.decomposition.non_negative_factorization(X5, W=None, H=H4, n_components= 14, init='random', update_H=False, solver='cd', beta_loss='frobenius', tol=0.0001, max_iter=1000, alpha=0.0, alpha_W=0.0, alpha_H='same', l1_ratio=0.0, regularization=None, random_state=None, verbose=0, shuffle=False)
26 | 
27 | test2 = list(test)
28 | 
29 | processed = pd.DataFrame(test2[0], columns= H.columns, index=X3.index)
30 | 
31 | row_sums = processed.sum(axis=1)
32 | 
33 | processed_data = (processed.div(row_sums, axis=0) * 100)
34 | 
35 | processed_data.to_csv(path_or_buf="./GBO_DMSO_GNE_Myeloid_Programs_Usages.txt", sep="\t", quoting=csv.QUOTE_NONE)
36 | 


--------------------------------------------------------------------------------
/Processing of scRNA-Seq Files (Related to Figure 1)/01- Align SeqWell scRNA-Seq Libraries:
--------------------------------------------------------------------------------
1 | /path/to/STAR --genomeDir /path/to/genome/dir/ --readFilesIn Read2_Lane1.fastq.gz,Read2_Lane2.fastq.gz,Read2_Lane3.fastq.gz Read1_Lane1.fastq.gz,Read1_Lane2.fastq.gz,Read1_Lane3.fastq.gz --soloType CB_UMI_Simple --soloCBstart 1 --soloCBlen 12 --soloUMIstart 13 --soloUMIlen 8 --outSAMtype BAM SortedByCoordinate --outSAMattributes CR UR CY UY CB UB 
2 | 


--------------------------------------------------------------------------------
/Processing of scRNA-Seq Files (Related to Figure 1)/02- Align 10X V3 scRNA-Seq Libraries:
--------------------------------------------------------------------------------
1 | /path/to/STAR --genomeDir /path/to/genome/dir/ --readFilesIn Read2_Lane1.fastq.gz,Read2_Lane2.fastq.gz,Read2_Lane3.fastq.gz Read1_Lane1.fastq.gz,Read1_Lane2.fastq.gz,Read1_Lane3.fastq.gz --soloType CB_UMI_Simple --soloCBstart 1 --soloCBlen 16 --soloUMIstart 17 --soloUMIlen 12 --soloCBmatchWLtype 1MM_multi_Nbase_pseudocounts --soloUMIfiltering MultiGeneUMI_CR --soloUMIdedup 1MM_CR --clipAdapterType CellRanger4 --outFilterScoreMin 30 --outSAMtype BAM SortedByCoordinate --outSAMattributes CR UR CY UY CB UB --soloCBwhitelist 3M-february-2018_TRU.txt --readFilesCommand zcat --outFileNamePrefix /Path/To/Output/{sampleName}
2 | 
3 | 
4 | ###### "3M-february-2018_TRU.txt" can be obtained from CellRanger tool barcode folder
5 | 


--------------------------------------------------------------------------------
/Processing of scRNA-Seq Files (Related to Figure 1)/03- Align 10X V2 scRNA-Seq Libraries:
--------------------------------------------------------------------------------
1 | /path/to/STAR --genomeDir /path/to/genome/dir/ --readFilesIn Read2_Lane1.fastq.gz,Read2_Lane2.fastq.gz,Read2_Lane3.fastq.gz Read1_Lane1.fastq.gz,Read1_Lane2.fastq.gz,Read1_Lane3.fastq.gz --soloType CB_UMI_Simple --soloCBstart 1 --soloCBlen 16 --soloUMIstart 17 --soloUMIlen 10 --soloCBmatchWLtype 1MM_multi_Nbase_pseudocounts --soloUMIfiltering MultiGeneUMI_CR --soloUMIdedup 1MM_CR --clipAdapterType CellRanger4 --outFilterScoreMin 30 --outSAMtype BAM SortedByCoordinate --outSAMattributes CR UR CY UY CB UB --soloCBwhitelist 737k-august-2016.txt --readFilesCommand zcat --outFileNamePrefix /Path/To/Output/{sampleName}
2 | 
3 | ###### "737k-august-2016.txt" can be obtained from CellRanger tool barcode folder
4 | 


--------------------------------------------------------------------------------
/Processing of scRNA-Seq Files (Related to Figure 1)/04- Seurat for Processing MGB Cohort.R:
--------------------------------------------------------------------------------
 1 | library(dplyr)
 2 | library(Seurat)
 3 | options(bitmapType='cairo')
 4 | options(future.globals.maxSize = 8000 * 1024^2)
 5 | 
 6 | ################### We converted all the raw STARsolo outputs into a tab-delimited text matrix (genes in rows, cells in columns) and merged all these matrices to form a single matrix. The barcodes for each tumor were prefixed with "TumorID_" #########
 7 | 
 8 | data <- read.table("All_SeqWell_220818_Raw_Expression.txt", sep="\t", head=TRUE, row.names=1)
 9 | 
10 | Tumors.combined <- CreateSeuratObject(counts = data, project = "SeqWell", min.cells = 3, min.features = 200)
11 | 
12 | Tumors.combined[["percent.mt"]] <- PercentageFeatureSet(Tumors.combined, pattern = "^MT.")
13 | 
14 | all.genes <- rownames(Tumors.combined)
15 | 
16 | 
17 | ######## Filtering Low Quality Cells ##############
18 | 
19 | Tumors.combined <- subset(Tumors.combined, subset = nFeature_RNA > 500 & nFeature_RNA < 6000 & nCount_RNA > 1000 & percent.mt < 25)
20 | 
21 | pdf("SeqWell_WT_Mutant_Tumors_QC_AF.pdf", height = 6, width = 20)
22 | VlnPlot(Tumors.combined, features = c("nFeature_RNA", "nCount_RNA", "percent.mt"), ncol = 3)
23 | dev.off()
24 | 
25 | ######### Normalization, Scaling, Identification of Variable Genes and Regression of % of mito genes, PCA ########
26 | 
27 | library(sctransform)
28 | Tumors.combined <- SCTransform(Tumors.combined, vars.to.regress = "percent.mt", verbose = TRUE)
29 | Tumors.combined <- RunPCA(Tumors.combined)
30 | pdf("SeqWell_WT_Mutant_Tumors_ElbowPlot.pdf", height = 6, width = 6)
31 | ElbowPlot(Tumors.combined, ndims=50)
32 | dev.off()
33 | 
34 | write.table(Tumors.combined@meta.data, file="SeqWell_WT_Mutant_Tumors_Integrated_MetaData.txt", sep="\t", col.names=NA, quote=FALSE)
35 | 
36 | 
37 | ################# Louvain Clustering and UMAP Generation ###################
38 | 
39 | 
40 | Tumors.combined <- RunUMAP(Tumors.combined, reduction = "pca", dims = 1:24)
41 | Tumors.combined <- FindNeighbors(Tumors.combined, dims = 1:24)
42 | Tumors.combined <- FindClusters(Tumors.combined, resolution = 0.3)
43 | 
44 | 
45 | pdf("SeqWell_WT_Mutant_Tumors_Clusters_With_Labels.pdf", height= 6, width = 7)
46 | DimPlot(Tumors.combined, reduction = "umap", label=TRUE)
47 | dev.off()
48 | 
49 | pdf("SeqWell_WT_Mutant_Tumors_UMAP_Patient_ID.pdf", height= 6, width = 9)
50 | DimPlot(Tumors.combined, reduction = "umap", group.by="orig.ident")
51 | dev.off()
52 | 
53 | 
54 | write.table(Tumors.combined@meta.data, file="SeqWell_WT_Mutant_Tumors_Integrated_MetaData.txt", sep="\t", col.names=NA, quote=FALSE)
55 | 
56 | 
57 | saveRDS(Tumors.combined, file="SeqWell_Brain_Tumors.rds")
58 | 


--------------------------------------------------------------------------------
/Processing of scRNA-Seq Files (Related to Figure 1)/05- Identifying Variable Genes for NMF in MGB Cohort:
--------------------------------------------------------------------------------
 1 | library(dplyr)
 2 | library(Seurat)
 3 | options(bitmapType='cairo')
 4 | options(future.globals.maxSize = 8000 * 1024^2)
 5 | 
 6 | ################### We converted all the raw STARsolo outputs into a tab-delimited text matrix (genes in rows, cells in columns) and merged all these matrices to form a single matrix. The barcodes for each tumor were prefixed with "TumorID_" #########
 7 | 
 8 | data <- read.table("All_SeqWell_220818_Raw_Expression.txt", sep="\t", head=TRUE, row.names=1)
 9 | 
10 | Tumors.combined <- CreateSeuratObject(counts = data, project = "SeqWell", min.cells = 3, min.features = 200)
11 | 
12 | Tumors.combined[["percent.mt"]] <- PercentageFeatureSet(Tumors.combined, pattern = "^MT.")
13 | 
14 | all.genes <- rownames(Tumors.combined)
15 | 
16 | 
17 | ######## Filtering Low Quality Cells ##############
18 | 
19 | Tumors.combined <- subset(Tumors.combined, subset = nFeature_RNA > 500 & nFeature_RNA < 6000 & nCount_RNA > 1000 & percent.mt < 25)
20 | 
21 | pdf("SeqWell_WT_Mutant_Tumors_QC_AF.pdf", height = 6, width = 20)
22 | VlnPlot(Tumors.combined, features = c("nFeature_RNA", "nCount_RNA", "percent.mt"), ncol = 3)
23 | dev.off()
24 | 
25 | ############### Normalization ##################
26 | 
27 | Tumors.combined <- NormalizeData(Tumors.combined)
28 | all.genes <- rownames(Tumors.combined)
29 | Tumors.combined <- ScaleData(Tumors.combined, features = all.genes)
30 | 
31 | 
32 | ######## Calculating Variable Scores for each gene in the matrix and outputting the results ##############
33 | 
34 | Tumors.combined <- FindVariableFeatures(Tumors.combined, selection.method="vst", nfeatures = 2000)
35 | 
36 | Var <- HVFInfo(object = Tumors.combined, selection.method="vst", assay = "RNA")
37 | 
38 | write.table(Var, file="SeqWell_Full_Gene_List_Variable_Score.txt", sep="\t", quote=FALSE, col.names=NA)
39 | 
40 | ########## Identified top 3000 variable genes #################
41 | 
42 | Var <- HVFInfo(object = Myeloid, selection.method="vst", assay = "RNA")
43 | 
44 | Var3 <- Var[order(Var$variance.standardized, decreasing = TRUE),]
45 | 
46 | Var4 <- Var3[c(1:3000),]
47 | 
48 | 
49 | Matrix <- as.matrix(GetAssayData(Tumors.combined, slot = "counts"))
50 | 
51 | 
52 | SeqWell3 <- Matrix[rownames(Matrix) %in% rownames(Var4),]
53 | 
54 | write.table(t(SeqWell3), file="SeqWell_Matrix_Filtered_For_NMF.txt", sep="\t", quote=FALSE, col.names=NA)
55 | 
56 | 


--------------------------------------------------------------------------------
/Processing of scRNA-Seq Files (Related to Figure 1)/06- cNMF for annotating cells in MGB cohort:
--------------------------------------------------------------------------------
 1 | cnmf prepare --output-dir ./All/ --name All -c SeqWell_Matrix_Filtered_For_NMF.txt -k 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 --n-iter 500 --total-workers 1 --seed 14 --numgenes 3000
 2 | 
 3 | cnmf factorize --output-dir ./All/ --name All --worker-index 0;
 4 | 
 5 | cnmf combine --output-dir ./All/ --name All;
 6 | 
 7 | rm ./All/All/cnmf_tmp/All.spectra.k_*.iter_*.df.npz;
 8 | 
 9 | cnmf k_selection_plot --output-dir All --name All;
10 | 
11 | 
12 | ##### Based on the K plot, we select K=18 ########
13 | 
14 | cnmf consensus --output-dir All --name All --components 18 --local-density-threshold 0.015 --show-clustering
15 | 
16 | 
17 | ########## The cnmf consensus script outputs usage matrix which is normalized per row to percentages (in each cell, the usages of the programs sums up to 100) ################
18 | ########## The gene spectra score matrix is used for annotation of the programs #############
19 | 


--------------------------------------------------------------------------------
/Processing of scRNA-Seq Files (Related to Figure 1)/07- Copy number Variation Analysis:
--------------------------------------------------------------------------------
 1 |         inferCNV.R \
 2 |         --raw_counts_matrix Input_Matrix.txt \
 3 |         --annotations_file Annotation_File.txt \
 4 |         --gene_order_file GRCh38-2020-A_gen_pos.txt \
 5 |         --num_threads 16 \
 6 |         --out_dir infercnv_Output \
 7 |         --denoise --HMM --cluster_by_groups --cutoff 0.1 \
 8 | 
 9 | 
10 |       ######## We selected a group of reference cells which are not annotated as any of the malignant programs from various tumors (i.e. a mix of Myeloid, Tcells, Oligos and Vasculature Cells)
11 |       ######## We extracted and merged the raw counts of these reference cells into a single matrix.
12 |       ######## In the annotation file, we included the reference cells and annotated the cells of each tumor
13 |       ######## We merged the raw matrix of each tumor with the raw matrix of the reference cells
14 |       ######## Gene order file was constructed using "gtf_to_position_file.py" script provided by infercnv package
15 | 
16 |       ######## gtf_to_position_file.py genes.gtf GRCh38-2020-A_gen_pos.txt
17 | 
18 |       ######## To obtain CNV values for each cell, use the "add_to_seurat("./infercnv_Output/", seurat_obj = NULL)" ######
19 | 


--------------------------------------------------------------------------------
/Processing of scRNA-Seq Files (Related to Figure 1)/09- Seurat for Processing Houston Cohort.R:
--------------------------------------------------------------------------------
 1 | library(dplyr)
 2 | library(Seurat)
 3 | options(bitmapType='cairo')
 4 | options(future.globals.maxSize = 8000 * 1024^2)
 5 | 
 6 | ################### We converted all the raw STARsolo outputs into a tab-delimited text matrix (genes in rows, cells in columns) and merged all these matrices to form a single matrix. The barcodes for each tumor were prefixed with "TumorID_" #########
 7 | 
 8 | data <- read.table("All_Houston_220826_Raw_Expression.txt"", sep="\t", head=TRUE, row.names=1)
 9 | 
10 | Tumors.combined <- CreateSeuratObject(counts = data, project = "Houston", min.cells = 3, min.features = 200)
11 | 
12 | Tumors.combined[["percent.mt"]] <- PercentageFeatureSet(Tumors.combined, pattern = "^MT.")
13 | 
14 | all.genes <- rownames(Tumors.combined)
15 | 
16 | 
17 | ######## Filtering Low Quality Cells ##############
18 | 
19 | Tumors.combined <- subset(Tumors.combined, subset = nFeature_RNA > 500 & nFeature_RNA < 6000 & nCount_RNA > 1000 & percent.mt < 25)
20 | 
21 | pdf("Houston_WT_Mutant_Tumors_QC_AF.pdf", height = 6, width = 20)
22 | VlnPlot(Tumors.combined, features = c("nFeature_RNA", "nCount_RNA", "percent.mt"), ncol = 3)
23 | dev.off()
24 | 
25 | ######### Normalization, Scaling, Identification of Variable Genes and Regression of % of mito genes, PCA ########
26 | 
27 | library(sctransform)
28 | Tumors.combined <- SCTransform(Tumors.combined, vars.to.regress = "percent.mt", verbose = TRUE)
29 | Tumors.combined <- RunPCA(Tumors.combined)
30 | pdf("Houston_WT_Mutant_Tumors_ElbowPlot.pdf", height = 6, width = 6)
31 | ElbowPlot(Tumors.combined, ndims=50)
32 | dev.off()
33 | 
34 | write.table(Tumors.combined@meta.data, file="Houston_WT_Mutant_Tumors_Integrated_MetaData.txt", sep="\t", col.names=NA, quote=FALSE)
35 | 
36 | 
37 | ################# Louvain Clustering and UMAP Generation ###################
38 | 
39 | 
40 | Tumors.combined <- RunUMAP(Tumors.combined, reduction = "pca", dims = 1:19)
41 | Tumors.combined <- FindNeighbors(Tumors.combined, dims = 1:19)
42 | Tumors.combined <- FindClusters(Tumors.combined, resolution = 0.3)
43 | 
44 | 
45 | pdf("Houston_WT_Mutant_Tumors_Clusters_With_Labels.pdf", height= 6, width = 7)
46 | DimPlot(Tumors.combined, reduction = "umap", label=TRUE)
47 | dev.off()
48 | 
49 | pdf("Houston_WT_Mutant_Tumors_UMAP_Patient_ID.pdf", height= 6, width = 9)
50 | DimPlot(Tumors.combined, reduction = "umap", group.by="orig.ident")
51 | dev.off()
52 | 
53 | 
54 | write.table(Tumors.combined@meta.data, file="Houston_WT_Mutant_Tumors_Integrated_MetaData.txt", sep="\t", col.names=NA, quote=FALSE)
55 | 
56 | 
57 | saveRDS(Tumors.combined, file="Houston_Brain_Tumors.rds")
58 | 


--------------------------------------------------------------------------------
/Processing of scRNA-Seq Files (Related to Figure 1)/10- Calculate usage matrix in Houston Cohort for cNMF cell annotation programs identified in MGB cohort:
--------------------------------------------------------------------------------
 1 | #####################  Inside R ################################
 2 | library(dplyr)
 3 | library(Seurat)
 4 | options(bitmapType='cairo')
 5 | options(future.globals.maxSize = 8000 * 1024^2)
 6 | 
 7 | Matrix <- GetAssayData(object = Tumors.combined, slot = "counts")
 8 | 
 9 | 
10 | ########## This loads the genes that were used in the overall MGB cNMF (Top 3000 variable genes) ###########
11 | Genes <- scan("./cnmf_run.overdispersed_genes.txt", what="")
12 | 
13 | Matrix2 <- Matrix[rownames(Matrix) %in% Genes,]
14 | 
15 | write.table(t(as.matrix(Matrix2)), file="./Houston_Raw_Counts_Variable_for_cNMF.txt", sep="\t", col.names=NA, quote=FALSE)
16 | 
17 | dim(Matrix2) ######### to find out the number for --numgenes below #################
18 | 
19 | q()
20 | 
21 | ########################### Exit R #######################################
22 | 
23 | ##### We run cNMF prepare script to normalize the raw matrix counts ###############
24 | 
25 | cnmf prepare --output-dir ./Calculate_Usage/ --name Calculate_Usage -c Houston_Raw_Counts_Variable_for_cNMF.txt -k 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 --n-iter 500 --total-workers 1 --seed 14 --numgenes 2990;
26 | 
27 | 
28 | python;
29 | 
30 | ################################## Inside Python #################################
31 | 
32 | import sklearn
33 | import sklearn.decomposition
34 | from sklearn.decomposition import non_negative_factorization
35 | import numpy as np
36 | import scanpy as sc
37 | import csv
38 | import scipy
39 | import pandas as pd
40 |  
41 | ########### Load the spectra consensus file from the MGB All cells cNMF run ###########
42 | H = np.load("cnmf_run.spectra.k_18.dt_0_015.consensus.df.npz", allow_pickle=True)
43 | 
44 | H2 = pd.DataFrame(H['data'], columns = H['columns'], index = H['index']) 
45 | 
46 | ########## Load the normalized raw counts matrix of the Houston cohort (Normalized by "prepare" script ######
47 | X = sc.read_h5ad('Calculate_Usage.norm_counts.h5ad')
48 | 
49 | X2 = X.X.toarray()
50 | 
51 | X3 = pd.DataFrame(data=X2, columns = X.var_names , index = X.obs.index)
52 | 
53 | H3 = H2.filter(items = X3.columns)
54 | 
55 | H4 = H3.to_numpy()
56 | 
57 | X5 = X2.astype(np.float64)
58 | 
59 | ########## Perform the calculation ###########
60 | test = sklearn.decomposition.non_negative_factorization(X5, W=None, H=H4, n_components= 18, init='random', update_H=False, solver='cd', beta_loss='frobenius', tol=0.0001, max_iter=1000, alpha=0.0, alpha_W=0.0, alpha_H='same', l1_ratio=0.0, regularization=None, random_state=None, verbose=0, shuffle=False)
61 | 
62 | test2 = list(test)
63 | 
64 | pd.DataFrame(test2[0], columns=H['index'], index=X.obs.index).to_csv(path_or_buf="./Houston_Glioma_All_cells_Usage.txt", sep="\t", quoting=csv.QUOTE_NONE)
65 | 
66 | 
67 | ############ This script outputs a usage matrix which is then normalized per row to percentages (in each cell, the usages of the programs sums up to 100). This matrix is used for annotating cells ################
68 | 


--------------------------------------------------------------------------------
/Processing of scRNA-Seq Files (Related to Figure 1)/11- Seurat for Processing Jackson's Cohort.R:
--------------------------------------------------------------------------------
 1 | library(dplyr)
 2 | library(Seurat)
 3 | options(bitmapType='cairo')
 4 | options(future.globals.maxSize = 8000 * 1024^2)
 5 | 
 6 | ################### We converted all the raw STARsolo outputs into a tab-delimited text matrix (genes in rows, cells in columns) and merged all these matrices to form a single matrix. The barcodes for each tumor were prefixed with "TumorID_" #########
 7 | 
 8 | data <- read.table("All_JAX_220826_Raw_Expression.txt", sep="\t", head=TRUE, row.names=1)
 9 | 
10 | Tumors.combined <- CreateSeuratObject(counts = data, project = "JAX", min.cells = 3, min.features = 200)
11 | 
12 | Tumors.combined[["percent.mt"]] <- PercentageFeatureSet(Tumors.combined, pattern = "^MT.")
13 | 
14 | all.genes <- rownames(Tumors.combined)
15 | 
16 | 
17 | ######## Filtering Low Quality Cells ##############
18 | 
19 | Tumors.combined <- subset(Tumors.combined, subset = nFeature_RNA > 500 & nFeature_RNA < 6000 & nCount_RNA > 1000 & percent.mt < 25)
20 | 
21 | pdf("JAX_WT_Mutant_Tumors_QC_AF.pdf", height = 6, width = 20)
22 | VlnPlot(Tumors.combined, features = c("nFeature_RNA", "nCount_RNA", "percent.mt"), ncol = 3)
23 | dev.off()
24 | 
25 | ######### Normalization, Scaling, Identification of Variable Genes and Regression of % of mito genes, PCA ########
26 | 
27 | library(sctransform)
28 | Tumors.combined <- SCTransform(Tumors.combined, vars.to.regress = "percent.mt", verbose = TRUE)
29 | Tumors.combined <- RunPCA(Tumors.combined)
30 | pdf("JAX_WT_Mutant_Tumors_ElbowPlot.pdf", height = 6, width = 6)
31 | ElbowPlot(Tumors.combined, ndims=50)
32 | dev.off()
33 | 
34 | write.table(Tumors.combined@meta.data, file="JAX_WT_Mutant_Tumors_Integrated_MetaData.txt", sep="\t", col.names=NA, quote=FALSE)
35 | 
36 | 
37 | ################# Louvain Clustering and UMAP Generation ###################
38 | 
39 | 
40 | Tumors.combined <- RunUMAP(Tumors.combined, reduction = "pca", dims = 1:16)
41 | Tumors.combined <- FindNeighbors(Tumors.combined, dims = 1:16)
42 | Tumors.combined <- FindClusters(Tumors.combined, resolution = 0.3)
43 | 
44 | 
45 | pdf("JAX_WT_Mutant_Tumors_Clusters_With_Labels.pdf", height= 6, width = 7)
46 | DimPlot(Tumors.combined, reduction = "umap", label=TRUE)
47 | dev.off()
48 | 
49 | pdf("JAX_WT_Mutant_Tumors_UMAP_Patient_ID.pdf", height= 6, width = 9)
50 | DimPlot(Tumors.combined, reduction = "umap", group.by="orig.ident")
51 | dev.off()
52 | 
53 | 
54 | write.table(Tumors.combined@meta.data, file="JAX_WT_Mutant_Tumors_Integrated_MetaData.txt", sep="\t", col.names=NA, quote=FALSE)
55 | 
56 | 
57 | saveRDS(Tumors.combined, file="JAX_Brain_Tumors.rds")
58 | 


--------------------------------------------------------------------------------
/Processing of scRNA-Seq Files (Related to Figure 1)/12- Calculate usage matrix in Jackson's Cohort for cNMF cell annotation programs identified in MGB cohort:
--------------------------------------------------------------------------------
 1 | #####################  Inside R ################################
 2 | library(dplyr)
 3 | library(Seurat)
 4 | options(bitmapType='cairo')
 5 | options(future.globals.maxSize = 8000 * 1024^2)
 6 | 
 7 | Matrix <- GetAssayData(object = Tumors.combined, slot = "counts")
 8 | 
 9 | 
10 | ########## This loads the genes that were used in the overall MGB cNMF (Top 4000 variable genes) ###########
11 | Genes <- scan("./cnmf_run.overdispersed_genes.txt", what="")
12 | 
13 | Matrix2 <- Matrix[rownames(Matrix) %in% Genes,]
14 | 
15 | write.table(t(as.matrix(Matrix2)), file="./JAX_Raw_Counts_Variable_for_cNMF.txt", sep="\t", col.names=NA, quote=FALSE)
16 | 
17 | dim(Matrix2) ######### to find out the number for --numgenes below #################
18 | 
19 | q()
20 | 
21 | ########################### Exit R #######################################
22 | 
23 | ##### We run cNMF prepare script to normalize the raw matrix counts ###############
24 | 
25 | cnmf prepare --output-dir ./Calculate_Usage/ --name Calculate_Usage -c JAX_Raw_Counts_Variable_for_cNMF.txt -k 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 --n-iter 500 --total-workers 1 --seed 14 --numgenes 2896;
26 | 
27 | 
28 | python;
29 | 
30 | ################################## Inside Python #################################
31 | 
32 | import sklearn
33 | import sklearn.decomposition
34 | from sklearn.decomposition import non_negative_factorization
35 | import numpy as np
36 | import scanpy as sc
37 | import csv
38 | import scipy
39 | import pandas as pd
40 |  
41 | ########### Load the spectra consensus file from the MGB All cells cNMF run ###########
42 | H = np.load("cnmf_run.spectra.k_18.dt_0_015.consensus.df.npz", allow_pickle=True)
43 | 
44 | H2 = pd.DataFrame(H['data'], columns = H['columns'], index = H['index']) 
45 | 
46 | ########## Load the normalized raw counts matrix of the JAX cohort (Normalized by "prepare" script) ######
47 | X = sc.read_h5ad('Calculate_Usage.norm_counts.h5ad')
48 | 
49 | X2 = X.X.toarray()
50 | 
51 | X3 = pd.DataFrame(data=X2, columns = X.var_names , index = X.obs.index)
52 | 
53 | H3 = H2.filter(items = X3.columns)
54 | 
55 | H4 = H3.to_numpy()
56 | 
57 | X5 = X2.astype(np.float64)
58 | 
59 | ########## Perform the calculation ###########
60 | test = sklearn.decomposition.non_negative_factorization(X5, W=None, H=H4, n_components= 18, init='random', update_H=False, solver='cd', beta_loss='frobenius', tol=0.0001, max_iter=1000, alpha=0.0, alpha_W=0.0, alpha_H='same', l1_ratio=0.0, regularization=None, random_state=None, verbose=0, shuffle=False)
61 | 
62 | test2 = list(test)
63 | 
64 | pd.DataFrame(test2[0], columns=H['index'], index=X.obs.index).to_csv(path_or_buf="./JAX_Glioma_All_cells_Usage.txt", sep="\t", quoting=csv.QUOTE_NONE)
65 | 
66 | 
67 | ############ This script outputs a usage matrix which is then normalized per row to percentages (in each cell, the usages of the programs sums up to 100). This matrix is used for annotating cells ################
68 | 


--------------------------------------------------------------------------------
/Processing of scRNA-Seq Files (Related to Figure 1)/13- Seurat for Processing Mcgill Cohort.R:
--------------------------------------------------------------------------------
 1 | library(dplyr)
 2 | library(Seurat)
 3 | options(bitmapType='cairo')
 4 | options(future.globals.maxSize = 8000 * 1024^2)
 5 | 
 6 | ################### We converted all the raw STARsolo outputs into a tab-delimited text matrix (genes in rows, cells in columns) and merged all these matrices to form a single matrix. The barcodes for each tumor were prefixed with "TumorID_" #########
 7 | 
 8 | data <- read.table("All_Mcgill_220826_Raw_Expression.txt", sep="\t", head=TRUE, row.names=1)
 9 | 
10 | Tumors.combined <- CreateSeuratObject(counts = data, project = "Mcgill", min.cells = 3, min.features = 200)
11 | 
12 | Tumors.combined[["percent.mt"]] <- PercentageFeatureSet(Tumors.combined, pattern = "^MT.")
13 | 
14 | all.genes <- rownames(Tumors.combined)
15 | 
16 | 
17 | ######## Filtering Low Quality Cells ##############
18 | 
19 | Tumors.combined <- subset(Tumors.combined, subset = nFeature_RNA > 500 & nFeature_RNA < 6000 & nCount_RNA > 1000 & percent.mt < 25)
20 | 
21 | pdf("Mcgill_WT_Mutant_Tumors_QC_AF.pdf", height = 6, width = 20)
22 | VlnPlot(Tumors.combined, features = c("nFeature_RNA", "nCount_RNA", "percent.mt"), ncol = 3)
23 | dev.off()
24 | 
25 | ######### Normalization, Scaling, Identification of Variable Genes and Regression of % of mito genes, PCA ########
26 | 
27 | library(sctransform)
28 | Tumors.combined <- SCTransform(Tumors.combined, vars.to.regress = "percent.mt", verbose = TRUE)
29 | Tumors.combined <- RunPCA(Tumors.combined)
30 | pdf("Mcgill_WT_Mutant_Tumors_ElbowPlot.pdf", height = 6, width = 6)
31 | ElbowPlot(Tumors.combined, ndims=50)
32 | dev.off()
33 | 
34 | write.table(Tumors.combined@meta.data, file="Mcgill_WT_Mutant_Tumors_Integrated_MetaData.txt", sep="\t", col.names=NA, quote=FALSE)
35 | 
36 | 
37 | ################# Louvain Clustering and UMAP Generation ###################
38 | 
39 | 
40 | Tumors.combined <- RunUMAP(Tumors.combined, reduction = "pca", dims = 1:28)
41 | Tumors.combined <- FindNeighbors(Tumors.combined, dims = 1:28)
42 | Tumors.combined <- FindClusters(Tumors.combined, resolution = 0.3)
43 | 
44 | 
45 | pdf("Mcgill_WT_Mutant_Tumors_Clusters_With_Labels.pdf", height= 6, width = 7)
46 | DimPlot(Tumors.combined, reduction = "umap", label=TRUE)
47 | dev.off()
48 | 
49 | pdf("Mcgill_WT_Mutant_Tumors_UMAP_Patient_ID.pdf", height= 6, width = 9)
50 | DimPlot(Tumors.combined, reduction = "umap", group.by="orig.ident")
51 | dev.off()
52 | 
53 | 
54 | write.table(Tumors.combined@meta.data, file="Mcgill_WT_Mutant_Tumors_Integrated_MetaData.txt", sep="\t", col.names=NA, quote=FALSE)
55 | 
56 | 
57 | saveRDS(Tumors.combined, file="Mcgill_Brain_Tumors.rds")
58 | 


--------------------------------------------------------------------------------
/Processing of scRNA-Seq Files (Related to Figure 1)/14- Calculate usage matrix in Mcgill Cohort for cNMF cell annotation programs identified in MGB cohort:
--------------------------------------------------------------------------------
 1 | #####################  Inside R ################################
 2 | library(dplyr)
 3 | library(Seurat)
 4 | options(bitmapType='cairo')
 5 | options(future.globals.maxSize = 8000 * 1024^2)
 6 | 
 7 | Matrix <- GetAssayData(object = Tumors.combined, slot = "counts")
 8 | 
 9 | 
10 | ########## This loads the genes that were used in the overall MGB cNMF (Top 4000 variable genes) ###########
11 | Genes <- scan("./cnmf_run.overdispersed_genes.txt", what="")
12 | 
13 | Matrix2 <- Matrix[rownames(Matrix) %in% Genes,]
14 | 
15 | write.table(t(as.matrix(Matrix2)), file="./Mcgill_Raw_Counts_Variable_for_cNMF.txt", sep="\t", col.names=NA, quote=FALSE)
16 | 
17 | dim(Matrix2) ######### to find out the number for --numgenes below #################
18 | 
19 | q()
20 | 
21 | ########################### Exit R #######################################
22 | 
23 | ##### We run cNMF prepare script to normalize the raw matrix counts ###############
24 | 
25 | cnmf prepare --output-dir ./Calculate_Usage/ --name Calculate_Usage -c Mcgill_Raw_Counts_Variable_for_cNMF.txt -k 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 --n-iter 500 --total-workers 1 --seed 14 --numgenes 2992;
26 | 
27 | 
28 | python;
29 | 
30 | ################################## Inside Python #################################
31 | 
32 | import sklearn
33 | import sklearn.decomposition
34 | from sklearn.decomposition import non_negative_factorization
35 | import numpy as np
36 | import scanpy as sc
37 | import csv
38 | import scipy
39 | import pandas as pd
40 |  
41 | ########### Load the spectra consensus file from the MGB All cells cNMF run ###########
42 | H = np.load("cnmf_run.spectra.k_18.dt_0_015.consensus.df.npz", allow_pickle=True)
43 | 
44 | H2 = pd.DataFrame(H['data'], columns = H['columns'], index = H['index']) 
45 | 
46 | ########## Load the normalized raw counts matrix of the Mcgill cohort (Normalized by "prepare" script) ######
47 | X = sc.read_h5ad('Calculate_Usage.norm_counts.h5ad')
48 | 
49 | X2 = X.X.toarray()
50 | 
51 | X3 = pd.DataFrame(data=X2, columns = X.var_names , index = X.obs.index)
52 | 
53 | H3 = H2.filter(items = X3.columns)
54 | 
55 | H4 = H3.to_numpy()
56 | 
57 | X5 = X2.astype(np.float64)
58 | 
59 | ########## Perform the calculation ###########
60 | test = sklearn.decomposition.non_negative_factorization(X5, W=None, H=H4, n_components= 18, init='random', update_H=False, solver='cd', beta_loss='frobenius', tol=0.0001, max_iter=1000, alpha=0.0, alpha_W=0.0, alpha_H='same', l1_ratio=0.0, regularization=None, random_state=None, verbose=0, shuffle=False)
61 | 
62 | test2 = list(test)
63 | 
64 | pd.DataFrame(test2[0], columns=H['index'], index=X.obs.index).to_csv(path_or_buf="./Mcgill_Glioma_All_cells_Usage.txt", sep="\t", quoting=csv.QUOTE_NONE)
65 | 
66 | 
67 | ############ This script outputs a usage matrix which is then normalized per row to percentages (in each cell, the usages of the programs sums up to 100). This matrix is used for annotating cells ################
68 | 


--------------------------------------------------------------------------------
/Processing of scRNA-Seq Files (Related to Figure 1)/15- Annotation and Doublet Detection.pdf:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/BernsteinLab/Myeloid-Glioma/118e4ce8c41c6a1e121ea1f86f8445da8b74b719/Processing of scRNA-Seq Files (Related to Figure 1)/15- Annotation and Doublet Detection.pdf


--------------------------------------------------------------------------------
/Spatial_transcriptomics/04-cnmf:
--------------------------------------------------------------------------------
 1 | cnmf prepare --output-dir ./results --name spatial -c ./adata.h5ad -k 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 --n-iter 500 --total-workers 1 --seed 14 --numgenes 1500
 2 | 
 3 | cnmf factorize --output-dir ./results --name spatial --worker-index 0;
 4 | 
 5 | cnmf combine --output-dir ./results/ --name spatial;
 6 | 
 7 | cnmf k_selection_plot --output-dir ./results --name spatial;
 8 | 
 9 | 
10 | ##### Based on the K plot, we select K=7 ########
11 | 
12 | cnmf consensus --output-dir ./results --name spatial --components 7 --local-density-threshold 0.1 --show-clustering
13 | 
14 | 
15 | ########## The cnmf consensus script outputs usage matrix which is normalized per row to percentages (in each cell, the usages of the programs sums up to 100) ################
16 | ########## The gene spectra score matrix is used for the annotation of the programs #############
17 | 


--------------------------------------------------------------------------------
/Spatial_transcriptomics/05-meta_programs_usage.ipynb:
--------------------------------------------------------------------------------
  1 | {
  2 |  "cells": [
  3 |   {
  4 |    "cell_type": "code",
  5 |    "execution_count": 1,
  6 |    "id": "766baa2e",
  7 |    "metadata": {},
  8 |    "outputs": [],
  9 |    "source": [
 10 |     "import os\n",
 11 |     "import yaml\n",
 12 |     "import scanpy as sc\n",
 13 |     "import pandas as pd\n",
 14 |     "import numpy as np\n",
 15 |     "from scipy import spatial\n",
 16 |     "import squidpy as sq\n",
 17 |     "from sklearn.decomposition import non_negative_factorization"
 18 |    ]
 19 |   },
 20 |   {
 21 |    "cell_type": "code",
 22 |    "execution_count": 2,
 23 |    "id": "e130256e",
 24 |    "metadata": {},
 25 |    "outputs": [],
 26 |    "source": [
 27 |     "selected_K = 7\n",
 28 |     "density_threshold = 0.1\n",
 29 |     "\n",
 30 |     "input_directory = 'results'\n",
 31 |     "adata_dir = 'adata'\n",
 32 |     "adata_file = 'adata_full.h5ad'\n",
 33 |     "adata_output_directory = 'adata_env'\n",
 34 |     "\n",
 35 |     "program_names = ['env_gray_matter','env_hypoxic','env_white_matter','env_cellular_cancer','env_vasculature',\\\n",
 36 |     "                 'env_astro_inflammatory','env_MT-RPL']"
 37 |    ]
 38 |   },
 39 |   {
 40 |    "cell_type": "code",
 41 |    "execution_count": 3,
 42 |    "id": "7e3a8780",
 43 |    "metadata": {},
 44 |    "outputs": [],
 45 |    "source": [
 46 |     "# prepare\n",
 47 |     "patients = [f[5:-5] for f in os.listdir(adata_dir)  if '.' != f[0]] \n",
 48 |     "density_threshold_str = ('%.2f' % density_threshold).replace('.', '_')\n",
 49 |     "if not os.path.exists(adata_output_directory):\n",
 50 |     "    os.mkdir(adata_output_directory)\n",
 51 |     "    \n",
 52 |     "k_filename = os.path.join(input_directory,'cnmf_run.k_selection_stats.df.npz')\n",
 53 |     "with np.load(k_filename, allow_pickle=True) as f:\n",
 54 |     "    k_obj = pd.DataFrame(**f)"
 55 |    ]
 56 |   },
 57 |   {
 58 |    "cell_type": "code",
 59 |    "execution_count": 4,
 60 |    "id": "e694cbf5",
 61 |    "metadata": {
 62 |     "scrolled": false
 63 |    },
 64 |    "outputs": [],
 65 |    "source": [
 66 |     "# get usage scores\n",
 67 |     "adata = sc.read(adata_file)\n",
 68 |     "    \n",
 69 |     "rf_usages = pd.read_csv(os.path.join(input_directory, 'cnmf_run.usages.k_%d.dt_%s.consensus.txt'\\\n",
 70 |     "                                       %(selected_K, density_threshold_str)), sep='\\t', index_col=0)\n",
 71 |     "rf_usages.columns = program_names\n",
 72 |     "norm_usages = rf_usages.div(rf_usages.sum(axis=1), axis=0)\n",
 73 |     "    \n",
 74 |     "for col in norm_usages:\n",
 75 |     "    adata.obs[col] = norm_usages[col]\n",
 76 |     "                                 \n",
 77 |     "# save adata for all patients\n",
 78 |     "saved_adata = os.path.join('adata_env.h5ad')\n",
 79 |     "adata.write(saved_adata)"
 80 |    ]
 81 |   },
 82 |   {
 83 |    "cell_type": "code",
 84 |    "execution_count": 5,
 85 |    "id": "bd96bfb6",
 86 |    "metadata": {},
 87 |    "outputs": [],
 88 |    "source": [
 89 |     "# save adata for patients separately\n",
 90 |     "for patient in patients:\n",
 91 |     "    adata_patient = sc.read(os.path.join(adata_dir,'adata%s.h5ad'%patient))\n",
 92 |     "    \n",
 93 |     "    usages_patient = adata[adata.obs['patient']==patient,:].obs[program_names]\n",
 94 |     "    usages_patient.index = [index.split('-')[0]+'-1' for index in usages_patient.index]\n",
 95 |     "    \n",
 96 |     "    for col in usages_patient:\n",
 97 |     "        adata_patient.obs[col] = usages_patient[col]\n",
 98 |     "        \n",
 99 |     "    # save\n",
100 |     "    saved_adata = os.path.join(adata_output_directory, 'adata%s.h5ad'%patient)\n",
101 |     "    adata_patient.write(saved_adata)"
102 |    ]
103 |   },
104 |   {
105 |    "cell_type": "code",
106 |    "execution_count": null,
107 |    "id": "5cd4c3dd",
108 |    "metadata": {},
109 |    "outputs": [],
110 |    "source": []
111 |   }
112 |  ],
113 |  "metadata": {
114 |   "kernelspec": {
115 |    "display_name": "Python 3 (ipykernel)",
116 |    "language": "python",
117 |    "name": "python3"
118 |   },
119 |   "language_info": {
120 |    "codemirror_mode": {
121 |     "name": "ipython",
122 |     "version": 3
123 |    },
124 |    "file_extension": ".py",
125 |    "mimetype": "text/x-python",
126 |    "name": "python",
127 |    "nbconvert_exporter": "python",
128 |    "pygments_lexer": "ipython3",
129 |    "version": "3.8.16"
130 |   }
131 |  },
132 |  "nbformat": 4,
133 |  "nbformat_minor": 5
134 | }
135 | 


--------------------------------------------------------------------------------
/Spatial_transcriptomics/06-RCTD_sc_reference_v2.R:
--------------------------------------------------------------------------------
 1 | library(spacexr)
 2 | library(Matrix)
 3 | library(data.table)
 4 | 
 5 | # single-cell reference
 6 | 
 7 | workdir <- '/Users/cpc45/Data/GBM/Henrik_spatial' 
 8 | datadir <- '/Users/cpc45/Data/GBM/Bernstein_SeqWell/Discrete_March2023'
 9 | metadata <- read.table(file.path(datadir, "Discrete4_Final_MetaData.txt"), 
10 |                    sep='\t', header = TRUE, row.names = 1) # load in annotation matrix
11 | counts <- fread(file.path(datadir, "discrete4_mgb_raw_counts.txt"), 
12 |                 sep='\t') # load in counts matrix
13 | 
14 | genes <- as.matrix(counts[,1]); counts[,1]<-NULL
15 | barcodes<-colnames(counts)
16 | cell_types<-metadata[barcodes,]$Annotation
17 | cell_types <- as.factor(cell_types) # convert to factor data type
18 | 
19 | names(cell_types)<- barcodes
20 | 
21 | counts <- apply(counts, 2, function(x) as.numeric(as.character(x)))
22 | counts<-as.data.frame(counts); rownames(counts)<-t(genes); colnames(counts)<-barcodes
23 | nUMI <- colSums(counts)
24 | 
25 | ### Create the Reference object
26 | reference <- Reference(counts, cell_types, nUMI)
27 | 
28 | ## Examine reference object (optional)
29 | print(dim(reference@counts)) #observe Digital Gene Expression matrix
30 | 
31 | table(reference@cell_types) #number of occurences for each cell type
32 | 
33 | ## Save RDS object (optional)
34 | saveRDS(reference, file.path(workdir,'SCRef.rds'))
35 | 


--------------------------------------------------------------------------------
/Spatial_transcriptomics/07-RCTD_make_pucks_all.R:
--------------------------------------------------------------------------------
 1 | library(spacexr)
 2 | library(Matrix)
 3 | library(SPATA2)
 4 | library(anndata)
 5 | 
 6 | workdir<- '/Users/cpc45/Data/GBM/Henrik_spatial/' 
 7 | adatadir<- file.path(workdir,'CancerCell','adata')
 8 | puckdir <- file.path(workdir,'pucks') 
 9 | dir.create(puckdir)
10 | 
11 | for (f in list.files(adatadir) ) {
12 |   adata <- read_h5ad(file.path(adatadir,f))
13 |   file_split <- strsplit(f,"[.]")[[1]]
14 |   sample <- substr(file_split[1], 6, nchar(file_split[1]))
15 |   
16 |   # extract counts matrix
17 |   counts <- t(as.matrix(adata$X)) 
18 |   colnames(counts) <- row.names(adata$obs)
19 |   row.names(counts) <- row.names(adata$var)
20 |   
21 |   # extract coord
22 |   coords <- as.data.frame(adata$obsm$spatial)
23 |   rownames(coords) <- row.names(adata$obs)
24 |   nUMI <- colSums(counts) # In this case, total counts per pixel is nUMI
25 |   
26 |   ### Create SpatialRNA object
27 |   puck <- SpatialRNA(coords, counts, nUMI)
28 |   
29 |   print(head(puck@coords)) # start of coordinate data.frame
30 |   
31 |   saveRDS(puck, file.path(puckdir,sprintf('puck_%s.rds',sample)))
32 | }
33 | 


--------------------------------------------------------------------------------
/Spatial_transcriptomics/08-run_RCTD_all.R:
--------------------------------------------------------------------------------
 1 | library(spacexr)
 2 | library(Matrix)
 3 | 
 4 | workdir<- '/Users/cpc45/Data/GBM/Henrik_spatial'
 5 | puckdir <- file.path(workdir,'pucks')
 6 | RCTD_dir <- file.path(workdir,'RCTD')
 7 | dir.create(RCTD_dir, showWarnings = FALSE)
 8 | 
 9 | reference<- readRDS(file.path(workdir,'SCRef.rds'))
10 | 
11 | for (f in list.files(puckdir) ) {
12 |   patient_sample <- substr(f, 6, nchar(f)-4) # get sample name from file name
13 |   
14 |   puck<-readRDS(file.path(workdir,'pucks',sprintf('puck_%s.rds',patient_sample)))
15 |   
16 |   myRCTD <- create.RCTD(puck, reference, max_cores = 4)
17 |   myRCTD <- run.RCTD(myRCTD, doublet_mode = 'full')
18 |   saveRDS(myRCTD, file.path(RCTD_dir,sprintf('RCTD_%s.rds',patient_sample)))
19 | }
20 | 


--------------------------------------------------------------------------------
/Spatial_transcriptomics/09-RCTD_tocsv_all_patients.R:
--------------------------------------------------------------------------------
 1 | library(spacexr)
 2 | library(Matrix)
 3 | 
 4 | workdir<- '/Users/cpc45/Data/GBM/Henrik_spatial' 
 5 | RCTDir<-file.path(workdir,'RCTD')
 6 | write_dir<-file.path(workdir,'RCTD_csv')
 7 | dir.create(write_dir)
 8 | 
 9 | for (f in list.files(RCTDir) ) {
10 |   sample_id <- substr(f, 6, nchar(f)-4) # get sample name from file name
11 |   myRCTD<- readRDS(file.path(RCTDir,sprintf('RCTD_%s.rds',sample_id)))
12 |   results <- myRCTD@results
13 |   
14 |   # normalize the cell type proportions to sum to 1.
15 |   norm_weights = normalize_weights(results$weights) 
16 |   norm_weights = as.matrix(norm_weights)
17 |   
18 |   write.csv(norm_weights, file.path(write_dir,sprintf('RCTD_%s.csv',sample_id)))
19 | }
20 | 


--------------------------------------------------------------------------------
/Spatial_transcriptomics/14- ScatterPie Visualization of the niches:
--------------------------------------------------------------------------------
 1 | ######## Inside R #########
 2 | 
 3 | library(ggplot2)
 4 | library(scatterpie)
 5 | 
 6 | # example for 1 sample
 7 | 
 8 | sample = 'UKF265_C'
 9 | csvdir = file.path(envdir, sprintf('env%s.csv',sample))
10 | data_ <- read.csv(csvdir)
11 | 
12 | pdf(file.path(plotdir, sprintf('%s.sp.pdf',sample) ))
13 | ggplot() + geom_scatterpie(aes(x=x, y=y, r=50), 
14 |                            data=data_, 
15 |                            cols=colnames(data_)[c(1:6)],
16 |                            color=NA) + 
17 |   coord_equal() + 
18 |   scale_fill_manual(values = c('#b3b3b3','#050505','#f5f3ed','#0098d5','#e62c54','#ffe700')) +
19 |   theme(panel.grid.major = element_blank(), 
20 |                         panel.grid.minor = element_blank(), 
21 |                         panel.background = element_blank(),
22 |                         axis.text.x=element_blank(),
23 |                         axis.ticks.x=element_blank(),
24 |                         axis.text.y=element_blank(),  
25 |                         axis.ticks.y=element_blank()
26 |   ) +
27 |   labs(x="", y="")
28 | dev.off()
29 | 


--------------------------------------------------------------------------------
/scATAC-Seq Analyses/02- Processing GBM C3L_03405  snATAC-Seq library:
--------------------------------------------------------------------------------
  1 | library(Signac)
  2 | library(Seurat)
  3 | library(EnsDb.Hsapiens.v86)
  4 | library(BSgenome.Hsapiens.UCSC.hg38)
  5 | 
  6 | options(future.globals.maxSize = 8000 * 1024^2)
  7 | 
  8 | C3L_03405.data=CreateFragmentObject("./C3L-03405_CPT0224600013_snATAC_GBM/outs/fragments.tsv.gz")
  9 | 
 10 | 
 11 | features <- CallPeaks(C3L_03405.data, format='BED', outdir ='./C3L-03405_CPT0224600013_snATAC_GBM/outs/',name='C3L_03405', cleanup=FALSE)
 12 | 
 13 | C3L_03405_Peaks <- keepStandardChromosomes(features, pruning.mode = "coarse")
 14 | C3L_03405_Peaks2 <- subsetByOverlaps(x = C3L_03405_Peaks, ranges = blacklist_hg38, invert = TRUE)
 15 | 
 16 | saveRDS(C3L_03405_Peaks2, file="C3L_03405_Peaks2.rds")
 17 | 
 18 | C3L_03405.counts <- FeatureMatrix(
 19 |   fragments = C3L_03405.data,
 20 |   features = C3L_03405_Peaks
 21 | )
 22 | 
 23 | 
 24 | C3L_03405.chrom_assay <- CreateChromatinAssay(
 25 |   counts = C3L_03405.counts,
 26 |   sep = c("-", "-"),
 27 |   fragments = './C3L-03405_CPT0224600013_snATAC_GBM/outs/fragments.tsv.gz',
 28 |   min.cells = 10,
 29 |   min.features = 200
 30 | )
 31 | 
 32 | C3L_03405 <- CreateSeuratObject(counts = C3L_03405.chrom_assay, assay = "peaks", project = "C3L_03405")
 33 | 
 34 | annotations.C3L_03405 <- GetGRangesFromEnsDb(ensdb = EnsDb.Hsapiens.v86)
 35 | seqlevels(annotations.C3L_03405) <- paste0('chr', seqlevels(annotations.C3L_03405))
 36 | genome(annotations.C3L_03405) <- "hg38"
 37 | 
 38 | 
 39 | Annotation(C3L_03405) <- annotations.C3L_03405
 40 | 
 41 | C3L_03405 <- NucleosomeSignal(object = C3L_03405)
 42 | C3L_03405 <- TSSEnrichment(object = C3L_03405, fast = FALSE, assay='peaks')
 43 | 
 44 | C3L_03405$blacklist_fraction <- FractionCountsInRegion(
 45 |   object = C3L_03405, 
 46 |   assay = 'peaks',
 47 |   regions = blacklist_hg38
 48 | )
 49 | 
 50 | 
 51 | total_fragments <- CountFragments("./C3L-03405_CPT0224600013_snATAC_GBM/outs/fragments.tsv.gz")
 52 | rownames(total_fragments) <- total_fragments$CB
 53 | C3L_03405 $fragments <- total_fragments[colnames(C3L_03405), "frequency_count"]
 54 | 
 55 | C3L_03405 <- FRiP(
 56 |   object = C3L_03405,
 57 |   assay = 'peaks',
 58 |   total.fragments = 'fragments'
 59 | )
 60 | 
 61 | 
 62 | pdf("C3L_03405_WT_ATAC_DensityScatter.pdf", height=5, width=9)
 63 | DensityScatter(C3L_03405, x = 'nCount_peaks', y = 'TSS.enrichment', log_x = TRUE, quantiles = TRUE)
 64 | dev.off()
 65 | 
 66 | 
 67 | pdf("C3L_03405_WT_ATAC_QC_BF.pdf", height=5, width=12)
 68 | VlnPlot(
 69 |   object = C3L_03405,
 70 |   features = c('nCount_peaks', 'TSS.enrichment', 'blacklist_fraction', 'nucleosome_signal', 'FRiP'),
 71 |   pt.size = 0.1,
 72 |   ncol = 5
 73 | )
 74 | dev.off()
 75 | 
 76 | 
 77 | pdf("C3L_03405_WT_ATAC_TSS_Enrichment.pdf", height=5, width=6)
 78 | C3L_03405$high.tss <- ifelse(C3L_03405$TSS.enrichment > 1.5, 'High', 'Low')
 79 | TSSPlot(C3L_03405, group.by = 'high.tss') + NoLegend()
 80 | dev.off()
 81 | 
 82 | pdf("C3L_03405_WT_ATAC_Nucleosome_signal.pdf", height=5, width=6)
 83 | C3L_03405$nucleosome_group <- ifelse(C3L_03405$nucleosome_signal > 2.5, 'NS > 2.5', 'NS < 2.5')
 84 | FragmentHistogram(object = C3L_03405, group.by = 'nucleosome_group')
 85 | dev.off()
 86 | 
 87 | 
 88 | C3L_03405 <- subset(
 89 |   x = C3L_03405,
 90 |   subset = nCount_peaks > 350 &
 91 |     nCount_peaks < 20000 &
 92 |     FRiP > 0.15 &
 93 |     blacklist_fraction < 0.05 &
 94 |     nucleosome_signal < 2.5 &
 95 |     TSS.enrichment > 1.5
 96 | )
 97 | 
 98 | 
 99 | pdf("C3L_03405_WT_ATAC_QC_AF.pdf", height=5, width=12)
100 | VlnPlot(
101 |   object = C3L_03405,
102 |   features = c('nCount_peaks', 'TSS.enrichment', 'blacklist_fraction', 'nucleosome_signal', 'FRiP'),
103 |   pt.size = 0.1,
104 |   ncol = 5
105 | )
106 | dev.off()
107 | 
108 | 
109 | 
110 | DefaultAssay(C3L_03405) <- "peaks"
111 | 
112 | 
113 | C3L_03405 <- RunTFIDF(C3L_03405)
114 | C3L_03405 <- FindTopFeatures(C3L_03405, min.cutoff = 'q0')
115 | C3L_03405 <- RunSVD(C3L_03405)
116 | 
117 | pdf("C3L_03405_WT_ATAC_DepthCor.pdf", height=5, width=9)
118 | DepthCor(C3L_03405)
119 | dev.off()
120 | 
121 | 
122 | C3L_03405 <- RunUMAP(object = C3L_03405, reduction = 'lsi', dims = 2:30)
123 | C3L_03405 <- FindNeighbors(object = C3L_03405, reduction = 'lsi', dims = 2:30)
124 | C3L_03405 <- FindClusters(object = C3L_03405, verbose = FALSE, algorithm = 3)
125 | 
126 | 
127 | pdf("C3L_03405_WT_ATAC_UMAP.pdf", height=5, width=7)
128 | DimPlot(object = C3L_03405, label = TRUE) + NoLegend()
129 | dev.off()
130 | 
131 | 
132 | gene.activities <- GeneActivity(C3L_03405)
133 | 
134 | 
135 | C3L_03405[['RNA']] <- CreateAssayObject(counts = gene.activities)
136 | C3L_03405 <- NormalizeData(
137 |   object = C3L_03405,
138 |   assay = 'RNA',
139 |   normalization.method = 'RC',
140 |   scale.factor = median(C3L_03405$nCount_RNA)
141 | )
142 | 
143 | write.table(gene.activities, file="C3L_03405_WT_ATAC_Gene_Activities.txt", sep="\t", col.names=NA, quote=FALSE)
144 | 
145 | saveRDS(C3L_03405, file="C3L_03405_WT_ATAC.rds")
146 | 


--------------------------------------------------------------------------------
/scATAC-Seq Analyses/03- Processing GBM C3L_03968  snATAC-Seq library:
--------------------------------------------------------------------------------
  1 | library(Signac)
  2 | library(Seurat)
  3 | library(EnsDb.Hsapiens.v86)
  4 | library(BSgenome.Hsapiens.UCSC.hg38)
  5 | 
  6 | options(future.globals.maxSize = 8000 * 1024^2)
  7 | 
  8 | C3L_03968.data=CreateFragmentObject("./C3L-03968_CPT0228220004_snATAC_GBM/outs/fragments.tsv.gz")
  9 | 
 10 | 
 11 | features <- CallPeaks(C3L_03968.data, format='BED', outdir ='./C3L-02705_CPT0189650015_snATAC_GBM/outs/',name='C3L_03968', cleanup=FALSE)
 12 | 
 13 | 
 14 | C3L_03968_Peaks <- keepStandardChromosomes(features, pruning.mode = "coarse")
 15 | C3L_03968_Peaks2 <- subsetByOverlaps(x = C3L_03968_Peaks, ranges = blacklist_hg38, invert = TRUE)
 16 | 
 17 | saveRDS(C3L_03968_Peaks2, file="C3L_03968_Peaks.rds")
 18 | 
 19 | C3L_03968.counts <- FeatureMatrix(
 20 |   fragments = C3L_03968.data,
 21 |   features = C3L_03968_Peaks
 22 | )
 23 | 
 24 | 
 25 | C3L_03968.chrom_assay <- CreateChromatinAssay(
 26 |   counts = C3L_03968.counts,
 27 |   sep = c("-", "-"),
 28 |   fragments = './C3L-03968_CPT0228220004_snATAC_GBM/outs/fragments.tsv.gz',
 29 |   min.cells = 10,
 30 |   min.features = 200
 31 | )
 32 | 
 33 | C3L_03968 <- CreateSeuratObject(counts = C3L_03968.chrom_assay, assay = "peaks", project = "C3L_03968")
 34 | 
 35 | annotations.C3L_03968 <- GetGRangesFromEnsDb(ensdb = EnsDb.Hsapiens.v86)
 36 | seqlevels(annotations.C3L_03968) <- paste0('chr', seqlevels(annotations.C3L_03968))
 37 | genome(annotations.C3L_03968) <- "hg38"
 38 | 
 39 | 
 40 | Annotation(C3L_03968) <- annotations.C3L_03968
 41 | 
 42 | C3L_03968 <- NucleosomeSignal(object = C3L_03968)
 43 | C3L_03968 <- TSSEnrichment(object = C3L_03968, fast = FALSE, assay='peaks')
 44 | 
 45 | C3L_03968$blacklist_fraction <- FractionCountsInRegion(
 46 |   object = C3L_03968, 
 47 |   assay = 'peaks',
 48 |   regions = blacklist_hg38
 49 | )
 50 | 
 51 | 
 52 | total_fragments <- CountFragments("./C3L-03968_CPT0228220004_snATAC_GBM/outs/fragments.tsv.gz")
 53 | rownames(total_fragments) <- total_fragments$CB
 54 | C3L_03968 $fragments <- total_fragments[colnames(C3L_03968), "frequency_count"]
 55 | 
 56 | C3L_03968 <- FRiP(
 57 |   object = C3L_03968,
 58 |   assay = 'peaks',
 59 |   total.fragments = 'fragments'
 60 | )
 61 | 
 62 | 
 63 | pdf("C3L_03968_WT_ATAC_DensityScatter.pdf", height=5, width=9)
 64 | DensityScatter(C3L_03968, x = 'nCount_peaks', y = 'TSS.enrichment', log_x = TRUE, quantiles = TRUE)
 65 | dev.off()
 66 | 
 67 | 
 68 | pdf("C3L_03968_WT_ATAC_QC_BF.pdf", height=5, width=12)
 69 | VlnPlot(
 70 |   object = C3L_03968,
 71 |   features = c('nCount_peaks', 'TSS.enrichment', 'blacklist_fraction', 'nucleosome_signal', 'FRiP'),
 72 |   pt.size = 0.1,
 73 |   ncol = 5
 74 | )
 75 | dev.off()
 76 | 
 77 | 
 78 | pdf("C3L_03968_WT_ATAC_TSS_Enrichment.pdf", height=5, width=6)
 79 | C3L_03968$high.tss <- ifelse(C3L_03968$TSS.enrichment > 1.5, 'High', 'Low')
 80 | TSSPlot(C3L_03968, group.by = 'high.tss') + NoLegend()
 81 | dev.off()
 82 | 
 83 | pdf("C3L_03968_WT_ATAC_Nucleosome_signal.pdf", height=5, width=6)
 84 | C3L_03968$nucleosome_group <- ifelse(C3L_03968$nucleosome_signal > 2.5, 'NS > 2.5', 'NS < 2.5')
 85 | FragmentHistogram(object = C3L_03968, group.by = 'nucleosome_group')
 86 | dev.off()
 87 | 
 88 | 
 89 | C3L_03968 <- subset(
 90 |   x = C3L_03968,
 91 |   subset = nCount_peaks > 350 &
 92 |     nCount_peaks < 25000 &
 93 |     FRiP > 0.15 &
 94 |     blacklist_fraction < 0.05 &
 95 |     nucleosome_signal < 2.5 &
 96 |     TSS.enrichment > 1.5
 97 | )
 98 | 
 99 | 
100 | pdf("C3L_03968_WT_ATAC_QC_AF.pdf", height=5, width=12)
101 | VlnPlot(
102 |   object = C3L_03968,
103 |   features = c('nCount_peaks', 'TSS.enrichment', 'blacklist_fraction', 'nucleosome_signal', 'FRiP'),
104 |   pt.size = 0.1,
105 |   ncol = 5
106 | )
107 | dev.off()
108 | 
109 | 
110 | DefaultAssay(C3L_03968) <- "peaks"
111 | 
112 | 
113 | C3L_03968 <- RunTFIDF(C3L_03968)
114 | C3L_03968 <- FindTopFeatures(C3L_03968, min.cutoff = 'q0')
115 | C3L_03968 <- RunSVD(C3L_03968)
116 | 
117 | pdf("C3L_03968_WT_ATAC_DepthCor.pdf", height=5, width=9)
118 | DepthCor(C3L_03968)
119 | dev.off()
120 | 
121 | 
122 | C3L_03968 <- RunUMAP(object = C3L_03968, reduction = 'lsi', dims = 2:30)
123 | C3L_03968 <- FindNeighbors(object = C3L_03968, reduction = 'lsi', dims = 2:30)
124 | C3L_03968 <- FindClusters(object = C3L_03968, verbose = FALSE, algorithm = 3)
125 | 
126 | 
127 | pdf("C3L_03968_WT_ATAC_UMAP.pdf", height=5, width=7)
128 | DimPlot(object = C3L_03968, label = TRUE) + NoLegend()
129 | dev.off()
130 | 
131 | 
132 | gene.activities <- GeneActivity(C3L_03968)
133 | 
134 | 
135 | C3L_03968[['RNA']] <- CreateAssayObject(counts = gene.activities)
136 | C3L_03968 <- NormalizeData(
137 |   object = C3L_03968,
138 |   assay = 'RNA',
139 |   normalization.method = 'RC',
140 |   scale.factor = median(C3L_03968$nCount_RNA)
141 | )
142 | 
143 | write.table(gene.activities, file="C3L_03968_WT_ATAC_Gene_Activities.txt", sep="\t", col.names=NA, quote=FALSE)
144 | 
145 | saveRDS(C3L_03968, file="C3L_03968_WT_ATAC.rds")
146 | 


--------------------------------------------------------------------------------
/scATAC-Seq Analyses/04- Processing GBM C3N_00662  snATAC-Seq library:
--------------------------------------------------------------------------------
  1 | library(Signac)
  2 | library(Seurat)
  3 | library(EnsDb.Hsapiens.v86)
  4 | library(BSgenome.Hsapiens.UCSC.hg38)
  5 | 
  6 | options(future.globals.maxSize = 8000 * 1024^2)
  7 | 
  8 | C3N_00662.data=CreateFragmentObject("./C3N-00662_CPT0087680014_snATAC_GBM/outs/fragments.tsv.gz")
  9 | 
 10 | 
 11 | features <- CallPeaks(C3N_00662.data, format='BED', outdir ='./C3N-00662_CPT0087680014_snATAC_GBM/outs/',name='C3N_00662', cleanup=FALSE)
 12 | 
 13 | 
 14 | C3N_00662_Peaks <- keepStandardChromosomes(features, pruning.mode = "coarse")
 15 | C3N_00662_Peaks2 <- subsetByOverlaps(x = C3N_00662_Peaks, ranges = blacklist_hg38, invert = TRUE)
 16 | 
 17 | saveRDS(C3N_00662_Peaks2, file="C3N_00662_Peaks.rds")
 18 | 
 19 | C3N_00662.counts <- FeatureMatrix(
 20 |   fragments = C3N_00662.data,
 21 |   features = C3N_00662_Peaks
 22 | )
 23 | 
 24 | 
 25 | C3N_00662.chrom_assay <- CreateChromatinAssay(
 26 |   counts = C3N_00662.counts,
 27 |   sep = c("-", "-"),
 28 |   fragments = './C3N-00662_CPT0087680014_snATAC_GBM/outs/fragments.tsv.gz',
 29 |   min.cells = 10,
 30 |   min.features = 200
 31 | )
 32 | 
 33 | C3N_00662 <- CreateSeuratObject(counts = C3N_00662.chrom_assay, assay = "peaks", project = "C3N_00662")
 34 | 
 35 | annotations.C3N_00662 <- GetGRangesFromEnsDb(ensdb = EnsDb.Hsapiens.v86)
 36 | seqlevels(annotations.C3N_00662) <- paste0('chr', seqlevels(annotations.C3N_00662))
 37 | genome(annotations.C3N_00662) <- "hg38"
 38 | 
 39 | 
 40 | Annotation(C3N_00662) <- annotations.C3N_00662
 41 | 
 42 | C3N_00662 <- NucleosomeSignal(object = C3N_00662)
 43 | C3N_00662 <- TSSEnrichment(object = C3N_00662, fast = FALSE, assay='peaks')
 44 | 
 45 | C3N_00662$blacklist_fraction <- FractionCountsInRegion(
 46 |   object = C3N_00662, 
 47 |   assay = 'peaks',
 48 |   regions = blacklist_hg38
 49 | )
 50 | 
 51 | 
 52 | total_fragments <- CountFragments("./C3N-00662_CPT0087680014_snATAC_GBM/outs/fragments.tsv.gz")
 53 | rownames(total_fragments) <- total_fragments$CB
 54 | C3N_00662 $fragments <- total_fragments[colnames(C3N_00662), "frequency_count"]
 55 | 
 56 | C3N_00662 <- FRiP(
 57 |   object = C3N_00662,
 58 |   assay = 'peaks',
 59 |   total.fragments = 'fragments'
 60 | )
 61 | 
 62 | 
 63 | pdf("C3N_00662_WT_ATAC_DensityScatter.pdf", height=5, width=9)
 64 | DensityScatter(C3N_00662, x = 'nCount_peaks', y = 'TSS.enrichment', log_x = TRUE, quantiles = TRUE)
 65 | dev.off()
 66 | 
 67 | 
 68 | pdf("C3N_00662_WT_ATAC_QC_BF.pdf", height=5, width=12)
 69 | VlnPlot(
 70 |   object = C3N_00662,
 71 |   features = c('nCount_peaks', 'TSS.enrichment', 'blacklist_fraction', 'nucleosome_signal', 'FRiP'),
 72 |   pt.size = 0.1,
 73 |   ncol = 5
 74 | )
 75 | dev.off()
 76 | 
 77 | 
 78 | pdf("C3N_00662_WT_ATAC_TSS_Enrichment.pdf", height=5, width=6)
 79 | C3N_00662$high.tss <- ifelse(C3N_00662$TSS.enrichment > 1.5, 'High', 'Low')
 80 | TSSPlot(C3N_00662, group.by = 'high.tss') + NoLegend()
 81 | dev.off()
 82 | 
 83 | pdf("C3N_00662_WT_ATAC_Nucleosome_signal.pdf", height=5, width=6)
 84 | C3N_00662$nucleosome_group <- ifelse(C3N_00662$nucleosome_signal > 2, 'NS > 2', 'NS < 2')
 85 | FragmentHistogram(object = C3N_00662, group.by = 'nucleosome_group')
 86 | dev.off()
 87 | 
 88 | 
 89 | C3N_00662 <- subset(
 90 |   x = C3N_00662,
 91 |   subset = nCount_peaks > 350 &
 92 |     nCount_peaks < 25000 &
 93 |     FRiP > 0.15 &
 94 |     blacklist_fraction < 0.05 &
 95 |     nucleosome_signal < 2 &
 96 |     TSS.enrichment > 1.5
 97 | )
 98 | 
 99 | 
100 | pdf("C3N_00662_WT_ATAC_QC_AF.pdf", height=5, width=12)
101 | VlnPlot(
102 |   object = C3N_00662,
103 |   features = c('nCount_peaks', 'TSS.enrichment', 'blacklist_fraction', 'nucleosome_signal', 'FRiP'),
104 |   pt.size = 0.1,
105 |   ncol = 5
106 | )
107 | dev.off()
108 | 
109 | 
110 | DefaultAssay(C3N_00662) <- "peaks"
111 | 
112 | 
113 | C3N_00662 <- RunTFIDF(C3N_00662)
114 | C3N_00662 <- FindTopFeatures(C3N_00662, min.cutoff = 'q0')
115 | C3N_00662 <- RunSVD(C3N_00662)
116 | 
117 | pdf("C3N_00662_WT_ATAC_DepthCor.pdf", height=5, width=9)
118 | DepthCor(C3N_00662)
119 | dev.off()
120 | 
121 | 
122 | C3N_00662 <- RunUMAP(object = C3N_00662, reduction = 'lsi', dims = 2:30)
123 | C3N_00662 <- FindNeighbors(object = C3N_00662, reduction = 'lsi', dims = 2:30)
124 | C3N_00662 <- FindClusters(object = C3N_00662, verbose = FALSE, algorithm = 3)
125 | 
126 | 
127 | pdf("C3N_00662_WT_ATAC_UMAP.pdf", height=5, width=7)
128 | DimPlot(object = C3N_00662, label = TRUE) + NoLegend()
129 | dev.off()
130 | 
131 | 
132 | gene.activities <- GeneActivity(C3N_00662)
133 | 
134 | 
135 | C3N_00662[['RNA']] <- CreateAssayObject(counts = gene.activities)
136 | C3N_00662 <- NormalizeData(
137 |   object = C3N_00662,
138 |   assay = 'RNA',
139 |   normalization.method = 'RC',
140 |   scale.factor = median(C3N_00662$nCount_RNA)
141 | )
142 | 
143 | write.table(gene.activities, file="C3N_00662_WT_ATAC_Gene_Activities.txt", sep="\t", col.names=NA, quote=FALSE)
144 | 
145 | saveRDS(C3N_00662, file="C3N_00662_WT_ATAC.rds")
146 | 


--------------------------------------------------------------------------------
/scATAC-Seq Analyses/05 - Processing GBM C3N_00663  snATAC-Seq library:
--------------------------------------------------------------------------------
  1 | library(Signac)
  2 | library(Seurat)
  3 | library(EnsDb.Hsapiens.v86)
  4 | library(BSgenome.Hsapiens.UCSC.hg38)
  5 | 
  6 | options(future.globals.maxSize = 8000 * 1024^2)
  7 | 
  8 | C3N_00663.data=CreateFragmentObject("./C3N-00663_CPT0087730014_snATAC_GBM/outs/fragments.tsv.gz")
  9 | 
 10 | 
 11 | features <- CallPeaks(C3N_00663.data, format='BED', outdir ='./C3L-03405_CPT0224600013_snATAC_GBM/outs/',name='C3N_00663', cleanup=FALSE)
 12 | 
 13 | C3N_00663_Peaks <- keepStandardChromosomes(features, pruning.mode = "coarse")
 14 | C3N_00663_Peaks2 <- subsetByOverlaps(x = C3N_00663_Peaks, ranges = blacklist_hg38, invert = TRUE)
 15 | 
 16 | saveRDS(C3N_00663_Peaks2, file="C3N_00663_Peaks2.rds")
 17 | 
 18 | C3N_00663.counts <- FeatureMatrix(
 19 |   fragments = C3N_00663.data,
 20 |   features = C3N_00663_Peaks
 21 | )
 22 | 
 23 | 
 24 | C3N_00663.chrom_assay <- CreateChromatinAssay(
 25 |   counts = C3N_00663.counts,
 26 |   sep = c("-", "-"),
 27 |   fragments = './C3N-00663_CPT0087730014_snATAC_GBM/outs/fragments.tsv.gz',
 28 |   min.cells = 10,
 29 |   min.features = 200
 30 | )
 31 | 
 32 | C3N_00663 <- CreateSeuratObject(counts = C3N_00663.chrom_assay, assay = "peaks", project = "C3N_00663")
 33 | 
 34 | annotations.C3N_00663 <- GetGRangesFromEnsDb(ensdb = EnsDb.Hsapiens.v86)
 35 | seqlevels(annotations.C3N_00663) <- paste0('chr', seqlevels(annotations.C3N_00663))
 36 | genome(annotations.C3N_00663) <- "hg38"
 37 | 
 38 | 
 39 | Annotation(C3N_00663) <- annotations.C3N_00663
 40 | 
 41 | C3N_00663 <- NucleosomeSignal(object = C3N_00663)
 42 | C3N_00663 <- TSSEnrichment(object = C3N_00663, fast = FALSE, assay='peaks')
 43 | 
 44 | C3N_00663$blacklist_fraction <- FractionCountsInRegion(
 45 |   object = C3N_00663, 
 46 |   assay = 'peaks',
 47 |   regions = blacklist_hg38
 48 | )
 49 | 
 50 | 
 51 | total_fragments <- CountFragments("./C3N-00663_CPT0087730014_snATAC_GBM/outs/fragments.tsv.gz")
 52 | rownames(total_fragments) <- total_fragments$CB
 53 | C3N_00663 $fragments <- total_fragments[colnames(C3N_00663), "frequency_count"]
 54 | 
 55 | C3N_00663 <- FRiP(
 56 |   object = C3N_00663,
 57 |   assay = 'peaks',
 58 |   total.fragments = 'fragments'
 59 | )
 60 | 
 61 | 
 62 | pdf("C3N_00663_WT_ATAC_DensityScatter.pdf", height=5, width=9)
 63 | DensityScatter(C3N_00663, x = 'nCount_peaks', y = 'TSS.enrichment', log_x = TRUE, quantiles = TRUE)
 64 | dev.off()
 65 | 
 66 | 
 67 | pdf("C3N_00663_WT_ATAC_QC_BF.pdf", height=5, width=12)
 68 | VlnPlot(
 69 |   object = C3N_00663,
 70 |   features = c('nCount_peaks', 'TSS.enrichment', 'blacklist_fraction', 'nucleosome_signal', 'FRiP'),
 71 |   pt.size = 0.1,
 72 |   ncol = 5
 73 | )
 74 | dev.off()
 75 | 
 76 | 
 77 | pdf("C3N_00663_WT_ATAC_TSS_Enrichment.pdf", height=5, width=6)
 78 | C3N_00663$high.tss <- ifelse(C3N_00663$TSS.enrichment > 1.5, 'High', 'Low')
 79 | TSSPlot(C3N_00663, group.by = 'high.tss') + NoLegend()
 80 | dev.off()
 81 | 
 82 | pdf("C3N_00663_WT_ATAC_Nucleosome_signal.pdf", height=5, width=6)
 83 | C3N_00663$nucleosome_group <- ifelse(C3N_00663$nucleosome_signal > 2.5, 'NS > 2', 'NS < 2.5')
 84 | FragmentHistogram(object = C3N_00663, group.by = 'nucleosome_group')
 85 | dev.off()
 86 | 
 87 | 
 88 | C3N_00663 <- subset(
 89 |   x = C3N_00663,
 90 |   subset = nCount_peaks > 350 &
 91 |     nCount_peaks < 12500 &
 92 |     FRiP > 0.15 &
 93 |     blacklist_fraction < 0.05 &
 94 |     nucleosome_signal < 2.5 &
 95 |     TSS.enrichment > 1.5
 96 | )
 97 | 
 98 | 
 99 | pdf("C3N_00663_WT_ATAC_QC_AF.pdf", height=5, width=12)
100 | VlnPlot(
101 |   object = C3N_00663,
102 |   features = c('nCount_peaks', 'TSS.enrichment', 'blacklist_fraction', 'nucleosome_signal', 'FRiP'),
103 |   pt.size = 0.1,
104 |   ncol = 5
105 | )
106 | dev.off()
107 | 
108 | 
109 | 
110 | DefaultAssay(C3N_00663) <- "peaks"
111 | 
112 | 
113 | C3N_00663 <- RunTFIDF(C3N_00663)
114 | C3N_00663 <- FindTopFeatures(C3N_00663, min.cutoff = 'q0')
115 | C3N_00663 <- RunSVD(C3N_00663)
116 | 
117 | pdf("C3N_00663_WT_ATAC_DepthCor.pdf", height=5, width=9)
118 | DepthCor(C3N_00663)
119 | dev.off()
120 | 
121 | 
122 | C3N_00663 <- RunUMAP(object = C3N_00663, reduction = 'lsi', dims = 2:30)
123 | C3N_00663 <- FindNeighbors(object = C3N_00663, reduction = 'lsi', dims = 2:30)
124 | C3N_00663 <- FindClusters(object = C3N_00663, verbose = FALSE, algorithm = 3)
125 | 
126 | 
127 | pdf("C3N_00663_WT_ATAC_UMAP.pdf", height=5, width=7)
128 | DimPlot(object = C3N_00663, label = TRUE) + NoLegend()
129 | dev.off()
130 | 
131 | 
132 | gene.activities <- GeneActivity(C3N_00663)
133 | 
134 | 
135 | C3N_00663[['RNA']] <- CreateAssayObject(counts = gene.activities)
136 | C3N_00663 <- NormalizeData(
137 |   object = C3N_00663,
138 |   assay = 'RNA',
139 |   normalization.method = 'RC',
140 |   scale.factor = median(C3N_00663$nCount_RNA)
141 | )
142 | 
143 | write.table(gene.activities, file="C3N_00663_WT_ATAC_Gene_Activities.txt", sep="\t", col.names=NA, quote=FALSE)
144 | 
145 | saveRDS(C3N_00663, file="C3N_00663_WT_ATAC.rds")
146 | 


--------------------------------------------------------------------------------
/scATAC-Seq Analyses/07- Processing GBM C3N_01518  snATAC-Seq library:
--------------------------------------------------------------------------------
  1 | library(Signac)
  2 | library(Seurat)
  3 | library(EnsDb.Hsapiens.v86)
  4 | library(BSgenome.Hsapiens.UCSC.hg38)
  5 | 
  6 | options(future.globals.maxSize = 8000 * 1024^2)
  7 | 
  8 | C3N_01518.data=CreateFragmentObject("./C3N-01518_CPT0167640014_snATAC_GBM/outs/fragments.tsv.gz")
  9 | 
 10 | 
 11 | features <- CallPeaks(C3N_01518.data, format='BED', outdir ='./C3N-01518_CPT0167640014_snATAC_GBM/outs/',name='C3N_01518', cleanup=FALSE)
 12 | 
 13 | 
 14 | C3N_01518_Peaks <- keepStandardChromosomes(features, pruning.mode = "coarse")
 15 | C3N_01518_Peaks2 <- subsetByOverlaps(x = C3N_01518_Peaks, ranges = blacklist_hg38, invert = TRUE)
 16 | 
 17 | saveRDS(C3N_01518_Peaks2, file="C3N_01518_Peaks.rds")
 18 | 
 19 | C3N_01518.counts <- FeatureMatrix(
 20 |   fragments = C3N_01518.data,
 21 |   features = C3N_01518_Peaks
 22 | )
 23 | 
 24 | 
 25 | C3N_01518.chrom_assay <- CreateChromatinAssay(
 26 |   counts = C3N_01518.counts,
 27 |   sep = c("-", "-"),
 28 |   fragments = './C3N-01518_CPT0167640014_snATAC_GBM/outs/fragments.tsv.gz',
 29 |   min.cells = 10,
 30 |   min.features = 200
 31 | )
 32 | 
 33 | C3N_01518 <- CreateSeuratObject(counts = C3N_01518.chrom_assay, assay = "peaks", project = "C3N_01518")
 34 | 
 35 | annotations.C3N_01518 <- GetGRangesFromEnsDb(ensdb = EnsDb.Hsapiens.v86)
 36 | seqlevels(annotations.C3N_01518) <- paste0('chr', seqlevels(annotations.C3N_01518))
 37 | genome(annotations.C3N_01518) <- "hg38"
 38 | 
 39 | 
 40 | Annotation(C3N_01518) <- annotations.C3N_01518
 41 | 
 42 | C3N_01518 <- NucleosomeSignal(object = C3N_01518)
 43 | C3N_01518 <- TSSEnrichment(object = C3N_01518, fast = FALSE, assay='peaks')
 44 | 
 45 | C3N_01518$blacklist_fraction <- FractionCountsInRegion(
 46 |   object = C3N_01518, 
 47 |   assay = 'peaks',
 48 |   regions = blacklist_hg38
 49 | )
 50 | 
 51 | 
 52 | total_fragments <- CountFragments("./C3N-01518_CPT0167640014_snATAC_GBM/outs/fragments.tsv.gz")
 53 | rownames(total_fragments) <- total_fragments$CB
 54 | C3N_01518 $fragments <- total_fragments[colnames(C3N_01518), "frequency_count"]
 55 | 
 56 | C3N_01518 <- FRiP(
 57 |   object = C3N_01518,
 58 |   assay = 'peaks',
 59 |   total.fragments = 'fragments'
 60 | )
 61 | 
 62 | 
 63 | pdf("C3N_01518_WT_ATAC_DensityScatter.pdf", height=5, width=9)
 64 | DensityScatter(C3N_01518, x = 'nCount_peaks', y = 'TSS.enrichment', log_x = TRUE, quantiles = TRUE)
 65 | dev.off()
 66 | 
 67 | 
 68 | pdf("C3N_01518_WT_ATAC_QC_BF.pdf", height=5, width=12)
 69 | VlnPlot(
 70 |   object = C3N_01518,
 71 |   features = c('nCount_peaks', 'TSS.enrichment', 'blacklist_fraction', 'nucleosome_signal', 'FRiP'),
 72 |   pt.size = 0.1,
 73 |   ncol = 5
 74 | )
 75 | dev.off()
 76 | 
 77 | 
 78 | pdf("C3N_01518_WT_ATAC_TSS_Enrichment.pdf", height=5, width=6)
 79 | C3N_01518$high.tss <- ifelse(C3N_01518$TSS.enrichment > 1.5, 'High', 'Low')
 80 | TSSPlot(C3N_01518, group.by = 'high.tss') + NoLegend()
 81 | dev.off()
 82 | 
 83 | pdf("C3N_01518_WT_ATAC_Nucleosome_signal.pdf", height=5, width=6)
 84 | C3N_01518$nucleosome_group <- ifelse(C3N_01518$nucleosome_signal > 2, 'NS > 2', 'NS < 2')
 85 | FragmentHistogram(object = C3N_01518, group.by = 'nucleosome_group')
 86 | dev.off()
 87 | 
 88 | 
 89 | C3N_01518 <- subset(
 90 |   x = C3N_01518,
 91 |   subset = nCount_peaks > 300 &
 92 |     nCount_peaks < 7500 &
 93 |     FRiP > 0.15 &
 94 |     blacklist_fraction < 0.05 &
 95 |     nucleosome_signal < 2 &
 96 |     TSS.enrichment > 1.5
 97 | )
 98 | 
 99 | 
100 | pdf("C3N_01518_WT_ATAC_QC_AF.pdf", height=5, width=12)
101 | VlnPlot(
102 |   object = C3N_01518,
103 |   features = c('nCount_peaks', 'TSS.enrichment', 'blacklist_fraction', 'nucleosome_signal', 'FRiP'),
104 |   pt.size = 0.1,
105 |   ncol = 5
106 | )
107 | dev.off()
108 | 
109 | 
110 | DefaultAssay(C3N_01518) <- "peaks"
111 | 
112 | 
113 | C3N_01518 <- RunTFIDF(C3N_01518)
114 | C3N_01518 <- FindTopFeatures(C3N_01518, min.cutoff = 'q0')
115 | C3N_01518 <- RunSVD(C3N_01518)
116 | 
117 | pdf("C3N_01518_WT_ATAC_DepthCor.pdf", height=5, width=9)
118 | DepthCor(C3N_01518)
119 | dev.off()
120 | 
121 | 
122 | C3N_01518 <- RunUMAP(object = C3N_01518, reduction = 'lsi', dims = 2:30)
123 | C3N_01518 <- FindNeighbors(object = C3N_01518, reduction = 'lsi', dims = 2:30)
124 | C3N_01518 <- FindClusters(object = C3N_01518, verbose = FALSE, algorithm = 3)
125 | 
126 | 
127 | pdf("C3N_01518_WT_ATAC_UMAP.pdf", height=5, width=7)
128 | DimPlot(object = C3N_01518, label = TRUE) + NoLegend()
129 | dev.off()
130 | 
131 | 
132 | gene.activities <- GeneActivity(C3N_01518)
133 | 
134 | 
135 | C3N_01518[['RNA']] <- CreateAssayObject(counts = gene.activities)
136 | C3N_01518 <- NormalizeData(
137 |   object = C3N_01518,
138 |   assay = 'RNA',
139 |   normalization.method = 'RC',
140 |   scale.factor = median(C3N_01518$nCount_RNA)
141 | )
142 | 
143 | write.table(gene.activities, file="C3N_01518_WT_ATAC_Gene_Activities.txt", sep="\t", col.names=NA, quote=FALSE)
144 | 
145 | saveRDS(C3N_01518, file="C3N_01518_WT_ATAC.rds")
146 | 


--------------------------------------------------------------------------------
/scATAC-Seq Analyses/08- Processing GBM C3N_01798  snATAC-Seq library:
--------------------------------------------------------------------------------
  1 | library(Signac)
  2 | library(Seurat)
  3 | library(EnsDb.Hsapiens.v86)
  4 | library(BSgenome.Hsapiens.UCSC.hg38)
  5 | 
  6 | options(future.globals.maxSize = 8000 * 1024^2)
  7 | 
  8 | C3N_01798.data=CreateFragmentObject("./C3N-01798_CPT0167750015_snATAC_GBM/outs/fragments.tsv.gz")
  9 | 
 10 | 
 11 | features <- CallPeaks(C3N_01798.data, format='BED', outdir ='./C3N-01798_CPT0167750015_snATAC_GBM/outs/',name='C3N_01798', cleanup=FALSE)
 12 | 
 13 | 
 14 | C3N_01798_Peaks <- keepStandardChromosomes(features, pruning.mode = "coarse")
 15 | C3N_01798_Peaks2 <- subsetByOverlaps(x = C3N_01798_Peaks, ranges = blacklist_hg38, invert = TRUE)
 16 | 
 17 | saveRDS(C3N_01798_Peaks2, file="C3N_01798_Peaks.rds")
 18 | 
 19 | C3N_01798.counts <- FeatureMatrix(
 20 |   fragments = C3N_01798.data,
 21 |   features = C3N_01798_Peaks
 22 | )
 23 | 
 24 | 
 25 | C3N_01798.chrom_assay <- CreateChromatinAssay(
 26 |   counts = C3N_01798.counts,
 27 |   sep = c("-", "-"),
 28 |   fragments = './C3N-01798_CPT0167750015_snATAC_GBM/outs/fragments.tsv.gz',
 29 |   min.cells = 10,
 30 |   min.features = 200
 31 | )
 32 | 
 33 | C3N_01798 <- CreateSeuratObject(counts = C3N_01798.chrom_assay, assay = "peaks", project = "C3N_01798")
 34 | 
 35 | annotations.C3N_01798 <- GetGRangesFromEnsDb(ensdb = EnsDb.Hsapiens.v86)
 36 | seqlevels(annotations.C3N_01798) <- paste0('chr', seqlevels(annotations.C3N_01798))
 37 | genome(annotations.C3N_01798) <- "hg38"
 38 | 
 39 | 
 40 | Annotation(C3N_01798) <- annotations.C3N_01798
 41 | 
 42 | C3N_01798 <- NucleosomeSignal(object = C3N_01798)
 43 | C3N_01798 <- TSSEnrichment(object = C3N_01798, fast = FALSE, assay='peaks')
 44 | 
 45 | C3N_01798$blacklist_fraction <- FractionCountsInRegion(
 46 |   object = C3N_01798, 
 47 |   assay = 'peaks',
 48 |   regions = blacklist_hg38
 49 | )
 50 | 
 51 | 
 52 | total_fragments <- CountFragments("./C3N-01798_CPT0167750015_snATAC_GBM/outs/fragments.tsv.gz")
 53 | rownames(total_fragments) <- total_fragments$CB
 54 | C3N_01798 $fragments <- total_fragments[colnames(C3N_01798), "frequency_count"]
 55 | 
 56 | C3N_01798 <- FRiP(
 57 |   object = C3N_01798,
 58 |   assay = 'peaks',
 59 |   total.fragments = 'fragments'
 60 | )
 61 | 
 62 | 
 63 | pdf("C3N_01798_WT_ATAC_DensityScatter.pdf", height=5, width=9)
 64 | DensityScatter(C3N_01798, x = 'nCount_peaks', y = 'TSS.enrichment', log_x = TRUE, quantiles = TRUE)
 65 | dev.off()
 66 | 
 67 | 
 68 | pdf("C3N_01798_WT_ATAC_QC_BF.pdf", height=5, width=12)
 69 | VlnPlot(
 70 |   object = C3N_01798,
 71 |   features = c('nCount_peaks', 'TSS.enrichment', 'blacklist_fraction', 'nucleosome_signal', 'FRiP'),
 72 |   pt.size = 0.1,
 73 |   ncol = 5
 74 | )
 75 | dev.off()
 76 | 
 77 | 
 78 | pdf("C3N_01798_WT_ATAC_TSS_Enrichment.pdf", height=5, width=6)
 79 | C3N_01798$high.tss <- ifelse(C3N_01798$TSS.enrichment > 1.5, 'High', 'Low')
 80 | TSSPlot(C3N_01798, group.by = 'high.tss') + NoLegend()
 81 | dev.off()
 82 | 
 83 | pdf("C3N_01798_WT_ATAC_Nucleosome_signal.pdf", height=5, width=6)
 84 | C3N_01798$nucleosome_group <- ifelse(C3N_01798$nucleosome_signal > 2, 'NS > 2', 'NS < 2')
 85 | FragmentHistogram(object = C3N_01798, group.by = 'nucleosome_group')
 86 | dev.off()
 87 | 
 88 | 
 89 | C3N_01798 <- subset(
 90 |   x = C3N_01798,
 91 |   subset = nCount_peaks > 350 &
 92 |     nCount_peaks < 25000 &
 93 |     FRiP > 0.15 &
 94 |     blacklist_fraction < 0.05 &
 95 |     nucleosome_signal < 2 &
 96 |     TSS.enrichment > 1.5
 97 | )
 98 | 
 99 | 
100 | pdf("C3N_01798_WT_ATAC_QC_AF.pdf", height=5, width=12)
101 | VlnPlot(
102 |   object = C3N_01798,
103 |   features = c('nCount_peaks', 'TSS.enrichment', 'blacklist_fraction', 'nucleosome_signal', 'FRiP'),
104 |   pt.size = 0.1,
105 |   ncol = 5
106 | )
107 | dev.off()
108 | 
109 | 
110 | DefaultAssay(C3N_01798) <- "peaks"
111 | 
112 | 
113 | C3N_01798 <- RunTFIDF(C3N_01798)
114 | C3N_01798 <- FindTopFeatures(C3N_01798, min.cutoff = 'q0')
115 | C3N_01798 <- RunSVD(C3N_01798)
116 | 
117 | pdf("C3N_01798_WT_ATAC_DepthCor.pdf", height=5, width=9)
118 | DepthCor(C3N_01798)
119 | dev.off()
120 | 
121 | 
122 | C3N_01798 <- RunUMAP(object = C3N_01798, reduction = 'lsi', dims = 2:30)
123 | C3N_01798 <- FindNeighbors(object = C3N_01798, reduction = 'lsi', dims = 2:30)
124 | C3N_01798 <- FindClusters(object = C3N_01798, verbose = FALSE, algorithm = 3)
125 | 
126 | 
127 | pdf("C3N_01798_WT_ATAC_UMAP.pdf", height=5, width=7)
128 | DimPlot(object = C3N_01798, label = TRUE) + NoLegend()
129 | dev.off()
130 | 
131 | 
132 | gene.activities <- GeneActivity(C3N_01798)
133 | 
134 | 
135 | C3N_01798[['RNA']] <- CreateAssayObject(counts = gene.activities)
136 | C3N_01798 <- NormalizeData(
137 |   object = C3N_01798,
138 |   assay = 'RNA',
139 |   normalization.method = 'RC',
140 |   scale.factor = median(C3N_01798$nCount_RNA)
141 | )
142 | 
143 | write.table(gene.activities, file="C3N_01798_WT_ATAC_Gene_Activities.txt", sep="\t", col.names=NA, quote=FALSE)
144 | 
145 | saveRDS(C3N_01798, file="C3N_01798_WT_ATAC.rds")
146 | 


--------------------------------------------------------------------------------
/scATAC-Seq Analyses/11- Processing GBM C3N_01818  snATAC-Seq library:
--------------------------------------------------------------------------------
  1 | library(Signac)
  2 | library(Seurat)
  3 | library(EnsDb.Hsapiens.v86)
  4 | library(BSgenome.Hsapiens.UCSC.hg38)
  5 | 
  6 | options(future.globals.maxSize = 8000 * 1024^2)
  7 | 
  8 | C3N_01818.data=CreateFragmentObject("./C3N-01818_CPT0168270014_snATAC_GBM/outs/fragments.tsv.gz")
  9 | 
 10 | 
 11 | features <- CallPeaks(C3N_01818.data, format='BED', outdir ='./C3N-01818_CPT0168270014_snATAC_GBM/outs/',name='C3N_01818', cleanup=FALSE)
 12 | 
 13 | 
 14 | C3N_01818_Peaks <- keepStandardChromosomes(features, pruning.mode = "coarse")
 15 | C3N_01818_Peaks2 <- subsetByOverlaps(x = C3N_01818_Peaks, ranges = blacklist_hg38, invert = TRUE)
 16 | 
 17 | saveRDS(C3N_01818_Peaks2, file="C3N_01818_Peaks.rds")
 18 | 
 19 | C3N_01818.counts <- FeatureMatrix(
 20 |   fragments = C3N_01818.data,
 21 |   features = C3N_01818_Peaks
 22 | )
 23 | 
 24 | 
 25 | C3N_01818.chrom_assay <- CreateChromatinAssay(
 26 |   counts = C3N_01818.counts,
 27 |   sep = c("-", "-"),
 28 |   fragments = './C3N-01818_CPT0168270014_snATAC_GBM/outs/fragments.tsv.gz',
 29 |   min.cells = 10,
 30 |   min.features = 200
 31 | )
 32 | 
 33 | C3N_01818 <- CreateSeuratObject(counts = C3N_01818.chrom_assay, assay = "peaks", project = "C3N_01818")
 34 | 
 35 | annotations.C3N_01818 <- GetGRangesFromEnsDb(ensdb = EnsDb.Hsapiens.v86)
 36 | seqlevels(annotations.C3N_01818) <- paste0('chr', seqlevels(annotations.C3N_01818))
 37 | genome(annotations.C3N_01818) <- "hg38"
 38 | 
 39 | 
 40 | Annotation(C3N_01818) <- annotations.C3N_01818
 41 | 
 42 | C3N_01818 <- NucleosomeSignal(object = C3N_01818)
 43 | C3N_01818 <- TSSEnrichment(object = C3N_01818, fast = FALSE, assay='peaks')
 44 | 
 45 | C3N_01818$blacklist_fraction <- FractionCountsInRegion(
 46 |   object = C3N_01818, 
 47 |   assay = 'peaks',
 48 |   regions = blacklist_hg38
 49 | )
 50 | 
 51 | 
 52 | total_fragments <- CountFragments("./C3N-01818_CPT0168270014_snATAC_GBM/outs/fragments.tsv.gz")
 53 | rownames(total_fragments) <- total_fragments$CB
 54 | C3N_01818 $fragments <- total_fragments[colnames(C3N_01818), "frequency_count"]
 55 | 
 56 | C3N_01818 <- FRiP(
 57 |   object = C3N_01818,
 58 |   assay = 'peaks',
 59 |   total.fragments = 'fragments'
 60 | )
 61 | 
 62 | 
 63 | pdf("C3N_01818_WT_ATAC_DensityScatter.pdf", height=5, width=9)
 64 | DensityScatter(C3N_01818, x = 'nCount_peaks', y = 'TSS.enrichment', log_x = TRUE, quantiles = TRUE)
 65 | dev.off()
 66 | 
 67 | 
 68 | pdf("C3N_01818_WT_ATAC_QC_BF.pdf", height=5, width=12)
 69 | VlnPlot(
 70 |   object = C3N_01818,
 71 |   features = c('nCount_peaks', 'TSS.enrichment', 'blacklist_fraction', 'nucleosome_signal', 'FRiP'),
 72 |   pt.size = 0.1,
 73 |   ncol = 5
 74 | )
 75 | dev.off()
 76 | 
 77 | 
 78 | pdf("C3N_01818_WT_ATAC_TSS_Enrichment.pdf", height=5, width=6)
 79 | C3N_01818$high.tss <- ifelse(C3N_01818$TSS.enrichment > 1.5, 'High', 'Low')
 80 | TSSPlot(C3N_01818, group.by = 'high.tss') + NoLegend()
 81 | dev.off()
 82 | 
 83 | pdf("C3N_01818_WT_ATAC_Nucleosome_signal.pdf", height=5, width=6)
 84 | C3N_01818$nucleosome_group <- ifelse(C3N_01818$nucleosome_signal > 2, 'NS > 2', 'NS < 2')
 85 | FragmentHistogram(object = C3N_01818, group.by = 'nucleosome_group')
 86 | dev.off()
 87 | 
 88 | 
 89 | C3N_01818 <- subset(
 90 |   x = C3N_01818,
 91 |   subset = nCount_peaks > 350 &
 92 |     nCount_peaks < 20000 &
 93 |     FRiP > 0.15 &
 94 |     blacklist_fraction < 0.05 &
 95 |     nucleosome_signal < 2 &
 96 |     TSS.enrichment > 1.5
 97 | )
 98 | 
 99 | 
100 | pdf("C3N_01818_WT_ATAC_QC_AF.pdf", height=5, width=12)
101 | VlnPlot(
102 |   object = C3N_01818,
103 |   features = c('nCount_peaks', 'TSS.enrichment', 'blacklist_fraction', 'nucleosome_signal', 'FRiP'),
104 |   pt.size = 0.1,
105 |   ncol = 5
106 | )
107 | dev.off()
108 | 
109 | 
110 | DefaultAssay(C3N_01818) <- "peaks"
111 | 
112 | 
113 | C3N_01818 <- RunTFIDF(C3N_01818)
114 | C3N_01818 <- FindTopFeatures(C3N_01818, min.cutoff = 'q0')
115 | C3N_01818 <- RunSVD(C3N_01818)
116 | 
117 | pdf("C3N_01818_WT_ATAC_DepthCor.pdf", height=5, width=9)
118 | DepthCor(C3N_01818)
119 | dev.off()
120 | 
121 | 
122 | C3N_01818 <- RunUMAP(object = C3N_01818, reduction = 'lsi', dims = 2:30)
123 | C3N_01818 <- FindNeighbors(object = C3N_01818, reduction = 'lsi', dims = 2:30)
124 | C3N_01818 <- FindClusters(object = C3N_01818, verbose = FALSE, algorithm = 3)
125 | 
126 | 
127 | pdf("C3N_01818_WT_ATAC_UMAP.pdf", height=5, width=7)
128 | DimPlot(object = C3N_01818, label = TRUE) + NoLegend()
129 | dev.off()
130 | 
131 | 
132 | gene.activities <- GeneActivity(C3N_01818)
133 | 
134 | 
135 | C3N_01818[['RNA']] <- CreateAssayObject(counts = gene.activities)
136 | C3N_01818 <- NormalizeData(
137 |   object = C3N_01818,
138 |   assay = 'RNA',
139 |   normalization.method = 'RC',
140 |   scale.factor = median(C3N_01818$nCount_RNA)
141 | )
142 | 
143 | write.table(gene.activities, file="C3N_01818_WT_ATAC_Gene_Activities.txt", sep="\t", col.names=NA, quote=FALSE)
144 | 
145 | saveRDS(C3N_01818, file="C3N_01818_WT_ATAC.rds")
146 | 


--------------------------------------------------------------------------------
/scATAC-Seq Analyses/12- Processing GBM C3N_02181  snATAC-Seq library:
--------------------------------------------------------------------------------
  1 | library(Signac)
  2 | library(Seurat)
  3 | library(EnsDb.Hsapiens.v86)
  4 | library(BSgenome.Hsapiens.UCSC.hg38)
  5 | 
  6 | options(future.globals.maxSize = 8000 * 1024^2)
  7 | 
  8 | C3N_02181.data=CreateFragmentObject("./C3N-02181_CPT0168380014_snATAC_GBM/outs/fragments.tsv.gz")
  9 | 
 10 | 
 11 | features <- CallPeaks(C3N_02181.data, format='BED', outdir ='./C3N-02181_CPT0168380014_snATAC_GBM/outs/',name='C3N_02181', cleanup=FALSE)
 12 | 
 13 | 
 14 | C3N_02181_Peaks <- keepStandardChromosomes(features, pruning.mode = "coarse")
 15 | C3N_02181_Peaks2 <- subsetByOverlaps(x = C3N_02181_Peaks, ranges = blacklist_hg38, invert = TRUE)
 16 | 
 17 | saveRDS(C3N_02181_Peaks2, file="C3N_02181_Peaks.rds")
 18 | 
 19 | C3N_02181.counts <- FeatureMatrix(
 20 |   fragments = C3N_02181.data,
 21 |   features = C3N_02181_Peaks
 22 | )
 23 | 
 24 | 
 25 | C3N_02181.chrom_assay <- CreateChromatinAssay(
 26 |   counts = C3N_02181.counts,
 27 |   sep = c("-", "-"),
 28 |   fragments = './C3N-02181_CPT0168380014_snATAC_GBM/outs/fragments.tsv.gz',
 29 |   min.cells = 10,
 30 |   min.features = 200
 31 | )
 32 | 
 33 | C3N_02181 <- CreateSeuratObject(counts = C3N_02181.chrom_assay, assay = "peaks", project = "C3N_02181")
 34 | 
 35 | annotations.C3N_02181 <- GetGRangesFromEnsDb(ensdb = EnsDb.Hsapiens.v86)
 36 | seqlevels(annotations.C3N_02181) <- paste0('chr', seqlevels(annotations.C3N_02181))
 37 | genome(annotations.C3N_02181) <- "hg38"
 38 | 
 39 | 
 40 | Annotation(C3N_02181) <- annotations.C3N_02181
 41 | 
 42 | C3N_02181 <- NucleosomeSignal(object = C3N_02181)
 43 | C3N_02181 <- TSSEnrichment(object = C3N_02181, fast = FALSE, assay='peaks')
 44 | 
 45 | C3N_02181$blacklist_fraction <- FractionCountsInRegion(
 46 |   object = C3N_02181, 
 47 |   assay = 'peaks',
 48 |   regions = blacklist_hg38
 49 | )
 50 | 
 51 | 
 52 | total_fragments <- CountFragments("./C3N-02181_CPT0168380014_snATAC_GBM/outs/fragments.tsv.gz")
 53 | rownames(total_fragments) <- total_fragments$CB
 54 | C3N_02181 $fragments <- total_fragments[colnames(C3N_02181), "frequency_count"]
 55 | 
 56 | C3N_02181 <- FRiP(
 57 |   object = C3N_02181,
 58 |   assay = 'peaks',
 59 |   total.fragments = 'fragments'
 60 | )
 61 | 
 62 | 
 63 | pdf("C3N_02181_WT_ATAC_DensityScatter.pdf", height=5, width=9)
 64 | DensityScatter(C3N_02181, x = 'nCount_peaks', y = 'TSS.enrichment', log_x = TRUE, quantiles = TRUE)
 65 | dev.off()
 66 | 
 67 | 
 68 | pdf("C3N_02181_WT_ATAC_QC_BF.pdf", height=5, width=12)
 69 | VlnPlot(
 70 |   object = C3N_02181,
 71 |   features = c('nCount_peaks', 'TSS.enrichment', 'blacklist_fraction', 'nucleosome_signal', 'FRiP'),
 72 |   pt.size = 0.1,
 73 |   ncol = 5
 74 | )
 75 | dev.off()
 76 | 
 77 | 
 78 | pdf("C3N_02181_WT_ATAC_TSS_Enrichment.pdf", height=5, width=6)
 79 | C3N_02181$high.tss <- ifelse(C3N_02181$TSS.enrichment > 1.5, 'High', 'Low')
 80 | TSSPlot(C3N_02181, group.by = 'high.tss') + NoLegend()
 81 | dev.off()
 82 | 
 83 | pdf("C3N_02181_WT_ATAC_Nucleosome_signal.pdf", height=5, width=6)
 84 | C3N_02181$nucleosome_group <- ifelse(C3N_02181$nucleosome_signal > 2, 'NS > 2', 'NS < 2')
 85 | FragmentHistogram(object = C3N_02181, group.by = 'nucleosome_group')
 86 | dev.off()
 87 | 
 88 | 
 89 | C3N_02181 <- subset(
 90 |   x = C3N_02181,
 91 |   subset = nCount_peaks > 350 &
 92 |     nCount_peaks < 4000 &
 93 |     FRiP > 0.15 &
 94 |     blacklist_fraction < 0.05 &
 95 |     nucleosome_signal < 2 &
 96 |     TSS.enrichment > 1.5
 97 | )
 98 | 
 99 | 
100 | pdf("C3N_02181_WT_ATAC_QC_AF.pdf", height=5, width=12)
101 | VlnPlot(
102 |   object = C3N_02181,
103 |   features = c('nCount_peaks', 'TSS.enrichment', 'blacklist_fraction', 'nucleosome_signal', 'FRiP'),
104 |   pt.size = 0.1,
105 |   ncol = 5
106 | )
107 | dev.off()
108 | 
109 | 
110 | DefaultAssay(C3N_02181) <- "peaks"
111 | 
112 | 
113 | C3N_02181 <- RunTFIDF(C3N_02181)
114 | C3N_02181 <- FindTopFeatures(C3N_02181, min.cutoff = 'q0')
115 | C3N_02181 <- RunSVD(C3N_02181)
116 | 
117 | pdf("C3N_02181_WT_ATAC_DepthCor.pdf", height=5, width=9)
118 | DepthCor(C3N_02181)
119 | dev.off()
120 | 
121 | 
122 | C3N_02181 <- RunUMAP(object = C3N_02181, reduction = 'lsi', dims = 2:30)
123 | C3N_02181 <- FindNeighbors(object = C3N_02181, reduction = 'lsi', dims = 2:30)
124 | C3N_02181 <- FindClusters(object = C3N_02181, verbose = FALSE, algorithm = 3)
125 | 
126 | 
127 | pdf("C3N_02181_WT_ATAC_UMAP.pdf", height=5, width=7)
128 | DimPlot(object = C3N_02181, label = TRUE) + NoLegend()
129 | dev.off()
130 | 
131 | 
132 | gene.activities <- GeneActivity(C3N_02181)
133 | 
134 | 
135 | C3N_02181[['RNA']] <- CreateAssayObject(counts = gene.activities)
136 | C3N_02181 <- NormalizeData(
137 |   object = C3N_02181,
138 |   assay = 'RNA',
139 |   normalization.method = 'RC',
140 |   scale.factor = median(C3N_02181$nCount_RNA)
141 | )
142 | 
143 | write.table(gene.activities, file="C3N_02181_WT_ATAC_Gene_Activities.txt", sep="\t", col.names=NA, quote=FALSE)
144 | 
145 | saveRDS(C3N_02181, file="C3N_02181_WT_ATAC.rds")
146 | 


--------------------------------------------------------------------------------
/scATAC-Seq Analyses/13- Processing GBM C3N_02186  snATAC-Seq library:
--------------------------------------------------------------------------------
  1 | library(Signac)
  2 | library(Seurat)
  3 | library(EnsDb.Hsapiens.v86)
  4 | library(BSgenome.Hsapiens.UCSC.hg38)
  5 | 
  6 | options(future.globals.maxSize = 8000 * 1024^2)
  7 | 
  8 | C3N_02186.data=CreateFragmentObject("./C3N-02186_CPT0168720014_snATAC_GBM/outs/fragments.tsv.gz")
  9 | 
 10 | 
 11 | features <- CallPeaks(C3N_02186.data, format='BED', outdir ='./C3N-02186_CPT0168720014_snATAC_GBM/outs/',name='C3N_02186', cleanup=FALSE)
 12 | 
 13 | 
 14 | C3N_02186_Peaks <- keepStandardChromosomes(features, pruning.mode = "coarse")
 15 | C3N_02186_Peaks2 <- subsetByOverlaps(x = C3N_02186_Peaks, ranges = blacklist_hg38, invert = TRUE)
 16 | 
 17 | saveRDS(C3N_02186_Peaks2, file="C3N_02186_Peaks.rds")
 18 | 
 19 | C3N_02186.counts <- FeatureMatrix(
 20 |   fragments = C3N_02186.data,
 21 |   features = C3N_02186_Peaks
 22 | )
 23 | 
 24 | 
 25 | C3N_02186.chrom_assay <- CreateChromatinAssay(
 26 |   counts = C3N_02186.counts,
 27 |   sep = c("-", "-"),
 28 |   fragments = './C3N-02186_CPT0168720014_snATAC_GBM/outs/fragments.tsv.gz',
 29 |   min.cells = 10,
 30 |   min.features = 200
 31 | )
 32 | 
 33 | C3N_02186 <- CreateSeuratObject(counts = C3N_02186.chrom_assay, assay = "peaks", project = "C3N_02186")
 34 | 
 35 | annotations.C3N_02186 <- GetGRangesFromEnsDb(ensdb = EnsDb.Hsapiens.v86)
 36 | seqlevels(annotations.C3N_02186) <- paste0('chr', seqlevels(annotations.C3N_02186))
 37 | genome(annotations.C3N_02186) <- "hg38"
 38 | 
 39 | 
 40 | Annotation(C3N_02186) <- annotations.C3N_02186
 41 | 
 42 | C3N_02186 <- NucleosomeSignal(object = C3N_02186)
 43 | C3N_02186 <- TSSEnrichment(object = C3N_02186, fast = FALSE, assay='peaks')
 44 | 
 45 | C3N_02186$blacklist_fraction <- FractionCountsInRegion(
 46 |   object = C3N_02186, 
 47 |   assay = 'peaks',
 48 |   regions = blacklist_hg38
 49 | )
 50 | 
 51 | 
 52 | total_fragments <- CountFragments("./C3N-02186_CPT0168720014_snATAC_GBM/outs/fragments.tsv.gz")
 53 | rownames(total_fragments) <- total_fragments$CB
 54 | C3N_02186$fragments <- total_fragments[colnames(C3N_02186), "frequency_count"]
 55 | 
 56 | C3N_02186 <- FRiP(
 57 |   object = C3N_02186,
 58 |   assay = 'peaks',
 59 |   total.fragments = 'fragments'
 60 | )
 61 | 
 62 | 
 63 | pdf("C3N_02186_WT_ATAC_DensityScatter.pdf", height=5, width=9)
 64 | DensityScatter(C3N_02186, x = 'nCount_peaks', y = 'TSS.enrichment', log_x = TRUE, quantiles = TRUE)
 65 | dev.off()
 66 | 
 67 | 
 68 | pdf("C3N_02186_WT_ATAC_QC_BF.pdf", height=5, width=12)
 69 | VlnPlot(
 70 |   object = C3N_02186,
 71 |   features = c('nCount_peaks', 'TSS.enrichment', 'blacklist_fraction', 'nucleosome_signal', 'FRiP'),
 72 |   pt.size = 0.1,
 73 |   ncol = 5
 74 | )
 75 | dev.off()
 76 | 
 77 | 
 78 | pdf("C3N_02186_WT_ATAC_TSS_Enrichment.pdf", height=5, width=6)
 79 | C3N_02186$high.tss <- ifelse(C3N_02186$TSS.enrichment > 1.5, 'High', 'Low')
 80 | TSSPlot(C3N_02186, group.by = 'high.tss') + NoLegend()
 81 | dev.off()
 82 | 
 83 | pdf("C3N_02186_WT_ATAC_Nucleosome_signal.pdf", height=5, width=6)
 84 | C3N_02186$nucleosome_group <- ifelse(C3N_02186$nucleosome_signal > 2, 'NS > 2', 'NS < 2')
 85 | FragmentHistogram(object = C3N_02186, group.by = 'nucleosome_group')
 86 | dev.off()
 87 | 
 88 | 
 89 | C3N_02186 <- subset(
 90 |   x = C3N_02186,
 91 |   subset = nCount_peaks > 350 &
 92 |     nCount_peaks < 10000 &
 93 |     FRiP > 0.15 &
 94 |     blacklist_fraction < 0.05 &
 95 |     nucleosome_signal < 2.5 &
 96 |     TSS.enrichment > 1.5
 97 | )
 98 | 
 99 | 
100 | pdf("C3N_02186_WT_ATAC_QC_AF.pdf", height=5, width=12)
101 | VlnPlot(
102 |   object = C3N_02186,
103 |   features = c('nCount_peaks', 'TSS.enrichment', 'blacklist_fraction', 'nucleosome_signal', 'FRiP'),
104 |   pt.size = 0.1,
105 |   ncol = 5
106 | )
107 | dev.off()
108 | 
109 | 
110 | DefaultAssay(C3N_02186) <- "peaks"
111 | 
112 | 
113 | C3N_02186 <- RunTFIDF(C3N_02186)
114 | C3N_02186 <- FindTopFeatures(C3N_02186, min.cutoff = 'q0')
115 | C3N_02186 <- RunSVD(C3N_02186)
116 | 
117 | pdf("C3N_02186_WT_ATAC_DepthCor.pdf", height=5, width=9)
118 | DepthCor(C3N_02186)
119 | dev.off()
120 | 
121 | 
122 | C3N_02186 <- RunUMAP(object = C3N_02186, reduction = 'lsi', dims = 2:30)
123 | C3N_02186 <- FindNeighbors(object = C3N_02186, reduction = 'lsi', dims = 2:30)
124 | C3N_02186 <- FindClusters(object = C3N_02186, verbose = FALSE, algorithm = 3)
125 | 
126 | 
127 | pdf("C3N_02186_WT_ATAC_UMAP.pdf", height=5, width=7)
128 | DimPlot(object = C3N_02186, label = TRUE) + NoLegend()
129 | dev.off()
130 | 
131 | 
132 | gene.activities <- GeneActivity(C3N_02186)
133 | 
134 | 
135 | C3N_02186[['RNA']] <- CreateAssayObject(counts = gene.activities)
136 | C3N_02186 <- NormalizeData(
137 |   object = C3N_02186,
138 |   assay = 'RNA',
139 |   normalization.method = 'RC',
140 |   scale.factor = median(C3N_02186$nCount_RNA)
141 | )
142 | 
143 | write.table(gene.activities, file="C3N_02186_WT_ATAC_Gene_Activities.txt", sep="\t", col.names=NA, quote=FALSE)
144 | 
145 | saveRDS(C3N_02186, file="C3N_02186_WT_ATAC.rds")
146 | 


--------------------------------------------------------------------------------
/scATAC-Seq Analyses/14- Processing GBM C3N_02188  snATAC-Seq library:
--------------------------------------------------------------------------------
  1 | library(Signac)
  2 | library(Seurat)
  3 | library(EnsDb.Hsapiens.v86)
  4 | library(BSgenome.Hsapiens.UCSC.hg38)
  5 | 
  6 | options(future.globals.maxSize = 8000 * 1024^2)
  7 | 
  8 | C3N_02188.data=CreateFragmentObject("./C3N-02188_CPT0168830014_snATAC_GBM/outs/fragments.tsv.gz")
  9 | 
 10 | 
 11 | features <- CallPeaks(C3N_02188.data, format='BED', outdir ='./C3N-02188_CPT0168830014_snATAC_GBM/outs/',name='C3N_02188', cleanup=FALSE)
 12 | 
 13 | 
 14 | C3N_02188_Peaks <- keepStandardChromosomes(features, pruning.mode = "coarse")
 15 | C3N_02188_Peaks2 <- subsetByOverlaps(x = C3N_02188_Peaks, ranges = blacklist_hg38, invert = TRUE)
 16 | 
 17 | saveRDS(C3N_02188_Peaks2, file="C3N_02188_Peaks.rds")
 18 | 
 19 | C3N_02188.counts <- FeatureMatrix(
 20 |   fragments = C3N_02188.data,
 21 |   features = C3N_02188_Peaks
 22 | )
 23 | 
 24 | 
 25 | C3N_02188.chrom_assay <- CreateChromatinAssay(
 26 |   counts = C3N_02188.counts,
 27 |   sep = c("-", "-"),
 28 |   fragments = './C3N-02188_CPT0168830014_snATAC_GBM/outs/fragments.tsv.gz',
 29 |   min.cells = 10,
 30 |   min.features = 200
 31 | )
 32 | 
 33 | C3N_02188 <- CreateSeuratObject(counts = C3N_02188.chrom_assay, assay = "peaks", project = "C3N_02188")
 34 | 
 35 | annotations.C3N_02188 <- GetGRangesFromEnsDb(ensdb = EnsDb.Hsapiens.v86)
 36 | seqlevels(annotations.C3N_02188) <- paste0('chr', seqlevels(annotations.C3N_02188))
 37 | genome(annotations.C3N_02188) <- "hg38"
 38 | 
 39 | 
 40 | Annotation(C3N_02188) <- annotations.C3N_02188
 41 | 
 42 | C3N_02188 <- NucleosomeSignal(object = C3N_02188)
 43 | C3N_02188 <- TSSEnrichment(object = C3N_02188, fast = FALSE, assay='peaks')
 44 | 
 45 | C3N_02188$blacklist_fraction <- FractionCountsInRegion(
 46 |   object = C3N_02188, 
 47 |   assay = 'peaks',
 48 |   regions = blacklist_hg38
 49 | )
 50 | 
 51 | 
 52 | total_fragments <- CountFragments("./C3N-02188_CPT0168830014_snATAC_GBM/outs/fragments.tsv.gz")
 53 | rownames(total_fragments) <- total_fragments$CB
 54 | C3N_02188 $fragments <- total_fragments[colnames(C3N_02188), "frequency_count"]
 55 | 
 56 | C3N_02188 <- FRiP(
 57 |   object = C3N_02188,
 58 |   assay = 'peaks',
 59 |   total.fragments = 'fragments'
 60 | )
 61 | 
 62 | 
 63 | pdf("C3N_02188_WT_ATAC_DensityScatter.pdf", height=5, width=9)
 64 | DensityScatter(C3N_02188, x = 'nCount_peaks', y = 'TSS.enrichment', log_x = TRUE, quantiles = TRUE)
 65 | dev.off()
 66 | 
 67 | 
 68 | pdf("C3N_02188_WT_ATAC_QC_BF.pdf", height=5, width=12)
 69 | VlnPlot(
 70 |   object = C3N_02188,
 71 |   features = c('nCount_peaks', 'TSS.enrichment', 'blacklist_fraction', 'nucleosome_signal', 'FRiP'),
 72 |   pt.size = 0.1,
 73 |   ncol = 5
 74 | )
 75 | dev.off()
 76 | 
 77 | 
 78 | pdf("C3N_02188_WT_ATAC_TSS_Enrichment.pdf", height=5, width=6)
 79 | C3N_02188$high.tss <- ifelse(C3N_02188$TSS.enrichment > 1.5, 'High', 'Low')
 80 | TSSPlot(C3N_02188, group.by = 'high.tss') + NoLegend()
 81 | dev.off()
 82 | 
 83 | pdf("C3N_02188_WT_ATAC_Nucleosome_signal.pdf", height=5, width=6)
 84 | C3N_02188$nucleosome_group <- ifelse(C3N_02188$nucleosome_signal > 5, 'NS > 5', 'NS < 5')
 85 | FragmentHistogram(object = C3N_02188, group.by = 'nucleosome_group')
 86 | dev.off()
 87 | 
 88 | 
 89 | C3N_02188 <- subset(
 90 |   x = C3N_02188,
 91 |   subset = nCount_peaks > 350 &
 92 |     nCount_peaks < 35000 &
 93 |     FRiP > 0.15 &
 94 |     blacklist_fraction < 0.05 &
 95 |     nucleosome_signal < 5 &
 96 |     TSS.enrichment > 1.5
 97 | )
 98 | 
 99 | 
100 | pdf("C3N_02188_WT_ATAC_QC_AF.pdf", height=5, width=12)
101 | VlnPlot(
102 |   object = C3N_02188,
103 |   features = c('nCount_peaks', 'TSS.enrichment', 'blacklist_fraction', 'nucleosome_signal', 'FRiP'),
104 |   pt.size = 0.1,
105 |   ncol = 5
106 | )
107 | dev.off()
108 | 
109 | 
110 | DefaultAssay(C3N_02188) <- "peaks"
111 | 
112 | 
113 | C3N_02188 <- RunTFIDF(C3N_02188)
114 | C3N_02188 <- FindTopFeatures(C3N_02188, min.cutoff = 'q0')
115 | C3N_02188 <- RunSVD(C3N_02188)
116 | 
117 | pdf("C3N_02188_WT_ATAC_DepthCor.pdf", height=5, width=9)
118 | DepthCor(C3N_02188)
119 | dev.off()
120 | 
121 | 
122 | C3N_02188 <- RunUMAP(object = C3N_02188, reduction = 'lsi', dims = 2:30)
123 | C3N_02188 <- FindNeighbors(object = C3N_02188, reduction = 'lsi', dims = 2:30)
124 | C3N_02188 <- FindClusters(object = C3N_02188, verbose = FALSE, algorithm = 3)
125 | 
126 | 
127 | pdf("C3N_02188_WT_ATAC_UMAP.pdf", height=5, width=7)
128 | DimPlot(object = C3N_02188, label = TRUE) + NoLegend()
129 | dev.off()
130 | 
131 | 
132 | gene.activities <- GeneActivity(C3N_02188)
133 | 
134 | 
135 | C3N_02188[['RNA']] <- CreateAssayObject(counts = gene.activities)
136 | C3N_02188 <- NormalizeData(
137 |   object = C3N_02188,
138 |   assay = 'RNA',
139 |   normalization.method = 'RC',
140 |   scale.factor = median(C3N_02188$nCount_RNA)
141 | )
142 | 
143 | write.table(gene.activities, file="C3N_02188_WT_ATAC_Gene_Activities.txt", sep="\t", col.names=NA, quote=FALSE)
144 | 
145 | saveRDS(C3N_02188, file="C3N_02188_WT_ATAC.rds")
146 | 


--------------------------------------------------------------------------------
/scATAC-Seq Analyses/16- Processing GBM C3N_02783  snATAC-Seq library:
--------------------------------------------------------------------------------
  1 | library(Signac)
  2 | library(Seurat)
  3 | library(EnsDb.Hsapiens.v86)
  4 | library(BSgenome.Hsapiens.UCSC.hg38)
  5 | 
  6 | options(future.globals.maxSize = 8000 * 1024^2)
  7 | 
  8 | C3N_02783.data=CreateFragmentObject("./C3N-02783_CPT0205890014_snATAC_GBM/outs/fragments.tsv.gz")
  9 | 
 10 | 
 11 | features <- CallPeaks(C3N_02783.data, format='BED', outdir ='./C3N-02783_CPT0205890014_snATAC_GBM/outs/',name='C3N_02783', cleanup=FALSE)
 12 | 
 13 | 
 14 | C3N_02783_Peaks <- keepStandardChromosomes(features, pruning.mode = "coarse")
 15 | C3N_02783_Peaks2 <- subsetByOverlaps(x = C3N_02783_Peaks, ranges = blacklist_hg38, invert = TRUE)
 16 | 
 17 | saveRDS(C3N_02783_Peaks2, file="C3N_02783_Peaks.rds")
 18 | 
 19 | C3N_02783.counts <- FeatureMatrix(
 20 |   fragments = C3N_02783.data,
 21 |   features = C3N_02783_Peaks
 22 | )
 23 | 
 24 | 
 25 | C3N_02783.chrom_assay <- CreateChromatinAssay(
 26 |   counts = C3N_02783.counts,
 27 |   sep = c("-", "-"),
 28 |   fragments = './C3N-02783_CPT0205890014_snATAC_GBM/outs/fragments.tsv.gz',
 29 |   min.cells = 10,
 30 |   min.features = 200
 31 | )
 32 | 
 33 | C3N_02783 <- CreateSeuratObject(counts = C3N_02783.chrom_assay, assay = "peaks", project = "C3N_02783")
 34 | 
 35 | annotations.C3N_02783 <- GetGRangesFromEnsDb(ensdb = EnsDb.Hsapiens.v86)
 36 | seqlevels(annotations.C3N_02783) <- paste0('chr', seqlevels(annotations.C3N_02783))
 37 | genome(annotations.C3N_02783) <- "hg38"
 38 | 
 39 | 
 40 | Annotation(C3N_02783) <- annotations.C3N_02783
 41 | 
 42 | C3N_02783 <- NucleosomeSignal(object = C3N_02783)
 43 | C3N_02783 <- TSSEnrichment(object = C3N_02783, fast = FALSE, assay='peaks')
 44 | 
 45 | C3N_02783$blacklist_fraction <- FractionCountsInRegion(
 46 |   object = C3N_02783, 
 47 |   assay = 'peaks',
 48 |   regions = blacklist_hg38
 49 | )
 50 | 
 51 | 
 52 | total_fragments <- CountFragments("./C3N-02783_CPT0205890014_snATAC_GBM/outs/fragments.tsv.gz")
 53 | rownames(total_fragments) <- total_fragments$CB
 54 | C3N_02783 $fragments <- total_fragments[colnames(C3N_02783), "frequency_count"]
 55 | 
 56 | C3N_02783 <- FRiP(
 57 |   object = C3N_02783,
 58 |   assay = 'peaks',
 59 |   total.fragments = 'fragments'
 60 | )
 61 | 
 62 | 
 63 | pdf("C3N_02783_WT_ATAC_DensityScatter.pdf", height=5, width=9)
 64 | DensityScatter(C3N_02783, x = 'nCount_peaks', y = 'TSS.enrichment', log_x = TRUE, quantiles = TRUE)
 65 | dev.off()
 66 | 
 67 | 
 68 | pdf("C3N_02783_WT_ATAC_QC_BF.pdf", height=5, width=12)
 69 | VlnPlot(
 70 |   object = C3N_02783,
 71 |   features = c('nCount_peaks', 'TSS.enrichment', 'blacklist_fraction', 'nucleosome_signal', 'FRiP'),
 72 |   pt.size = 0.1,
 73 |   ncol = 5
 74 | )
 75 | dev.off()
 76 | 
 77 | 
 78 | pdf("C3N_02783_WT_ATAC_TSS_Enrichment.pdf", height=5, width=6)
 79 | C3N_02783$high.tss <- ifelse(C3N_02783$TSS.enrichment > 1.5, 'High', 'Low')
 80 | TSSPlot(C3N_02783, group.by = 'high.tss') + NoLegend()
 81 | dev.off()
 82 | 
 83 | pdf("C3N_02783_WT_ATAC_Nucleosome_signal.pdf", height=5, width=6)
 84 | C3N_02783$nucleosome_group <- ifelse(C3N_02783$nucleosome_signal > 2, 'NS > 2', 'NS < 2')
 85 | FragmentHistogram(object = C3N_02783, group.by = 'nucleosome_group')
 86 | dev.off()
 87 | 
 88 | 
 89 | C3N_02783 <- subset(
 90 |   x = C3N_02783,
 91 |   subset = nCount_peaks > 350 &
 92 |     nCount_peaks < 40000 &
 93 |     FRiP > 0.15 &
 94 |     blacklist_fraction < 0.05 &
 95 |     nucleosome_signal < 2 &
 96 |     TSS.enrichment > 1.5
 97 | )
 98 | 
 99 | 
100 | pdf("C3N_02783_WT_ATAC_QC_AF.pdf", height=5, width=12)
101 | VlnPlot(
102 |   object = C3N_02783,
103 |   features = c('nCount_peaks', 'TSS.enrichment', 'blacklist_fraction', 'nucleosome_signal', 'FRiP'),
104 |   pt.size = 0.1,
105 |   ncol = 5
106 | )
107 | dev.off()
108 | 
109 | 
110 | DefaultAssay(C3N_02783) <- "peaks"
111 | 
112 | 
113 | C3N_02783 <- RunTFIDF(C3N_02783)
114 | C3N_02783 <- FindTopFeatures(C3N_02783, min.cutoff = 'q0')
115 | C3N_02783 <- RunSVD(C3N_02783)
116 | 
117 | pdf("C3N_02783_WT_ATAC_DepthCor.pdf", height=5, width=9)
118 | DepthCor(C3N_02783)
119 | dev.off()
120 | 
121 | 
122 | C3N_02783 <- RunUMAP(object = C3N_02783, reduction = 'lsi', dims = 2:30)
123 | C3N_02783 <- FindNeighbors(object = C3N_02783, reduction = 'lsi', dims = 2:30)
124 | C3N_02783 <- FindClusters(object = C3N_02783, verbose = FALSE, algorithm = 3)
125 | 
126 | 
127 | pdf("C3N_02783_WT_ATAC_UMAP.pdf", height=5, width=7)
128 | DimPlot(object = C3N_02783, label = TRUE) + NoLegend()
129 | dev.off()
130 | 
131 | 
132 | gene.activities <- GeneActivity(C3N_02783)
133 | 
134 | 
135 | C3N_02783[['RNA']] <- CreateAssayObject(counts = gene.activities)
136 | C3N_02783 <- NormalizeData(
137 |   object = C3N_02783,
138 |   assay = 'RNA',
139 |   normalization.method = 'RC',
140 |   scale.factor = median(C3N_02783$nCount_RNA)
141 | )
142 | 
143 | write.table(gene.activities, file="C3N_02783_WT_ATAC_Gene_Activities.txt", sep="\t", col.names=NA, quote=FALSE)
144 | 
145 | saveRDS(C3N_02783, file="C3N_02783_WT_ATAC.rds")
146 | 


--------------------------------------------------------------------------------
/scATAC-Seq Analyses/22- Calculating all cell types programs usages in the combined gene activities matrix:
--------------------------------------------------------------------------------
 1 | ############### Python Scripts #####################
 2 | 
 3 | X = pd.read_table("Combined_snATAC_Glioma_Ding_Dataset_Gene_Activities.txt", index_col=0, sep='\t')
 4 | 
 5 | X2 = X.T
 6 | 
 7 | H = np.load("cnmf_run.spectra.k_18.dt_0_015.consensus.df.npz", allow_pickle=True)
 8 | 
 9 | H2 = pd.DataFrame(H['data'], columns = H['columns'], index = H['index'])
10 | 
11 | H3 = H2.filter(items = X2.columns)
12 |       
13 | X4 = X2.filter(items = H3.columns)
14 | 
15 | H4 = H3.to_numpy()
16 | 
17 | X5 = X4.values
18 |       
19 | X6 = X5.astype(np.float64)
20 | 
21 | test = sklearn.decomposition.non_negative_factorization(X6, W=None, H=H4, n_components= 18, init='random', update_H=False, solver='cd', beta_loss='frobenius', tol=0.0001, max_iter=1000, alpha=0.0, alpha_W=0.0, alpha_H='same', l1_ratio=0.0, regularization=None, random_state=None, verbose=0, shuffle=False)
22 | 
23 | test2 = list(test)
24 | 
25 | processed = pd.DataFrame(test2[0], columns= H['index'], index=X4.index)
26 | 
27 | row_sums = processed.sum(axis=1)
28 | 
29 | processed_data = (processed.div(row_sums, axis=0) * 100)
30 |       
31 | new_column_names = ['Tcells', 'AC', 'NPC1_OPC', 'Microglia', 'MES2', 'Vascular_MES1', 'Oligodendrocytes', 'MES1', 'CD14_Mono', 'cDC', 'Neutrophils', 'NPC2', 'Giant_Cell_GBM', 'Cycling', 'Pericytes', 'Plasma', 'Endothelial', 'Mast']
32 | 
33 | processed_data.columns = new_column_names
34 | 
35 | processed_data.to_csv(path_or_buf="./Combined_snATAC_Glioma_Ding_Dataset_All_CellType_Usages.txt", sep="\t", quoting=csv.QUOTE_NONE)
36 | 


--------------------------------------------------------------------------------
/scATAC-Seq Analyses/23- Extracting Myeloid Cells from the Combined snATAC Object and Calculating Gene Activities:
--------------------------------------------------------------------------------
 1 | library(Signac)
 2 | library(Seurat)
 3 | library(EnsDb.Hsapiens.v86)
 4 | library(BSgenome.Hsapiens.UCSC.hg38)
 5 | library(JASPAR2020)
 6 | library(TFBSTools)
 7 | 
 8 | options(future.globals.maxSize = 8000 * 1024^2)
 9 | 
10 | annotation <- GetGRangesFromEnsDb(ensdb = EnsDb.Hsapiens.v86)
11 | seqlevels(annotation) <- paste0('chr', seqlevels(annotation))
12 | genome(annotation) <- "hg38"
13 | 
14 | combined <- readRDS("Combined_snATAC_Glioma_ATAC.rds")
15 | 
16 | 
17 | Myeloid <- subset(combined, subset = Annotation == "Myeloid")
18 | 
19 | 
20 | Myeloid$allcells <- "all"
21 | 
22 | Discrete_Peaks <- CallPeaks(Myeloid, group.by = "allcells", format='BED', outdir ='.', fragment.tempdir=".", name='Myeloid_Ding', cleanup=FALSE)
23 | 
24 | Discrete_Peaks2 <- keepStandardChromosomes(Discrete_Peaks, pruning.mode = "coarse")
25 | Discrete_Peaks2 <- subsetByOverlaps(x = Discrete_Peaks2, ranges = blacklist_hg38, invert = TRUE)
26 | 
27 | 
28 | macs2_counts <- FeatureMatrix(
29 |   fragments = Fragments(Myeloid),
30 |   features = Discrete_Peaks2,
31 |   cells = colnames(Myeloid)
32 | )
33 | 
34 | Myeloid[["peaks_macs2"]] <- CreateChromatinAssay(
35 |     counts = macs2_counts,
36 |     fragments = Fragments(Myeloid),
37 |     annotation = annotation
38 | )
39 | 
40 | DefaultAssay(Myeloid) <- "peaks_macs2"
41 | 
42 | Myeloid <- RunTFIDF(Myeloid)
43 | Myeloid <- FindTopFeatures(Myeloid, min.cutoff = 'q0')
44 | Myeloid <- RunSVD(Myeloid)
45 | 
46 | pdf("Myeloid_Myeloid_Ding_ATAC_DepthCor.pdf", height=5, width=9)
47 | DepthCor(Myeloid)
48 | dev.off()
49 | 
50 | 
51 | Myeloid <- RunUMAP(object = Myeloid, reduction = 'lsi', dims = 2:30)
52 | Myeloid <- FindNeighbors(object = Myeloid, reduction = 'lsi', dims = 2:30)
53 | Myeloid <- FindClusters(object = Myeloid, verbose = FALSE, algorithm = 3)
54 | 
55 | 
56 | pdf("Myeloid_Myeloid_Ding_ATAC_UMAP_Clusters.pdf", height=5, width=7)
57 | DimPlot(object = Myeloid, label = TRUE)
58 | dev.off()
59 | 
60 | gene.activities <- GeneActivity(Myeloid)
61 | 
62 | 
63 | Myeloid[['RNA']] <- CreateAssayObject(counts = gene.activities)
64 | Myeloid <- NormalizeData(
65 |   object = combined,
66 |   assay = 'RNA',
67 |   normalization.method = 'RC',
68 |   scale.factor = median(Myeloid$nCount_RNA)
69 | )
70 | 
71 | write.table(gene.activities, file="Combined_Myeloid_snATAC_Glioma_Gene_Activities.txt", sep="\t", col.names=NA, quote=FALSE)
72 | 
73 | saveRDS(Myeloid, file="Ding_Dataset_Myeloid.rds")
74 | 
75 | 


--------------------------------------------------------------------------------
/scATAC-Seq Analyses/24- Calculating myeloid program usages in myeloid cells of the combined snATAC object:
--------------------------------------------------------------------------------
 1 | X = pd.read_table("Combined_Myeloid_snATAC_Glioma_Gene_Activities.txt", index_col=0, sep='\t')
 2 | 
 3 | X2 = X.T
 4 | 
 5 | H = pd.read_table("Myeloid_NMF_Average_Gene_Spectra.txt", sep="\t", index_col=0)
 6 | 
 7 | H2 = H.T
 8 | 
 9 | X3 = X2.filter(items = H2.columns)
10 | 
11 | H3 = H2.filter(items = X3.columns)
12 | 
13 | H4 = H3.to_numpy()
14 |       
15 | X4 = X3.values
16 | 
17 | X5 = X4.astype(np.float64)
18 | 
19 | test = sklearn.decomposition.non_negative_factorization(X5, W=None, H=H4, n_components= 14, init='random', update_H=False, solver='cd', beta_loss='frobenius', tol=0.0001, max_iter=1000, alpha=0.0, alpha_W=0.0, alpha_H='same', l1_ratio=0.0, regularization=None, random_state=None, verbose=0, shuffle=False)
20 | 
21 | test2 = list(test)
22 | 
23 | processed = pd.DataFrame(test2[0], columns= H.columns, index=X3.index)
24 | 
25 | row_sums = processed.sum(axis=1)
26 | 
27 | processed_data = (processed.div(row_sums, axis=0) * 100)
28 | 
29 | processed_data.to_csv(path_or_buf="./Combined_Myeloid_snATAC_Glioma_Ding_Dataset_Myeloid_Usages.txt", sep="\t", quoting=csv.QUOTE_NONE)
30 | 


--------------------------------------------------------------------------------
/scATAC-Seq Analyses/25- Extracting Discrete Myeloid Cells from the combined GBM snATAC object:
--------------------------------------------------------------------------------
 1 | library(Signac)
 2 | library(Seurat)
 3 | library(EnsDb.Hsapiens.v86)
 4 | library(BSgenome.Hsapiens.UCSC.hg38)
 5 | library(JASPAR2020)
 6 | library(TFBSTools)
 7 | 
 8 | options(future.globals.maxSize = 8000 * 1024^2)
 9 | 
10 | annotation <- GetGRangesFromEnsDb(ensdb = EnsDb.Hsapiens.v86)
11 | seqlevels(annotation) <- paste0('chr', seqlevels(annotation))
12 | genome(annotation) <- "hg38"
13 | 
14 | nnotation <- GetGRangesFromEnsDb(ensdb = EnsDb.Hsapiens.v86)
15 | seqlevels(annotation) <- paste0('chr', seqlevels(annotation))
16 | genome(annotation) <- "hg38"
17 | 
18 | combined <- readRDS("Combined_snATAC_Glioma_ATAC.rds")
19 | 
20 | ##### File containing the IDs of the discrete myeloid cells #########
21 | Myeloid_Discrete_Status4 <- scan("Myeloid_Discrete_Status_top400_WithIdentity.txt", what="")
22 | 
23 | combined@meta.data$Myeloid_Discrete_Status4 <- Myeloid_Discrete_Status4
24 | 
25 | 
26 | Discrete4 <- subset(combined, subset = Myeloid_Discrete_Status4 == "Systemic" | Myeloid_Discrete_Status4 == "Tissue_resident" | Myeloid_Discrete_Status4 == "Complement" | Myeloid_Discrete_Status4 == "Scavenger" | Myeloid_Discrete_Status4 == "Monocyte" | Myeloid_Discrete_Status4 == "Microglia")
27 | 
28 | Discrete4$allcells <- "all"
29 | 
30 | Discrete_Peaks <- CallPeaks(Discrete4, group.by = "Myeloid_Discrete_Status4", format='BED', outdir ='.', fragment.tempdir=".", name='Myeloid_Discrete_Status4', cleanup=FALSE)
31 | 
32 | 
33 | Discrete_Peaks2 <- keepStandardChromosomes(Discrete_Peaks, pruning.mode = "coarse")
34 | Discrete_Peaks2 <- subsetByOverlaps(x = Discrete_Peaks2, ranges = blacklist_hg38, invert = TRUE)
35 | 
36 | macs2_counts <- FeatureMatrix(
37 |   fragments = Fragments(Discrete4),
38 |   features = Discrete_Peaks2,
39 |   cells = colnames(Discrete4)
40 | )
41 | 
42 | Discrete4[["peaks_macs2"]] <- CreateChromatinAssay(
43 |     counts = macs2_counts,
44 |     fragments = Fragments(Discrete4),
45 |     annotation = annotation
46 | )
47 | 
48 | DefaultAssay(Discrete4) <- "peaks_macs2"
49 | 
50 | Discrete4 <- RunTFIDF(Discrete4)
51 | Discrete4 <- FindTopFeatures(Discrete4, min.cutoff = 'q0')
52 | Discrete4 <- RunSVD(Discrete4)
53 | 
54 | pdf("Discrete4_Myeloid_Ding_ATAC_DepthCor.pdf", height=5, width=9)
55 | DepthCor(Discrete4)
56 | dev.off()
57 | 
58 | 
59 | Discrete4 <- RunUMAP(object = Discrete4, reduction = 'lsi', dims = 2:30)
60 | Discrete4 <- FindNeighbors(object = Discrete4, reduction = 'lsi', dims = 2:30)
61 | Discrete4 <- FindClusters(object = Discrete4, verbose = FALSE, algorithm = 3)
62 | 
63 | 
64 | pdf("Discrete4_Myeloid_Ding_ATAC_UMAP_Annotation.pdf", height=5, width=7)
65 | DimPlot(object = Discrete4, label = TRUE, group.by="Myeloid_Discrete_Status4")
66 | dev.off()
67 | 
68 | 
69 | Pseudobulk <- AggregateExpression(Discrete4, assays = "peaks_macs2", return.seurat = TRUE, group.by = "Myeloid_Discrete_Status4", normalization.method = "LogNormalize", scale.factor = 10000, margin = 1)
70 | 
71 | 
72 | Matrix2 <- LayerData(Pseudobulk, assay = "peaks_macs2", layer = "data")
73 | write.table(as.matrix(Matrix2), file="Discrete4_Pseudobulked_Normalized_Counts.txt", sep="\t", col.names=NA, quote=FALSE)
74 | 
75 | saveRDS(Discrete4, file="Discrete_Myeloid_GBM_snATAC.rds")
76 | 


--------------------------------------------------------------------------------
/scATAC-Seq Analyses/27- Generating Deeptools heatmaps for specific peaks for the immunomodulatory discrete myeloid cells:
--------------------------------------------------------------------------------
1 | ######### Specific peaks that are used as input for the -R option in computeMatrix are identified in the normalized pseudo-bulked peaks file. We converted the log1p values to exponential values and considered a peak to be specific to a particular discrete annotation if it had a count at least 2.5 times higher than the average counts of the other annotations.
2 | 
3 | computeMatrix reference-point -S ../sorted_Discrete_Fragments/Monocyte-TileSize-50-normMethod-nCount_peaks_macs2.bw ../sorted_Discrete_Fragments/Systemic-TileSize-50-normMethod-nCount_peaks_macs2.bw ../sorted_Discrete_Fragments/Scavenger-TileSize-50-normMethod-nCount_peaks_macs2.bw ../sorted_Discrete_Fragments/Complement-TileSize-50-normMethod-nCount_peaks_macs2.bw ../sorted_Discrete_Fragments/Tissue_resident-TileSize-50-normMethod-nCount_peaks_macs2.bw ../sorted_Discrete_Fragments/Microglia-TileSize-50-normMethod-nCount_peaks_macs2.bw -R Monocytes_Specific.bed Systemic_Specific.bed Scavenger_Specific.bed Complement_Specific.bed Tissue_Resident_Specific.bed Microglia_Specific.bed -b 1000 -a 1000 --skipZeros --smartLabels -o Ding_Discrete_Supervised_Specific_Peaks_Centered_Signac_Normalized_Version4_ArchR.gz --referencePoint center;
4 | 
5 | plotHeatmap -m Ding_Discrete_Supervised_Specific_Peaks_Centered_Signac_Normalized_Version4_ArchR.gz -o Ding_Discrete_Supervised_Specific_Peaks_Centered_Signac_Normalized_Version4_ArchR.pdf --startLabel 5 --endLabel 3 --whatToShow "heatmap and colorbar" --colorMap Blues --outFileSortedRegions Ding_Discrete_Supervised_Specific_Peaks_Centered_Signac_Normalized_Version4_ArchR.bed;
6 | 


--------------------------------------------------------------------------------
/scATAC-Seq Analyses/28- Identifying enriched motifs in specific immunomodulatory peaks using monaLISA:
--------------------------------------------------------------------------------
 1 | ######### Specific peaks that are used as input for the are identified using the normalized pseudo-bulked peaks file. We converted the log1p values to exponential values and considered a peak to be specific to a particular discrete annotation if it had a count at least 2.5 times higher than the average counts of the other annotations.
 2 | 
 3 | R;
 4 | 
 5 | ############################ Inside R ##############################
 6 | library(monaLisa)
 7 | library(GenomicRanges)
 8 | library(SummarizedExperiment)
 9 | library(JASPAR2024)
10 | library(TFBSTools)
11 | library(BSgenome.Hsapiens.UCSC.hg38)
12 | library(ComplexHeatmap)
13 | library(circlize)
14 | library(universalmotif)
15 | 
16 | 
17 | Systemic <- rtracklayer::import(con = "Systemic_Specific.bed", format = "bed")
18 | 
19 | Scavenger <- rtracklayer::import(con = "Scavenger_Specific.bed", format = "bed")
20 | 
21 | Complement <- rtracklayer::import(con = "Complement_Specific.bed", format = "bed")
22 | 
23 | Tissue_Resident <- rtracklayer::import(con = "Tissue_Resident_Specific.bed", format = "bed")
24 | 
25 | 
26 | Peaks <- c(Systemic, Scavenger, Complement, Tissue_Resident)
27 | 
28 | Peaks2 <- trim(resize(Peaks, width = median(width(Peaks)), fix = "center"))
29 | summary(width(Peaks2))
30 | 
31 | 
32 | bins2 <- rep(c("Systemic", "Scavenger", "Complement", "Tissue_Resident"), c(length(Systemic), length(Scavenger), length(Complement), length(Tissue_Resident)))
33 | 
34 | desired_order <- c("Systemic", "Scavenger", "Complement", "Tissue_Resident")
35 | 
36 | bins2 <- factor(bins2, levels = desired_order)
37 | 
38 | table(bins2)
39 | 
40 | Peakseqs <- getSeq(BSgenome.Hsapiens.UCSC.hg38, Peaks2)
41 | 
42 | 
43 | 
44 | 
45 | db <- file.path(system.file("extdata", package="JASPAR2024"), 
46 |                     "JASPAR2024.sqlite")
47 | opts <- list()
48 | opts[["tax_group"]] <- "vertebrates"
49 | opts[["matrixtype"]] <- "PWM"
50 | opts[["collection"]] <- "CORE"
51 | pwms <- getMatrixSet(db, opts)
52 | 
53 | 
54 | hg38 <- Hsapiens
55 | 
56 | 
57 | se2 <- calcBinnedMotifEnrR(seqs = Peakseqs, bins = bins2, pwmL = pwms, background = "genome", genome = hg38, genome.oversample = 500)
58 | 
59 | background_matrix_pvalue <- assay(se2, "negLog10Padj")
60 | write.table(background_matrix_pvalue, file="LISA_Supervised_Specific_Peaks_Background_Mode_log10pvalues_NoCollapsing_Background500.txt", col.names=NA, sep="\t", quote=FALSE)
61 | 
62 | 
63 | background_matrix <- assay(se2, "log2enr")
64 | write.table(background_matrix, file="LISA_Supervised_Specific_Peaks_Background_Mode_log2enrichment_No_Collapsing_Background500.txt", col.names=NA, sep="\t", quote=FALSE)
65 | 
66 | saveRDS(se2, file="LISA_Supervised_Specific_Peaks_Background_Mode_500.rds")
67 | 
68 | 
69 | 
70 | Motifs <- scan("Motifs_OI3.txt", what="")
71 | 
72 | seSel2 <- se2[Motifs, ]
73 | 
74 | pdf("LISA_Heatmap_Supervised_Specific_Peaks_Background_Mode_NoCollapsing_Selected_Motifs_Version6_No_Clustering.pdf", height=18, width=40)
75 | plotMotifHeatmaps(x = seSel2, which.plots = c("log2enr", "negLog10Padj"), 
76 |                   width = 4, cluster = FALSE, maxEnr = 2, maxSig = 10,
77 |                   show_dendrogram = TRUE, show_seqlogo = TRUE,
78 |                   width.seqlogo = 1.75, show_motif_GC = TRUE)
79 | dev.off()
80 | 


--------------------------------------------------------------------------------