├── .github ├── .gitignore └── workflows │ └── R-CMD-check.yaml ├── .gitignore ├── DESCRIPTION ├── NAMESPACE ├── NEWS.md ├── R ├── balanced_dataset.R ├── building_networks.R ├── neighborhood_analysis.R ├── post_RCTD.R ├── purification.R ├── split_dataset.R ├── utils.R └── visualization.R ├── README.md ├── doc ├── Run_RCTD_and_SPLIT_on_Xenium.R ├── Run_RCTD_and_SPLIT_on_Xenium.Rmd └── Run_RCTD_and_SPLIT_on_Xenium.html ├── inst └── CITATION └── vignettes ├── Run_RCTD_and_SPLIT_on_Xenium.Rmd └── plots └── SPLIT_schema.png /.github/.gitignore: -------------------------------------------------------------------------------- 1 | *.html 2 | -------------------------------------------------------------------------------- /.github/workflows/R-CMD-check.yaml: -------------------------------------------------------------------------------- 1 | # Workflow derived from https://github.com/r-lib/actions/tree/v2/examples 2 | # Need help debugging build failures? Start at https://github.com/r-lib/actions#where-to-find-help 3 | on: 4 | push: 5 | branches: [main, master] 6 | paths-ignore: 7 | - 'README.md' 8 | - 'README.Rmd' 9 | - 'docs/**' 10 | - '*.md' 11 | - '*.Rproj' 12 | - '.gitignore' 13 | - '.github/ISSUE_TEMPLATE/**' 14 | pull_request: 15 | 16 | name: R-CMD-check.yaml 17 | 18 | permissions: read-all 19 | 20 | jobs: 21 | R-CMD-check: 22 | runs-on: ubuntu-latest 23 | env: 24 | GITHUB_PAT: ${{ secrets.GITHUB_TOKEN }} 25 | R_KEEP_PKG_SOURCE: yes 26 | steps: 27 | - uses: actions/checkout@v4 28 | 29 | - uses: r-lib/actions/setup-r@v2 30 | with: 31 | use-public-rspm: true 32 | 33 | - uses: r-lib/actions/setup-r-dependencies@v2 34 | with: 35 | extra-packages: any::rcmdcheck 36 | needs: check 37 | 38 | - uses: r-lib/actions/check-r-package@v2 39 | with: 40 | upload-snapshots: true 41 | error-on: '"error"' 42 | args: 'c("--no-build-vignettes", "--no-manual", "--no-stop-on-warning")' 43 | build_args: 'c("--no-build-vignettes", "--no-manual","--compact-vignettes=gs+qpdf")' 44 | 45 | 46 | 47 | -------------------------------------------------------------------------------- /.gitignore: -------------------------------------------------------------------------------- 1 | # R-specific files 2 | .Rhistory 3 | .Rapp.history 4 | .RData 5 | .Ruserdata 6 | *.Rproj.user 7 | 8 | # RStudio files 9 | .Rproj 10 | *.Rproj.user/ 11 | 12 | # Compiled files 13 | *.o 14 | *.a 15 | *.dll 16 | *.so 17 | *.dylib 18 | 19 | # Package binary files 20 | *.tar.gz 21 | *.zip 22 | 23 | # R package build artifacts 24 | /inst/doc/ 25 | /inst/examples/ 26 | /inst/test/ 27 | /man/*.Rd 28 | 29 | # Temporary files 30 | *.log 31 | *.tmp 32 | *.bak 33 | *.swp 34 | 35 | # .Rbuildignore file should also be included 36 | /.Rbuildignore 37 | 38 | # Other artifacts generated by devtools and roxygen 39 | .Rbuildignore 40 | dev/ 41 | 42 | # Test files 43 | /tests/testthat/*.Rout 44 | 45 | # RStudio project files 46 | *.Rproj 47 | *.Rproj.user/ 48 | 49 | # VSCode settings 50 | .vscode/ 51 | 52 | # Other directories and files 53 | .DS_Store 54 | Thumbs.db 55 | .Rproj.user 56 | #/doc/ 57 | /Meta/ 58 | -------------------------------------------------------------------------------- /DESCRIPTION: -------------------------------------------------------------------------------- 1 | Package: SPLIT 2 | Type: Package 3 | Title: Spatial Purification of Layered Intracellular Transcripts 4 | Version: 0.1.0 5 | Authors@R: c( 6 | person( 7 | "Mariia", "Bilous", 8 | email = "Mariia.Bilous@chuv.ch", 9 | role = c("aut", "cre") 10 | ) 11 | ) 12 | Description: 13 | The SPLIT (Spatial Purification of Layered Intracellular Transcripts) package provides 14 | tools to purify spatial transcriptomics data using decomposition results from the RCTD 15 | (Robust Cell Type Decomposition) algorithm. It offers functionality to adjust cell-type 16 | profiles by removing contributions attributed to background contamination. SPLIT also 17 | computes contamination scores and estimates local diffusion of contaminating profiles 18 | to help distinguish between contamination and phenotypic variation. Additionally, 19 | the package includes methods to refine cell-type annotations based on transcriptomic 20 | neighborhood homogeneity. The resulting output contains profiles with reduced 21 | contamination and increased cell-type specificity. 22 | License: MIT 23 | Encoding: UTF-8 24 | LazyData: true 25 | Depends: R (>= 4.0.0) 26 | Imports: 27 | Matrix, 28 | dplyr, 29 | forcats, 30 | entropy, 31 | spacexr, 32 | ggplot2, 33 | scatterpie, 34 | Seurat, 35 | BiocParallel, 36 | igraph 37 | Suggests: 38 | testthat, 39 | knitr, 40 | rmarkdown 41 | VignetteBuilder: knitr 42 | RoxygenNote: 7.3.2 43 | -------------------------------------------------------------------------------- /NAMESPACE: -------------------------------------------------------------------------------- 1 | # Generated by roxygen2: do not edit by hand 2 | 3 | export(add_annotation_from_neighbors) 4 | export(add_cell_types_neighborhood_weights) 5 | export(add_individual_w1_of_second_type_in_neighborhood) 6 | export(add_individual_w2_of_second_type_in_neighborhood) 7 | export(add_infiltration_metrics_to_neighborhood) 8 | export(add_max_weight_of_spilling_type_in_neighborhood) 9 | export(add_neigborhood_nCount_of_spilling_cell_type) 10 | export(add_neigborhood_weight_composition_of_spilling_cell_type) 11 | export(add_neighborhood_annotation_certainty) 12 | export(add_neighborhood_weight_composition) 13 | export(add_rctd_to_neighborhood) 14 | export(add_spatial_metric) 15 | export(add_transcriptomics_metric) 16 | export(balance_raw_and_purified_data_by_score) 17 | export(balance_raw_and_purified_data_by_spot_class) 18 | export(balance_split) 19 | export(build_spatial_network) 20 | export(build_transcriptomics_network) 21 | export(compute_neighborhood) 22 | export(crop_pie_df_to_cell) 23 | export(decompose_doublet) 24 | export(get_pieplot_df) 25 | export(get_singlet_data) 26 | export(neighborhood_analysis_to_metadata) 27 | export(plot_pie) 28 | export(plot_pie_around_cell) 29 | export(plot_pie_by_coordinates) 30 | export(purify) 31 | export(purify_counts_with_rctd) 32 | export(run_post_process_RCTD) 33 | export(save_pieplot) 34 | export(split) 35 | import(BiocParallel) 36 | import(Seurat) 37 | import(dplyr) 38 | import(spacexr) 39 | importFrom(dplyr,"%>%") 40 | importFrom(dplyr,filter) 41 | importFrom(dplyr,select) 42 | importFrom(entropy,entropy) 43 | importFrom(ggplot2,aes) 44 | importFrom(ggplot2,annotate) 45 | importFrom(ggplot2,coord_fixed) 46 | importFrom(ggplot2,ggplot) 47 | importFrom(ggplot2,guides) 48 | importFrom(ggplot2,scale_color_manual) 49 | importFrom(ggplot2,scale_fill_manual) 50 | importFrom(ggplot2,theme_void) 51 | importFrom(scatterpie,geom_scatterpie) 52 | -------------------------------------------------------------------------------- /NEWS.md: -------------------------------------------------------------------------------- 1 | # SPLIT 0.1.0 2 | 3 | Initial release of the SPLIT (Spatial Purification of Layered Intracellular Transcripts) package. 4 | 5 | ## Features 6 | 7 | - Purifies spatial transcriptomics data by removing background contamination using RCTD deconvolution results. 8 | - Computes local contamination scores and models background diffusion. 9 | - Refines cell-type annotations based on neighborhood transcriptomic homogeneity (SPLIT-shift). 10 | - Outputs a purified count matrix with improved cell-type specificity. 11 | - Scores computed by SPLIT can be used to identify highly contaminating cell types and assess cell co-localization patterns. 12 | 13 | ⚠️ **IMPORTANT:**\ 14 | SPLIT currently requires **doublet-mode** RCTD results from the original [spacexr GitHub repository](https://github.com/dmcable/spacexr) or its faster [HD fork](https://github.com/jpromeror/spacexr/tree/HD), **not** from the newly released [Bioconductor version](https://www.bioconductor.org/packages/release/bioc/html/spacexr.html).\ 15 | **Compatibility with Bioconductor's spacexr is coming soon.** 16 | -------------------------------------------------------------------------------- /R/balanced_dataset.R: -------------------------------------------------------------------------------- 1 | #' @keywords internal 2 | swap_expr <- rlang::expr({ 3 | cells_to_swap_label <- meta_data %>% 4 | filter(!first_type_neighborhood_agreement & !first_type_class_neighborhood_agreement) %>% 5 | filter(second_type_class == first_type_class_neighborhood) %>% 6 | rownames() 7 | 8 | meta_data <- meta_data %>% 9 | mutate(across(c(first_type, second_type, weight_first_type, weight_second_type), 10 | .fns = list(before_swap = identity))) %>% # Create backup in one step 11 | mutate( 12 | swap = rownames(.) %in% cells_to_swap_label, # Logical column for swapping 13 | first_type = if_else(swap, second_type_before_swap, first_type), 14 | second_type = if_else(swap, first_type_before_swap, second_type), 15 | weight_first_type = if_else(swap, weight_second_type_before_swap, weight_first_type), 16 | weight_second_type = if_else(swap, weight_first_type_before_swap, weight_second_type) 17 | ) 18 | 19 | # Replace corrected counts for swapped cells 20 | cells_to_swap_correcred_profile <- cells_to_swap_label[cells_to_swap_label %in% colnames(xe_purified)] #intersect(cells_to_swap_label, cells_to_replace_with_purified) 21 | count_matrix[, cells_to_swap_correcred_profile] <- 22 | GetAssayData(xe_raw, assay = default_assay, layer = "counts")[common_genes, cells_to_swap_correcred_profile] - 23 | GetAssayData(xe_purified, assay = default_assay, layer = "counts")[common_genes, cells_to_swap_correcred_profile] 24 | count_matrix[count_matrix < 0] <- 0 25 | }) 26 | 27 | #' Balance raw and purified data by merging high quality raw data and otherwise purified data 28 | #' 29 | #' Merges raw and purified data into one dataset by keeping raw counts 30 | #' for high quality cells and replacing contaminated cells with their purified profiles. Reject cells are removed 31 | #' 32 | #' 33 | #' @param xe_raw raw seurat object 34 | #' @param xe_purified purified seurat object (post \code{purify_counts_with_rctd}) 35 | #' @param threshold value below which cell is considered as contaminated and is replaced by purified profile. For the moment, it's a single value, but should accept cell-type-specific vector later on #TODO 36 | #' @param score_name name of the param to threshold on 37 | #' @param DO_swap_lables A logical indicating whether to swap first and second cell types for cells which label does not agree with its transcriptomic neighborhod lable 38 | #' 39 | #' @export 40 | 41 | balance_raw_and_purified_data_by_score <- function( 42 | xe_raw, 43 | xe_purified, 44 | threshold = .15, 45 | score_name = c("neighborhood_weights_second_type", "second_type_neighbors_N", "second_type_neighbors_no_reject_N"), 46 | spot_class_key = "spot_class", 47 | DO_swap_lables = FALSE, 48 | default_assay = "Xenium" 49 | ){ 50 | score_name <- score_name[1] 51 | if(!score_name %in% colnames(xe_raw@meta.data)){ 52 | stop("score", score_name, "is not available in `xe_raw`, please compute it first!") 53 | } 54 | 55 | if(!"first_type" %in% colnames(xe_raw@meta.data)){ 56 | stop("`first_type` is not available in `xe_raw`, please compute it first!") 57 | } 58 | 59 | if(!spot_class_key %in% colnames(xe_raw@meta.data)){ 60 | stop("`spot_class` is not available in `xe_raw`, please compute it first!") 61 | } 62 | 63 | cells_to_replace_with_purified <- 64 | xe_raw@meta.data %>% 65 | filter((.data[[spot_class_key]] != "reject" & .data[[score_name]] > threshold) | 66 | .data[[spot_class_key]] == "doublet_uncertain") %>% 67 | rownames() 68 | 69 | cells_to_remove <- 70 | xe_raw@meta.data %>% 71 | filter(.data[[spot_class_key]] == "reject") %>% 72 | rownames() 73 | 74 | cells_to_keep_raw <- setdiff(colnames(xe_raw), c(cells_to_replace_with_purified, cells_to_remove)) 75 | cells_to_replace_with_purified <- setdiff(cells_to_replace_with_purified, cells_to_remove) 76 | # restrict to cells present in purified data 77 | cells_to_replace_with_purified <- cells_to_replace_with_purified[cells_to_replace_with_purified %in% colnames(xe_purified)] 78 | 79 | xe_raw$purification_status <- "raw" 80 | xe_raw@meta.data[cells_to_replace_with_purified,"purification_status"] <- xe_purified@meta.data[cells_to_replace_with_purified,"purification_status"] 81 | xe_raw@meta.data[cells_to_remove,"purification_status"] <- "removed" 82 | 83 | common_genes <- intersect(rownames(xe_raw), rownames(xe_purified)) 84 | count_matrix <- cbind(GetAssayData(xe_raw, assay = default_assay, layer = "counts")[common_genes, cells_to_keep_raw], 85 | GetAssayData(xe_purified, assay = default_assay, layer = "counts")[common_genes, cells_to_replace_with_purified]) 86 | 87 | meta_data = xe_raw@meta.data[colnames(count_matrix),] 88 | 89 | if(DO_swap_lables){ 90 | eval(swap_expr) # Expands the swap logic in-place 91 | } else { 92 | meta_data$swap <- FALSE 93 | } 94 | 95 | xe_balanced <- CreateSeuratObject(counts = count_matrix, assay = default_assay, meta.data = meta_data) 96 | 97 | return(xe_balanced) 98 | } 99 | 100 | #' Balance raw and purified data using spot class categories 101 | #' 102 | #' Merges raw and purified data into one dataset by retaining raw counts 103 | #' for singlet cells and replacing other cell categories with their purified profiles. 104 | #' Reject cells are removed from the dataset. 105 | #' 106 | #' @param xe_raw A raw Seurat object containing the original spatial transcriptomics data. 107 | #' @param xe_purified A purified Seurat object, typically post \code{purify_counts_with_rctd}, containing corrected cell profiles. 108 | #' @param spot_class_key A character string specifying the metadata column name in \code{xe_raw} that indicates the spot classification (default: `"spot_class"`). 109 | #' @param DO_swap_lables A logical indicating whether to swap first and second cell types for cells which label does not agree with its transcriptomic neighborhod lable 110 | #' 111 | #' @return A balanced Seurat object with a merged count matrix where: 112 | #' \itemize{ 113 | #' \item Singlet cells retain their raw counts. 114 | #' \item Cells classified as non-singlet (but not rejected) are replaced with purified counts. 115 | #' \item Rejected cells are removed. 116 | #' } 117 | #' 118 | #' @details The function categorizes cells into three groups: 119 | #' \itemize{ 120 | #' \item Cells classified as `"singlet"` retain their raw data. 121 | #' \item Cells classified as anything other than `"singlet"` (excluding `"reject"`) are replaced with their purified profiles. 122 | #' \item Cells classified as `"reject"` are completely removed from the dataset. 123 | #' } 124 | #' 125 | #' @export 126 | 127 | balance_raw_and_purified_data_by_spot_class <- function( 128 | xe_raw, 129 | xe_purified, 130 | spot_class_key = "spot_class", 131 | DO_swap_lables = FALSE, 132 | default_assay = "Xenium" 133 | ){ 134 | 135 | if(!spot_class_key %in% colnames(xe_raw@meta.data)){ 136 | stop("spot_class_key", spot_class_key, "is not available in `xe_raw`, please compute it first!") 137 | } 138 | 139 | if(!"first_type" %in% colnames(xe_raw@meta.data)){ 140 | stop("`first_type` is not available in `xe_raw`, please compute it first!") 141 | } 142 | 143 | cells_to_remove <- 144 | xe_raw@meta.data %>% 145 | filter(.data[[spot_class_key]] == "reject") %>% 146 | rownames() 147 | 148 | cells_to_replace_with_purified <- 149 | xe_raw@meta.data %>% 150 | filter(.data[[spot_class_key]] != "singlet") %>% 151 | rownames() 152 | cells_to_replace_with_purified <- setdiff(cells_to_replace_with_purified, cells_to_remove) 153 | 154 | cells_to_keep_raw <- 155 | xe_raw@meta.data %>% 156 | filter(.data[[spot_class_key]] == "singlet") %>% 157 | rownames() 158 | 159 | 160 | xe_raw$purification_status <- "raw" 161 | xe_raw@meta.data[cells_to_replace_with_purified,"purification_status"] <- xe_purified@meta.data[cells_to_replace_with_purified,"purification_status"] 162 | xe_raw@meta.data[cells_to_remove,"purification_status"] <- "removed" 163 | 164 | common_genes <- intersect(rownames(xe_raw), rownames(xe_purified)) 165 | count_matrix <- cbind(GetAssayData(xe_raw, assay = default_assay, layer = "counts")[common_genes, cells_to_keep_raw], 166 | GetAssayData(xe_purified, assay = default_assay, layer = "counts")[common_genes, cells_to_replace_with_purified]) 167 | 168 | meta_data = xe_raw@meta.data[colnames(count_matrix),] 169 | 170 | if(DO_swap_lables){ 171 | eval(swap_expr) # Expands the swap logic in-place 172 | } else { 173 | meta_data$swap <- FALSE 174 | } 175 | 176 | xe_balanced <- CreateSeuratObject(counts = count_matrix, assay = default_assay, meta.data = meta_data) 177 | return(xe_balanced) 178 | } 179 | 180 | -------------------------------------------------------------------------------- /R/building_networks.R: -------------------------------------------------------------------------------- 1 | #' Compute Neighborhood Graph with Optional Pruning 2 | #' 3 | #' This function computes or retrieves a k-nearest neighbors (KNN) graph from a given object, 4 | #' using a specified dimensional reduction or features. Optionally, it prunes edges based on 5 | #' a specified distance threshold. 6 | #' 7 | #' @param obj An object (e.g., Seurat object) containing data to compute the KNN graph. 8 | #' @param neighbors_name A character string specifying the name of an existing neighbors object 9 | #' in `obj`. If `NULL`, neighbors are computed. 10 | #' @param k_knn An integer specifying the number of nearest neighbors to use. Default is 20. 11 | #' @param reduction A character string specifying the dimensional reduction to use 12 | #' (e.g., "pca", "spatial"). Default is "pca". 13 | #' @param dims A numeric vector specifying the dimensions to use for the reduction. Default is `1:50`. 14 | #' @param features A character vector of feature names to use for neighbor computation. Default is `NULL`. 15 | #' @param graph_name A character string specifying the name of the graph to store in `obj`. Default is `"transcriptomics_knn"`. 16 | #' @param DO_prune A logical value indicating whether to prune edges in the graph based on a distance threshold. Default is `FALSE`. 17 | #' @param rad_pruning A numeric value specifying the maximum distance for retaining edges. Edges with distances greater than this value will be pruned. Default is `Inf`. 18 | #' @param ... Additional parameters passed to `Seurat::FindNeighbors`. 19 | #' 20 | #' @return A list containing the following elements: 21 | #' \describe{ 22 | #' \item{`graph`}{An igraph object representing the (possibly pruned) KNN graph.} 23 | #' \item{`nn_idx`}{A matrix of nearest-neighbor indices.} 24 | #' \item{`nn_dist`}{A matrix of distances to the nearest neighbors.} 25 | #' \item{`cell_id`}{A vector of cell IDs corresponding to the rows of `nn_idx` and `nn_dist`.} 26 | #' } 27 | #' 28 | #' @details 29 | #' If `DO_prune` is `TRUE`, the function removes edges from the graph where the 30 | #' distance exceeds the `rad_pruning` threshold. A warning is issued if `rad_pruning` is 31 | #' larger than the maximum distance in the graph, as no edges will be pruned in that case. 32 | #' 33 | #' @examples 34 | #' \dontrun{ 35 | #' # Example usage with a Seurat object 36 | #' result <- compute_neighborhood(obj = seurat_obj, reduction = "pca", k_knn = 15) 37 | #' plot(result$graph) 38 | #' 39 | #' # Example with pruning 40 | #' result <- compute_neighborhood( 41 | #' obj = seurat_obj, 42 | #' reduction = "pca", 43 | #' k_knn = 15, 44 | #' DO_prune = TRUE, 45 | #' rad_pruning = 0.2 46 | #' ) 47 | #' plot(result$graph) 48 | #' } 49 | #' 50 | #' @export 51 | #' 52 | 53 | compute_neighborhood <- function( 54 | obj, 55 | neighbors_name = NULL, 56 | k_knn = 20, 57 | reduction = "pca", 58 | dims = 1:50, 59 | features = NULL, 60 | graph_name = "transcriptomics_knn", 61 | DO_prune = FALSE, 62 | rad_pruning = Inf, 63 | ... 64 | ) { 65 | # Check if neighbors exist or compute them 66 | knn_neighbors <- if (!is.null(neighbors_name)) { 67 | if (!(neighbors_name %in% obj@neighbors)) { 68 | stop(paste("Neighbors", neighbors_name, "do not exist. Compute one or set `neighbors_name` to `NULL` to re-compute neighbors.")) 69 | } 70 | obj@neighbors[[neighbors_name]] 71 | } else { 72 | obj <- Seurat::FindNeighbors( 73 | obj, 74 | reduction = reduction, 75 | dims = dims, 76 | features = features, 77 | k = k_knn, 78 | return.neighbor = TRUE, 79 | graph.name = graph_name, 80 | ... 81 | ) 82 | obj@neighbors[[graph_name]] 83 | } 84 | 85 | # Extract adjacency list and clean up NAs 86 | adjacency_knn <- apply(knn_neighbors@nn.idx[, -1, drop = FALSE], 1, function(x) x[!is.na(x)]) 87 | 88 | # Create graph from adjacency list 89 | graph <- igraph::graph_from_adj_list(adjacency_knn) 90 | n_edge <- igraph::ecount(graph) 91 | 92 | max_dist <- max(knn_neighbors@nn.dist) 93 | 94 | # Pruning 95 | if(DO_prune){ 96 | if(rad_pruning > max_dist){ 97 | warning("No puning as `rad_pruning` is larger than any distance in the graph ") 98 | } else { 99 | 100 | # prune edges with large distance (remove distant neighbors) 101 | knn_neighbors@nn.idx[knn_neighbors@nn.dist > rad_pruning] <- NA 102 | knn_neighbors@nn.dist[knn_neighbors@nn.dist > rad_pruning] <- NA 103 | 104 | adjacency_knn <- apply(knn_neighbors@nn.idx[, -1, drop = FALSE], 1, function(x) x[!is.na(x)]) 105 | 106 | graph <- igraph::graph_from_adj_list(adjacency_knn) 107 | n_edge_prune <- igraph::ecount(graph) 108 | delta_n_edge <- n_edge-n_edge_prune 109 | message(paste("N =", delta_n_edge, "(", round(100*delta_n_edge/n_edge), "%) edges were pruned")) 110 | } 111 | } 112 | 113 | # Prepare results 114 | result <- list( 115 | graph = graph, 116 | nn_idx = knn_neighbors@nn.idx, 117 | nn_dist = knn_neighbors@nn.dist, 118 | cell_id = knn_neighbors@cell.names 119 | ) 120 | 121 | return(result) 122 | } 123 | 124 | #' Build Spatial Network 125 | #' 126 | #' A wrapper function around `compute_neighborhood` to create a spatial KNN graph 127 | #' using spatial reduction data. 128 | #' 129 | #' @param obj An object (e.g., Seurat object) containing spatial data. 130 | #' @param ... Additional parameters passed to `compute_neighborhood`. 131 | #' 132 | #' @return A list containing the following elements: 133 | #' \describe{ 134 | #' \item{`graph`}{An igraph object representing the KNN graph.} 135 | #' \item{`nn_idx`}{A matrix of nearest-neighbor indices.} 136 | #' \item{`nn_dist`}{A matrix of distances to the nearest neighbors.} 137 | #' \item{`cell_id`}{A vector of cell IDs corresponding to the rows of `nn_idx` and `nn_dist`.} 138 | #' } 139 | #' 140 | #' @examples 141 | #' \dontrun{ 142 | #' # Example usage with a Seurat object containing spatial data 143 | #' spatial_result <- build_spatial_network(obj = seurat_obj) 144 | #' plot(spatial_result$graph) 145 | #' } 146 | #' 147 | #' @export 148 | 149 | build_spatial_network <- function(obj, reduction = "spatial", dims = 1:2, graph_name = "spatial_knn", DO_prune = T, rad_pruning = 30, ...) { 150 | compute_neighborhood(obj = obj, reduction = reduction, dims = dims, DO_prune = DO_prune, rad_pruning = rad_pruning, ...) 151 | } 152 | 153 | #' Build Transcriptomics Network 154 | #' 155 | #' A wrapper function around `compute_neighborhood` to create a transcriptomics KNN graph 156 | #' using transcriptomics reduction data. 157 | #' 158 | #' @param obj An object (e.g., Seurat object) containing transcriptomics data. 159 | #' @param ... Additional parameters passed to `compute_neighborhood`. 160 | #' 161 | #' @return A list containing the following elements: 162 | #' \describe{ 163 | #' \item{`graph`}{An igraph object representing the KNN graph.} 164 | #' \item{`nn_idx`}{A matrix of nearest-neighbor indices.} 165 | #' \item{`nn_dist`}{A matrix of distances to the nearest neighbors.} 166 | #' \item{`cell_id`}{A vector of cell IDs corresponding to the rows of `nn_idx` and `nn_dist`.} 167 | #' } 168 | #' 169 | #' @examples 170 | #' \dontrun{ 171 | #' # Example usage with a Seurat object containing transcriptomics data 172 | #' transcriptomics_result <- build_transcriptomics_network(obj = seurat_obj, k_knn = 15) 173 | #' plot(spatial_result$graph) 174 | #' } 175 | #' @export 176 | build_transcriptomics_network <- function(obj, reduction = "pca", graph_name = "transcriptomics_knn", ...){ 177 | compute_neighborhood(obj = obj, reduction = reduction, graph_name = graph_name, ...) 178 | } 179 | 180 | -------------------------------------------------------------------------------- /R/neighborhood_analysis.R: -------------------------------------------------------------------------------- 1 | #' Add RCTD Results to Neighborhood Graph 2 | #' 3 | #' This function integrates RCTD (Robust Cell Type Decomposition) results into a neighborhood graph. 4 | #' It ensures that the RCTD results align with the graph's cells and formats them for downstream use. 5 | #' 6 | #' @param graph A list representing a neighborhood graph, typically created by \code{compute_neighborhood}. 7 | #' Must contain \code{nn_idx} (nearest neighbor indices) and \code{cell_id} (cell identifiers). 8 | #' @param rctd An RCTD object containing cell decomposition results. The results are expected in 9 | #' \code{rctd@results$results_df}. 10 | #' 11 | #' @return A list containing the original neighborhood graph and a matrix for each column in 12 | #' \code{rctd@results$results_df}, where rows correspond to cells and columns correspond to neighbors. 13 | #' 14 | #' @details 15 | #' The function extracts the decomposition results from the RCTD object and maps them to the 16 | #' neighborhood graph structure, ensuring consistency in cell alignment. It then organizes 17 | #' the results into matrices where each matrix corresponds to a specific result column from the RCTD results. 18 | #' 19 | #' @examples 20 | #' \dontrun{ 21 | #' # Assuming `graph` is a neighborhood graph and `rctd` is an RCTD object 22 | #' updated_graph <- add_rctd_res_to_neighborhood(graph = graph, rctd = rctd) 23 | #' } 24 | #' 25 | #' @export 26 | 27 | add_rctd_to_neighborhood <- function( 28 | graph, 29 | rctd 30 | ){ 31 | # Ensure RCTD results align with graph cells 32 | results_df <- rctd@results$results_df[graph$cell_id, ] # Match cells 33 | rownames(results_df) <- graph$cell_id 34 | 35 | # Reshape results to align with nearest neighbor indices 36 | results_df <- results_df[as.vector(graph$nn_idx), ] 37 | num_neighbors <- ncol(graph$nn_idx) 38 | 39 | # Prepare results as a list of matrices 40 | result_list <- lapply(colnames(results_df), function(column_name) { 41 | matrix(results_df[[column_name]], ncol = num_neighbors) 42 | }) 43 | 44 | # Combine graph and results 45 | names(result_list) <- colnames(results_df) 46 | return(c(graph, result_list)) 47 | } 48 | 49 | 50 | #' Add Neighborhood Infiltration Metrics to Spatial Neighborhood 51 | #' 52 | #' This function calculates various infiltration metrics for cell neighborhoods 53 | #' within a spatial context. It computes metrics related to the number and type 54 | #' of neighbors annotated to a cell's first or second type, as well as infiltration 55 | #' metrics based on cell classes. 56 | #' 57 | #' @param neighborhood A list containing spatial neighborhood data. This should 58 | #' include matrices for `first_type`, `second_type`, `nn.idx`, and `spot_class`. 59 | #' 60 | #' @return A modified version of the input `neighborhood` list with additional 61 | #' metrics. The added fields include: 62 | #' \item{`total_neighbors_N`}{Total number of neighbors for each cell.} 63 | #' \item{`total_singlets_neighbors_N`}{Number of singlet neighbors for each cell.} 64 | #' \item{`second_type_neighbors`}{Indices of neighbors annotated to the second cell type.} 65 | #' \item{`second_type_neighbors_N`}{Count of neighbors annotated to the second cell type.} 66 | #' \item{`second_type_singlets_neighbors`}{Indices of singlet neighbors annotated to the second cell type.} 67 | #' \item{`second_type_singlets_neighbors_N`}{Count of singlet neighbors annotated to the second cell type.} 68 | #' \item{`first_type_neighbors`}{Indices of neighbors annotated to the first cell type.} 69 | #' \item{`first_type_neighbors_N`}{Count of neighbors annotated to the first cell type.} 70 | #' \item{`first_type_singlets_neighbors`}{Indices of singlet neighbors annotated to the first cell type.} 71 | #' \item{`first_type_singlets_neighbors_N`}{Count of singlet neighbors annotated to the first cell type.} 72 | #' \item{`second_type_class_neighbors`}{Indices of neighbors annotated to the second cell type class.} 73 | #' \item{`second_type_class_neighbors_N`}{Count of neighbors annotated to the second cell type class.} 74 | #' \item{`first_type_class_neighbors`}{Indices of neighbors annotated to the first cell type class.} 75 | #' \item{`first_type_class_neighbors_N`}{Count of neighbors annotated to the first cell type class.} 76 | #' \item{`same_second_type_neighbors`}{Indices of neighbors annotated to the same second cell type.} 77 | #' \item{`same_second_type_neighbors_N`}{Count of neighbors annotated to the same second cell type.} 78 | #' 79 | #' 80 | #' @export 81 | #' 82 | add_infiltration_metrics_to_neighborhood <- function(neighborhood) { 83 | 84 | 85 | # Compute the total number of neighbors 86 | neighborhood$total_neighbors_N <- rowSums(!is.na(neighborhood$nn_idx)) - 1 87 | neighborhood$annotated_neighbors_N <- rowSums(!is.na(neighborhood$first_type)) - 1 88 | 89 | # Compute total number of singlet neighbors 90 | neighborhood$total_singlets_neighbors_N <- rowSums(neighborhood$spot_class == "singlet", na.rm = TRUE) 91 | 92 | ## Generate vector of annotation of cell's neighbors ignoring reject cells 93 | neighborhood$first_type_no_reject <- neighborhood$first_type 94 | neighborhood$first_type_no_reject[neighborhood$spot_class == "reject"] <- "NA" 95 | 96 | # Create first_type_fist_element_second_type (with first element as second_type) 97 | # to facilitate identification of the number of the second cell type in the neighborhood 98 | neighborhood$first_type_fist_element_second_type <- cbind(neighborhood$second_type[, 1], neighborhood$first_type[, -1]) 99 | neighborhood$first_type_fist_element_second_type_no_reject <- cbind(neighborhood$second_type[, 1], neighborhood$first_type_no_reject[, -1]) 100 | 101 | 102 | # Helper function to calculate neighbors for a given type 103 | get_neighbors <- function(neighborhood_col) { 104 | apply(neighborhood_col, 105 | 1, 106 | FUN = function(x) { 107 | which(x[-1] == x[1]) + 1 108 | }) 109 | } 110 | 111 | # SECOND type neighbors: indices where neighbors are annotated to the residual second type 112 | neighborhood$second_type_neighbors <- get_neighbors(neighborhood$first_type_fist_element_second_type) 113 | neighborhood$second_type_neighbors_N <- sapply(neighborhood$second_type_neighbors, length) 114 | 115 | neighborhood$second_type_neighbors_no_reject <- get_neighbors(neighborhood$first_type_fist_element_second_type_no_reject) 116 | neighborhood$second_type_neighbors_no_reject_N <- sapply(neighborhood$second_type_neighbors_no_reject, length) 117 | 118 | # SECOND type singlet neighbors: neighbors annotated to the residual cell type and are singlets 119 | neighborhood$second_type_singlets_neighbors <- sapply( 120 | seq(nrow(neighborhood$first_type_fist_element_second_type)), 121 | FUN = function(i){ 122 | x = neighborhood$first_type_fist_element_second_type[i,] 123 | y = neighborhood$spot_class[i,] 124 | which((x[-1] == x[1]) & (y[-1] == "singlet")) + 1 125 | }) 126 | 127 | neighborhood$second_type_singlets_neighbors_N <- sapply(neighborhood$second_type_singlets_neighbors, length) 128 | 129 | # Compute infiltration of second type CLASS in the neighborhood 130 | neighborhood$first_type_class_fist_element_second_type_class <- cbind(neighborhood$second_type_class[, 1], neighborhood$first_type_class[, -1]) 131 | neighborhood$second_type_class_neighbors <- get_neighbors(neighborhood$first_type_class_fist_element_second_type_class) 132 | neighborhood$second_type_class_neighbors_N <- sapply(neighborhood$second_type_class_neighbors, length) 133 | 134 | # FIRST type neighbors: indices where neighbors are annotated to the main first type 135 | neighborhood$first_type_neighbors <- get_neighbors(neighborhood$first_type) 136 | neighborhood$first_type_neighbors_N <- sapply(neighborhood$first_type_neighbors, length) 137 | 138 | # FIRST type singlet neighbors: neighbors annotated to the main cell type and are singlets 139 | neighborhood$first_type_singlets_neighbors <- sapply( 140 | seq(nrow(neighborhood$first_type)), 141 | FUN = function(i){ 142 | x = neighborhood$first_type[i,] 143 | y = neighborhood$spot_class[i,] 144 | which((x[-1] == x[1]) & (y[-1] == "singlet")) + 1 145 | }) 146 | neighborhood$first_type_singlets_neighbors_N <- sapply(neighborhood$first_type_singlets_neighbors, length) 147 | 148 | # Compute infiltration of first type CLASS in the neighborhood 149 | neighborhood$first_type_class_neighbors <- get_neighbors(neighborhood$first_type_class) 150 | neighborhood$first_type_class_neighbors_N <- sapply(neighborhood$first_type_class_neighbors, length) 151 | 152 | # SAME SECOND type neighbors: indices where neighbors are annotated to the same second type 153 | neighborhood$same_second_type_neighbors <- get_neighbors(neighborhood$second_type) 154 | neighborhood$same_second_type_neighbors_N <- sapply(neighborhood$same_second_type_neighbors, length) 155 | 156 | return(neighborhood) 157 | } 158 | 159 | 160 | #' Add Neighborhood Weight Composition 161 | #' 162 | #' This function computes the spatial composition of a neighborhood by calculating the weights of different cell types. 163 | #' It adds the computed composition as a new column (`neighborhood_weight_composition`) to the input `neighborhood` object. 164 | #' 165 | #' @param neighborhood A list containing information about the neighborhood, including `nn_idx` (indices of neighbors), 166 | #' `first_type` and `second_type` (cell types of the neighbors), and `weight_first_type` and `weight_second_type` (weights of the corresponding cell types). 167 | #' 168 | #' @return A modified `neighborhood` object with an added column `neighborhood_weight_composition`, which is a matrix where each row corresponds 169 | #' to the vector of cell-type weights for a neighborhood. 170 | #' 171 | #' @importFrom dplyr %>% 172 | #' @export 173 | #' 174 | add_neighborhood_weight_composition <- function( 175 | neighborhood 176 | ){ 177 | 178 | # Neighborhood composition 179 | compute_neighborhood_weight_composition <- function( 180 | nbhd 181 | ){ 182 | composition <- rep(0, length(cell_types)) 183 | names(composition) <- cell_types 184 | nbhd <- nbhd[!is.na(neighborhood$first_type[nbhd, 1])] # exclude cells not annotated by RCTD 185 | nbhd <- nbhd[-1] 186 | 187 | if(length(nbhd) == 0) 188 | return(composition) 189 | 190 | for(i in nbhd){ 191 | ft <- neighborhood$first_type[i, 1] 192 | st <- neighborhood$second_type[i, 1] 193 | w1 <- neighborhood$weight_first_type[i, 1] 194 | w2 <- neighborhood$weight_second_type[i, 1] 195 | if(!is.na(st)){ 196 | composition[ft] <- composition[ft] + w1 197 | composition[st] <- composition[st] + w2 198 | } else { 199 | composition[ft] <- composition[ft] + 1 200 | } 201 | } 202 | #composition <- unname(composition) 203 | return(composition/sum(composition)) 204 | } 205 | 206 | # Extract unique cell types 207 | cell_types <- neighborhood$first_type %>% as.vector() %>% unique() %>% sort() 208 | # Store `neighborhood_weight_composition` -- the global weight composition of the neighborhood 209 | neighborhood$neighborhood_weight_composition <- apply(neighborhood$nn_idx, 1, compute_neighborhood_weight_composition, simplify = T) %>% t() 210 | 211 | return(neighborhood) 212 | } 213 | 214 | 215 | #' Add Neighborhood Weights for Cell Types 216 | #' 217 | #' This function calculates the weights of the first and second cell types for each neighborhood based on the neighborhood composition. 218 | #' It assigns a weight of 0 to the second cell type if it is `NA`. 219 | #' 220 | #' @param neighborhood A list containing the neighborhood data, including: 221 | #' - `neighborhood_weight_composition`: A matrix where each row corresponds to a neighborhood, and columns represent cell types with their respective weights. 222 | #' - `first_type`: A matrix or vector indicating the first cell type for each neighborhood. 223 | #' - `second_type`: A matrix or vector indicating the second cell type for each neighborhood (can contain `NA` values). 224 | #' 225 | #' @return The input `neighborhood` object with an additional element `neighborhood_weights`, 226 | #' a matrix of dimensions `nrow(neighborhood$neighborhood_weight_composition) x 2`. 227 | #' Each row contains the weights of the `first_type` and `second_type` for the corresponding neighborhood. 228 | #' 229 | #' @export 230 | add_cell_types_neighborhood_weights <- function( 231 | neighborhood 232 | ){ 233 | 234 | neighborhood_weights <- sapply( 235 | 1:nrow(neighborhood$neighborhood_weight_composition), 236 | FUN = function(i){ 237 | ft <- neighborhood$first_type[i,1] 238 | st <- neighborhood$second_type[i,1] 239 | 240 | if(!is.na(st)){ 241 | neighborhood_weights <- c(neighborhood$neighborhood_weight_composition[i,ft], neighborhood$neighborhood_weight_composition[i, st]) 242 | } else{ 243 | if(!is.na(ft)){ 244 | neighborhood_weights <- c(neighborhood$neighborhood_weight_composition[i,ft], 0) 245 | } else { 246 | neighborhood_weights <- c(NA, NA) 247 | } 248 | } 249 | } 250 | ) 251 | neighborhood_weights <- neighborhood_weights %>% t() 252 | colnames(neighborhood_weights) <- c("first_type", "second_type") 253 | 254 | neighborhood$neighborhood_weights_first_type <- neighborhood_weights[, "first_type"] 255 | neighborhood$neighborhood_weights_second_type <- neighborhood_weights[, "second_type"] 256 | return(neighborhood) 257 | } 258 | 259 | 260 | #' Add Weights for First Type of Second Type Neighbors 261 | #' 262 | #' This function calculates the weight of the first cell type (`w1`) for each second-type neighbor in a neighborhood. 263 | #' It adds this information to the `neighborhood` object as a new element. 264 | #' 265 | #' @param neighborhood A list containing the neighborhood data, including: 266 | #' - `second_type_neighbors`: A list where each element contains indices of neighbors corresponding to the second type. 267 | #' - `weight_first_type`: A matrix where rows correspond to neighborhoods, and columns represent weights of the first type for each neighbor. 268 | #' 269 | #' @return The input `neighborhood` object with an additional element, 270 | #' - `w1_second_type_in_neighborhood`, which is a vector of weights for the first cell type 271 | #' corresponding to each second-type neighbor. 272 | #' - `sum_w1_second_type_in_neighborhood` : A vector with sum of `w1_second_type_in_neighborhood` 273 | #' 274 | #' @export 275 | 276 | add_individual_w1_of_second_type_in_neighborhood <- function( 277 | neighborhood 278 | ){ 279 | neighborhood$w1_second_type_in_neighborhood <- sapply(1:length(neighborhood$second_type_neighbors), function(i){ 280 | neighborhood$weight_first_type[i, neighborhood$second_type_neighbors[[i]]] 281 | }) 282 | 283 | neighborhood$sum_w1_second_type_in_neighborhood <- lapply(neighborhood$w1_second_type_in_neighborhood, sum) %>% unlist() 284 | 285 | return(neighborhood) 286 | } 287 | 288 | # Functions to retrieve from `xenium_analysis_pipeline` project 289 | 290 | # add scores: max w1 as second type 291 | #' Add Weights of Second Cell Type's Second Annotation in the Neighborhood 292 | #' 293 | #' This function calculates the weights associated with the second type's second annotation 294 | #' (w2) for neighbors of the same second cell type within a spatial neighborhood. 295 | #' 296 | #' @param neighborhood A list containing spatial neighborhood data. This list must include: 297 | #' - `same_second_type_neighbors`: Indices of neighbors annotated to the same second cell type. 298 | #' - `weight_second_type`: A matrix of weights for the second cell type annotations. 299 | #' 300 | #' @return The modified `neighborhood` list with an added field: 301 | #' - `w2_second_type_in_neighborhood`: A list where each entry contains the `w2` weights 302 | #' of neighbors annotated to the same second cell type for each cell. 303 | #' - `sum_w2_second_type_in_neighborhood` : A vector with sum of `w2_second_type_in_neighborhood` 304 | #' 305 | #' @export 306 | add_individual_w2_of_second_type_in_neighborhood <- function( 307 | neighborhood 308 | ){ 309 | neighborhood$w2_second_type_in_neighborhood <- sapply(1:length(neighborhood$same_second_type_neighbors), function(i){ 310 | neighborhood$weight_second_type[i, neighborhood$same_second_type_neighbors[[i]]] 311 | }) 312 | 313 | neighborhood$sum_w2_second_type_in_neighborhood <- lapply(neighborhood$w2_second_type_in_neighborhood, sum) %>% unlist() 314 | 315 | return(neighborhood) 316 | } 317 | 318 | #' Add Maximum Weight of Spilling Cell Type in the Neighborhood 319 | #' 320 | #' This function computes the maximum weight of a spilling cell type 321 | #' (i.e., the largest value among `w1` and `w2` weights for neighbors of the same second cell type) 322 | #' within a spatial neighborhood. 323 | #' 324 | #' @param neighborhood A list containing spatial neighborhood data. This list must include: 325 | #' - `same_second_type_neighbors`: Indices of neighbors annotated to the same second cell type. 326 | #' - `w1_second_type_in_neighborhood`: A list of `w1` weights for the same second cell type neighbors. 327 | #' - `w2_second_type_in_neighborhood`: A list of `w2` weights for the same second cell type neighbors. 328 | #' 329 | #' @return The modified `neighborhood` list with an added field: 330 | #' - `max_weight_of_spilling_type_in_neighborhood`: A numeric vector containing the maximum 331 | #' weight of the spilling cell type for each cell in the neighborhood. 332 | #' 333 | #' @export 334 | add_max_weight_of_spilling_type_in_neighborhood <- function( 335 | neighborhood 336 | ){ 337 | N_cells <- length(neighborhood$same_second_type_neighbors) 338 | neighborhood$max_weight_of_spilling_type_in_neighborhood <- sapply(1:N_cells, function(i){ 339 | max(0, neighborhood$w1_second_type_in_neighborhood[[i]], neighborhood$w2_second_type_in_neighborhood[[i]]) 340 | }) 341 | return(neighborhood) 342 | } 343 | 344 | 345 | #' Add Neighborhood Weight Composition of Spilling Cell Type 346 | #' 347 | #' This function computes and adds several metrics related to the weight composition of spilling 348 | #' cell types within a spatial neighborhood. It integrates multiple helper functions to calculate 349 | #' neighborhood composition, cell type weights, and the maximum weight of spilling cell types. 350 | #' 351 | #' @param neighborhood A list containing spatial neighborhood data. The `neighborhood` list 352 | #' should include the required fields for the following computations: 353 | #' - `add_neighborhood_weight_composition`: Calculates neighborhood cell-type weight composition. 354 | #' - `add_cell_types_neighborhood_weights`: Extracts weights for the first and second cell types. 355 | #' - `add_individual_w1_of_second_type_in_neighborhood`: Extracts individual `w1` weights of 356 | #' the second type neighbors. 357 | #' - `add_individual_w2_of_second_type_in_neighborhood`: Extracts individual `w2` weights of 358 | #' the second type neighbors. 359 | #' - `add_max_weight_of_spilling_type_in_neighborhood`: Computes the maximum weight of spilling 360 | #' cell types in the neighborhood. 361 | #' 362 | #' @return The modified `neighborhood` list, augmented with additional metrics: 363 | #' - `neighborhood_weight_composition`: Cell-type weight composition in the neighborhood. 364 | #' - `neighborhood_weights`: Weights of the first and second cell types. 365 | #' - `w1_second_type_in_neighborhood`: List of `w1` weights for the second cell type neighbors. 366 | #' - `w2_second_type_in_neighborhood`: List of `w2` weights for the second cell type neighbors. 367 | #' - `max_weight_of_spilling_type_in_neighborhood`: Maximum weight of spilling cell types in 368 | #' the neighborhood. 369 | #' 370 | #' @seealso 371 | #' - \code{\link{add_neighborhood_weight_composition}} 372 | #' - \code{\link{add_cell_types_neighborhood_weights}} 373 | #' - \code{\link{add_individual_w1_of_second_type_in_neighborhood}} 374 | #' - \code{\link{add_individual_w2_of_second_type_in_neighborhood}} 375 | #' - \code{\link{add_max_weight_of_spilling_type_in_neighborhood}} 376 | #' 377 | #' @export 378 | 379 | add_neigborhood_weight_composition_of_spilling_cell_type <- function( 380 | neighborhood 381 | ){ 382 | neighborhood <- add_neighborhood_weight_composition(neighborhood = neighborhood) 383 | neighborhood <- add_cell_types_neighborhood_weights(neighborhood = neighborhood) 384 | neighborhood <- add_individual_w1_of_second_type_in_neighborhood(neighborhood = neighborhood) 385 | neighborhood <- add_individual_w2_of_second_type_in_neighborhood(neighborhood = neighborhood) 386 | neighborhood <- add_max_weight_of_spilling_type_in_neighborhood(neighborhood = neighborhood) 387 | return(neighborhood) 388 | } 389 | 390 | #' Add Neighborhood nCount Metrics for Spilling Cell Types 391 | #' 392 | #' This function calculates the sum of `nCount` values for cells in a spatial neighborhood, 393 | #' as well as the sum of `nCount` values specifically for cells of the spilling cell type 394 | #' within each neighborhood. 395 | #' 396 | #' @param neighborhood A list containing spatial neighborhood data. The list should include: 397 | #' \itemize{ 398 | #' \item{\code{nn_idx}}{A matrix of indices representing neighborhood relationships. The first column represents the focal cell, and subsequent columns represent its neighbors.} 399 | #' \item{\code{second_type_neighbors}}{A list where each element contains indices of neighbors annotated to the second cell type for the corresponding focal cell.} 400 | #' \item{\code{nCount}}{A list of numeric vectors where each vector contains `nCount` values for cells in the dataset.} 401 | #' } 402 | #' 403 | #' @return A modified version of the input \code{neighborhood} list with the following additional fields: 404 | #' \itemize{ 405 | #' \item{\code{sum_nCount_neighborhood}}{A numeric vector containing the sum of `nCount` values for all neighbors of each focal cell.} 406 | #' \item{\code{sum_nCount_neighborhood_spilling_type}}{A numeric vector containing the sum of `nCount` values for neighbors of each focal cell that belong to the spilling cell type.} 407 | #' } 408 | #' 409 | #' @export 410 | 411 | add_neigborhood_nCount_of_spilling_cell_type <- function( 412 | neighborhood 413 | ){ 414 | N_cells <- length(neighborhood$second_type_neighbors) 415 | neighborhood$sum_nCount_neighborhood <- sapply(1:N_cells, function(i){ 416 | sum(neighborhood$nCount[i, -1]) 417 | }) 418 | 419 | neighborhood$sum_nCount_neighborhood_spilling_type <- sapply(1:N_cells, function(i){ 420 | sum(neighborhood$nCount[i, neighborhood$second_type_neighbors[[i]]]) 421 | }) 422 | 423 | return(neighborhood) 424 | } 425 | 426 | 427 | #' Add Spatial Metrics to Neighborhood 428 | #' 429 | #' This function enhances a spatial neighborhood structure by integrating RCTD results and adding various spatial metrics. 430 | #' It incorporates infiltration metrics, weight compositions, and neighborhood count metrics, facilitating in-depth spatial analysis. 431 | #' 432 | #' @param spatial_neighborhood A list representing the spatial neighborhood data. It should include fields required for neighborhood and infiltration analysis. 433 | #' @param rctd An RCTD object containing cell type decomposition results. 434 | #' 435 | #' @return A modified spatial neighborhood list enriched with additional metrics, including: 436 | #' \itemize{ 437 | #' \item Infiltration metrics. 438 | #' \item Weight composition of spilling cell types. 439 | #' \item Sum of \code{nCount} values for neighborhood and spilling cell types. 440 | #' } 441 | #' 442 | #' @details This function sequentially: 443 | #' \enumerate{ 444 | #' \item Integrates RCTD results into the spatial neighborhood. 445 | #' \item Calculates infiltration metrics based on cell type and class. 446 | #' \item Computes the weight composition for spilling cell types. 447 | #' \item Adds neighborhood \code{nCount} metrics for spilling cell types. 448 | #' } 449 | #' 450 | #' @seealso 451 | #' \itemize{ 452 | #' \item \code{\link{add_rctd_to_neighborhood}}: Integrates RCTD results into the neighborhood. 453 | #' \item \code{\link{add_infiltration_metrics_to_neighborhood}}: Computes infiltration metrics. 454 | #' \item \code{\link{add_neigborhood_weight_composition_of_spilling_cell_type}}: Adds weight composition for spilling cell types. 455 | #' \item \code{\link{add_neigborhood_nCount_of_spilling_cell_type}}: Computes neighborhood \code{nCount} metrics. 456 | #' } 457 | #' 458 | #' @export 459 | #' 460 | 461 | 462 | add_spatial_metric <- function(spatial_neighborhood, rctd){ 463 | 464 | spatial_neighborhood <- add_rctd_to_neighborhood(graph = spatial_neighborhood, rctd = rctd) 465 | spatial_neighborhood <- add_infiltration_metrics_to_neighborhood(neighborhood = spatial_neighborhood) 466 | spatial_neighborhood <- add_neigborhood_weight_composition_of_spilling_cell_type(neighborhood = spatial_neighborhood) 467 | spatial_neighborhood <- add_neigborhood_nCount_of_spilling_cell_type(neighborhood = spatial_neighborhood) 468 | return(spatial_neighborhood) 469 | } 470 | 471 | 472 | #' Annotate Transcriptomics Neighborhood with Dominant Cell Types 473 | #' 474 | #' This function annotates each neighborhood in a transcriptomics dataset by determining 475 | #' the most frequent cell type (or class) for each neighborhood category (`first_type` and `second_type`). 476 | #' 477 | #' @param neighborhood A data frame or list containing neighborhood-level transcriptomics data. 478 | #' It must contain `first_type`, `second_type`, and optionally `first_type_class` and `second_type_class`. 479 | #' 480 | #' @return The updated `neighborhood` object with the following added columns: 481 | #' \itemize{ 482 | #' \item `first_type_neighborhood`: The most frequent first type in the neighborhood. 483 | #' \item `first_type_class_neighborhood`: The most frequent first type class in the neighborhood (if available). 484 | #' \item `second_type_neighborhood`: The most frequent second type in the neighborhood. 485 | #' \item `second_type_class_neighborhood`: The most frequent second type class in the neighborhood (if available). 486 | #' } 487 | #' 488 | #' @details The function applies a helper function `find_most_frequent()` that determines 489 | #' the most frequently occurring value in a given set of neighborhood annotations. If the `first_type_class` 490 | #' or `second_type_class` column is missing, a warning is issued, and the respective computation is skipped. 491 | #' 492 | #' @export 493 | #' 494 | add_annotation_from_neighbors <- function(neighborhood) { 495 | 496 | # Helper function to determine the most frequent annotation in a neighborhood 497 | find_most_frequent <- function(values) { 498 | value_counts <- values[-1] 499 | most_frequent <- table(value_counts, useNA = "ifany") %>% which.max %>% names %>% .[1] 500 | return(most_frequent) 501 | } 502 | 503 | # Annotate first type neighborhood 504 | neighborhood$first_type_neighborhood <- apply( 505 | neighborhood$first_type, 506 | 1, 507 | find_most_frequent 508 | ) 509 | 510 | neighborhood$first_type_neighborhood_agreement <- neighborhood$first_type[,1] == neighborhood$first_type_neighborhood 511 | 512 | # Annotate first type class neighborhood if available 513 | if ("first_type_class" %in% names(neighborhood)) { 514 | neighborhood$first_type_class_neighborhood <- apply( 515 | neighborhood$first_type_class, 516 | 1, 517 | find_most_frequent 518 | ) 519 | 520 | neighborhood$first_type_class_neighborhood_agreement <- neighborhood$first_type_class[,1] == neighborhood$first_type_class_neighborhood 521 | 522 | } else { 523 | warning("`first_type_class` does not exist in the neighborhood object, `first_type_class_neighborhood` was not computed") 524 | } 525 | 526 | # Annotate second type neighborhood 527 | neighborhood$second_type_neighborhood <- apply( 528 | neighborhood$second_type, 529 | 1, 530 | find_most_frequent 531 | ) 532 | 533 | neighborhood$second_type_neighborhood_agreement <- neighborhood$second_type[,1] == neighborhood$second_type_neighborhood 534 | 535 | 536 | # Annotate second type class neighborhood if available 537 | if ("second_type_class" %in% names(neighborhood)) { 538 | neighborhood$second_type_class_neighborhood <- apply( 539 | neighborhood$second_type_class, 540 | 1, 541 | find_most_frequent 542 | ) 543 | 544 | neighborhood$second_type_class_neighborhood_agreement <- neighborhood$second_type_class[,1] == neighborhood$second_type_class_neighborhood 545 | 546 | } else { 547 | warning("`second_type_class` does not exist in the neighborhood object, `second_type_class_neighborhood` was not computed") 548 | } 549 | 550 | return(neighborhood) 551 | } 552 | 553 | 554 | #' Annotate Neighborhood with Certainty Scores 555 | #' 556 | #' This function calculates a certainty score for each neighborhood based on 557 | #' the entropy of its cell type composition. The certainty score ranges from 0 558 | #' (high entropy, diverse cell types) to 1 (low entropy, dominated by a single cell type). 559 | #' 560 | #' @param neighborhood 561 | #' A data frame or list containing neighborhood-level cell type data. It must 562 | #' contain `first_type` and `second_type`, and optionally 563 | #' `first_type_class` and `second_type_class`. 564 | #' 565 | #' @return 566 | #' The updated `neighborhood` object with the following additional columns: 567 | #' \itemize{ 568 | #' \item `first_type_neighborhood_certainty`: Certainty score for the first type. 569 | #' \item `first_type_class_neighborhood_certainty`: Certainty score for the first type class (if available). 570 | #' \item `second_type_neighborhood_certainty`: Certainty score for the second type. 571 | #' \item `second_type_class_neighborhood_certainty`: Certainty score for the second type class (if available). 572 | #' } 573 | #' 574 | #' @details 575 | #' Certainty is computed as: 576 | #' \deqn{1 - \frac{H}{\log(N)}}{ 577 | #' 1 - entropy / log(N) 578 | #' } 579 | #' where \eqn{H} is the entropy of the neighborhood's cell type distribution, 580 | #' and \eqn{N} is the total number of unique cell types. A higher score 581 | #' indicates a more homogenous neighborhood. 582 | #' 583 | #' If `first_type_class` or `second_type_class` is missing, a warning is issued 584 | #' and the respective certainty score is not computed. 585 | #' 586 | #' @importFrom entropy entropy 587 | #' @export 588 | #' 589 | 590 | add_neighborhood_annotation_certainty <- function( 591 | neighborhood 592 | ) { 593 | 594 | # Helper function to compute normalized certainty score based on entropy 595 | neighborhood_normalized_certainty <- function(x) { 596 | y <- x[-1] 597 | fr <- table(y) %>% as.numeric() 598 | res <- 1 - entropy::entropy(fr) / log(length(y)) 599 | # More accurate denominator: log(min(length(y), N_cell_types)) 600 | return(res) 601 | } 602 | 603 | # Compute certainty scores for first type neighborhoods 604 | neighborhood$first_type_neighborhood_certainty <- apply( 605 | neighborhood$first_type, 606 | 1, 607 | neighborhood_normalized_certainty 608 | ) 609 | 610 | # Compute certainty scores for first type class neighborhoods if available 611 | if ("first_type_class" %in% names(neighborhood)) { 612 | neighborhood$first_type_class_neighborhood_certainty <- apply( 613 | neighborhood$first_type_class, 614 | 1, 615 | neighborhood_normalized_certainty 616 | ) 617 | } else { 618 | warning("`first_type_class` does not exist in the neighborhood object, `first_type_class_neighborhood_certainty` was not computed") 619 | } 620 | 621 | # Compute certainty scores for second type neighborhoods 622 | neighborhood$second_type_neighborhood_certainty <- apply( 623 | neighborhood$second_type, 624 | 1, 625 | FUN = neighborhood_normalized_certainty 626 | ) 627 | 628 | # Compute certainty scores for second type class neighborhoods if available 629 | if ("second_type_class" %in% names(neighborhood)) { 630 | neighborhood$second_type_class_neighborhood_certainty <- apply( 631 | neighborhood$second_type_class, 632 | 1, 633 | FUN = neighborhood_normalized_certainty 634 | ) 635 | } else { 636 | warning("`second_type_class` does not exist in the neighborhood object, `second_type_class_neighborhood_certainty` was not computed") 637 | } 638 | 639 | return(neighborhood) 640 | } 641 | 642 | 643 | 644 | #' Add Transcriptomics-Based Metrics to Neighborhood Data 645 | #' 646 | #' This function enriches a transcriptomics neighborhood dataset by incorporating 647 | #' additional metrics from an `rctd` object, propagating annotations from neighboring cells, 648 | #' and computing neighborhood annotation certainty scores. 649 | #' 650 | #' @param transcriptomics_neighborhood 651 | #' A data frame or list representing the transcriptomics neighborhood, 652 | #' containing spatial relationships between cells. 653 | #' @param rctd 654 | #' An object containing **RCTD** (Robust Cell Type Decomposition) results, 655 | #' which provide cell type compositions inferred from spatial transcriptomics data. 656 | #' 657 | #' @return 658 | #' The updated `transcriptomics_neighborhood` object with additional transcriptomics-based metrics: 659 | #' \itemize{ 660 | #' \item **RCTD-derived cell type information** integrated into the neighborhood. 661 | #' \item **Propagated annotations from neighboring cells.** 662 | #' \item **Certainty scores** quantifying the confidence of neighborhood-level annotations. 663 | #' } 664 | #' 665 | #' @details 666 | #' The function performs the following steps: 667 | #' \enumerate{ 668 | #' \item **Integrates RCTD data** into the transcriptomics neighborhood using `add_rctd_to_neighborhood()`. 669 | #' \item **Propagates annotations from neighboring cells** using `add_annotation_from_neighbors()`. 670 | #' \item **Computes neighborhood annotation certainty** using `add_neighborhood_annotation_certainty()`. 671 | #' } 672 | #' 673 | #' @seealso 674 | #' \code{\link{add_rctd_to_neighborhood}}, 675 | #' \code{\link{add_annotation_from_neighbors}}, 676 | #' \code{\link{add_neighborhood_annotation_certainty}} 677 | #' 678 | #' @export 679 | #' 680 | 681 | add_transcriptomics_metric <- function(transcriptomics_neighborhood, rctd) { 682 | 683 | # Add RCTD data to the transcriptomics neighborhood 684 | transcriptomics_neighborhood <- add_rctd_to_neighborhood( 685 | graph = transcriptomics_neighborhood, 686 | rctd = rctd 687 | ) 688 | 689 | # Propagate annotations from neighboring cells 690 | transcriptomics_neighborhood <- add_annotation_from_neighbors( 691 | neighborhood = transcriptomics_neighborhood 692 | ) 693 | 694 | # Compute neighborhood annotation certainty 695 | transcriptomics_neighborhood <- add_neighborhood_annotation_certainty( 696 | neighborhood = transcriptomics_neighborhood 697 | ) 698 | 699 | return(transcriptomics_neighborhood) 700 | } 701 | 702 | 703 | 704 | #' Convert Neighborhood Analysis to Metadata 705 | #' 706 | #' This function extracts specific elements from a neighborhood list and 707 | #' converts them into a data frame format suitable for further analysis or 708 | #' integration with metadata. It processes variables that are either vectors 709 | #' or matrices, retaining relevant columns and applying necessary transformations. 710 | #' 711 | #' @param neighborhood A list containing neighborhood data, with various 712 | #' variables, including vectors and matrices. 713 | #' 714 | #' @return A data frame with the following columns: 715 | #' - Variables from the input `neighborhood` list that are either vectors 716 | #' or matrices, with matrices reduced to their first column. 717 | #' - The data frame is indexed by the `cell_id` column from the input list. 718 | #' 719 | #' @details 720 | #' - The function checks for the class of each element in the `neighborhood` 721 | #' list and processes vectors and matrices. 722 | #' - For matrices, only the first column is kept. 723 | #' - A data frame is created from the selected variables, using `cell_id` 724 | #' as the row names. 725 | #' 726 | #' @export 727 | #' 728 | 729 | neighborhood_analysis_to_metadata <- function( 730 | neighborhood 731 | ){ 732 | var_names <- names(neighborhood) 733 | var_class <- lapply(neighborhood, function(x){class(x)[1]}) %>% unlist() 734 | is_vct <- var_class[which(!var_class %in% c("matrix", "list", "igraph"))] %>% names() 735 | 736 | variables_to_keep <- c(is_vct) 737 | neighborhood_df <- neighborhood[variables_to_keep] 738 | neighborhood_df <- as.data.frame(neighborhood_df, row.names = neighborhood_df$cell_id) 739 | 740 | return(neighborhood_df) 741 | 742 | } 743 | -------------------------------------------------------------------------------- /R/post_RCTD.R: -------------------------------------------------------------------------------- 1 | #' Updates `score_mat` of the `Run.RCTD` output 2 | #' 3 | #' Adds `singlet_scores` as a diagonal and removes cell types with a low weight in full cell-type decomposition. 4 | #' 5 | #' This function modifies the `score_mat` by adding `singlet_scores` as a diagonal matrix and removes cell types 6 | #' from the candidate list if their corresponding weight is below a user-defined threshold. The updated 7 | #' `score_mat` and `singlet_scores` are saved in the `rctd` object. 8 | #' 9 | #' @param rctd An object resulting from \link[spacerx]{Run.RCTD}. 10 | #' @param min_weight A threshold (numeric) to keep cell types as candidates. Cell types with a weight below this 11 | #' threshold are removed from the `score_mat`. Default is 0.01, which is the same as in `Run.RCTD`. 12 | #' @param verbose Logical. If `TRUE`, the function will print messages about removed low-weight cell types. 13 | #' Default is `FALSE`. 14 | #' 15 | #' @return An updated `rctd` object with modified `score_mat` and `singlet_scores`. 16 | #' 17 | 18 | 19 | update_score_mat_RCTD <- function( 20 | rctd, 21 | min_weight = .01, # same as in run.RCTD, use higher values to remove non-relevant candidates 22 | verbose = FALSE, 23 | BPPARAM = bpparam(), 24 | n_workers = NULL 25 | ){ 26 | score_mat <- rctd@results$score_mat 27 | weights <- rctd@results$weights 28 | singlet_scores <- rctd@results$singlet_scores 29 | 30 | # update score_mat by adding `singlet_scores` as a diagonal and then removing 31 | # candidate cell types that do not have sufficient weight 32 | 33 | cell_types <- colnames(weights) 34 | 35 | if (is.null(n_workers)) { 36 | n_workers <- multicoreWorkers() - 1 37 | } 38 | param <- MulticoreParam(workers = n_workers) 39 | 40 | result_list <- bplapply(seq_along(score_mat), function(i) { 41 | smat <- as.matrix(score_mat[[i]]) 42 | diag(smat) <- singlet_scores[[i]] 43 | 44 | new_candidates <- cell_types[weights[i, ] > min_weight] 45 | cur_ct <- colnames(smat) 46 | keep_ct <- intersect(cur_ct, new_candidates) 47 | 48 | smat_sub <- smat[keep_ct, keep_ct, drop = FALSE] 49 | sscore_sub <- singlet_scores[[i]][keep_ct] 50 | 51 | list(score_mat = smat_sub, singlet_scores = sscore_sub) 52 | }, BPPARAM = BPPARAM) 53 | 54 | score_mat_xe <- lapply(result_list, function(x) x$score_mat) 55 | singlet_scores_xe <- lapply(result_list, function(x) x$singlet_scores) 56 | 57 | rctd@results$score_mat_xe <- score_mat_xe 58 | rctd@results$singlet_scores_xe <- singlet_scores_xe 59 | return(rctd) 60 | } 61 | 62 | #' Correct First Type in Very Confident Singlets 63 | #' 64 | #' This function updates the `first_type` and `spot_class` fields in the results of an RCTD object for cells classified as highly confident singlets. It ensures accurate cell-type annotations for singlets with only one candidate cell type and updates related fields accordingly. 65 | #' 66 | #' @param rctd An `RCTD` object containing spatial transcriptomics annotation results. The object must have the `singlet_scores_xe` field in `rctd@results`. Run `update_score_mat()` prior to using this function if the field is missing. 67 | #' 68 | #' @return An updated `RCTD` object with the following fields modified in `rctd@results$results_df_xe`: 69 | #' \itemize{ 70 | #' \item \code{first_type}: Updated for confident singlets to reflect correct annotation. 71 | #' \item \code{second_type}: Set to \code{NA} for confident singlets. 72 | #' \item \code{spot_class}: Updated to "singlet" for highly confident cells. 73 | #' \item \code{max_doublet_weight}: Added to capture the maximum weight for doublets. 74 | #' \item \code{n_candidates}: Number of candidate cell types for each spot. 75 | #' \item \code{rctd_weights_entropy}: Entropy of rctd weights in full cell-type decomposition. 76 | #' 77 | #' } 78 | #' 79 | #' @details 80 | #' The function identifies cells with only one candidate cell type (i.e., highly confident singlets). It corrects the `first_type` assignment, removes the `second_type` (assignes to NA), and updates the `spot_class` to "singlet." This ensures proper classification of confident cells while leaving other cells unchanged. 81 | #' 82 | #' @examples 83 | #' \dontrun{ 84 | #' rctd <- update_score_mat(rctd) 85 | #' rctd <- correct_singlets(rctd) 86 | #' } 87 | #' 88 | #' @import dplyr 89 | 90 | 91 | correct_singlets <- function( 92 | rctd 93 | ){ 94 | if(!("singlet_scores_xe" %in% names(rctd@results))) 95 | stop("No `singlet_scores_xe` field in rctd@results, run `update_score_mat()` first!") 96 | 97 | if("scond_type" %in% colnames(rctd@results$results_df)){ 98 | rctd@results$results_df$second_type <- rctd@results$results_df$scond_type 99 | } 100 | df <- rctd@results$results_df 101 | 102 | # select cells with only one candidate cell type 103 | len_mat <- lapply(rctd@results$singlet_scores_xe, FUN = length) %>% as.numeric() # number of cell type candidates (dont use singlet score, as that one has not been updated) 104 | confident_singlet_idx <- which(len_mat == 1) 105 | no_cell_type_idx <- which(len_mat == 0) 106 | rejects_idx <- which(rctd@results$results_df$spot_class == "reject") 107 | 108 | #names(rctd@results$singlet_scores_xe) <- rctd@results$results_df %>% rownames() 109 | 110 | #get their singlet scores 111 | argmin_singlet_score <- sapply(rctd@results$singlet_scores_xe, 112 | function(x){ 113 | r <- x %>% which.min() %>% names() 114 | if(is.null(r)) 115 | r <- NA 116 | return(r) 117 | }, simplify = T) %>% unlist() 118 | 119 | # update first type to make sure highly confined cells have correct annotation 120 | first_type_updated <- rctd@results$results_df$first_type 121 | first_type_updated[confident_singlet_idx] <- argmin_singlet_score[confident_singlet_idx] 122 | first_type_updated[no_cell_type_idx] <- NA_character_ 123 | 124 | # replace second type with NA as now RCTD assigns it to a random cell type (usually the first of the available cell types) 125 | second_type_updated <- rctd@results$results_df$second_type 126 | second_type_updated[confident_singlet_idx] <- NA_character_ 127 | second_type_updated[no_cell_type_idx] <- NA_character_ 128 | 129 | # update spot class of highly confident cells to singlets 130 | spot_class_upd <- rctd@results$results_df$spot_class 131 | spot_class_upd[setdiff(confident_singlet_idx, rejects_idx)] <- "singlet" 132 | 133 | spot_class_upd[no_cell_type_idx] <- "reject" 134 | spot_class_upd <- ordered(spot_class_upd, levels = c("reject", "doublet_uncertain", "doublet_certain", "singlet")) 135 | max_doublet_weight <- apply(rctd@results$weights_doublet, 1, max) %>% unname() 136 | 137 | # compute entropy of rctd weight score 138 | weights <- rctd@results$weights 139 | weights[weights<0] <- 0 140 | weights <- spacexr::normalize_weights(weights) 141 | weights_entr <- apply(weights, 1, entropy::entropy) %>% unname() 142 | 143 | df <- rctd@results$results_df %>% 144 | mutate( 145 | first_type = first_type_updated, 146 | second_type = second_type_updated, 147 | max_doublet_weight = max_doublet_weight, 148 | spot_class = spot_class_upd, 149 | n_candidates = len_mat, 150 | rctd_weights_entropy = weights_entr 151 | ) 152 | 153 | rctd@results$results_df_xe <- df 154 | return(rctd) 155 | } 156 | 157 | 158 | #' Update scores in RCTD results data frame 159 | #' 160 | #' This function updates existing scores in `rctd@results$results_df_xe`, computes new scores, and adds flags to facilitate further analysis. The function integrates additional metadata, refines singlet scores for cell type annotations, computes score differences, and checks for relationships between cell types. 161 | #' 162 | #' @param rctd An `RCTD` object containing spatial transcriptomics data and results. 163 | #' 164 | #' @return An updated `RCTD` object with modified and new fields in `rctd@results$results_df_xe`, including: 165 | #' \itemize{ 166 | #' \item \code{singlet_score_first}: Singlet score for the cell type assigned to `first_type`. 167 | #' \item \code{singlet_score_second}: Singlet score for the cell type assigned to `second_type` (if applicable). 168 | #' \item \code{delta_singlet_score_first_second}: Difference between \code{singlet_score_first} and \code{singlet_score_second}. 169 | #' \item \code{score_diff}: Difference between \code{singlet_score_first} and \code{min_score}. 170 | #' \item \code{delta_singlet_score}: Difference between the smallest and second smallest singlet scores. 171 | #' \item \code{delta_singlet_score_class}: Similar to \code{delta_singlet_score}, but excludes scores from the same class as \code{first_type}. 172 | #' \item \code{weight_first_type}: Weight for the cell type assigned to `first_type`. 173 | #' \item \code{weight_second_type}: Weight for the cell type assigned to `second_type` (if applicable). 174 | #' \item \code{same_class}: Logical flag indicating whether `first_type` and `second_type` belong to the same class. 175 | #' } 176 | #' 177 | #' @details 178 | #' The function refines singlet scores for `first_type` and `second_type`, computes score differences, and introduces new metrics such as `delta_singlet_score` and `delta_singlet_score_class`. These metrics help distinguish between highly confident cell types and ambiguous cases. Additionally, the function evaluates whether `first_type` and `second_type` belong to the same class, using an internal class mapping. 179 | #' 180 | 181 | update_scores_RCTD <- function(rctd, lite = TRUE){ 182 | 183 | df <- rctd@results$results_df_xe 184 | 185 | if(!lite){ 186 | # Ensure first_type and second_type are available in the data frame 187 | df <- df %>% mutate( 188 | singlet_score_first = sapply(1:nrow(df), function(i) { 189 | ft <- df$first_type[i] %>% as.vector() 190 | return(rctd@results$singlet_scores[[i]][ft] %>% unname()) 191 | }), 192 | singlet_score_second = sapply(1:nrow(df), function(i) { 193 | st <- df$second_type[i] %>% as.vector() 194 | if (is.na(st)) return(NA) 195 | return(rctd@results$singlet_scores[[i]][st] %>% unname()) 196 | }), 197 | delta_singlet_score_first_second = singlet_score_second - singlet_score_first 198 | ) 199 | 200 | # Update score_diff to be consistent with the new singlet_score 201 | df <- df %>% mutate( 202 | score_diff = df$singlet_score_first - df$min_score, # Use this one 203 | score_diff_old = df$singlet_score - df$min_score, # Keep this to track 204 | delta_singlet_score_original_first_class = singlet_score_first - singlet_score # Difference between singlet_score and singlet_score_first 205 | ) 206 | } 207 | 208 | # Add weight_first_type, weight_second_type 209 | df <- df %>% mutate( 210 | weight_first_type = rctd@results$weights_doublet[,"first_type"] %>% unname(), 211 | weight_second_type = rctd@results$weights_doublet[,"second_type"] %>% unname() 212 | ) 213 | 214 | if(!lite){ 215 | # Calculate delta_singlet_score: difference between first and second smallest singlet scores 216 | sorted_singlet_scores <- sapply(rctd@results$singlet_scores, FUN = function(x) sort(x, decreasing = F)) 217 | delta_singlet_score <- sapply(sorted_singlet_scores, FUN = function(x) { 218 | if (length(x) == 1) return(Inf) 219 | return(x[2] - x[1]) 220 | }) 221 | df$delta_singlet_score <- delta_singlet_score 222 | 223 | # Calculate delta_singlet_score_class (ignore same class elements when computing delta score) 224 | sorted_singlet_scores_class <- sapply(sorted_singlet_scores, function(x) { 225 | class_vec <- rctd@internal_vars$class_df[names(x), "class"] 226 | mask <- class_vec != class_vec[1] # Keep elements where their class != class[1] 227 | mask[1] <- TRUE # Keep the first element 228 | return(x[mask]) 229 | }) 230 | 231 | delta_singlet_score_class <- sapply(sorted_singlet_scores_class, FUN = function(x) { 232 | if (length(x) == 1) return(0) 233 | return(x[2] - x[1]) 234 | }) 235 | 236 | df$delta_singlet_score_class <- delta_singlet_score_class 237 | } 238 | 239 | # Check whether first_type and second_type come from the same class 240 | df <- df %>% 241 | mutate( 242 | first_type_class = factor( 243 | rctd@internal_vars$class_df[df$first_type %>% as.vector(), "class"], 244 | levels = unique(rctd@internal_vars$class_df$class) 245 | ), 246 | second_type_class = factor( 247 | rctd@internal_vars$class_df[df$second_type %>% as.vector(), "class"], 248 | levels = unique(rctd@internal_vars$class_df$class) 249 | ), 250 | same_class = first_type_class == second_type_class 251 | ) 252 | 253 | rctd@results$results_df_xe <- df 254 | return(rctd) 255 | } 256 | 257 | 258 | #' Normalize `score_diff` to Compensate for Feature Count and Update `spot_class` 259 | #' 260 | #' This function normalizes `score_diff` in the RCTD results to account for the positive correlation between the number of features (`nFeature`) and the goodness-of-fit metrics (`singlet_score` and `min_score`). The normalization helps to reduce the bias introduced by feature count and adjusts the `spot_class` accordingly. 261 | #' 262 | #' @param rctd An `RCTD` object containing spatial transcriptomics data and results. 263 | #' @param nFeature_doublet_threshold Numeric. Threshold for `score_diff_normalized` to classify a spot as a singlet. Default is 0.5. 264 | #' @param nFeature a vector of nFeatures for `rctd@results$results_df_xe` rows, if not provided, will be computed from `rctd@spatialRNA@counts` 265 | #' @param nCount a vector of nCount for `rctd@results$results_df_xe` rows, if not provided, will be computed from `rctd@spatialRNA@counts` 266 | #' 267 | #' @return An updated `RCTD` object with the following fields added or updated in `rctd@results$results_df_xe`: 268 | #' \itemize{ 269 | #' \item \code{nCount}: The count of reads or molecules for each spot, retrieved from \code{xe}. 270 | #' \item \code{nFeature}: The number of features (e.g., genes) detected for each spot, retrieved from \code{xe}. 271 | #' \item \code{score_diff_normalized}: The normalized \code{score_diff}, computed as \code{score_diff / nFeature}. 272 | #' \item \code{is_singlet_in_normalized_thresh}: Logical flag indicating whether a spot's normalized score falls below the doublet threshold. 273 | #' \item \code{spot_class_normalized}: Updated `spot_class` that classifies spots as "singlet" if they pass the normalized threshold and were not previously rejected. 274 | #' } 275 | #' 276 | #' @details 277 | #' The function compensates for the effect of feature count (`nFeature`) on `score_diff` by normalizing it. Spots are reclassified as singlets if their normalized score (`score_diff_normalized`) is below the threshold specified by \code{nFeature_doublet_threshold}, and they are not rejected. 278 | 279 | 280 | normalize_score_diff_by_nFeature <- function( 281 | rctd, 282 | nFeature_doublet_threshold = 0.5, 283 | nFeature = NULL, 284 | nCount = NULL 285 | ) { 286 | 287 | if(is.null(nFeature) || is.null(nCount)){ 288 | # nCount, nFeature 289 | rctd@results$results_df_xe <- rctd@results$results_df_xe %>% mutate( 290 | nCount = Matrix::colSums(rctd@spatialRNA@counts) %>% unname(), 291 | nFeature = Matrix::colSums(rctd@spatialRNA@counts > 0) %>% unname() 292 | ) 293 | } else { 294 | rctd@results$results_df_xe <- rctd@results$results_df_xe %>% mutate( 295 | nCount = nCount, 296 | nFeature = nFeature 297 | ) 298 | } 299 | 300 | # Normalization 301 | rctd@results$results_df_xe <- rctd@results$results_df_xe %>% 302 | mutate( 303 | score_diff_normalized = score_diff / nFeature, 304 | is_singlet_in_normalized_thresh = score_diff_normalized < nFeature_doublet_threshold, 305 | spot_class_normalized = if_else(is_singlet_in_normalized_thresh & spot_class != "reject", 306 | "singlet", spot_class) %>% ordered(levels = levels(spot_class)) 307 | ) 308 | return(rctd) 309 | } 310 | 311 | #' Compute Alternative Annotations 312 | #' 313 | #' This function computes three alternative annotations for each cell based on the results in the `rctd` object: 314 | #' 315 | #' - `annot_min_singlet_score`: The cell type with the minimum singlet score. 316 | #' - `annot_max_weight`: The cell type with the maximum weight. 317 | #' - `annot_max_doublet_weight`: The cell type with the maximum doublet weight, choosing the first type if the first weight is greater than the second, or the second type otherwise. 318 | #' 319 | #' These annotations are added to the `results_df_xe` slot of the `rctd` object. 320 | #' 321 | #' @param rctd An object of class `RCTD` containing the results from the spatial transcriptomics analysis. 322 | #' 323 | #' @return The input `rctd` object with the updated `results_df_xe` slot containing the three alternative annotations. 324 | #' 325 | 326 | compute_alternative_annotations <- function(rctd){ 327 | 328 | # Compute alternative annotations 329 | annot_min_singlet_score <- sapply(rctd@results$singlet_scores, function(x) { 330 | names(x)[which.min(x)] # directly find the name with min singlet score 331 | }) %>% unname() 332 | 333 | annot_max_weight <- apply(rctd@results$weights, 1, function(x) { 334 | names(x)[which.max(x)] # find the name with max weight 335 | }) %>% unname() 336 | 337 | annot_max_doublet_weight <- ifelse( 338 | rctd@results$results_df_xe$weight_first_type > rctd@results$results_df_xe$weight_second_type | 339 | is.na(rctd@results$results_df_xe$weight_second_type), 340 | rctd@results$results_df_xe$first_type %>% as.vector(), 341 | rctd@results$results_df_xe$second_type %>% as.vector() 342 | ) %>% unname() 343 | 344 | rctd@results$results_df_xe <- rctd@results$results_df_xe %>% 345 | mutate( 346 | annot_min_singlet_score = annot_min_singlet_score, 347 | annot_max_weight = annot_max_weight, 348 | annot_max_doublet_weight = annot_max_doublet_weight, 349 | w1_larger_w2 = first_type == annot_max_doublet_weight 350 | ) 351 | return(rctd) 352 | } 353 | 354 | #' Computes Normalized Entropy of Annotations and Stores in `rctd@results$results_df_xe` 355 | #' 356 | #' This function computes the normalized entropy of the annotations for each cell type in the `rctd@results$results_df_xe` dataframe. 357 | #' The entropy is computed based on the relative frequencies of each annotation field and normalized by the maximum possible entropy. 358 | #' 359 | #' The following entropy values are computed: 360 | #' - `entropy_first_type`: Normalized entropy for the `first_type` annotation, considering annotations such as `annot_min_singlet_score`, `annot_max_weight`, and `annot_max_doublet_weight`. 361 | #' - `entropy_second_type`: Normalized entropy for the `second_type` annotation, computed in the same way as `entropy_first_type`. 362 | #' 363 | #' Entropy values indicate the diversity of the annotations for each cell type. A high entropy value suggests more diverse annotations, while a low value indicates more certainty in the annotation. 364 | #' 365 | #' @param rctd An object of class `RCTD` containing the results of the spatial transcriptomics analysis. 366 | #' 367 | #' @return The input `rctd` object with two additional columns in `results_df_xe`: 368 | #' - `entropy_first_type`: The normalized entropy for the `first_type` annotation. 369 | #' - `entropy_second_type`: The normalized entropy for the `second_type` annotation. 370 | 371 | 372 | compute_annotation_entropy <- function(rctd){ 373 | annotation_fields <- grep("annot_", colnames(rctd@results$results_df_xe), value = T) 374 | 375 | normalized_entropy <- function(x){ 376 | vec <- x %>% factor() %>% forcats::fct_infreq() %>% as.numeric() 377 | freqs <- table(vec) / length(vec) 378 | entropy_value <- entropy::entropy(freqs, unit = "log2") 379 | # Calculate maximum entropy 380 | max_entropy <- log2(length(vec)) 381 | 382 | if(is.na(entropy_value)) # missing argmax doublet weight because the first type has low doublet weight,but very high overall weight such that ther is only one candidate cell type (instability in RCTD decomposition) 383 | entropy_value <- 0 384 | 385 | return(entropy_value / max_entropy) 386 | } 387 | 388 | entropy_first_type <- rctd@results$results_df_xe %>% select(all_of(c("first_type", annotation_fields))) %>% 389 | apply(1, FUN = normalized_entropy) 390 | entropy_second_type <- rctd@results$results_df_xe %>% select(all_of(c("second_type", annotation_fields))) %>% 391 | apply(1, FUN = normalized_entropy) 392 | 393 | rctd@results$results_df_xe$entropy_first_type <- entropy_first_type 394 | rctd@results$results_df_xe$entropy_second_type <- entropy_second_type 395 | return(rctd) 396 | } 397 | 398 | #' Computes Annotation Confidence Based on Normalized Entropy of Original and Alternative Annotations 399 | #' 400 | #' This function computes the annotation confidence for each cell based on the normalized entropy of the original and alternative annotations (`entropy_first_type` and `entropy_second_type`), as well as the cell's weight (`weight_first_type`). 401 | #' The confidence for each cell type is computed as follows: 402 | #' - `confidence_first_type = (1 - entropy_first_type) * weight_first_type` 403 | #' - `confidence_second_type = (1 - entropy_second_type) * (1 - weight_first_type)` 404 | #' 405 | #' These confidence values reflect the reliability of the annotations, with higher confidence indicating greater certainty in the cell's type assignment. The entropy values capture the diversity of the annotations, and the weights are used to adjust the confidence based on the strength of the cell-type assignment. 406 | #' 407 | #' @param rctd An object of class `RCTD` containing the results of the spatial transcriptomics analysis. 408 | #' 409 | #' @return The input `rctd` object with two additional columns in `results_df_xe`: 410 | #' - `confidence_first_type`: The computed confidence for the `first_type` annotation. 411 | #' - `confidence_second_type`: The computed confidence for the `second_type` annotation. 412 | 413 | 414 | compute_annotation_confidence <- function(rctd){ 415 | rctd@results$results_df_xe <- rctd@results$results_df_xe %>% 416 | mutate( 417 | confidence_first_type = (1 - entropy_first_type) * weight_first_type, 418 | confidence_second_type = (1 - entropy_second_type) * (1-weight_first_type) 419 | ) 420 | return(rctd) 421 | } 422 | 423 | 424 | #' Runs the post-processing pipeline for RCTD. 425 | #' 426 | #' This function sequentially applies a series of processing steps to the RCTD results, 427 | #' including score matrix updates, singlet correction, score updates, normalization, 428 | #' alternative annotation computation, entropy calculation, and annotation confidence computation. 429 | #' 430 | #' The following processing steps are performed in order: 431 | #' - Updates the score matrix using `update_score_mat_RCTD()` 432 | #' - Corrects singlets using `correct_singlets()` 433 | #' - Updates scores with `update_scores_RCTD()` 434 | #' - Normalizes the score difference by the number of features using `normalize_score_diff_by_nFeature()` 435 | #' - Computes alternative annotations using `compute_alternative_annotations()` 436 | #' - Computes normalized entropy for each annotation using `compute_annotation_entropy()` 437 | #' - Computes annotation confidence using `compute_annotation_confidence()` 438 | #' 439 | #' The final results are stored in the `rctd@results$results_df` slot, and the previous results are saved in 440 | #' `rctd@results$results_df_old` for tracking purposes. The intermediate results in `results_df_xe` are cleared. 441 | #' 442 | #' @param rctd RCTD object containing the results. 443 | #' @param min_weight Minimum weight threshold for the score matrix update (default: 0.05). 444 | #' @param nFeature_doublet_threshold Threshold for doublet classification when normalizing the score difference 445 | #' by the number of features (default: 0.5). 446 | #' @param nFeature Optional numeric vector providing the number of features for each cell, 447 | #' overrides calculation based on the data if provided (default: NULL). 448 | #' @param nCount Optional numeric vector providing the count of features for each cell, 449 | #' overrides calculation based on the data if provided (default: NULL). 450 | #' 451 | #' @return Updated RCTD object with processed results. 452 | #' @export 453 | 454 | run_post_process_RCTD <- function( 455 | rctd, 456 | min_weight = 0.05, 457 | nFeature_doublet_threshold = 0.5, 458 | nFeature = NULL, 459 | nCount = NULL, 460 | n_workers = NULL, 461 | lite = TRUE 462 | ){ 463 | message("Updating score_mat ...") 464 | rctd <- update_score_mat_RCTD( 465 | rctd = rctd, 466 | min_weight = min_weight, 467 | n_workers = n_workers 468 | ) 469 | 470 | message("Correcting singlets ...") 471 | rctd <- correct_singlets(rctd = rctd) 472 | 473 | message("Updating scores ...") 474 | rctd <- update_scores_RCTD(rctd = rctd, lite = lite) 475 | 476 | message("Add coordinates to results ...") 477 | rctd@results$results_df_xe$x <- rctd@spatialRNA@coords[rownames(rctd@results$results_df_xe), "x"] 478 | rctd@results$results_df_xe$y <- rctd@spatialRNA@coords[rownames(rctd@results$results_df_xe), "y"] 479 | 480 | if(!lite){ 481 | message("Normalizing score_diff by nFeature ...") 482 | rctd <- normalize_score_diff_by_nFeature( 483 | rctd = rctd, 484 | nFeature_doublet_threshold = nFeature_doublet_threshold, 485 | nFeature = nFeature, 486 | nCount = nCount 487 | ) 488 | } 489 | 490 | message("Computing alternative annotations ...") 491 | rctd <- compute_alternative_annotations(rctd) 492 | 493 | #message("Computing annotation entropy ...") 494 | #rctd <- compute_annotation_entropy(rctd) 495 | 496 | #message("Computing annotation confidence ...") 497 | #rtrd <- compute_annotation_confidence(rctd) 498 | 499 | message("Replacing results_df ...") 500 | # store the old results to keep track and replace them with the new ones 501 | rctd@results$results_df_old <- rctd@results$results_df 502 | rctd@results$results_df <- rctd@results$results_df_xe 503 | rctd@results$results_df_xe <- NULL 504 | 505 | return(rctd) 506 | } 507 | 508 | -------------------------------------------------------------------------------- /R/purification.R: -------------------------------------------------------------------------------- 1 | # Functions below were adapted and modified from: 2 | 3 | #get_decomposed_data() # https://github.com/dmcable/spacexr/blob/0a0861e3d1e16014a20e9b743d0e19d3b42231f3/R/postProcessing.R#L81C1-L81C20 4 | #get_decomposed_data_full_doublet() # https://github.com/dmcable/spacexr/blob/0a0861e3d1e16014a20e9b743d0e19d3b42231f3/R/postProcessing.R#L44 5 | #decompose_doublet_fast() # https://github.com/dmcable/spacexr/blob/0a0861e3d1e16014a20e9b743d0e19d3b42231f3/R/RCTD_helper.R#L219C1-L219C23 6 | 7 | #' Decompose doublet into 2 profiles 8 | #' 9 | #' @param bead (counts) profile to decompose 10 | #' @param weights cell-type weights (expected to sum up to 1) 11 | #' @param gene_list vector of genes (should be (a subset of) common genes between reference data and query data) 12 | #' @param cell_type_info slot of \link[spacexr]{run.RCDT} output containing reference profile matrix 13 | #' @param type1 name of the main cell-type (expected `first_type`) 14 | #' @param cell_types name(s) of all cell types present in the cell (expects `second_type` in case of decomposition of `doublet_certain` and all present cell types in case of `doublet_uncertain`) 15 | #' 16 | #' @return description list of the profile of the first and the second profiles, 17 | #'in case of decomposition of `doublet_uncertain` (i.e., when `length(cell_types)>1`), the second profile corresponds to the complement (bead - first_profile) 18 | #' @export 19 | 20 | decompose_doublet <- function( 21 | bead, weights, gene_list, cell_type_info, type1, cell_types 22 | ){ 23 | bead <- bead[gene_list] 24 | 25 | N_genes <- length(gene_list) 26 | epsilon <- 1e-10 27 | 28 | if(!(type1 %in% cell_types)){ 29 | cell_types <- c(type1, cell_types) 30 | } 31 | 32 | cell_types <- unique(cell_types) 33 | N_cell_types <- length(cell_types) 34 | 35 | denom <- rowSums(sweep(as.matrix(cell_type_info[[1]][gene_list,cell_types]), 2, weights[cell_types], FUN = "*")) + epsilon 36 | posterior_1 <- (weights[type1] * cell_type_info[[1]][gene_list,type1] + epsilon/N_cell_types) / denom 37 | expect_1 <- posterior_1 * bead 38 | # expect_2 <- bead - expect_1 39 | variance <- expect_1 * (1 - posterior_1) 40 | 41 | return(list(expect_1 = expect_1, 42 | # expect_2 = expect_2, 43 | variance = variance)) 44 | } 45 | 46 | 47 | #' Purify Counts Using RCTD Output 48 | #' 49 | #' This function purifies a query count matrix using the RCTD output, handling both certain and uncertain doublets, as well as optionally processing singlets. 50 | #' 51 | #' @param counts A matrix of gene expression counts, where rows represent genes and columns represent cell IDs. 52 | #' @param results_df A data frame from the `results_df` slot of \link[spacexr]{run.RCTD} output, containing cell-type assignments and classifications. 53 | #' @param ct_weights A matrix of cell-type weights for each spot from the RCTD output. 54 | #' @param cell_type_info A list containing cell-type information used for purification. 55 | #' @param DO_purify_singlets Logical; if `TRUE`, singlets will also be purified. 56 | #' @param n_workers Integer; the number of parallel workers to use. If `NULL`, it defaults to the number of available cores minus one. 57 | #' @param chunk_size Integer; the number of barcodes processed in each batch for parallelization. Default is 10,000. 58 | #' 59 | #' @return A list containing: 60 | #' \describe{ 61 | #' \item{purified_counts}{A matrix of purified gene expression counts.} 62 | #' \item{cell_meta}{A data frame with metadata for each cell, including purification status.} 63 | #' } 64 | #' 65 | #' @import BiocParallel 66 | #' @export 67 | 68 | purify_counts_with_rctd <- function(counts, results_df, ct_weights, cell_type_info, DO_purify_singlets, n_workers = NULL, chunk_size = 10000) { 69 | 70 | is.certain <- c("doublet_certain") 71 | if(DO_purify_singlets){ 72 | is.certain <- c(is.certain, "singlet") 73 | } 74 | 75 | doublets_certain <- results_df[results_df$spot_class %in% is.certain,] %>% rownames() 76 | doublets_uncertain <- results_df[results_df$spot_class %in% c("doublet_uncertain"),] %>% rownames() 77 | 78 | if (is.null(n_workers)) { 79 | n_workers <- multicoreWorkers() - 1 80 | } 81 | param <- MulticoreParam(workers = n_workers) 82 | 83 | gene_list <- intersect(rownames(counts), rownames(cell_type_info[[1]])) 84 | print(length(gene_list)) 85 | # Function to decompose certain doublets 86 | decompose_certain <- function(bead, results_df_bead, ct_weights_bead, gene_list, cell_type_info) { 87 | tryCatch({ 88 | 89 | bead <- bead[gene_list, ] 90 | barcode <- rownames(results_df_bead)[1] 91 | type1 <- as.vector(results_df_bead[barcode, "first_type"]) 92 | type2 <- as.vector(results_df_bead[barcode, "second_type"]) 93 | 94 | w1 <- as.vector(results_df_bead[barcode, "weight_first_type"]) 95 | w2 <- as.vector(results_df_bead[barcode, "weight_second_type"]) 96 | 97 | if(is.na(type2)){ # highly confident singlet -> Do Not Purify! 98 | return(list(barcode = barcode, res = bead)) 99 | } 100 | 101 | if(is.na(w2)){ 102 | w1 <- 1 103 | w2 <- 0 104 | } 105 | 106 | wgts <- c(w1, w2) 107 | wgts <- wgts / sum(wgts) 108 | 109 | names(wgts) <- c(type1, type2) 110 | 111 | doub_res <- decompose_doublet( 112 | bead, 113 | wgts, 114 | gene_list, 115 | cell_type_info, 116 | type1, 117 | c(type1, type2) 118 | ) 119 | return(list(barcode = barcode, res = doub_res$expect_1)) 120 | }, error = function(e) { 121 | message(sprintf("Error processing barcode %s: %s", barcode, e$message)) 122 | return(NULL) 123 | }) 124 | } 125 | 126 | # Function to decompose uncertain doublets 127 | decompose_uncertain <- function(bead, results_df_bead, ct_weights_bead, gene_list, cell_type_info) { 128 | tryCatch({ 129 | 130 | bead <- bead[gene_list,] 131 | barcode <- rownames(results_df_bead)[1] 132 | type1 <- as.vector(results_df_bead[barcode, "first_type"]) 133 | type2 <- as.vector(results_df_bead[barcode, "second_type"]) 134 | 135 | if(is.na(type2)){ # highly confident singlet -> Do Not Purify! 136 | return(list(barcode = barcode, res = bead)) 137 | } 138 | 139 | wgts <- ct_weights_bead 140 | wgts <- wgts / sum(wgts) 141 | types <- names(wgts) 142 | 143 | doub_res <- decompose_doublet( 144 | bead, 145 | wgts, 146 | gene_list, 147 | cell_type_info, 148 | type1, 149 | types 150 | ) 151 | return(list(barcode = barcode, res = doub_res$expect_1)) 152 | }, error = function(e) { 153 | message(sprintf("Error processing barcode %s: %s", barcode, e$message)) 154 | return(NULL) 155 | }) 156 | } 157 | 158 | # Helper function to process chunks 159 | process_chunks <- function(barcodes, decompose_func) { 160 | results_list <- list() 161 | 162 | for (i in seq(1, length(barcodes), by = chunk_size)) { 163 | cat(round(i * 100 / length(barcodes)), "%\n") 164 | 165 | # Select only relevant barcodes for this chunk 166 | chunk_barcodes <- barcodes[i:min(i + chunk_size - 1, length(barcodes))] 167 | 168 | 169 | 170 | # Process in parallel, passing only the subsetted counts 171 | chunk_results <- bplapply( 172 | chunk_barcodes, 173 | function(barcode) { 174 | decompose_func(counts[, barcode, drop = FALSE], results_df[barcode,], ct_weights[barcode,], gene_list, cell_type_info) 175 | }, 176 | BPPARAM = param 177 | ) 178 | 179 | results_list <- c(results_list, chunk_results) 180 | } 181 | 182 | results_list <- Filter(Negate(is.null), results_list) 183 | return(results_list) 184 | } 185 | 186 | 187 | # Process certain doublets 188 | cat("Processing certain doublets...\n") 189 | certain_results <- process_chunks(doublets_certain, decompose_certain) 190 | res_certain_mtrx <- matrix(NA, nrow = length(gene_list), ncol = length(doublets_certain), dimnames = list(gene_list, doublets_certain)) 191 | for (res in certain_results) { 192 | res_certain_mtrx[, res$barcode] <- res$res 193 | } 194 | 195 | # Process uncertain doublets 196 | cat("Processing uncertain doublets...\n") 197 | uncertain_results <- process_chunks(doublets_uncertain, decompose_uncertain) 198 | res_uncertain_mtrx <- matrix(NA, nrow = length(gene_list), ncol = length(doublets_uncertain), dimnames = list(gene_list, doublets_uncertain)) 199 | for (res in uncertain_results) { 200 | res_uncertain_mtrx[, res$barcode] <- res$res 201 | } 202 | 203 | # Combine results 204 | cat("Combaning doublets results ...\n") 205 | purified <- cbind(res_certain_mtrx, res_uncertain_mtrx) 206 | cell_ids <- c(colnames(res_certain_mtrx), colnames(res_uncertain_mtrx)) 207 | 208 | # Process singlets 209 | if(!DO_purify_singlets){ 210 | cat("Processing singlets...\n") 211 | singlets <- results_df[results_df$spot_class == "singlet",] 212 | res_singlet_mtrx <- counts[gene_list, rownames(singlets)] 213 | purified <- cbind(purified, res_singlet_mtrx) 214 | cell_ids <- c(cell_ids, colnames(res_singlet_mtrx)) 215 | } 216 | 217 | colnames(purified) <- cell_ids 218 | rownames(purified) <- gene_list 219 | 220 | #purified <- purified[,colnames(counts)] 221 | cell_meta <- results_df[colnames(purified),] 222 | 223 | cell_meta$purification_status <- "purified" 224 | if(!DO_purify_singlets){ 225 | cell_meta$purification_status[cell_meta$spot_class == "singlet"] <- "raw" 226 | } 227 | cell_meta$purification_status[is.na(cell_meta$second_type)] <- "raw" # Very confident singlets are not purified 228 | 229 | cell_meta$cell_id <- rownames(cell_meta) 230 | 231 | return(list(purified_counts = purified, cell_meta = cell_meta)) 232 | } 233 | 234 | 235 | #' Purify Data Using RCTD Output 236 | #' 237 | #' This function purifies a query count matrix using the RCTD output, handling both certain and uncertain doublets, as well as optionally processing singlets. 238 | #' 239 | #' @param counts A matrix of gene expression counts, where rows represent genes and columns represent cell IDs. 240 | #' @param rctd RCTD output 241 | #' @param DO_purify_singlets Logical; if `TRUE`, singlets will also be purified. 242 | #' @param n_workers Integer; the number of parallel workers to use. If `NULL`, it defaults to the number of available cores minus one. 243 | #' @param chunk_size Integer; the number of barcodes processed in each batch for parallelization. Default is 10,000. 244 | #' 245 | #' @return A list containing: 246 | #' \describe{ 247 | #' \item{purified_counts}{A matrix of purified gene expression counts.} 248 | #' \item{cell_meta}{A data frame with metadata for each cell, including purification status.} 249 | #' } 250 | #' 251 | #' @import BiocParallel 252 | #' @export 253 | 254 | purify <- function(counts, rctd, DO_purify_singlets, n_workers = NULL, chunk_size = 10000) { 255 | 256 | results_df <- rctd@results$results_df 257 | 258 | common_cells <- intersect(colnames(counts), rownames(results_df)) 259 | results_df <- results_df[common_cells, ] 260 | 261 | cell_type_info <- rctd@cell_type_info[[1]] 262 | ct_weights <- rctd@results$weights 263 | ct_weights <- ct_weights[common_cells, colnames(cell_type_info[[1]])] 264 | 265 | counts <- counts[,common_cells] 266 | 267 | return(purify_counts_with_rctd( 268 | counts = counts, 269 | results_df = results_df, 270 | ct_weights = ct_weights, 271 | cell_type_info = cell_type_info, 272 | DO_purify_singlets = DO_purify_singlets, 273 | n_workers = n_workers, 274 | chunk_size = chunk_size 275 | ) 276 | ) 277 | } 278 | 279 | 280 | 281 | 282 | 283 | 284 | 285 | 286 | 287 | 288 | 289 | 290 | -------------------------------------------------------------------------------- /R/split_dataset.R: -------------------------------------------------------------------------------- 1 | #' Split Purified Count Data 2 | #' 3 | #' This function splits purified count data into two components based on the first and second cell type annotations. It ensures that negative values in the residual split profile are replaced with zeros. 4 | #' 5 | #' @param counts A matrix of raw count data where rows represent genes and columns represent cells. 6 | #' @param rctd An object containing RCTD results, including spatial and cell type information. 7 | #' @param DO_purify_singlets A logical value indicating whether to purify singlet cells. 8 | #' @param n_workers An optional integer specifying the number of workers for parallel processing. Defaults to `NULL`. 9 | #' @param chunk_size An integer defining the chunk size for processing. Defaults to `10000`. 10 | #' 11 | #' @return A list containing: 12 | #' \itemize{ 13 | #' \item `purified_counts`: A matrix of purified count data, with two components labeled `_1` and `_2` for each cell. 14 | #' \item `cell_meta`: A data frame containing metadata for the purified cells, including cell type annotations and decomposition order. 15 | #' } 16 | #' 17 | #' @note The function modifies negative residual values in `purified_2`, replacing them with zeros. 18 | #' @export 19 | 20 | 21 | split <- function(counts, rctd, DO_purify_singlets, n_workers = NULL, chunk_size = 10000){ 22 | 23 | purified <- purify(counts = counts, rctd = rctd, DO_purify_singlets = DO_purify_singlets, n_workers = n_workers, chunk_size = chunk_size) 24 | 25 | purified_1 <- purified$purified_counts 26 | cell_meta <- purified$cell_meta 27 | cell_meta$cell_id <- rownames(cell_meta) 28 | 29 | purified_2 <- counts[rownames(purified_1), colnames(purified_1)] - purified_1 30 | 31 | if(sum(purified_2<0) > 0){ 32 | warning("Residual split profile has ", sum(purified_2<0), "negative values that are replaced with 0s") 33 | purified_2[purified_2<0] <- 0 34 | } 35 | 36 | cell_meta_1 <- cell_meta 37 | cell_meta_1$cell_type <- cell_meta_1$first_type 38 | 39 | colnames(purified_1) <- paste0(colnames(purified_1), "_1") 40 | rownames(cell_meta_1) <- paste0(rownames(cell_meta_1), "_1") 41 | cell_meta_1$decomposition_order <- "first" 42 | 43 | cell_meta_2 <- cell_meta 44 | cell_meta_2$cell_type <- cell_meta_2$second_type 45 | cell_meta_2$purification_status[cell_meta_2$purification_status == "raw"] <- "null" 46 | colnames(purified_2) <- paste0(colnames(purified_2), "_2") 47 | rownames(cell_meta_2) <- paste0(rownames(cell_meta_2), "_2") 48 | cell_meta_2$decomposition_order <- "second" 49 | 50 | 51 | return(list( 52 | purified_counts = cbind(purified_1, purified_2), 53 | cell_meta = rbind(cell_meta_1, cell_meta_2) 54 | )) 55 | } 56 | 57 | 58 | #' Split Purified Count Data 59 | #' @export 60 | 61 | 62 | balance_split <- function( 63 | xe_raw, 64 | xe_purified, 65 | spot_class_key = "spot_class", 66 | DO_purify_singlets = TRUE, 67 | DO_split_singlets = TRUE, 68 | DO_split_doublets_uncertain = FALSE, # keep first only 69 | default_assay = "Xenium" 70 | ){ 71 | 72 | if(!spot_class_key %in% colnames(xe_raw@meta.data)){ 73 | stop("spot_class_key", spot_class_key, "is not available in `xe_raw`, please compute it first!") 74 | } 75 | 76 | if(!"first_type" %in% colnames(xe_raw@meta.data)){ 77 | stop("`first_type` is not available in `xe_raw`, please compute it first!") 78 | } 79 | 80 | cells_to_remove <- 81 | xe_raw@meta.data %>% 82 | filter(.data[[spot_class_key]] == "reject") %>% 83 | rownames() 84 | 85 | if(DO_purify_singlets){ 86 | cells_to_keep_raw <- 87 | xe_raw@meta.data %>% 88 | filter(is.na(second_type)) %>% 89 | rownames() 90 | cat(cells_to_keep_raw[1:10], "\n") 91 | cat(length(cells_to_keep_raw)) 92 | } else { 93 | cells_to_keep_raw <- 94 | xe_raw@meta.data %>% 95 | filter(.data[[spot_class_key]] == "singlet") %>% 96 | rownames() 97 | cat(cells_to_keep_raw[1:10], "\n") 98 | cat(length(cells_to_keep_raw)) 99 | } 100 | cells_to_keep_raw <- setdiff(cells_to_keep_raw, cells_to_remove) 101 | 102 | cells_to_replace_with_purified <- setdiff(colnames(xe_raw), c(cells_to_keep_raw, cells_to_remove)) 103 | cat(length(cells_to_replace_with_purified), " - all cells_to_replace_with_purified") 104 | cells_to_replace_with_purified <- cells_to_replace_with_purified[cells_to_replace_with_purified %in% colnames(xe_purified)] 105 | cat(length(cells_to_replace_with_purified), " - in colnames(xe_purified)") 106 | 107 | xe_raw$purification_status <- "raw" 108 | xe_raw@meta.data[cells_to_replace_with_purified,"purification_status"] <- xe_purified@meta.data[cells_to_replace_with_purified,"purification_status"] 109 | xe_raw@meta.data[cells_to_remove,"purification_status"] <- "removed" 110 | 111 | common_genes <- intersect(rownames(xe_raw), rownames(xe_purified)) 112 | 113 | message("raw... \n") 114 | raw <- GetAssayData(xe_raw, assay = default_assay, layer = "counts")[common_genes, cells_to_keep_raw] 115 | cell_meta_raw <- xe_raw@meta.data[cells_to_keep_raw,] 116 | cell_meta_raw$decomposition_order <- "raw" 117 | cell_meta_raw$cell_id <- rownames(cell_meta_raw) 118 | cell_meta_raw$cell_type <- cell_meta_raw$first_type 119 | 120 | message("pur_1 ... \n") 121 | cat(cells_to_replace_with_purified[1:5], "\n") 122 | cat(length(cells_to_replace_with_purified)) 123 | purified_1 <- GetAssayData(xe_purified, assay = default_assay, layer = "counts")[common_genes, cells_to_replace_with_purified] 124 | cell_meta <- xe_purified@meta.data[colnames(purified_1), ] 125 | cell_meta$cell_id <- rownames(cell_meta) 126 | 127 | message("pur_2 ... \n") 128 | purified_2 <- GetAssayData(xe_raw, assay = default_assay, layer = "counts")[rownames(purified_1), colnames(purified_1)] - purified_1 129 | if(sum(purified_2<0) > 0){ 130 | warning(purified_2[purified_2<0] %>% as.vector() %>% summary()) 131 | warning("Residual split profile has ", sum(purified_2<0), "negative values that are replaced with 0s") 132 | purified_2[purified_2<0] <- 0 133 | } 134 | 135 | cell_meta_1 <- cell_meta 136 | cell_meta_1$cell_type <- cell_meta_1$first_type 137 | colnames(purified_1) <- paste0(colnames(purified_1), "_1") 138 | rownames(cell_meta_1) <- paste0(rownames(cell_meta_1), "_1") 139 | cell_meta_1$decomposition_order <- "first" 140 | 141 | cell_meta_2 <- cell_meta 142 | cell_meta_2$cell_type <- cell_meta_2$second_type 143 | cell_meta_2$purification_status[cell_meta_2$purification_status == "raw"] <- "null" 144 | colnames(purified_2) <- paste0(colnames(purified_2), "_2") 145 | rownames(cell_meta_2) <- paste0(rownames(cell_meta_2), "_2") 146 | cell_meta_2$decomposition_order <- "second" 147 | 148 | 149 | if(!DO_split_singlets){ 150 | if(DO_purify_singlets){ 151 | # remove second profile of singlets 152 | cells_to_keep_second_profile <- cell_meta_2 %>% filter(spot_class != "singlet") %>% rownames() 153 | 154 | purified_2 <- purified_2[,cells_to_keep_second_profile] 155 | cell_meta_2 <- cell_meta_2[cells_to_keep_second_profile,] 156 | } 157 | } else { 158 | if(!DO_purify_singlets){ 159 | warning("Cannot split singlets as they are raw") 160 | } 161 | } 162 | 163 | if(!DO_split_doublets_uncertain){ 164 | cells_to_keep_second_profile <- cell_meta_2 %>% filter(spot_class != "doublet_uncertain") %>% rownames() 165 | 166 | purified_2 <- purified_2[,cells_to_keep_second_profile] 167 | cell_meta_2 <- cell_meta_2[cells_to_keep_second_profile,] 168 | } 169 | 170 | common_cols <- intersect(colnames(cell_meta_raw), colnames(cell_meta_1)) 171 | 172 | count_matrix <- cbind(raw, cbind(purified_1, purified_2)) 173 | meta_data = rbind(cell_meta_raw[,common_cols], rbind(cell_meta_1[,common_cols], cell_meta_2[,common_cols])) 174 | 175 | xe_balanced <- CreateSeuratObject(counts = count_matrix, assay = default_assay, meta.data = meta_data) 176 | return(xe_balanced) 177 | } 178 | 179 | 180 | -------------------------------------------------------------------------------- /R/utils.R: -------------------------------------------------------------------------------- 1 | #' Extract Singlet Cell Data 2 | #' 3 | #' This function filters an `xe` object to retain only the cells classified as "singlet" 4 | #' based on the classification stored in the RCTD results. 5 | #' 6 | #' @param xe A Seurat object containing spatial transcriptomics data. 7 | #' @param rctd An object containing RCTD results. 8 | #' @param spot_class_key A character string specifying the column name in `rctd@results$results_df` 9 | #' that contains spot classification information. Default is `"spot_class"`. 10 | #' 11 | #' @return A subset of `xe` containing only singlet-classified cells. 12 | #' 13 | #' @import spacexr 14 | #' @import Seurat 15 | #' @import dplyr 16 | #' @export 17 | #' 18 | get_singlet_data <- function( 19 | xe, 20 | rctd, 21 | spot_class_key = "spot_class" 22 | ){ 23 | results_df <- rctd@results$results_df 24 | singlets_ids <- results_df %>% filter(.data[[spot_class_key]] == "singlet") %>% rownames() 25 | xe <- subset(xe, cells = singlets_ids) 26 | return(xe) 27 | } 28 | -------------------------------------------------------------------------------- /R/visualization.R: -------------------------------------------------------------------------------- 1 | #' Generate Pie Plot Data Frame 2 | #' 3 | #' This function generates a dsata frame for creating pie plots, where each row corresponds to a spatial cell with information about cell types, weights, and coordinates. The function computes weights for each cell type based on the first and second cell type annotations from RCTD results and supports different modes of analysis. 4 | #' 5 | #' @param rctd An object containing RCTD results, including spatial and cell type data. 6 | #' @param mode A character string specifying the mode of computation. Currently, only `"doublet"` mode is implemented. Available options include: 7 | #' \itemize{ 8 | #' \item `"doublet"`: Computes weights for first and second cell types for doublet cells. 9 | #' \item `"all"`: Not yet implemented. 10 | #' \item `"by_spot_class"`: Not yet implemented. 11 | #' } 12 | #' 13 | #' @return A data frame containing the following columns: 14 | #' \itemize{ 15 | #' \item `cell_id`: Unique identifier for each spatial cell. 16 | #' \item `x`, `y`: Spatial coordinates of the cell. 17 | #' \item `weight_first_type`, `weight_second_type`: Weights for the first and second cell types. 18 | #' \item `first_type`, `second_type`: Annotations of the first and second cell types. 19 | #' \item `w1_larger_w2`: Logical indicator of whether the first type weight is larger than the second type weight. 20 | #' \item One column for each cell type, containing the computed weight of that cell type for each spatial cell. 21 | #' } 22 | #' 23 | #' @note Only the `"doublet"` mode is currently implemented. Other modes will result in an error. 24 | #' 25 | #' @importFrom dplyr %>% select 26 | #' 27 | #' @export 28 | 29 | get_pieplot_df <- function( 30 | rctd, 31 | mode = c("doublet", "all", "by_spot_class")[1] 32 | ){ 33 | 34 | cell_types <- rctd@cell_type_info[[1]][[2]] 35 | cell_types_N <- length(cell_types) 36 | 37 | pie_df <- rctd@results$results_df %>% 38 | select(spot_class, weight_first_type, weight_second_type, first_type, second_type, w1_larger_w2, x, y) 39 | pie_df$cell_id <- rownames(pie_df) 40 | 41 | if(mode == "doublet"){ 42 | pie_celltype_weigts <- apply(pie_df, 1, FUN = function(row){ 43 | res <- rep(0, cell_types_N) 44 | names(res) <- cell_types 45 | ft <- row["first_type"] %>% unname() 46 | st <- row["second_type"] %>% unname() 47 | w1 <- row["weight_first_type"] %>% as.numeric() 48 | 49 | if(!is.na(ft)){ 50 | if(is.na(st)){ 51 | res[ft] <- 1 52 | } else { 53 | res[ft] <- w1 54 | res[st] <- 1 - w1 55 | } 56 | } else { 57 | return(res) 58 | } 59 | return(res) 60 | 61 | }) %>% t() 62 | } else { 63 | stop("Only doublet mode is implemented so far!") 64 | } 65 | 66 | pie_celltype_weigts <- as.data.frame(pie_celltype_weigts) 67 | pie_celltype_weigts$cell_id <- rownames(pie_celltype_weigts) 68 | 69 | 70 | # Merge the two data frames by `cell_id` 71 | merged_pie_df <- merge( 72 | pie_df, 73 | pie_celltype_weigts, 74 | by = "cell_id", 75 | all.x = TRUE # Keep all rows from pie_df 76 | ) 77 | rownames(merged_pie_df) <- merged_pie_df$cell_id 78 | return(merged_pie_df) 79 | } 80 | 81 | #' Crop pie_df Data to a Specific Cell 82 | #' 83 | #' This function filters a data frame to retain only the rows within a specified radius around a given cell. 84 | #' 85 | #' @param df A data frame containing spatial cell information. 86 | #' @param cell_id The unique identifier of the central cell. 87 | #' @param radius A numeric value specifying the radius for cropping (default: 100). 88 | #' 89 | #' @importFrom dplyr %>% filter 90 | #' @return A filtered data frame containing only the cells within the specified radius. 91 | #' @export 92 | #' 93 | crop_pie_df_to_cell <- function( 94 | df, 95 | cell_id, 96 | radius = 100 97 | ){ 98 | x_center <- df[cell_id, "x"] 99 | y_center <- df[cell_id, "y"] 100 | print(x_center) 101 | print(y_center) 102 | result <- df %>% filter( 103 | (x < (x_center + radius) & x > (x_center - radius) & 104 | y < (y_center + radius) & y > (y_center - radius)) 105 | ) 106 | return(result) 107 | } 108 | 109 | #' Generate a Pie Plot 110 | #' 111 | #' This function creates a pie plot from a given data frame containing spatial cell type proportions. 112 | #' 113 | #' @param pie_df A data frame containing spatial cell type proportions and coordinates. 114 | #' @param cols An optional named vector of colors for cell types. 115 | #' @param pie_scale A numeric value to scale pie sizes (default: 1). 116 | #' 117 | #' @return A ggplot object representing the spatial pie chart. 118 | #' 119 | #' @importFrom scatterpie geom_scatterpie 120 | #' @importFrom ggplot2 ggplot coord_fixed scale_color_manual scale_fill_manual aes theme_void guides 121 | #' @importFrom dplyr %>% 122 | 123 | #' @export 124 | 125 | plot_pie <- function(pie_df, cols = NULL, pie_scale = 1){ 126 | 127 | cell_types <- unique(c(pie_df$first_type,pie_df$second_type)) %>% as.vector() 128 | cell_types <- cell_types[!is.na(cell_types)] 129 | 130 | p <- ggplot() + 131 | geom_scatterpie( 132 | aes(x = x, y = -y, group = cell_id, color = first_type), # Define the position and grouping 133 | data = pie_df, 134 | cols = cell_types , # Specify columns for the pie slices 135 | pie_scale = pie_scale 136 | ) 137 | 138 | pie_df_cell_type <- pie_df %>% filter(spot_class != "singlet") 139 | 140 | if(!is.null(pie_df_cell_type)){ 141 | pie_df_cell_type <- pie_df_cell_type %>% 142 | mutate( 143 | second_type_spot_class = case_when( 144 | spot_class %in% c("reject", "doublet_uncertain") ~ as.character(spot_class), 145 | spot_class == "doublet_certain" ~ as.character(second_type), 146 | TRUE ~ as.character(second_type) 147 | ) 148 | ) 149 | 150 | p <- p + geom_point(data = pie_df_cell_type, aes(x = x, y = -y, pch = spot_class, color = second_type_spot_class), size = 3*pie_scale) 151 | } 152 | 153 | if(!is.null(cols)){ 154 | cols["doublet_uncertain"] <- "black" 155 | cols["reject"] <- "black" 156 | 157 | p <- p + 158 | scale_color_manual(values = cols) + 159 | scale_fill_manual(values = cols) 160 | } 161 | 162 | p <- p + scale_shape_manual(values = c("doublet_certain" = 16, "doublet_uncertain" = 16, "reject" = 4)) 163 | 164 | p <- p + 165 | coord_fixed() + 166 | theme_void() + 167 | guides(color = "none") # Hide the color legend but keep the fill legend 168 | 169 | return(p) 170 | } 171 | 172 | 173 | #' Plot Pie Chart by Coordinate Range 174 | #' 175 | #' This function filters the pie data frame by x and y limits and generates a pie plot. 176 | #' 177 | #' @param pie_df A data frame containing spatial cell type proportions and coordinates. 178 | #' @param x_lims A numeric vector specifying x-axis limits. 179 | #' @param y_lims A numeric vector specifying y-axis limits. 180 | #' @param cols An optional named vector of colors for cell types. 181 | #' @param pie_scale A numeric value to scale pie sizes (default: 1). 182 | #' 183 | #' @importFrom dplyr %>% filter 184 | #' @return A ggplot object representing the cropped spatial pie chart. 185 | #' @export 186 | 187 | plot_pie_by_coordinates <- function( 188 | pie_df, 189 | x_lims, y_lims, 190 | cols = NULL, 191 | pie_scale = 1 192 | ){ 193 | 194 | pie_df_crop <- pie_df %>% filter(x > min(x_lims) & x < max(x_lims) & y > min(y_lims) & y < max(y_lims)) 195 | p <- plot_pie(pie_df_crop, cols = cols, pie_scale = pie_scale) 196 | 197 | return(p) 198 | } 199 | 200 | #' Plot Pie Chart Around a Specific Cell 201 | #' 202 | #' This function crops the data around a specified cell within a given radius and generates a pie plot. 203 | #' 204 | #' @param pie_df A data frame containing spatial cell type proportions and coordinates. 205 | #' @param cell_id The unique identifier of the central cell. 206 | #' @param radius A numeric value specifying the radius for cropping (default: 100). 207 | #' @param pie_scale A numeric value to scale pie sizes (default: 1). 208 | #' @param cols An optional named vector of colors for cell types. 209 | #' @param DO_highlight_cell A logical indicating whether to highlight the central cell (default: TRUE). 210 | #' 211 | #' @importFrom ggplot2 annotate 212 | #' @return A ggplot object representing the spatial pie chart around the specified cell. 213 | #' @export 214 | 215 | plot_pie_around_cell <- function( 216 | pie_df, 217 | cell_id, 218 | radius = 100, 219 | pie_scale = 1, 220 | cols = NULL, 221 | DO_highlight_cell = TRUE 222 | ){ 223 | 224 | pie_df_crop <- crop_pie_df_to_cell(pie_df, cell_id = cell_id, radius = radius) 225 | p <- plot_pie(pie_df_crop, cols = cols, pie_scale = pie_scale) 226 | 227 | if(DO_highlight_cell){ 228 | x_center <- pie_df_crop[cell_id, "x"] 229 | y_center <- pie_df_crop[cell_id, "y"] 230 | p <- p + 231 | annotate("point", x = x_center, y = -y_center, color = "black", size = 1, shape = 8) 232 | } 233 | 234 | return(p) 235 | } 236 | 237 | #' Save Pie Plot to File 238 | #' 239 | #' This function saves a given ggplot object as an image file. 240 | #' 241 | #' @param p A ggplot object representing the pie chart. 242 | #' @param filename A character string specifying the output file name. 243 | #' @param ... Additional arguments passed to `ggsave()`, such as width, height, and dpi. 244 | #' 245 | #' @return Saves the plot as a file with a transparent background. 246 | #' @export 247 | 248 | save_pieplot <- function( 249 | p, 250 | filename, 251 | ... 252 | ){ 253 | ggsave(filename, plot = p, bg = "transparent", ... ) 254 | } 255 | 256 | -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | 2 | [![R-CMD-check](https://github.com/BDSC-tds/SPLIT/actions/workflows/R-CMD-check.yaml/badge.svg)](https://github.com/BDSC-tds/SPLIT/actions/workflows/R-CMD-check.yaml) 3 | [![DOI](https://img.shields.io/badge/DOI%3A-10.1101%2F2025.04.23.649965-brightgreen)](https://doi.org/10.1101/2025.04.23.649965) 4 | [![Version](https://img.shields.io/badge/version-0.1.0-blue)](https://github.com/bdsc-tds/SPLIT/releases/tag/v0.1.0) 5 | 6 | 7 | # SPLIT: Spatial Purification of Layered Intracellular Transcripts 8 | 9 | ![](vignettes/plots/SPLIT_schema.png) 10 | 11 | 🚧 **This package is under active development.**\ 12 | ⚡ Use the **Quick Start** guide below to get up and running quickly.\ 13 | 📖 A **comprehensive tutorial** of running SPLIT on Xenium data is now available as [.Rmd](https://github.com/bdsc-tds/SPLIT/blob/main/vignettes/Run_RCTD_and_SPLIT_on_Xenium.Rmd) and [.html](https://github.com/bdsc-tds/SPLIT/blob/main/doc/Run_RCTD_and_SPLIT_on_Xenium.html). 14 | 15 | ------------------------------------------------------------------------ 16 | 17 | ## 📦 Installation 18 | 19 | To install SPLIT from GitHub: 20 | 21 | ``` r 22 | # Install SPLIT 23 | remotes::install_github("bdsc-tds/SPLIT") 24 | ``` 25 | 26 | ## 🚀 Quick Start 27 | 28 | ⚠️ **IMPORTANT:**\ 29 | SPLIT currently requires **doublet-mode** RCTD results from the original [spacexr GitHub repository](https://github.com/dmcable/spacexr) or its faster [HD fork](https://github.com/jpromeror/spacexr/tree/HD), **not** from the newly released [Bioconductor version](https://www.bioconductor.org/packages/release/bioc/html/spacexr.html).\ 30 | 🚧 **Compatibility with Bioconductor's spacexr is coming soon.** 31 | 32 | If you already have your **Xenium** dataset as a Seurat object (`xe`) and **RCTD** results from **doublet-mode** decomposition in `RCTD`, you can run SPLIT purification like this: 33 | 34 | ```{r} 35 | library(SPLIT) 36 | library(spacexr) 37 | library(dplyr) 38 | library(Seurat) 39 | library(ggplot2) 40 | 41 | # Post-process RCTD output 42 | RCTD <- SPLIT::run_post_process_RCTD(RCTD) 43 | 44 | # Run SPLIT purification 45 | res_split <- SPLIT::purify( 46 | counts = GetAssayData(xe, assay = 'Xenium', layer = 'counts'), # or any gene x cells counts matrix 47 | rctd = RCTD, 48 | DO_purify_singlets = TRUE # optional 49 | ) 50 | 51 | # Create a purified Seurat object 52 | xe_purified <- CreateSeuratObject( 53 | counts = res_split$purified_counts, 54 | meta.data = res_split$cell_meta, 55 | assay = "Xenium" 56 | ) 57 | 58 | # Optional: Filter, normalize and visualize 59 | xe_purified <- subset(xe_purified, subset = nCount_Xenium > 5) 60 | xe_purified <- xe_purified %>% 61 | SCTransform(assay = "Xenium") %>% 62 | RunPCA() %>% 63 | RunUMAP(dims = 1:20) 64 | UMAPPlot(xe_purified, group.by = c("first_type"), label = T, repel = T) + theme(aspect.ratio = 1) 65 | ``` 66 | 67 | ## Citation 68 | 69 | If you use **SPLIT** in your work, please cite: 70 | 71 | > **From Transcripts to Cells: Dissecting Sensitivity, Signal Contamination, and Specificity in Xenium Spatial Transcriptomics**\ 72 | > Mariia Bilous, Daria Buszta, Jonathan Bac, Senbai Kang, Yixing Dong, Stephanie Tissot, Sylvie Andre, Marina Alexandre-Gaveta, Christel Voize, Solange Peters, Krisztian Homicsko, Raphael Gottardo\ 73 | > *bioRxiv* (2025). 74 | 75 | ## Contact 76 | 77 | If you have any questions about the package, feel free to [open an issue](https://github.com/bdsc-tds/SPLIT/issues) or contact **Mariia Bilous** at [Mariia.Bilous\@chuv.ch](mailto:Mariia.Bilous@chuv.ch). 78 | -------------------------------------------------------------------------------- /doc/Run_RCTD_and_SPLIT_on_Xenium.R: -------------------------------------------------------------------------------- 1 | ## ----libs, message=FALSE------------------------------------------------------ 2 | if(!requireNamespace("spacexr", quietly = TRUE)){ 3 | remotes::install_github("dmcable/spacexr") ## or remotes::install_github("jpromeror/spacexr@HD") for implementation of the doublet mode. 4 | } 5 | library(spacexr) 6 | 7 | if(!requireNamespace("SPLIT", quietly = TRUE)){ 8 | remotes::install_github("bdsc-tds/SPLIT") 9 | } 10 | library(SPLIT) 11 | 12 | library(dplyr) 13 | library(Seurat) 14 | library(readxl) 15 | library(SingleCellExperiment) 16 | library(httr) 17 | library(ggplot2) 18 | 19 | ## ----load-chormium-metadata--------------------------------------------------- 20 | # read metadata 21 | url <- "https://static-content.springer.com/esm/art%3A10.1038%2Fs41467-023-43458-x/MediaObjects/41467_2023_43458_MOESM4_ESM.xlsx" 22 | temp_file <- tempfile(fileext = ".xlsx") 23 | GET(url, write_disk(temp_file, overwrite = TRUE)) 24 | 25 | chrom_metadata <- read_excel(temp_file, sheet = 1) %>% as.data.frame() 26 | rownames(chrom_metadata) <- chrom_metadata$Barcode 27 | 28 | ## ----load-chromium------------------------------------------------------------ 29 | # read Chromium 30 | url <- "https://cf.10xgenomics.com/samples/cell-exp/7.0.1/Chromium_FFPE_Human_Breast_Cancer_Chromium_FFPE_Human_Breast_Cancer/Chromium_FFPE_Human_Breast_Cancer_Chromium_FFPE_Human_Breast_Cancer_count_sample_filtered_feature_bc_matrix.h5" 31 | temp_file <- tempfile(fileext = ".h5") 32 | GET(url, write_disk(temp_file, overwrite = TRUE)) 33 | 34 | chrom_counts <- Read10X_h5(temp_file) 35 | chrom <- CreateSeuratObject(counts = chrom_counts, assay = "RNA", meta.data = chrom_metadata) 36 | 37 | chrom$QC <- !is.na(chrom$Annotation) 38 | chrom$is_hybrid <- grepl("Hybrid", chrom$Annotation, ignore.case = TRUE) 39 | chrom <- subset(chrom, subset = QC == TRUE & is_hybrid == FALSE) # remove not annotated cells that cells that have sign of doublets 40 | 41 | 42 | ## ----class-df----------------------------------------------------------------- 43 | cell_type_to_class <- c( 44 | "B_Cells" = "B cell", 45 | "CD4+_T_Cells" = "T cell", 46 | "CD8+_T_Cells" = "T cell", 47 | "IRF7+_DCs" = "Myeloid", 48 | "LAMP3+_DCs" = "Myeloid", 49 | "Macrophages_1" = "Myeloid", 50 | "Macrophages_2" = "Myeloid", 51 | "Mast_Cells" = "Myeloid", 52 | "DCIS 1" = "Epithelial", 53 | "DCIS 2" = "Epithelial", 54 | "Invasive_Tumor" = "Epithelial", 55 | "Prolif_Invasive_Tumor" = "Epithelial", 56 | "Myoepi_ACTA2+" = "Myoepithelial", 57 | "Myoepi_KRT15+" = "Myoepithelial", 58 | "Stromal" = "Stromal", 59 | "Perivascular-Like" = "Stromal", 60 | "Endothelial" = "Endothelial" 61 | ) 62 | 63 | class_df <- data.frame(class = cell_type_to_class) 64 | 65 | # and define colors for reproducibility 66 | library(RColorBrewer) 67 | 68 | cell_types <- unique(chrom$Annotation) 69 | colors <- brewer.pal(n = max(3, min(length(cell_types), 12)), name = "Set3") 70 | # Recycle colors if not enough 71 | colors <- rep(colors, length.out = length(cell_types)) 72 | pal <- setNames(colors, cell_types) 73 | 74 | ## ----load-xenium-------------------------------------------------------------- 75 | if(!requireNamespace("STexampleData", quietly = TRUE)) 76 | remotes::install_github("lmweber/STexampleData") 77 | xe_full_seu <- STexampleData::Janesick_breastCancer_Xenium_rep1() 78 | 79 | ## Convert to Seurat to stay consistent with chromium object 80 | sp_coords <- spatialCoords(xe_full_seu) 81 | colnames(sp_coords) <- c("ST_1", "ST_2") 82 | 83 | xe_full <- CreateSeuratObject( 84 | counts = counts(xe_full_seu), 85 | assay = "Xenium", 86 | meta.data = as.data.frame(colData(xe_full_seu)) 87 | ) 88 | 89 | xe_full[["spatial"]] <- CreateDimReducObject(sp_coords, assay = "Xenium", key = "ST_") 90 | 91 | xe_full$x <- sp_coords[,1] 92 | xe_full$y <- sp_coords[,2] 93 | rm(xe_full_seu) 94 | 95 | ## ----downsampling------------------------------------------------------------- 96 | DO_subset_xe <- FALSE 97 | X_lim <- c(6000, Inf) # cropping area 98 | Y_lim <- c(4000, Inf) # cropping area 99 | 100 | if(DO_subset_xe){ 101 | xe <- subset(xe_full, subset = x > min(X_lim) & x < max(X_lim) & y > min(Y_lim) & y < max(Y_lim)) 102 | } else { 103 | xe <- xe_full 104 | } 105 | 106 | ## ----rctd--------------------------------------------------------------------- 107 | DO_run_RCTC <- FALSE # FALSE to load pre-computed results 108 | 109 | common_genes <- intersect(rownames(xe), rownames(chrom)) 110 | ref_labels <- chrom$Annotation %>% as.factor() 111 | 112 | ref.obj <- Reference(GetAssayData(chrom, "RNA", "counts")[common_genes, ], 113 | cell_types = ref_labels, min_UMI = 10, require_int = TRUE) 114 | 115 | test.obj <- SpatialRNA(coords = xe@reductions$spatial@cell.embeddings %>% as.data.frame(), 116 | counts = GetAssayData(xe, assay = "Xenium", layer = "counts")[common_genes, ], 117 | require_int = TRUE) 118 | 119 | if(!exists("class_df")) 120 | class_df <- NULL 121 | 122 | RCTD <- create.RCTD( 123 | test.obj, 124 | ref.obj, 125 | UMI_min = 10, 126 | counts_MIN = 10, 127 | UMI_min_sigma = 100, 128 | max_cores = BiocParallel::multicoreWorkers() - 1, 129 | CELL_MIN_INSTANCE = 25, 130 | class_df = class_df # highly recommended 131 | ) 132 | 133 | if(DO_run_RCTC){ 134 | RCTD <- run.RCTD(RCTD, doublet_mode = "doublet") 135 | saveRDS(RCTD, "~/precomp_rctd_class_aware.rds") 136 | } else { 137 | message("reading precomp RCTD results") 138 | 139 | # Install googledrive if you haven't already 140 | if (!requireNamespace("googledrive", quietly = TRUE)) { 141 | install.packages("googledrive") 142 | } 143 | library(googledrive) 144 | drive_deauth() 145 | # Define the file ID from the Google Drive link 146 | file_id <- "1pTUKq49JbUFwVk7vttjZIFqkx-AKznRF" #"1DCalFIZJywOvrSGBSPqrHh9QINeQp8aq" 147 | local_path <- tempfile(fileext = ".rds") 148 | drive_download(as_id(file_id), path = local_path, overwrite = TRUE) 149 | RCTD <- readRDS(local_path) 150 | 151 | } 152 | 153 | 154 | ## ----post-rctd---------------------------------------------------------------- 155 | RCTD <- SPLIT::run_post_process_RCTD(RCTD) 156 | xe <- AddMetaData(xe, RCTD@results$results_df) 157 | xe <- subset(xe, subset = nCount_Xenium >= 10) 158 | 159 | cat("Proprtion of spot classes") 160 | (xe$spot_class %>% table())/ncol(xe)*100 161 | 162 | ## ----plot-raw-xenium, fig.width=16, message=FALSE----------------------------- 163 | 164 | xe <- xe %>% SCTransform(assay = "Xenium", verbose = FALSE) %>% RunPCA(verbose = FALSE) %>% RunUMAP(dims = 1:50, verbose = FALSE) 165 | p1 <- UMAPPlot(xe, group.by = "first_type", label = T, repel = T, cols = pal) + theme_void() + theme(aspect.ratio = 1, legend.position = "none") 166 | p2 <- UMAPPlot(xe, group.by = "second_type", cols = pal) + theme_void() + theme(aspect.ratio = 1, legend.position = "bottom") 167 | p3 <- UMAPPlot(xe, group.by = "spot_class") + theme_void() + theme(aspect.ratio = 1, legend.position = "right") 168 | 169 | p1 | p2 | p3 170 | 171 | 172 | ## ----spatial-plot, fig.width=12, warning=FALSE-------------------------------- 173 | DimPlot(xe, reduction = "spatial", group.by = "first_type", raster = TRUE, cols = pal) + coord_fixed() 174 | 175 | ## ----SPLIT, message=FALSE----------------------------------------------------- 176 | # Run SPLIT purification 177 | res_split <- SPLIT::purify( 178 | counts = GetAssayData(xe, assay = 'Xenium', layer = 'counts'), # or any gene x cells counts matrix 179 | rctd = RCTD, 180 | DO_purify_singlets = TRUE # Optional. If TRUE, singlets with an available secondary type are purified the same way as doublets_certain; otherwise, left unchanged. 181 | ) 182 | 183 | # Create a purified Seurat object 184 | xe_purified <- CreateSeuratObject( 185 | counts = res_split$purified_counts, 186 | meta.data = res_split$cell_meta, 187 | assay = "Xenium" 188 | ) 189 | 190 | # Optional: Filter, normalize and visualize 191 | xe_purified <- subset(xe_purified, subset = nCount_Xenium > 5) 192 | xe_purified <- xe_purified %>% 193 | SCTransform(assay = "Xenium", verbose = FALSE) %>% 194 | RunPCA(verbose = FALSE) %>% 195 | RunUMAP(dims = 1:20, verbose = FALSE) 196 | 197 | #UMAPPlot(xe_purified, group.by = c("first_type"), label = T, repel = T) + theme_void() + theme(aspect.ratio = 1, legend.position = "none") 198 | 199 | ## ----plot-raw-split-purified, fig.width=12, message=FALSE--------------------- 200 | p1 <- UMAPPlot(xe, group.by = c("first_type"), label = T, repel = T, cols = pal) + theme_void() + theme(aspect.ratio = 1, legend.position = "none") + ggtitle("Raw Xenium data") 201 | 202 | p2 <- UMAPPlot(xe_purified, group.by = c("first_type"), label = T, repel = T, cols = pal) + theme_void() + theme(aspect.ratio = 1, legend.position = "right") + ggtitle("SPLIT-purified Xenium data") 203 | 204 | p3 <- UMAPPlot(xe_purified, group.by = c("spot_class")) + theme_void() + theme(aspect.ratio = 1, legend.position = "bottom") + ggtitle("SPLIT-purified Xenium data colored by spot class") 205 | p4 <- UMAPPlot(xe_purified, group.by = c("purification_status")) + theme_void() + theme(aspect.ratio = 1, legend.position = "bottom") + ggtitle("SPLIT-purified Xenium data colored by purification status") 206 | 207 | 208 | (p1|p2) 209 | (p3|p4) 210 | 211 | 212 | ## ----spatial-nw--------------------------------------------------------------- 213 | sp_nw <- SPLIT::build_spatial_network( 214 | xe, 215 | reduction = "spatial", 216 | dims = 1:2, 217 | DO_prune = TRUE, 218 | rad_pruning = 15, # remove connections further than 15um 219 | k_knn = 20 220 | ) 221 | 222 | sp_nw <- SPLIT::add_spatial_metric(spatial_neighborhood = sp_nw, rctd = RCTD) 223 | sp_neigh_df <- SPLIT::neighborhood_analysis_to_metadata(sp_nw) 224 | 225 | xe <- AddMetaData(xe, sp_neigh_df) 226 | 227 | rm(sp_nw, sp_neigh_df) 228 | 229 | ## ----plot-neigh-weight-second-type, message=FALSE----------------------------- 230 | # Plot magnitude of local diffusion on UMAP 231 | FeaturePlot(xe, features = c("neighborhood_weights_second_type")) + theme_void() + theme(aspect.ratio = 1) 232 | 233 | # Plot distribution of local diffusion value 234 | hist(xe$neighborhood_weights_second_type) 235 | 236 | # Plot distribution of local diffusion value per `spot_class` 237 | xe@meta.data %>% filter(!is.na(spot_class)) %>% 238 | ggplot(aes(x = spot_class, y = neighborhood_weights_second_type, color = spot_class)) + geom_boxplot() + labs(title = "Local neighbohood diffusion by spot class") + theme_minimal() 239 | 240 | ## ----spatially-aware-split, fig.width=12, message=FALSE----------------------- 241 | xe_purified_balanced_score <- SPLIT::balance_raw_and_purified_data_by_score( 242 | xe_raw = xe, 243 | xe_purified = xe_purified, 244 | default_assay = "Xenium", # should be param, but can wait 245 | spot_class_key = "spot_class", 246 | threshold = 0.05, # lower -> more cells will be purified 247 | score_name = "neighborhood_weights_second_type" 248 | ) 249 | 250 | # Optional: Filter, normalize and visualize 251 | xe_purified_balanced_score <- subset(xe_purified_balanced_score, subset = nCount_Xenium > 5) 252 | xe_purified_balanced_score <- xe_purified_balanced_score %>% 253 | SCTransform(assay = "Xenium", verbose = FALSE) %>% 254 | RunPCA(verbose = FALSE) %>% 255 | RunUMAP(dims = 1:20, verbose = FALSE) 256 | 257 | p5 <- UMAPPlot(xe_purified_balanced_score, group.by = c("first_type"), label = T, repel = T, cols = pal) + theme_void() + theme(aspect.ratio = 1, legend.position = "none") + ggtitle("Spatially-aware SPLIT-purified Xenium data") 258 | p6 <- UMAPPlot(xe_purified_balanced_score, group.by = c("purification_status")) + theme_void() + theme(aspect.ratio = 1, legend.position = "bottom") 259 | 260 | p5|p6 261 | 262 | ## ----transcriptomics-nw------------------------------------------------------- 263 | tr_nw <- build_transcriptomics_network( 264 | xe, 265 | DO_prune = FALSE, 266 | k_knn = 100 267 | ) 268 | tr_nw <- add_transcriptomics_metric(transcriptomics_neighborhood = tr_nw, rctd = RCTD) 269 | tr_neigh_df <- neighborhood_analysis_to_metadata(tr_nw) 270 | xe <- AddMetaData(xe, tr_neigh_df) 271 | 272 | rm(tr_nw, tr_neigh_df) 273 | 274 | ## ----run-split-shift---------------------------------------------------------- 275 | xe_split_shift <- SPLIT::balance_raw_and_purified_data_by_score( 276 | xe_raw = xe, 277 | xe_purified = xe_purified, 278 | default_assay = "Xenium", 279 | spot_class_key = "spot_class", 280 | threshold = 0.05, # to be consistent with spatially-aware SPLIT results 281 | score_name = "neighborhood_weights_second_type", 282 | DO_swap_lables = TRUE 283 | ) 284 | 285 | # Optional: Filter, normalize and visualize 286 | xe_split_shift <- subset(xe_split_shift, subset = nCount_Xenium > 5) 287 | xe_split_shift <- xe_split_shift %>% 288 | SCTransform(assay = "Xenium", verbose = FALSE) %>% 289 | RunPCA(verbose = FALSE) %>% 290 | RunUMAP(dims = 1:20, verbose = FALSE) 291 | 292 | ## ----vis-split-shift, fig.width=16, message=FALSE----------------------------- 293 | p7 <- UMAPPlot(xe_split_shift, group.by = c("first_type"), label = T, repel = T, cols = pal) + theme_void() + theme(aspect.ratio = 1, legend.position = "none") + ggtitle("SPLIT-shift-purified Xenium data") 294 | p8 <- UMAPPlot(xe_split_shift, group.by = c("purification_status")) + theme_void() + theme(aspect.ratio = 1, legend.position = "bottom") 295 | p9 <- UMAPPlot(xe_split_shift, group.by = c("swap")) + theme_void() + theme(aspect.ratio = 1, legend.position = "bottom") 296 | 297 | p7|p8|p9 298 | 299 | ## ----vis-split-shift-swap-facet, fig.width=12, message=FALSE------------------ 300 | # Visualize results faceted by swapping status 301 | p10 <- UMAPPlot(xe_split_shift, group.by = c("first_type"), split.by = "swap", raster = F, cols = pal) + theme_void() + theme(aspect.ratio = 1, legend.position = "bottom") + ggtitle("SPLIT-shift-purified Xenium data faceted by lable swapping status") 302 | p10 303 | 304 | ## ----summary-plot, fig.width=18, fig.height=7, message=FALSE------------------ 305 | p1 | p2+theme(legend.position = "bottom") | p5 | p7 306 | 307 | ## ----fig.width=12------------------------------------------------------------- 308 | pie_df <- get_pieplot_df(rctd = RCTD) 309 | cois <- xe@meta.data %>% filter(neighborhood_weights_second_type > .2, first_type %in% c("CD4+_T_Cells", "CD8+_T_Cells"), spot_class != "reject") %>% rownames() 310 | 311 | p <- plot_pie_around_cell(pie_df, cell_id = cois[1], radius = 50, cols = pal) 312 | plot(p) 313 | 314 | -------------------------------------------------------------------------------- /doc/Run_RCTD_and_SPLIT_on_Xenium.Rmd: -------------------------------------------------------------------------------- 1 | --- 2 | title: "RCTD Annotation and SPLIT Purification for Xenium Sample" 3 | output: 4 | rmarkdown::github_document: 5 | toc: true 6 | toc_depth: 2 7 | vignette: > 8 | %\VignetteIndexEntry{Run RCTD and SPLIT on Xenium} 9 | %\VignetteEngine{knitr::rmarkdown} 10 | %\VignetteEncoding{UTF-8} 11 | --- 12 | 13 | # Introduction 14 | 15 | This vignette demonstrates how to annotate Xenium spatial transcriptomics data using RCTD, followed by purification with SPLIT. 16 | 17 | ⚠️ **Important Notice** 18 | 19 | SPLIT currently requires **doublet-mode** RCTD results generated with the original [spacexr GitHub repository](https://github.com/dmcable/spacexr) or the faster [HD fork](https://github.com/jpromeror/spacexr/tree/HD). 20 | 🚧 **Compatibility with the newly released [Bioconductor version](https://www.bioconductor.org/packages/release/bioc/html/spacexr.html) of spacexr is under development.** 21 | 22 | # Overview 23 | 24 | In this vignette, we: 25 | 26 | 1. **Run RCTD Annotation** 27 | - We begin by running RCTD annotation on a Xenium sample, using matched Chromium data from a public 10x Genomics dataset as the reference. 28 | 29 | 2. **Apply Default SPLIT Purification** 30 | - The default SPLIT purification method is then applied to clean the annotated Xenium sample, refining the initial annotations. 31 | 32 | 3. **Apply Spatially-Aware SPLIT** 33 | - We apply spatially-aware SPLIT, which purifies cells showing signs of contamination based on local spatial diffusion patterns. 34 | 35 | 4. **Apply SPLIT-Shift** 36 | - Finally, SPLIT-shift is applied to swap primary and secondary labels based on transcriptomic neighborhood heterogeneity, improving the accuracy of cell type assignments. 37 | 38 | This pipeline assumes that cell type assignments — originally derived from the Chromium reference — are refined and reliable for downstream analysis. 39 | 40 | 41 | ```{r libs, message=FALSE} 42 | if(!requireNamespace("spacexr", quietly = TRUE)){ 43 | remotes::install_github("dmcable/spacexr") ## or remotes::install_github("jpromeror/spacexr@HD") for implementation of the doublet mode. 44 | } 45 | library(spacexr) 46 | 47 | if(!requireNamespace("SPLIT", quietly = TRUE)){ 48 | remotes::install_github("bdsc-tds/SPLIT") 49 | } 50 | library(SPLIT) 51 | 52 | library(dplyr) 53 | library(Seurat) 54 | library(readxl) 55 | library(SingleCellExperiment) 56 | library(httr) 57 | library(ggplot2) 58 | ``` 59 | 60 | ## Load Data 61 | 62 | For this vignette, we use a publicly available Xenium dataset from the 10x Genomics database, originating from: 63 | 64 | > **Janesick, A., Shelansky, R., Gottscho, A.D. et al.** 65 | > *High resolution mapping of the tumor microenvironment using integrated single-cell, spatial and in situ analysis.* 66 | > *Nature Communications* 14, 8353 (2023). 67 | > [https://doi.org/10.1038/s41467-023-43458-x](https://doi.org/10.1038/s41467-023-43458-x) 68 | 69 | This dataset provides high-resolution spatial transcriptomics data suitable for downstream analysis with RCTD and SPLIT. 70 | 71 | --- 72 | 73 | ### Load Chromium Dataset (Reference) 74 | 75 | We manually load metadata for the Chromium single-cell dataset and metadata from the same study, which will serve as the **reference** for RCTD annotation. 76 | 77 | ```{r load-chormium-metadata} 78 | # read metadata 79 | url <- "https://static-content.springer.com/esm/art%3A10.1038%2Fs41467-023-43458-x/MediaObjects/41467_2023_43458_MOESM4_ESM.xlsx" 80 | temp_file <- tempfile(fileext = ".xlsx") 81 | GET(url, write_disk(temp_file, overwrite = TRUE)) 82 | 83 | chrom_metadata <- read_excel(temp_file, sheet = 1) %>% as.data.frame() 84 | rownames(chrom_metadata) <- chrom_metadata$Barcode 85 | ``` 86 | 87 | Manually load Chromium from 10x 88 | ```{r load-chromium} 89 | # read Chromium 90 | url <- "https://cf.10xgenomics.com/samples/cell-exp/7.0.1/Chromium_FFPE_Human_Breast_Cancer_Chromium_FFPE_Human_Breast_Cancer/Chromium_FFPE_Human_Breast_Cancer_Chromium_FFPE_Human_Breast_Cancer_count_sample_filtered_feature_bc_matrix.h5" 91 | temp_file <- tempfile(fileext = ".h5") 92 | GET(url, write_disk(temp_file, overwrite = TRUE)) 93 | 94 | chrom_counts <- Read10X_h5(temp_file) 95 | chrom <- CreateSeuratObject(counts = chrom_counts, assay = "RNA", meta.data = chrom_metadata) 96 | 97 | chrom$QC <- !is.na(chrom$Annotation) 98 | chrom$is_hybrid <- grepl("Hybrid", chrom$Annotation, ignore.case = TRUE) 99 | chrom <- subset(chrom, subset = QC == TRUE & is_hybrid == FALSE) # remove not annotated cells that cells that have sign of doublets 100 | 101 | ``` 102 | 103 | #### Generate broader level annoation (optional, but **highly recommended** for more robust RCTD annotation) 104 | Providing higher-level cell type classes improves RCTD accuracy and **significantly** reduces the number of rejected cells, preserving more cells (rejects are excluded from the downstream analyses and do not undergo SPLIT purification). 105 | ```{r class-df} 106 | cell_type_to_class <- c( 107 | "B_Cells" = "B cell", 108 | "CD4+_T_Cells" = "T cell", 109 | "CD8+_T_Cells" = "T cell", 110 | "IRF7+_DCs" = "Myeloid", 111 | "LAMP3+_DCs" = "Myeloid", 112 | "Macrophages_1" = "Myeloid", 113 | "Macrophages_2" = "Myeloid", 114 | "Mast_Cells" = "Myeloid", 115 | "DCIS 1" = "Epithelial", 116 | "DCIS 2" = "Epithelial", 117 | "Invasive_Tumor" = "Epithelial", 118 | "Prolif_Invasive_Tumor" = "Epithelial", 119 | "Myoepi_ACTA2+" = "Myoepithelial", 120 | "Myoepi_KRT15+" = "Myoepithelial", 121 | "Stromal" = "Stromal", 122 | "Perivascular-Like" = "Stromal", 123 | "Endothelial" = "Endothelial" 124 | ) 125 | 126 | class_df <- data.frame(class = cell_type_to_class) 127 | 128 | # and define colors for reproducibility 129 | library(RColorBrewer) 130 | 131 | cell_types <- unique(chrom$Annotation) 132 | colors <- brewer.pal(n = max(3, min(length(cell_types), 12)), name = "Set3") 133 | # Recycle colors if not enough 134 | colors <- rep(colors, length.out = length(cell_types)) 135 | pal <- setNames(colors, cell_types) 136 | ``` 137 | 138 | 139 | ### Load Xenium Dataset 140 | 141 | We load the Xenium spatial transcriptomics data using the `STexampleData` package, which provides convenient access to example spatial datasets for analysis and demonstration. 142 | 143 | ```{r load-xenium} 144 | if(!requireNamespace("STexampleData", quietly = TRUE)) 145 | remotes::install_github("lmweber/STexampleData") 146 | xe_full_seu <- STexampleData::Janesick_breastCancer_Xenium_rep1() 147 | 148 | ## Convert to Seurat to stay consistent with chromium object 149 | sp_coords <- spatialCoords(xe_full_seu) 150 | colnames(sp_coords) <- c("ST_1", "ST_2") 151 | 152 | xe_full <- CreateSeuratObject( 153 | counts = counts(xe_full_seu), 154 | assay = "Xenium", 155 | meta.data = as.data.frame(colData(xe_full_seu)) 156 | ) 157 | 158 | xe_full[["spatial"]] <- CreateDimReducObject(sp_coords, assay = "Xenium", key = "ST_") 159 | 160 | xe_full$x <- sp_coords[,1] 161 | xe_full$y <- sp_coords[,2] 162 | rm(xe_full_seu) 163 | ``` 164 | 165 | #### Downsample Xenium Dataset for Faster RCTD (Optional) 166 | 167 | Running RCTD can be time-consuming, especially on large datasets like Xenium. To speed up computation during this tutorial, we optionally provide code to downsample the dataset. We recommend spatial cropping (rather than random sampling) to preserve neighborhood structure, which is important for downstream analysis. 168 | 169 | That said, downsampling is **not required **, we provide precomputed RCTD results below so you can skip running RCTD altogether if desired. 170 | 171 | ```{r downsampling} 172 | DO_subset_xe <- FALSE 173 | X_lim <- c(6000, Inf) # cropping area 174 | Y_lim <- c(4000, Inf) # cropping area 175 | 176 | if(DO_subset_xe){ 177 | xe <- subset(xe_full, subset = x > min(X_lim) & x < max(X_lim) & y > min(Y_lim) & y < max(Y_lim)) 178 | } else { 179 | xe <- xe_full 180 | } 181 | ``` 182 | 183 | # RCDT annotation 184 | Run RCTD Annotation on Xenium 185 | Running RCTD on large datasets can be computationally intensive and may take several hours. To streamline the workflow, we provide the full code for reproducibility. However, we recommend loading a pre-computed RCTD object by setting `DO_run_RCTD <- FALSE`. 186 | ```{r rctd} 187 | DO_run_RCTC <- FALSE # FALSE to load pre-computed results 188 | 189 | common_genes <- intersect(rownames(xe), rownames(chrom)) 190 | ref_labels <- chrom$Annotation %>% as.factor() 191 | 192 | ref.obj <- Reference(GetAssayData(chrom, "RNA", "counts")[common_genes, ], 193 | cell_types = ref_labels, min_UMI = 10, require_int = TRUE) 194 | 195 | test.obj <- SpatialRNA(coords = xe@reductions$spatial@cell.embeddings %>% as.data.frame(), 196 | counts = GetAssayData(xe, assay = "Xenium", layer = "counts")[common_genes, ], 197 | require_int = TRUE) 198 | 199 | if(!exists("class_df")) 200 | class_df <- NULL 201 | 202 | RCTD <- create.RCTD( 203 | test.obj, 204 | ref.obj, 205 | UMI_min = 10, 206 | counts_MIN = 10, 207 | UMI_min_sigma = 100, 208 | max_cores = BiocParallel::multicoreWorkers() - 1, 209 | CELL_MIN_INSTANCE = 25, 210 | class_df = class_df # highly recommended 211 | ) 212 | 213 | if(DO_run_RCTC){ 214 | RCTD <- run.RCTD(RCTD, doublet_mode = "doublet") 215 | saveRDS(RCTD, "~/precomp_rctd_class_aware.rds") 216 | } else { 217 | message("reading precomp RCTD results") 218 | 219 | # Install googledrive if you haven't already 220 | if (!requireNamespace("googledrive", quietly = TRUE)) { 221 | install.packages("googledrive") 222 | } 223 | library(googledrive) 224 | drive_deauth() 225 | # Define the file ID from the Google Drive link 226 | file_id <- "1pTUKq49JbUFwVk7vttjZIFqkx-AKznRF" #"1DCalFIZJywOvrSGBSPqrHh9QINeQp8aq" 227 | local_path <- tempfile(fileext = ".rds") 228 | drive_download(as_id(file_id), path = local_path, overwrite = TRUE) 229 | RCTD <- readRDS(local_path) 230 | 231 | } 232 | 233 | ``` 234 | 235 | Visualize RCTD Annotation 236 | Post-process RCDT output and add reselts into Xenium object 237 | ```{r post-rctd} 238 | RCTD <- SPLIT::run_post_process_RCTD(RCTD) 239 | xe <- AddMetaData(xe, RCTD@results$results_df) 240 | xe <- subset(xe, subset = nCount_Xenium >= 10) 241 | 242 | cat("Proprtion of spot classes") 243 | (xe$spot_class %>% table())/ncol(xe)*100 244 | ``` 245 | 246 | ```{r plot-raw-xenium, fig.width=16, message=FALSE} 247 | 248 | xe <- xe %>% SCTransform(assay = "Xenium", verbose = FALSE) %>% RunPCA(verbose = FALSE) %>% RunUMAP(dims = 1:50, verbose = FALSE) 249 | p1 <- UMAPPlot(xe, group.by = "first_type", label = T, repel = T, cols = pal) + theme_void() + theme(aspect.ratio = 1, legend.position = "none") 250 | p2 <- UMAPPlot(xe, group.by = "second_type", cols = pal) + theme_void() + theme(aspect.ratio = 1, legend.position = "bottom") 251 | p3 <- UMAPPlot(xe, group.by = "spot_class") + theme_void() + theme(aspect.ratio = 1, legend.position = "right") 252 | 253 | p1 | p2 | p3 254 | 255 | ``` 256 | Spatial Visualization 257 | ```{r spatial-plot, fig.width=12, warning=FALSE} 258 | DimPlot(xe, reduction = "spatial", group.by = "first_type", raster = TRUE, cols = pal) + coord_fixed() 259 | ``` 260 | 261 | # Purification 262 | ## SPLIT (default) 263 | This section runs the default SPLIT purification and visualizes purified data. 264 | 265 | ```{r SPLIT, message=FALSE} 266 | # Run SPLIT purification 267 | res_split <- SPLIT::purify( 268 | counts = GetAssayData(xe, assay = 'Xenium', layer = 'counts'), # or any gene x cells counts matrix 269 | rctd = RCTD, 270 | DO_purify_singlets = TRUE # Optional. If TRUE, singlets with an available secondary type are purified the same way as doublets_certain; otherwise, left unchanged. 271 | ) 272 | 273 | # Create a purified Seurat object 274 | xe_purified <- CreateSeuratObject( 275 | counts = res_split$purified_counts, 276 | meta.data = res_split$cell_meta, 277 | assay = "Xenium" 278 | ) 279 | 280 | # Optional: Filter, normalize and visualize 281 | xe_purified <- subset(xe_purified, subset = nCount_Xenium > 5) 282 | xe_purified <- xe_purified %>% 283 | SCTransform(assay = "Xenium", verbose = FALSE) %>% 284 | RunPCA(verbose = FALSE) %>% 285 | RunUMAP(dims = 1:20, verbose = FALSE) 286 | 287 | #UMAPPlot(xe_purified, group.by = c("first_type"), label = T, repel = T) + theme_void() + theme(aspect.ratio = 1, legend.position = "none") 288 | ``` 289 | ### Visually compare results of Raw and SPLIT-Purified data 290 | ```{r plot-raw-split-purified, fig.width=12, message=FALSE} 291 | p1 <- UMAPPlot(xe, group.by = c("first_type"), label = T, repel = T, cols = pal) + theme_void() + theme(aspect.ratio = 1, legend.position = "none") + ggtitle("Raw Xenium data") 292 | 293 | p2 <- UMAPPlot(xe_purified, group.by = c("first_type"), label = T, repel = T, cols = pal) + theme_void() + theme(aspect.ratio = 1, legend.position = "right") + ggtitle("SPLIT-purified Xenium data") 294 | 295 | p3 <- UMAPPlot(xe_purified, group.by = c("spot_class")) + theme_void() + theme(aspect.ratio = 1, legend.position = "bottom") + ggtitle("SPLIT-purified Xenium data colored by spot class") 296 | p4 <- UMAPPlot(xe_purified, group.by = c("purification_status")) + theme_void() + theme(aspect.ratio = 1, legend.position = "bottom") + ggtitle("SPLIT-purified Xenium data colored by purification status") 297 | 298 | 299 | (p1|p2) 300 | (p3|p4) 301 | 302 | ``` 303 | 304 | ## Spatially-aware SPLIT 305 | SPLIT can leverage spatial information to assess the abundance of secondary signals in the local neighborhood (i.e., local diffusion potential), enabling selective decomposition only when contamination is likely. This spatially informed strategy helps prevent overcorrection of phenotypes that may be underrepresented or absent in the reference. 306 | Specifically, we first compute the spatial neighborhood for each cell, then identify and purify cells that have sign of local diffusion of the secondary cell type. 307 | ```{r spatial-nw} 308 | sp_nw <- SPLIT::build_spatial_network( 309 | xe, 310 | reduction = "spatial", 311 | dims = 1:2, 312 | DO_prune = TRUE, 313 | rad_pruning = 15, # remove connections further than 15um 314 | k_knn = 20 315 | ) 316 | 317 | sp_nw <- SPLIT::add_spatial_metric(spatial_neighborhood = sp_nw, rctd = RCTD) 318 | sp_neigh_df <- SPLIT::neighborhood_analysis_to_metadata(sp_nw) 319 | 320 | xe <- AddMetaData(xe, sp_neigh_df) 321 | 322 | rm(sp_nw, sp_neigh_df) 323 | ``` 324 | 325 | ### Visualize local diffusion of secondary cell type 326 | The score `neighborhood_weights_second_type` corresponds to the average weight of the secondary cell type in cell's spatial neighborhood. 327 | ```{r plot-neigh-weight-second-type, message=FALSE} 328 | # Plot magnitude of local diffusion on UMAP 329 | FeaturePlot(xe, features = c("neighborhood_weights_second_type")) + theme_void() + theme(aspect.ratio = 1) 330 | 331 | # Plot distribution of local diffusion value 332 | hist(xe$neighborhood_weights_second_type) 333 | 334 | # Plot distribution of local diffusion value per `spot_class` 335 | xe@meta.data %>% filter(!is.na(spot_class)) %>% 336 | ggplot(aes(x = spot_class, y = neighborhood_weights_second_type, color = spot_class)) + geom_boxplot() + labs(title = "Local neighbohood diffusion by spot class") + theme_minimal() 337 | ``` 338 | 339 | 340 | We now purify cells that have secondary signal in their spatial neighborhood (e.g., `neighborhood_weights_second_type`) and keep other cells unchanged 341 | ```{r spatially-aware-split, fig.width=12, message=FALSE} 342 | xe_purified_balanced_score <- SPLIT::balance_raw_and_purified_data_by_score( 343 | xe_raw = xe, 344 | xe_purified = xe_purified, 345 | default_assay = "Xenium", # should be param, but can wait 346 | spot_class_key = "spot_class", 347 | threshold = 0.05, # lower -> more cells will be purified 348 | score_name = "neighborhood_weights_second_type" 349 | ) 350 | 351 | # Optional: Filter, normalize and visualize 352 | xe_purified_balanced_score <- subset(xe_purified_balanced_score, subset = nCount_Xenium > 5) 353 | xe_purified_balanced_score <- xe_purified_balanced_score %>% 354 | SCTransform(assay = "Xenium", verbose = FALSE) %>% 355 | RunPCA(verbose = FALSE) %>% 356 | RunUMAP(dims = 1:20, verbose = FALSE) 357 | 358 | p5 <- UMAPPlot(xe_purified_balanced_score, group.by = c("first_type"), label = T, repel = T, cols = pal) + theme_void() + theme(aspect.ratio = 1, legend.position = "none") + ggtitle("Spatially-aware SPLIT-purified Xenium data") 359 | p6 <- UMAPPlot(xe_purified_balanced_score, group.by = c("purification_status")) + theme_void() + theme(aspect.ratio = 1, legend.position = "bottom") 360 | 361 | p5|p6 362 | ``` 363 | 364 | ## SPLIT-shift 365 | In some cases, the contamination signal is so strong that RCTD assigns the cell to its secondary cell type. To address this, we introduce SPLIT-shift—an approach that refines phenotype assignments by swapping the primary and secondary cell type labels based on transcriptional neighborhood homogeneity. 366 | 367 | For this, we need to compute transcriptomics neighborhood 368 | ```{r transcriptomics-nw} 369 | tr_nw <- build_transcriptomics_network( 370 | xe, 371 | DO_prune = FALSE, 372 | k_knn = 100 373 | ) 374 | tr_nw <- add_transcriptomics_metric(transcriptomics_neighborhood = tr_nw, rctd = RCTD) 375 | tr_neigh_df <- neighborhood_analysis_to_metadata(tr_nw) 376 | xe <- AddMetaData(xe, tr_neigh_df) 377 | 378 | rm(tr_nw, tr_neigh_df) 379 | ``` 380 | 381 | And then, we set `DO_swap_lables = TRUE` to allow SPLIT-shift 382 | ```{r run-split-shift} 383 | xe_split_shift <- SPLIT::balance_raw_and_purified_data_by_score( 384 | xe_raw = xe, 385 | xe_purified = xe_purified, 386 | default_assay = "Xenium", 387 | spot_class_key = "spot_class", 388 | threshold = 0.05, # to be consistent with spatially-aware SPLIT results 389 | score_name = "neighborhood_weights_second_type", 390 | DO_swap_lables = TRUE 391 | ) 392 | 393 | # Optional: Filter, normalize and visualize 394 | xe_split_shift <- subset(xe_split_shift, subset = nCount_Xenium > 5) 395 | xe_split_shift <- xe_split_shift %>% 396 | SCTransform(assay = "Xenium", verbose = FALSE) %>% 397 | RunPCA(verbose = FALSE) %>% 398 | RunUMAP(dims = 1:20, verbose = FALSE) 399 | ``` 400 | 401 | ### Visualize SPLIT-shift 402 | ```{r vis-split-shift, fig.width=16, message=FALSE} 403 | p7 <- UMAPPlot(xe_split_shift, group.by = c("first_type"), label = T, repel = T, cols = pal) + theme_void() + theme(aspect.ratio = 1, legend.position = "none") + ggtitle("SPLIT-shift-purified Xenium data") 404 | p8 <- UMAPPlot(xe_split_shift, group.by = c("purification_status")) + theme_void() + theme(aspect.ratio = 1, legend.position = "bottom") 405 | p9 <- UMAPPlot(xe_split_shift, group.by = c("swap")) + theme_void() + theme(aspect.ratio = 1, legend.position = "bottom") 406 | 407 | p7|p8|p9 408 | ``` 409 | 410 | ```{r vis-split-shift-swap-facet, fig.width=12, message=FALSE} 411 | # Visualize results faceted by swapping status 412 | p10 <- UMAPPlot(xe_split_shift, group.by = c("first_type"), split.by = "swap", raster = F, cols = pal) + theme_void() + theme(aspect.ratio = 1, legend.position = "bottom") + ggtitle("SPLIT-shift-purified Xenium data faceted by lable swapping status") 413 | p10 414 | ``` 415 | 416 | # Summary 417 | 418 | To run SPLIT, you need a single-cell reference with reliable cell type labels, which is used to annotate Xenium data using RCTD in doublet mode. 419 | We strongly recommend providing a broader-level mapping of the reference cell types to higher-level classes. This helps RCTD produce more robust results and reduces the number of rejected cells, which are excluded from downstream analysis. 420 | 421 | After annotation, SPLIT can be run in several **combinable** modes to purify the data: 422 | 423 | 1. **Default SPLIT** purifies all `doublets_certain`, all `doublets_uncertain`, and—if `DO_purify_singlets = TRUE` — singlets that show signs of a secondary cell type. 424 | Rejected cells are always removed as unreliable. 425 | 426 | 2. **Spatially-aware SPLIT** purifies any cells that show signs of contamination based on local spatial diffusion — i.e., having secondary signal in their spatial neighborhood. 427 | 428 | 3. **SPLIT-shift** allows swapping the primary and secondary cell type labels based on transcriptional neighborhood homogeneity. 429 | 430 | ```{r summary-plot, fig.width=18, fig.height=7, message=FALSE} 431 | p1 | p2+theme(legend.position = "bottom") | p5 | p7 432 | ``` 433 | 434 | # Bonus: Pie visualization 435 | 436 | We can visualize individual cells as pie charts, where colors represent the primary and secondary cell types, and sector areas are proportional to their respective weights. 437 | 438 | In this example, we display a randomly selected T cell that exhibits strong local diffusion of its secondary cell type (ie., there is high signal of secondary cell type in its spatial neighborhood). The plot includes this cell and all neighboring cells within a 50μm radius. 439 | 440 | ```{r, fig.width=12} 441 | pie_df <- get_pieplot_df(rctd = RCTD) 442 | cois <- xe@meta.data %>% filter(neighborhood_weights_second_type > .2, first_type %in% c("CD4+_T_Cells", "CD8+_T_Cells"), spot_class != "reject") %>% rownames() 443 | 444 | p <- plot_pie_around_cell(pie_df, cell_id = cois[1], radius = 50, cols = pal) 445 | plot(p) 446 | ``` 447 | 448 | The **border color** of each pie indicates the **primary cell type**. Additional markers within the pies reflect RCTD spot class (ie., annotation confidence): 449 | 450 | - A **black dot** indicates a `doublet_uncertain` cell. 451 | - A **colored dot** (matching the secondary cell type) indicates a `doublet_certain`. 452 | - A **cross (×)** marks a `reject` cell. 453 | - Pies with no internal symbol represent confidently assigned `singlet` cells. 454 | -------------------------------------------------------------------------------- /inst/CITATION: -------------------------------------------------------------------------------- 1 | citHeader("To cite SPLIT in publications, please use:") 2 | 3 | bibentry( 4 | bibtype = "Article", 5 | title = "From Transcripts to Cells: Dissecting Sensitivity, Signal Contamination, and Specificity in Xenium Spatial Transcriptomics", 6 | author = c( 7 | person("Mariia", "Bilous"), 8 | person("Daria", "Buszta"), 9 | person("Jonathan", "Bac"), 10 | person("Senbai", "Kang"), 11 | person("Yixing", "Dong"), 12 | person("Stephanie", "Tissot"), 13 | person("Sylvie", "Andre"), 14 | person("Marina", "Alexandre-Gaveta"), 15 | person("Christel", "Voize"), 16 | person("Solange", "Peters"), 17 | person("Krisztian", "Homicsko"), 18 | person("Raphael", "Gottardo") 19 | ), 20 | year = "2025", 21 | journal = "bioRxiv", 22 | note = "doi:10.1101/2025.04.23.649965", 23 | url = "https://doi.org/10.1101/2025.04.23.649965" 24 | ) 25 | -------------------------------------------------------------------------------- /vignettes/Run_RCTD_and_SPLIT_on_Xenium.Rmd: -------------------------------------------------------------------------------- 1 | --- 2 | title: "RCTD Annotation and SPLIT Purification for Xenium Sample" 3 | output: 4 | rmarkdown::github_document: 5 | toc: true 6 | toc_depth: 2 7 | vignette: > 8 | %\VignetteIndexEntry{Run RCTD and SPLIT on Xenium} 9 | %\VignetteEngine{knitr::rmarkdown} 10 | %\VignetteEncoding{UTF-8} 11 | --- 12 | 13 | # Introduction 14 | 15 | This vignette demonstrates how to annotate Xenium spatial transcriptomics data using RCTD, followed by purification with SPLIT. 16 | 17 | ⚠️ **Important Notice** 18 | 19 | SPLIT currently requires **doublet-mode** RCTD results generated with the original [spacexr GitHub repository](https://github.com/dmcable/spacexr) or the faster [HD fork](https://github.com/jpromeror/spacexr/tree/HD). 20 | 🚧 **Compatibility with the newly released [Bioconductor version](https://www.bioconductor.org/packages/release/bioc/html/spacexr.html) of spacexr is under development.** 21 | 22 | # Overview 23 | 24 | In this vignette, we: 25 | 26 | 1. **Run RCTD Annotation** 27 | - We begin by running RCTD annotation on a Xenium sample, using matched Chromium data from a public 10x Genomics dataset as the reference. 28 | 29 | 2. **Apply Default SPLIT Purification** 30 | - The default SPLIT purification method is then applied to clean the annotated Xenium sample, refining the initial annotations. 31 | 32 | 3. **Apply Spatially-Aware SPLIT** 33 | - We apply spatially-aware SPLIT, which purifies cells showing signs of contamination based on local spatial diffusion patterns. 34 | 35 | 4. **Apply SPLIT-Shift** 36 | - Finally, SPLIT-shift is applied to swap primary and secondary labels based on transcriptomic neighborhood heterogeneity, improving the accuracy of cell type assignments. 37 | 38 | This pipeline assumes that cell type assignments — originally derived from the Chromium reference — are refined and reliable for downstream analysis. 39 | 40 | 41 | ```{r libs, message=FALSE} 42 | if(!requireNamespace("spacexr", quietly = TRUE)){ 43 | remotes::install_github("dmcable/spacexr") ## or remotes::install_github("jpromeror/spacexr@HD") for implementation of the doublet mode. 44 | } 45 | library(spacexr) 46 | 47 | if(!requireNamespace("SPLIT", quietly = TRUE)){ 48 | remotes::install_github("bdsc-tds/SPLIT") 49 | } 50 | library(SPLIT) 51 | 52 | library(dplyr) 53 | library(Seurat) 54 | library(readxl) 55 | library(SingleCellExperiment) 56 | library(httr) 57 | library(ggplot2) 58 | ``` 59 | 60 | ## Load Data 61 | 62 | For this vignette, we use a publicly available Xenium dataset from the 10x Genomics database, originating from: 63 | 64 | > **Janesick, A., Shelansky, R., Gottscho, A.D. et al.** 65 | > *High resolution mapping of the tumor microenvironment using integrated single-cell, spatial and in situ analysis.* 66 | > *Nature Communications* 14, 8353 (2023). 67 | > [https://doi.org/10.1038/s41467-023-43458-x](https://doi.org/10.1038/s41467-023-43458-x) 68 | 69 | This dataset provides high-resolution spatial transcriptomics data suitable for downstream analysis with RCTD and SPLIT. 70 | 71 | --- 72 | 73 | ### Load Chromium Dataset (Reference) 74 | 75 | We manually load metadata for the Chromium single-cell dataset and metadata from the same study, which will serve as the **reference** for RCTD annotation. 76 | 77 | ```{r load-chormium-metadata} 78 | # read metadata 79 | url <- "https://static-content.springer.com/esm/art%3A10.1038%2Fs41467-023-43458-x/MediaObjects/41467_2023_43458_MOESM4_ESM.xlsx" 80 | temp_file <- tempfile(fileext = ".xlsx") 81 | GET(url, write_disk(temp_file, overwrite = TRUE)) 82 | 83 | chrom_metadata <- read_excel(temp_file, sheet = 1) %>% as.data.frame() 84 | rownames(chrom_metadata) <- chrom_metadata$Barcode 85 | ``` 86 | 87 | Manually load Chromium from 10x 88 | ```{r load-chromium} 89 | # read Chromium 90 | url <- "https://cf.10xgenomics.com/samples/cell-exp/7.0.1/Chromium_FFPE_Human_Breast_Cancer_Chromium_FFPE_Human_Breast_Cancer/Chromium_FFPE_Human_Breast_Cancer_Chromium_FFPE_Human_Breast_Cancer_count_sample_filtered_feature_bc_matrix.h5" 91 | temp_file <- tempfile(fileext = ".h5") 92 | GET(url, write_disk(temp_file, overwrite = TRUE)) 93 | 94 | chrom_counts <- Read10X_h5(temp_file) 95 | chrom <- CreateSeuratObject(counts = chrom_counts, assay = "RNA", meta.data = chrom_metadata) 96 | 97 | chrom$QC <- !is.na(chrom$Annotation) 98 | chrom$is_hybrid <- grepl("Hybrid", chrom$Annotation, ignore.case = TRUE) 99 | chrom <- subset(chrom, subset = QC == TRUE & is_hybrid == FALSE) # remove not annotated cells that cells that have sign of doublets 100 | 101 | ``` 102 | 103 | #### Generate broader level annoation (optional, but **highly recommended** for more robust RCTD annotation) 104 | Providing higher-level cell type classes improves RCTD accuracy and **significantly** reduces the number of rejected cells, preserving more cells (rejects are excluded from the downstream analyses and do not undergo SPLIT purification). 105 | ```{r class-df} 106 | cell_type_to_class <- c( 107 | "B_Cells" = "B cell", 108 | "CD4+_T_Cells" = "T cell", 109 | "CD8+_T_Cells" = "T cell", 110 | "IRF7+_DCs" = "Myeloid", 111 | "LAMP3+_DCs" = "Myeloid", 112 | "Macrophages_1" = "Myeloid", 113 | "Macrophages_2" = "Myeloid", 114 | "Mast_Cells" = "Myeloid", 115 | "DCIS 1" = "Epithelial", 116 | "DCIS 2" = "Epithelial", 117 | "Invasive_Tumor" = "Epithelial", 118 | "Prolif_Invasive_Tumor" = "Epithelial", 119 | "Myoepi_ACTA2+" = "Myoepithelial", 120 | "Myoepi_KRT15+" = "Myoepithelial", 121 | "Stromal" = "Stromal", 122 | "Perivascular-Like" = "Stromal", 123 | "Endothelial" = "Endothelial" 124 | ) 125 | 126 | class_df <- data.frame(class = cell_type_to_class) 127 | 128 | # and define colors for reproducibility 129 | library(RColorBrewer) 130 | 131 | cell_types <- unique(chrom$Annotation) 132 | colors <- brewer.pal(n = max(3, min(length(cell_types), 12)), name = "Set3") 133 | # Recycle colors if not enough 134 | colors <- rep(colors, length.out = length(cell_types)) 135 | pal <- setNames(colors, cell_types) 136 | ``` 137 | 138 | 139 | ### Load Xenium Dataset 140 | 141 | We load the Xenium spatial transcriptomics data using the `STexampleData` package, which provides convenient access to example spatial datasets for analysis and demonstration. 142 | 143 | ```{r load-xenium} 144 | if(!requireNamespace("STexampleData", quietly = TRUE)) 145 | remotes::install_github("lmweber/STexampleData") 146 | xe_full_seu <- STexampleData::Janesick_breastCancer_Xenium_rep1() 147 | 148 | ## Convert to Seurat to stay consistent with chromium object 149 | sp_coords <- spatialCoords(xe_full_seu) 150 | colnames(sp_coords) <- c("ST_1", "ST_2") 151 | 152 | xe_full <- CreateSeuratObject( 153 | counts = counts(xe_full_seu), 154 | assay = "Xenium", 155 | meta.data = as.data.frame(colData(xe_full_seu)) 156 | ) 157 | 158 | xe_full[["spatial"]] <- CreateDimReducObject(sp_coords, assay = "Xenium", key = "ST_") 159 | 160 | xe_full$x <- sp_coords[,1] 161 | xe_full$y <- sp_coords[,2] 162 | rm(xe_full_seu) 163 | ``` 164 | 165 | #### Downsample Xenium Dataset for Faster RCTD (Optional) 166 | 167 | Running RCTD can be time-consuming, especially on large datasets like Xenium. To speed up computation during this tutorial, we optionally provide code to downsample the dataset. We recommend spatial cropping (rather than random sampling) to preserve neighborhood structure, which is important for downstream analysis. 168 | 169 | That said, downsampling is **not required **, we provide precomputed RCTD results below so you can skip running RCTD altogether if desired. 170 | 171 | ```{r downsampling} 172 | DO_subset_xe <- FALSE 173 | X_lim <- c(6000, Inf) # cropping area 174 | Y_lim <- c(4000, Inf) # cropping area 175 | 176 | if(DO_subset_xe){ 177 | xe <- subset(xe_full, subset = x > min(X_lim) & x < max(X_lim) & y > min(Y_lim) & y < max(Y_lim)) 178 | } else { 179 | xe <- xe_full 180 | } 181 | ``` 182 | 183 | # RCDT annotation 184 | Run RCTD Annotation on Xenium 185 | Running RCTD on large datasets can be computationally intensive and may take several hours. To streamline the workflow, we provide the full code for reproducibility. However, we recommend loading a pre-computed RCTD object by setting `DO_run_RCTD <- FALSE`. 186 | ```{r rctd} 187 | DO_run_RCTC <- FALSE # FALSE to load pre-computed results 188 | 189 | common_genes <- intersect(rownames(xe), rownames(chrom)) 190 | ref_labels <- chrom$Annotation %>% as.factor() 191 | 192 | ref.obj <- Reference(GetAssayData(chrom, "RNA", "counts")[common_genes, ], 193 | cell_types = ref_labels, min_UMI = 10, require_int = TRUE) 194 | 195 | test.obj <- SpatialRNA(coords = xe@reductions$spatial@cell.embeddings %>% as.data.frame(), 196 | counts = GetAssayData(xe, assay = "Xenium", layer = "counts")[common_genes, ], 197 | require_int = TRUE) 198 | 199 | if(!exists("class_df")) 200 | class_df <- NULL 201 | 202 | RCTD <- create.RCTD( 203 | test.obj, 204 | ref.obj, 205 | UMI_min = 10, 206 | counts_MIN = 10, 207 | UMI_min_sigma = 100, 208 | max_cores = BiocParallel::multicoreWorkers() - 1, 209 | CELL_MIN_INSTANCE = 25, 210 | class_df = class_df # highly recommended 211 | ) 212 | 213 | if(DO_run_RCTC){ 214 | RCTD <- run.RCTD(RCTD, doublet_mode = "doublet") 215 | saveRDS(RCTD, "~/precomp_rctd_class_aware.rds") 216 | } else { 217 | message("reading precomp RCTD results") 218 | 219 | # Install googledrive if you haven't already 220 | if (!requireNamespace("googledrive", quietly = TRUE)) { 221 | install.packages("googledrive") 222 | } 223 | library(googledrive) 224 | drive_deauth() 225 | # Define the file ID from the Google Drive link 226 | file_id <- "1pTUKq49JbUFwVk7vttjZIFqkx-AKznRF" #"1DCalFIZJywOvrSGBSPqrHh9QINeQp8aq" 227 | local_path <- tempfile(fileext = ".rds") 228 | drive_download(as_id(file_id), path = local_path, overwrite = TRUE) 229 | RCTD <- readRDS(local_path) 230 | 231 | } 232 | 233 | ``` 234 | 235 | Visualize RCTD Annotation 236 | Post-process RCDT output and add reselts into Xenium object 237 | ```{r post-rctd} 238 | RCTD <- SPLIT::run_post_process_RCTD(RCTD) 239 | xe <- AddMetaData(xe, RCTD@results$results_df) 240 | xe <- subset(xe, subset = nCount_Xenium >= 10) 241 | 242 | cat("Proprtion of spot classes") 243 | (xe$spot_class %>% table())/ncol(xe)*100 244 | ``` 245 | 246 | ```{r plot-raw-xenium, fig.width=16, message=FALSE} 247 | 248 | xe <- xe %>% SCTransform(assay = "Xenium", verbose = FALSE) %>% RunPCA(verbose = FALSE) %>% RunUMAP(dims = 1:50, verbose = FALSE) 249 | p1 <- UMAPPlot(xe, group.by = "first_type", label = T, repel = T, cols = pal) + theme_void() + theme(aspect.ratio = 1, legend.position = "none") 250 | p2 <- UMAPPlot(xe, group.by = "second_type", cols = pal) + theme_void() + theme(aspect.ratio = 1, legend.position = "bottom") 251 | p3 <- UMAPPlot(xe, group.by = "spot_class") + theme_void() + theme(aspect.ratio = 1, legend.position = "right") 252 | 253 | p1 | p2 | p3 254 | 255 | ``` 256 | Spatial Visualization 257 | ```{r spatial-plot, fig.width=12, warning=FALSE} 258 | DimPlot(xe, reduction = "spatial", group.by = "first_type", raster = TRUE, cols = pal) + coord_fixed() 259 | ``` 260 | 261 | # Purification 262 | ## SPLIT (default) 263 | This section runs the default SPLIT purification and visualizes purified data. 264 | 265 | ```{r SPLIT, message=FALSE} 266 | # Run SPLIT purification 267 | res_split <- SPLIT::purify( 268 | counts = GetAssayData(xe, assay = 'Xenium', layer = 'counts'), # or any gene x cells counts matrix 269 | rctd = RCTD, 270 | DO_purify_singlets = TRUE # Optional. If TRUE, singlets with an available secondary type are purified the same way as doublets_certain; otherwise, left unchanged. 271 | ) 272 | 273 | # Create a purified Seurat object 274 | xe_purified <- CreateSeuratObject( 275 | counts = res_split$purified_counts, 276 | meta.data = res_split$cell_meta, 277 | assay = "Xenium" 278 | ) 279 | 280 | # Optional: Filter, normalize and visualize 281 | xe_purified <- subset(xe_purified, subset = nCount_Xenium > 5) 282 | xe_purified <- xe_purified %>% 283 | SCTransform(assay = "Xenium", verbose = FALSE) %>% 284 | RunPCA(verbose = FALSE) %>% 285 | RunUMAP(dims = 1:20, verbose = FALSE) 286 | 287 | #UMAPPlot(xe_purified, group.by = c("first_type"), label = T, repel = T) + theme_void() + theme(aspect.ratio = 1, legend.position = "none") 288 | ``` 289 | ### Visually compare results of Raw and SPLIT-Purified data 290 | ```{r plot-raw-split-purified, fig.width=12, message=FALSE} 291 | p1 <- UMAPPlot(xe, group.by = c("first_type"), label = T, repel = T, cols = pal) + theme_void() + theme(aspect.ratio = 1, legend.position = "none") + ggtitle("Raw Xenium data") 292 | 293 | p2 <- UMAPPlot(xe_purified, group.by = c("first_type"), label = T, repel = T, cols = pal) + theme_void() + theme(aspect.ratio = 1, legend.position = "right") + ggtitle("SPLIT-purified Xenium data") 294 | 295 | p3 <- UMAPPlot(xe_purified, group.by = c("spot_class")) + theme_void() + theme(aspect.ratio = 1, legend.position = "bottom") + ggtitle("SPLIT-purified Xenium data colored by spot class") 296 | p4 <- UMAPPlot(xe_purified, group.by = c("purification_status")) + theme_void() + theme(aspect.ratio = 1, legend.position = "bottom") + ggtitle("SPLIT-purified Xenium data colored by purification status") 297 | 298 | 299 | (p1|p2) 300 | (p3|p4) 301 | 302 | ``` 303 | 304 | ## Spatially-aware SPLIT 305 | SPLIT can leverage spatial information to assess the abundance of secondary signals in the local neighborhood (i.e., local diffusion potential), enabling selective decomposition only when contamination is likely. This spatially informed strategy helps prevent overcorrection of phenotypes that may be underrepresented or absent in the reference. 306 | Specifically, we first compute the spatial neighborhood for each cell, then identify and purify cells that have sign of local diffusion of the secondary cell type. 307 | ```{r spatial-nw} 308 | sp_nw <- SPLIT::build_spatial_network( 309 | xe, 310 | reduction = "spatial", 311 | dims = 1:2, 312 | DO_prune = TRUE, 313 | rad_pruning = 15, # remove connections further than 15um 314 | k_knn = 20 315 | ) 316 | 317 | sp_nw <- SPLIT::add_spatial_metric(spatial_neighborhood = sp_nw, rctd = RCTD) 318 | sp_neigh_df <- SPLIT::neighborhood_analysis_to_metadata(sp_nw) 319 | 320 | xe <- AddMetaData(xe, sp_neigh_df) 321 | 322 | rm(sp_nw, sp_neigh_df) 323 | ``` 324 | 325 | ### Visualize local diffusion of secondary cell type 326 | The score `neighborhood_weights_second_type` corresponds to the average weight of the secondary cell type in cell's spatial neighborhood. 327 | ```{r plot-neigh-weight-second-type, message=FALSE} 328 | # Plot magnitude of local diffusion on UMAP 329 | FeaturePlot(xe, features = c("neighborhood_weights_second_type")) + theme_void() + theme(aspect.ratio = 1) 330 | 331 | # Plot distribution of local diffusion value 332 | hist(xe$neighborhood_weights_second_type) 333 | 334 | # Plot distribution of local diffusion value per `spot_class` 335 | xe@meta.data %>% filter(!is.na(spot_class)) %>% 336 | ggplot(aes(x = spot_class, y = neighborhood_weights_second_type, color = spot_class)) + geom_boxplot() + labs(title = "Local neighbohood diffusion by spot class") + theme_minimal() 337 | ``` 338 | 339 | 340 | We now purify cells that have secondary signal in their spatial neighborhood (e.g., `neighborhood_weights_second_type`) and keep other cells unchanged 341 | ```{r spatially-aware-split, fig.width=12, message=FALSE} 342 | xe_purified_balanced_score <- SPLIT::balance_raw_and_purified_data_by_score( 343 | xe_raw = xe, 344 | xe_purified = xe_purified, 345 | default_assay = "Xenium", # should be param, but can wait 346 | spot_class_key = "spot_class", 347 | threshold = 0.05, # lower -> more cells will be purified 348 | score_name = "neighborhood_weights_second_type" 349 | ) 350 | 351 | # Optional: Filter, normalize and visualize 352 | xe_purified_balanced_score <- subset(xe_purified_balanced_score, subset = nCount_Xenium > 5) 353 | xe_purified_balanced_score <- xe_purified_balanced_score %>% 354 | SCTransform(assay = "Xenium", verbose = FALSE) %>% 355 | RunPCA(verbose = FALSE) %>% 356 | RunUMAP(dims = 1:20, verbose = FALSE) 357 | 358 | p5 <- UMAPPlot(xe_purified_balanced_score, group.by = c("first_type"), label = T, repel = T, cols = pal) + theme_void() + theme(aspect.ratio = 1, legend.position = "none") + ggtitle("Spatially-aware SPLIT-purified Xenium data") 359 | p6 <- UMAPPlot(xe_purified_balanced_score, group.by = c("purification_status")) + theme_void() + theme(aspect.ratio = 1, legend.position = "bottom") 360 | 361 | p5|p6 362 | ``` 363 | 364 | ## SPLIT-shift 365 | In some cases, the contamination signal is so strong that RCTD assigns the cell to its secondary cell type. To address this, we introduce SPLIT-shift—an approach that refines phenotype assignments by swapping the primary and secondary cell type labels based on transcriptional neighborhood homogeneity. 366 | 367 | For this, we need to compute transcriptomics neighborhood 368 | ```{r transcriptomics-nw} 369 | tr_nw <- build_transcriptomics_network( 370 | xe, 371 | DO_prune = FALSE, 372 | k_knn = 100 373 | ) 374 | tr_nw <- add_transcriptomics_metric(transcriptomics_neighborhood = tr_nw, rctd = RCTD) 375 | tr_neigh_df <- neighborhood_analysis_to_metadata(tr_nw) 376 | xe <- AddMetaData(xe, tr_neigh_df) 377 | 378 | rm(tr_nw, tr_neigh_df) 379 | ``` 380 | 381 | And then, we set `DO_swap_lables = TRUE` to allow SPLIT-shift 382 | ```{r run-split-shift} 383 | xe_split_shift <- SPLIT::balance_raw_and_purified_data_by_score( 384 | xe_raw = xe, 385 | xe_purified = xe_purified, 386 | default_assay = "Xenium", 387 | spot_class_key = "spot_class", 388 | threshold = 0.05, # to be consistent with spatially-aware SPLIT results 389 | score_name = "neighborhood_weights_second_type", 390 | DO_swap_lables = TRUE 391 | ) 392 | 393 | # Optional: Filter, normalize and visualize 394 | xe_split_shift <- subset(xe_split_shift, subset = nCount_Xenium > 5) 395 | xe_split_shift <- xe_split_shift %>% 396 | SCTransform(assay = "Xenium", verbose = FALSE) %>% 397 | RunPCA(verbose = FALSE) %>% 398 | RunUMAP(dims = 1:20, verbose = FALSE) 399 | ``` 400 | 401 | ### Visualize SPLIT-shift 402 | ```{r vis-split-shift, fig.width=16, message=FALSE} 403 | p7 <- UMAPPlot(xe_split_shift, group.by = c("first_type"), label = T, repel = T, cols = pal) + theme_void() + theme(aspect.ratio = 1, legend.position = "none") + ggtitle("SPLIT-shift-purified Xenium data") 404 | p8 <- UMAPPlot(xe_split_shift, group.by = c("purification_status")) + theme_void() + theme(aspect.ratio = 1, legend.position = "bottom") 405 | p9 <- UMAPPlot(xe_split_shift, group.by = c("swap")) + theme_void() + theme(aspect.ratio = 1, legend.position = "bottom") 406 | 407 | p7|p8|p9 408 | ``` 409 | 410 | ```{r vis-split-shift-swap-facet, fig.width=12, message=FALSE} 411 | # Visualize results faceted by swapping status 412 | p10 <- UMAPPlot(xe_split_shift, group.by = c("first_type"), split.by = "swap", raster = F, cols = pal) + theme_void() + theme(aspect.ratio = 1, legend.position = "bottom") + ggtitle("SPLIT-shift-purified Xenium data faceted by lable swapping status") 413 | p10 414 | ``` 415 | 416 | # Summary 417 | 418 | To run SPLIT, you need a single-cell reference with reliable cell type labels, which is used to annotate Xenium data using RCTD in doublet mode. 419 | We strongly recommend providing a broader-level mapping of the reference cell types to higher-level classes. This helps RCTD produce more robust results and reduces the number of rejected cells, which are excluded from downstream analysis. 420 | 421 | After annotation, SPLIT can be run in several **combinable** modes to purify the data: 422 | 423 | 1. **Default SPLIT** purifies all `doublets_certain`, all `doublets_uncertain`, and—if `DO_purify_singlets = TRUE` — singlets that show signs of a secondary cell type. 424 | Rejected cells are always removed as unreliable. 425 | 426 | 2. **Spatially-aware SPLIT** purifies any cells that show signs of contamination based on local spatial diffusion — i.e., having secondary signal in their spatial neighborhood. 427 | 428 | 3. **SPLIT-shift** allows swapping the primary and secondary cell type labels based on transcriptional neighborhood homogeneity. 429 | 430 | ```{r summary-plot, fig.width=18, fig.height=7, message=FALSE} 431 | p1 | p2+theme(legend.position = "bottom") | p5 | p7 432 | ``` 433 | 434 | # Bonus: Pie visualization 435 | 436 | We can visualize individual cells as pie charts, where colors represent the primary and secondary cell types, and sector areas are proportional to their respective weights. 437 | 438 | In this example, we display a randomly selected T cell that exhibits strong local diffusion of its secondary cell type (ie., there is high signal of secondary cell type in its spatial neighborhood). The plot includes this cell and all neighboring cells within a 50μm radius. 439 | 440 | ```{r, fig.width=12} 441 | pie_df <- get_pieplot_df(rctd = RCTD) 442 | cois <- xe@meta.data %>% filter(neighborhood_weights_second_type > .2, first_type %in% c("CD4+_T_Cells", "CD8+_T_Cells"), spot_class != "reject") %>% rownames() 443 | 444 | p <- plot_pie_around_cell(pie_df, cell_id = cois[1], radius = 50, cols = pal) 445 | plot(p) 446 | ``` 447 | 448 | The **border color** of each pie indicates the **primary cell type**. Additional markers within the pies reflect RCTD spot class (ie., annotation confidence): 449 | 450 | - A **black dot** indicates a `doublet_uncertain` cell. 451 | - A **colored dot** (matching the secondary cell type) indicates a `doublet_certain`. 452 | - A **cross (×)** marks a `reject` cell. 453 | - Pies with no internal symbol represent confidently assigned `singlet` cells. 454 | -------------------------------------------------------------------------------- /vignettes/plots/SPLIT_schema.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/bdsc-tds/SPLIT/cd90fa7572b7a2ad6ac864e07659b8777a5d8ac0/vignettes/plots/SPLIT_schema.png --------------------------------------------------------------------------------