├── Example ├── GEMIL_GitHub_testing.png ├── GEMLI_GitHub_cancer_lineage_overview_upsetR.png ├── GEMLI_GitHub_crypts_lineage_overview_bubble.png ├── GEMLI_GitHub_crypts_lineage_overview_upsetR.png ├── GEMLI_GitHub_crypts_network_70_cell_type_colors.png ├── GEMLI_GitHub_crypts_network_70_custom_cell_type_colors.png ├── GEMLI_GitHub_crypts_network_70_fr.png ├── GEMLI_GitHub_crypts_network_70_grid.png ├── GEMLI_GitHub_crypts_network_70_kk.png ├── GEMLI_GitHub_network_50.png ├── GEMLI_GitHub_network_50_GT.png ├── GEMLI_GitHub_network_50_GT_ST.png ├── GEMLI_GitHub_network_50_trim.png ├── GEMLI_GitHub_network_50_trim_GT.png ├── GEMLI_GitHub_network_90.png ├── GEMLI_GitHub_network_90_GT.png ├── GEMLI_GitHub_volcano_asym_inv_tumor_asym_DCIS.png ├── GEMLI_GitHub_volcano_sym_DCIS_asym_DCIS.png ├── GEMLI_cancer_example_cell_type_annotation.RData ├── GEMLI_cancer_example_norm_count.RData ├── GEMLI_cancer_example_predicted_lineages.RData ├── GEMLI_crypts_example_barcode_information.RData ├── GEMLI_crypts_example_cell_type_annotation.RData ├── GEMLI_crypts_example_data_matrix.RData ├── GEMLI_example_barcode_information.RData ├── GEMLI_example_data_matrix.RData └── Scheme_Phillips.png ├── GEMLI_package_v0 ├── DESCRIPTION ├── GEMLI.Rproj ├── NAMESPACE ├── R │ ├── DEG_volcano_plot.R │ ├── calculate_correlations.R │ ├── cell_fate_DEG_calling.R │ ├── cell_type_composition_plot.R │ ├── cluster_stability_plot.R │ ├── extract_cell_fate_lineages.R │ ├── memory_gene_calling.R │ ├── potential_markers.R │ ├── predict_lineages.R │ ├── predict_lineages_multiple_sizes.R │ ├── predict_lineages_with_known_markers.R │ ├── prediction_to_lineage_information.R │ ├── quantify_clusters_iterative.R │ ├── suggest_network_trimming_to_size.R │ ├── test_lineages.R │ ├── trim_network_to_size.R │ └── visualize_as_network.R ├── man │ ├── DEG_volcano_plot.Rd │ ├── calculate_correlations.Rd │ ├── cell_fate_DEG_calling.Rd │ ├── cell_type_composition_plot.Rd │ ├── cluster_stability_plot.Rd │ ├── extract_cell_fate_lineages.Rd │ ├── memory_gene_calling.Rd │ ├── potential_markers.Rd │ ├── predict_lineages.Rd │ ├── predict_lineages_multiple_sizes.Rd │ ├── predict_lineages_with_known_markers.Rd │ ├── prediction_to_lineage_information.Rd │ ├── quantify_clusters_iterative.Rd │ ├── suggest_network_trimming_to_size.Rd │ ├── test_lineages.Rd │ ├── trim_network_to_size.Rd │ └── visualize_as_network.Rd └── vignettes │ ├── Example1_simple_lineage_predictions.Rmd │ ├── Example2_predicting_multicellular_structures.Rmd │ └── Example3_cell_fate_analysis.Rmd └── README.md /Example/GEMIL_GitHub_testing.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/UPSUTER/GEMLI/fe4eb91d16aadec3875ef266940b2703b052a30a/Example/GEMIL_GitHub_testing.png -------------------------------------------------------------------------------- /Example/GEMLI_GitHub_cancer_lineage_overview_upsetR.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/UPSUTER/GEMLI/fe4eb91d16aadec3875ef266940b2703b052a30a/Example/GEMLI_GitHub_cancer_lineage_overview_upsetR.png -------------------------------------------------------------------------------- /Example/GEMLI_GitHub_crypts_lineage_overview_bubble.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/UPSUTER/GEMLI/fe4eb91d16aadec3875ef266940b2703b052a30a/Example/GEMLI_GitHub_crypts_lineage_overview_bubble.png -------------------------------------------------------------------------------- /Example/GEMLI_GitHub_crypts_lineage_overview_upsetR.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/UPSUTER/GEMLI/fe4eb91d16aadec3875ef266940b2703b052a30a/Example/GEMLI_GitHub_crypts_lineage_overview_upsetR.png -------------------------------------------------------------------------------- /Example/GEMLI_GitHub_crypts_network_70_cell_type_colors.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/UPSUTER/GEMLI/fe4eb91d16aadec3875ef266940b2703b052a30a/Example/GEMLI_GitHub_crypts_network_70_cell_type_colors.png -------------------------------------------------------------------------------- /Example/GEMLI_GitHub_crypts_network_70_custom_cell_type_colors.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/UPSUTER/GEMLI/fe4eb91d16aadec3875ef266940b2703b052a30a/Example/GEMLI_GitHub_crypts_network_70_custom_cell_type_colors.png -------------------------------------------------------------------------------- /Example/GEMLI_GitHub_crypts_network_70_fr.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/UPSUTER/GEMLI/fe4eb91d16aadec3875ef266940b2703b052a30a/Example/GEMLI_GitHub_crypts_network_70_fr.png -------------------------------------------------------------------------------- /Example/GEMLI_GitHub_crypts_network_70_grid.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/UPSUTER/GEMLI/fe4eb91d16aadec3875ef266940b2703b052a30a/Example/GEMLI_GitHub_crypts_network_70_grid.png -------------------------------------------------------------------------------- /Example/GEMLI_GitHub_crypts_network_70_kk.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/UPSUTER/GEMLI/fe4eb91d16aadec3875ef266940b2703b052a30a/Example/GEMLI_GitHub_crypts_network_70_kk.png -------------------------------------------------------------------------------- /Example/GEMLI_GitHub_network_50.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/UPSUTER/GEMLI/fe4eb91d16aadec3875ef266940b2703b052a30a/Example/GEMLI_GitHub_network_50.png -------------------------------------------------------------------------------- /Example/GEMLI_GitHub_network_50_GT.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/UPSUTER/GEMLI/fe4eb91d16aadec3875ef266940b2703b052a30a/Example/GEMLI_GitHub_network_50_GT.png -------------------------------------------------------------------------------- /Example/GEMLI_GitHub_network_50_GT_ST.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/UPSUTER/GEMLI/fe4eb91d16aadec3875ef266940b2703b052a30a/Example/GEMLI_GitHub_network_50_GT_ST.png -------------------------------------------------------------------------------- /Example/GEMLI_GitHub_network_50_trim.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/UPSUTER/GEMLI/fe4eb91d16aadec3875ef266940b2703b052a30a/Example/GEMLI_GitHub_network_50_trim.png -------------------------------------------------------------------------------- /Example/GEMLI_GitHub_network_50_trim_GT.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/UPSUTER/GEMLI/fe4eb91d16aadec3875ef266940b2703b052a30a/Example/GEMLI_GitHub_network_50_trim_GT.png -------------------------------------------------------------------------------- /Example/GEMLI_GitHub_network_90.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/UPSUTER/GEMLI/fe4eb91d16aadec3875ef266940b2703b052a30a/Example/GEMLI_GitHub_network_90.png -------------------------------------------------------------------------------- /Example/GEMLI_GitHub_network_90_GT.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/UPSUTER/GEMLI/fe4eb91d16aadec3875ef266940b2703b052a30a/Example/GEMLI_GitHub_network_90_GT.png -------------------------------------------------------------------------------- /Example/GEMLI_GitHub_volcano_asym_inv_tumor_asym_DCIS.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/UPSUTER/GEMLI/fe4eb91d16aadec3875ef266940b2703b052a30a/Example/GEMLI_GitHub_volcano_asym_inv_tumor_asym_DCIS.png -------------------------------------------------------------------------------- /Example/GEMLI_GitHub_volcano_sym_DCIS_asym_DCIS.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/UPSUTER/GEMLI/fe4eb91d16aadec3875ef266940b2703b052a30a/Example/GEMLI_GitHub_volcano_sym_DCIS_asym_DCIS.png -------------------------------------------------------------------------------- /Example/GEMLI_cancer_example_cell_type_annotation.RData: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/UPSUTER/GEMLI/fe4eb91d16aadec3875ef266940b2703b052a30a/Example/GEMLI_cancer_example_cell_type_annotation.RData -------------------------------------------------------------------------------- /Example/GEMLI_cancer_example_norm_count.RData: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/UPSUTER/GEMLI/fe4eb91d16aadec3875ef266940b2703b052a30a/Example/GEMLI_cancer_example_norm_count.RData -------------------------------------------------------------------------------- /Example/GEMLI_cancer_example_predicted_lineages.RData: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/UPSUTER/GEMLI/fe4eb91d16aadec3875ef266940b2703b052a30a/Example/GEMLI_cancer_example_predicted_lineages.RData -------------------------------------------------------------------------------- /Example/GEMLI_crypts_example_barcode_information.RData: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/UPSUTER/GEMLI/fe4eb91d16aadec3875ef266940b2703b052a30a/Example/GEMLI_crypts_example_barcode_information.RData -------------------------------------------------------------------------------- /Example/GEMLI_crypts_example_cell_type_annotation.RData: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/UPSUTER/GEMLI/fe4eb91d16aadec3875ef266940b2703b052a30a/Example/GEMLI_crypts_example_cell_type_annotation.RData -------------------------------------------------------------------------------- /Example/GEMLI_crypts_example_data_matrix.RData: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/UPSUTER/GEMLI/fe4eb91d16aadec3875ef266940b2703b052a30a/Example/GEMLI_crypts_example_data_matrix.RData -------------------------------------------------------------------------------- /Example/GEMLI_example_barcode_information.RData: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/UPSUTER/GEMLI/fe4eb91d16aadec3875ef266940b2703b052a30a/Example/GEMLI_example_barcode_information.RData -------------------------------------------------------------------------------- /Example/GEMLI_example_data_matrix.RData: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/UPSUTER/GEMLI/fe4eb91d16aadec3875ef266940b2703b052a30a/Example/GEMLI_example_data_matrix.RData -------------------------------------------------------------------------------- /Example/Scheme_Phillips.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/UPSUTER/GEMLI/fe4eb91d16aadec3875ef266940b2703b052a30a/Example/Scheme_Phillips.png -------------------------------------------------------------------------------- /GEMLI_package_v0/DESCRIPTION: -------------------------------------------------------------------------------- 1 | Package: GEMLI 2 | Type: Package 3 | Title: Gene expression memory based lineage inference 4 | Version: 0.1.0 5 | Author: Who wrote it 6 | Maintainer: The package maintainer 7 | Description: Uses general characteristics of genes with lineage-specific expression 8 | to predict cell lineages in scRNA-seq datasets. 9 | URL: https://github.com/UPSUTER/GEMLI 10 | Imports: 11 | igraph (>= 1.4.3), 12 | HiClimR (>= 2.2.1), 13 | dplyr (>= 1.1.2), 14 | Seurat (>= 4.3.0), 15 | ggplot2 (>= 3.4.2), 16 | ggrepel (>= 0.9.3), 17 | clustree (>= 0.5.0), 18 | UpSetR (>= 1.4.0), 19 | reshape (>= 0.8.9), 20 | tidyr (>= 1.3.0) 21 | License: What license is it under? 22 | Encoding: UTF-8 23 | LazyData: true 24 | -------------------------------------------------------------------------------- /GEMLI_package_v0/GEMLI.Rproj: -------------------------------------------------------------------------------- 1 | Version: 1.0 2 | 3 | RestoreWorkspace: Default 4 | SaveWorkspace: Default 5 | AlwaysSaveHistory: Default 6 | 7 | EnableCodeIndexing: Yes 8 | UseSpacesForTab: Yes 9 | NumSpacesForTab: 2 10 | Encoding: UTF-8 11 | 12 | RnwWeave: Sweave 13 | LaTeX: pdfLaTeX 14 | 15 | AutoAppendNewline: Yes 16 | StripTrailingWhitespace: Yes 17 | 18 | BuildType: Package 19 | PackageUseDevtools: Yes 20 | PackageInstallArgs: --no-multiarch --with-keep.source 21 | -------------------------------------------------------------------------------- /GEMLI_package_v0/NAMESPACE: -------------------------------------------------------------------------------- 1 | exportPattern("^[[:alpha:]]+") 2 | -------------------------------------------------------------------------------- /GEMLI_package_v0/R/DEG_volcano_plot.R: -------------------------------------------------------------------------------- 1 | DEG_volcano_plot<-function(GEMLI_items, name1, name2){ 2 | DEG<-GEMLI_items[['DEG']] 3 | DEG$change = ifelse(DEG$p_val_adj <= 0.05 & abs(DEG$avg_log2FC) >= 0.5, ifelse(DEG$avg_log2FC> 0.5 ,name1,name2),'Stable') 4 | DEG$label=rownames(DEG) 5 | plt<-ggplot(data = DEG, aes(x = avg_log2FC , y = -log10(p_val_adj), colour=change, label=label)) + 6 | geom_point(alpha=0.4, size=3.5)+ 7 | xlim(c(-4.5, 4.5)) + 8 | scale_color_manual(values=c("#5386BD", "darkred","grey"))+ 9 | geom_vline(xintercept=c(-0.5,0.5),lty=4,col="black",lwd=0.8) + 10 | geom_hline(yintercept = 1.301,lty=4,col="black",lwd=0.8)+ 11 | labs(x="log2(fold change)",y="-log10 (p-value)", title=paste0("DEG"," ",name1," vs ",name2)) + 12 | theme_bw()+ 13 | theme(plot.title = element_text(hjust = 0.5), 14 | legend.position="right", 15 | legend.title = element_blank()) + 16 | geom_text_repel(data = subset(DEG, avg_log2FC >= 0.5 | avg_log2FC < -0.5), aes(label = label), max.overlaps = 15) 17 | suppressWarnings(print(plt)) 18 | } 19 | -------------------------------------------------------------------------------- /GEMLI_package_v0/R/calculate_correlations.R: -------------------------------------------------------------------------------- 1 | calculate_correlations <- function(data, fast = FALSE) 2 | { 3 | tmp = t(apply(data, 1, rank)); tmp = as.matrix(tmp); tmp = tmp - rowMeans(tmp); tmp = tmp / sqrt(rowSums(tmp^2)) 4 | if(fast == TRUE){ 5 | r = fastCor(t(tmp), nSplit = 10, upperTri = TRUE, optBLAS = TRUE, verbose=FALSE)} 6 | else{ 7 | r = tcrossprod(tmp)} 8 | diag(r) <- 0 9 | return(r) 10 | } 11 | -------------------------------------------------------------------------------- /GEMLI_package_v0/R/cell_fate_DEG_calling.R: -------------------------------------------------------------------------------- 1 | cell_fate_DEG_calling<-function(GEMLI_items, ident1, ident2, min.pct=0.05, logfc.threshold=0.1) 2 | { 3 | GEMLI_Seurat<-CreateSeuratObject(GEMLI_items[['gene_expression']], project = "SeuratProject", assay = "RNA") 4 | Metadata<-GEMLI_items[['cell_fate_analysis']];Metadata$ident<-NA 5 | Metadata$ident[Metadata$cell.fate %in% ident1]<-"ident1" 6 | Metadata$ident[Metadata$cell.fate%in%ident2]<-"ident2" 7 | Meta<-as.data.frame(Metadata[,c(5)]); rownames(Meta)<-Metadata$cell.ID; colnames(Meta)<-c("cell.fate") 8 | GEMLI_Seurat<-AddMetaData(GEMLI_Seurat, Meta, col.name = NULL); DefaultAssay(object = GEMLI_Seurat) <- "RNA"; Idents(GEMLI_Seurat) <- GEMLI_Seurat$cell.fate 9 | DEG <- FindMarkers(object = GEMLI_Seurat, ident.1 = "ident1", ident.2 = "ident2", min.pct =min.pct, logfc.threshold = logfc.threshold) 10 | GEMLI_items[['DEG']]<-DEG 11 | return(GEMLI_items) 12 | } 13 | -------------------------------------------------------------------------------- /GEMLI_package_v0/R/cell_type_composition_plot.R: -------------------------------------------------------------------------------- 1 | cell_type_composition_plot <- function(GEMLI_items, ground_truth=F, cell_type_colors=F, type, intersections=NULL) 2 | { 3 | base_colors = rep(c('#a50026','#d73027','#f46d43','#fdae61','#fee090','#e0f3f8','#abd9e9','#74add1','#4575b4','#313695','#40004b','#762a83','#9970ab','#c2a5cf','#e7d4e8','#d9f0d3','#a6dba0','#5aae61','#1b7837','#00441b','#543005','#8c510a','#bf812d','#dfc27d','#f6e8c3','#e0e0e0','#bababa','#878787','#4d4d4d','#1a1a1a'), 100) 4 | if (cell_type_colors==F){ 5 | cell.type<-unique(GEMLI_items[['cell_type']]$cell.type) 6 | color<-base_colors[rank(unique(GEMLI_items[['cell_type']]$cell.type))] 7 | GEMLI_items[['cell_type_color']] = data.frame(cell.type, color)} 8 | 9 | if (ground_truth){cell.ID<-names(GEMLI_items[['barcodes']]); clone.ID<-unname(GEMLI_items[['barcodes']]); GT<-as.data.frame(cbind(clone.ID,cell.ID)); 10 | Lookup<-merge(GT, GEMLI_items[['cell_type']], by="cell.ID", all=TRUE)} else { 11 | Lookup<-merge(as.data.frame(GEMLI_items[['predicted_lineage_table']]), GEMLI_items[['cell_type']], by="cell.ID", all=TRUE) 12 | } 13 | 14 | if (type == "bubble"){ 15 | Lookup <- Lookup %>% group_by(clone.ID, cell.type) %>% summarise(cnt = n()) %>% mutate(freq = round(cnt / sum(cnt), 3)); Lookup <- reshape::cast(Lookup, clone.ID~cell.type, value="freq"); 16 | base_colors = GEMLI_items[['cell_type_color']]$color[match(colnames(Lookup[,2:length(Lookup)]),GEMLI_items[['cell_type_color']]$cell.type)] 17 | p<-Lookup %>% gather(cell.type, percentage, -clone.ID)%>% ggplot(group=cell.type) + geom_point(aes(x = cell.type, y = clone.ID, size = percentage, col= cell.type))+ theme_classic()+ scale_colour_manual(values = base_colors)} 18 | 19 | if (type == "upsetR"){ 20 | Lookup_list <- split(Lookup$clone.ID, Lookup$cell.type) 21 | p<-upset(fromList(Lookup_list), order.by = "freq", nsets = length(Lookup_list), 22 | sets.x.label = "Lineages in cell type", mainbar.y.label = "Number of lineages", 23 | nintersects = NA, intersections = NULL, point.size=5, mb.ratio = c(0.5, 0.5), text.scale = 2, 24 | set_size.show = TRUE, set_size.numbers_size = 7, set_size.scale_max = length(unique(Lookup$clone.ID)))} 25 | 26 | if (type == "plain"){ 27 | Lookup<-unique(Lookup[,-c(1)]) 28 | p<-Lookup %>% group_by(clone.ID) %>% arrange(clone.ID, cell.type) %>% summarize(combi = paste0(cell.type, collapse = "__"), .groups = "drop") %>% count(combi)} 29 | 30 | return(p) 31 | } 32 | -------------------------------------------------------------------------------- /GEMLI_package_v0/R/cluster_stability_plot.R: -------------------------------------------------------------------------------- 1 | cluster_stability_plot <- function(GEMLI_items) # check 2 | { 3 | data_matrix<-GEMLI_items[['prediction_multiple_sizes']] 4 | clustree<-clustree(data_matrix, prefix = "K") 5 | clustree<-clustree[["data"]][which(clustree[["data"]]$size != 1),] 6 | plot(clustree$K, clustree$sc3_stability, xlab="lineage size", ylab="clustree_stability_index") 7 | } 8 | -------------------------------------------------------------------------------- /GEMLI_package_v0/R/extract_cell_fate_lineages.R: -------------------------------------------------------------------------------- 1 | extract_cell_fate_lineages<- function(GEMLI, selection, unique=FALSE, threshold) 2 | { 3 | Lookup<-merge(as.data.frame(GEMLI[['predicted_lineage_table']]), GEMLI[['cell_type']], by="cell.ID", all=TRUE) 4 | if (unique){ 5 | Lookup$cell.fate<-NA 6 | Lookup<-Lookup %>% group_by(clone.ID) %>% mutate(cell.fate=case_when((n_distinct(cell.type)==length(selection)& all(cell.type %in% selection)& is.na(clone.ID)==FALSE)~ "asym", (n_distinct(cell.type)==1& all(cell.type %in% selection) & n_distinct(cell.ID)>1& is.na(clone.ID)==FALSE)~"sym")) 7 | } else { 8 | Lookup2<-Lookup[Lookup$cell.type %in% selection, ] 9 | Lookup2<-Lookup2 %>% group_by(clone.ID) %>% mutate(cell.fate=case_when((n_distinct(cell.type)==length(selection)& all(cell.type %in% selection)& is.na(clone.ID)==FALSE)~ "asym", (n_distinct(cell.type)==1& all(cell.type %in% selection) & n_distinct(cell.ID)>1& is.na(clone.ID)==FALSE)~"sym")) 10 | Lookup2<-Lookup2[,c(1,4)]; Lookup<-merge(Lookup, Lookup2, by=c("cell.ID" ), all=TRUE) 11 | } 12 | # filter by threshold 13 | Lookup <- Lookup %>% group_by(cell.fate, clone.ID) %>% mutate(cnt = n()); Lookup<-Lookup%>% group_by(cell.fate, clone.ID, cell.type) %>% mutate(per= (n()/cnt)*100) 14 | for(i in 1:length(selection)){Lookup <- Lookup %>% group_by(cell.fate, clone.ID) %>% mutate(cell.fate=case_when(((cell.fate=="asym") & (cell.type==selection[i] ) & (per < threshold[i]))~"filtered", TRUE~cell.fate))} 15 | Lookup<-Lookup %>% group_by(clone.ID) %>% mutate(cell.fate=case_when(any(cell.fate=="filtered")~NA, TRUE~cell.fate)) 16 | Lookup$cell.fate <- paste(Lookup$cell.fate,Lookup$cell.type,sep = "_") 17 | GEMLI[['cell_fate_analysis']]<-Lookup[,1:4] 18 | return(GEMLI) 19 | } 20 | -------------------------------------------------------------------------------- /GEMLI_package_v0/R/memory_gene_calling.R: -------------------------------------------------------------------------------- 1 | memory_gene_calling <- function(GEMLI_items, valid_lineage_sizes=(2:5), use_median=T, ground_truth=F, cell_fate) 2 | { 3 | markers_by_cvsq_of_lineage_means <- function(data_matrix, lineage_dict, valid_lineage_sizes=(2:5), use_median=T) 4 | { 5 | cv_sq <- function(data_matrix) 6 | { 7 | sd = apply(data_matrix, 1, sd, na.rm = TRUE) 8 | mean = apply(data_matrix, 1, mean, na.rm = TRUE) 9 | noise = (sd/mean)**2 10 | return(noise) 11 | } 12 | lineage_dict_filt = lineage_dict[intersect(colnames(data_matrix), names(lineage_dict))] 13 | valid_lineage_dict = lineage_dict_filt[as.character(lineage_dict_filt) %in% names(table(lineage_dict_filt))[table(lineage_dict_filt) %in% valid_lineage_sizes]] 14 | lineage_center = matrix(NA, ncol=length(unique(valid_lineage_dict)), nrow=nrow(data_matrix)); colnames(lineage_center) = unique(valid_lineage_dict); rownames(lineage_center) = rownames(data_matrix) 15 | for (lineage in as.character(unique(valid_lineage_dict))) 16 | { 17 | if (use_median){lineage_center[,lineage] = apply(data_matrix[,names(valid_lineage_dict)[valid_lineage_dict==lineage]], 1, quantile, probs = 0.5, na.rm = TRUE)} 18 | else {lineage_center[,lineage] = rowMeans(data_matrix[,names(valid_lineage_dict)[valid_lineage_dict==lineage]])} 19 | } 20 | x = rowMeans(lineage_center); y = cv_sq(lineage_center); x = log2(x); y = log2(y) 21 | filter = is.na(x) | is.na(y) | is.infinite(x) | is.infinite(y) | (x==0); x=x[!filter]; y=y[!filter]; loess_means = loess(y ~ x, span=0.75, control=loess.control(surface="direct")) 22 | filter = names(which(!filter)) 23 | lineage_center_variation = loess_means$residuals 24 | return(sort(lineage_center_variation[filter], decreasing=T)) 25 | } 26 | data_matrix = GEMLI_items[['gene_expression']] 27 | 28 | if (ground_truth) {lineage_dict = GEMLI_items[['barcodes']]} else { 29 | if (length(GEMLI_items[['predicted_lineages']])>0){lineage_dict = GEMLI_items[['predicted_lineages']]} else { 30 | lineage_dict = GEMLI_items[['predicted_lineage_table']]$clone.ID; names(lineage_dict) = GEMLI_items[['predicted_lineage_table']]$cell.ID}} 31 | if (hasArg(cell_fate)){match<-GEMLI_items[['cell_fate_analysis']][GEMLI_items[['cell_fate_analysis']]$cell.fate ==cell_fate,] 32 | lineage_dict=lineage_dict[names(lineage_dict)%in% match$cell.ID]} 33 | 34 | lineage_center_variation = markers_by_cvsq_of_lineage_means(data_matrix, lineage_dict, valid_lineage_sizes=valid_lineage_sizes, use_median=use_median) 35 | 36 | data_matrix_control = matrix(NA, ncol=20, nrow=nrow(data_matrix)); rownames(data_matrix_control) = rownames(data_matrix) 37 | for (i in c(1:20)) 38 | { 39 | lineage_dict_sampled = lineage_dict; names(lineage_dict_sampled) = sample(names(lineage_dict)) 40 | tmp = markers_by_cvsq_of_lineage_means(as.matrix(data_matrix), lineage_dict_sampled, valid_lineage_sizes=valid_lineage_sizes, use_median=use_median) 41 | data_matrix_control[names(tmp),i] = tmp 42 | } 43 | markers_pvalue = rowSums(data_matrix_control[intersect(rownames(data_matrix_control), names(lineage_center_variation)),]>lineage_center_variation[intersect(rownames(data_matrix_control), names(lineage_center_variation))], na.rm=T)/20 44 | 45 | shared_genes = intersect(names(lineage_center_variation), names(markers_pvalue)) 46 | marker_table = data.frame(cbind(lineage_center_variation[shared_genes], markers_pvalue[shared_genes])); rownames(marker_table) = shared_genes; colnames(marker_table) = c("var","p") 47 | marker_table = marker_table[with(marker_table, order(p, -var)),] 48 | 49 | GEMLI_items[["memory_genes"]] = marker_table 50 | return(GEMLI_items) 51 | } 52 | -------------------------------------------------------------------------------- /GEMLI_package_v0/R/potential_markers.R: -------------------------------------------------------------------------------- 1 | potential_markers <- function(data_matrix) # check 2 | { 3 | means = rowMeans(data_matrix) 4 | variation = (apply(data_matrix, 1, sd, na.rm=T) / apply(data_matrix, 1, mean, na.rm=T))**2 5 | filter = names(which(!(is.na(means) | is.na(variation) | is.infinite(means) | is.infinite(variation) | (means==0)))); x=means[filter]; y=variation[filter] 6 | linear_fit = lm(log2(y) ~ log2(x)) 7 | variation_residuals = residuals(linear_fit) 8 | means = means[filter] 9 | mean_quantiles = quantile(means, seq(0.01,1,0.01)) 10 | variation_quantiles = quantile(variation_residuals, seq(0.01,1,0.01)) 11 | memory_genes = names(which((means>=mean_quantiles[98]) | (means>=mean_quantiles[90] & variation_residuals>=variation_quantiles[40]) | (means>=mean_quantiles[80] & variation_residuals>=variation_quantiles[80]) |(means>=mean_quantiles[60] & variation_residuals>=variation_quantiles[90]))) 12 | return(memory_genes) 13 | } 14 | -------------------------------------------------------------------------------- /GEMLI_package_v0/R/predict_lineages.R: -------------------------------------------------------------------------------- 1 | predict_lineages <- function(GEMLI_items, repetitions=100, sample_size=(2/3), desired_cluster_size=c(2,3), fast=FALSE) # check 2 | { 3 | data_matrix = GEMLI_items[['gene_expression']] 4 | marker_genes = potential_markers(data_matrix) 5 | results = data.matrix(matrix(0, nrow=ncol(data_matrix), ncol=ncol(data_matrix))); rownames(results) = colnames(data_matrix); colnames(results) = colnames(data_matrix) 6 | for (i in seq(1,repetitions)) 7 | { 8 | marker_genes_sample = sample(intersect(marker_genes, rownames(data_matrix)), round(length(intersect(marker_genes, rownames(data_matrix)))*sample_size,0)) 9 | cell_clusters = quantify_clusters_iterative(data_matrix, marker_genes_sample, N=2, fast) 10 | cell_clusters_unique_name = cell_clusters; for (colname in 1:ncol(cell_clusters)){cell_clusters_unique_name[!is.na(cell_clusters_unique_name[,colname]),colname] = paste0(colname,'_',cell_clusters_unique_name[!is.na(cell_clusters_unique_name[,colname]),colname])} 11 | clustersize_dict = table(cell_clusters_unique_name) 12 | smallest_clusters = names(clustersize_dict)[clustersize_dict %in% desired_cluster_size] 13 | best_prediction = data.matrix(matrix(F, nrow=ncol(data_matrix), ncol=ncol(data_matrix))); rownames(best_prediction) = colnames(data_matrix); colnames(best_prediction) = colnames(data_matrix) 14 | for (cluster in smallest_clusters){cells_in_cluster = rownames(best_prediction)[rowSums(cell_clusters_unique_name==cluster, na.rm=T)>0]; best_prediction[cells_in_cluster,cells_in_cluster] <- T} 15 | diag(best_prediction) = F 16 | results = results + best_prediction 17 | } 18 | GEMLI_items[["prediction"]] = results 19 | return(GEMLI_items) 20 | } 21 | -------------------------------------------------------------------------------- /GEMLI_package_v0/R/predict_lineages_multiple_sizes.R: -------------------------------------------------------------------------------- 1 | predict_lineages_multiple_sizes <- function(GEMLI_items, repetitions=10, sample_size=(2/3), minimal_maximal_cluster_size=c(2,50), cutoff=5) # check 2 | { 3 | # split out the minimal_maximal_cluster_size into vectors encompassing all combis between the min and max value 4 | # for each of the vectors then generate the lineage prediction and combine them into one table 5 | desired_sizes<-list() 6 | for (m in 2:minimal_maximal_cluster_size[2]){desired_sizes[[m-1]]<-c(1:m)} 7 | GEMLI_prediction_list<-list() 8 | for (j in 1:length(desired_sizes)) { 9 | progress((100*j)/length(desired_sizes)) 10 | sub_desired_cluster_size<-desired_sizes[[j]] 11 | data_matrix = GEMLI_items[['gene_expression']] 12 | marker_genes = potential_markers(data_matrix) 13 | results = data.matrix(matrix(0, nrow=ncol(data_matrix), ncol=ncol(data_matrix))); rownames(results) = colnames(data_matrix); colnames(results) = colnames(data_matrix) 14 | for (i in seq(1,repetitions)) 15 | { 16 | marker_genes_sample = sample(intersect(marker_genes, rownames(data_matrix)), round(length(intersect(marker_genes, rownames(data_matrix)))*sample_size,0)) 17 | cell_clusters = quantify_clusters_iterative(data_matrix, marker_genes_sample, N=2) 18 | cell_clusters_unique_name = cell_clusters; for (colname in 1:ncol(cell_clusters)){cell_clusters_unique_name[!is.na(cell_clusters_unique_name[,colname]),colname] = paste0(colname,'_',cell_clusters_unique_name[!is.na(cell_clusters_unique_name[,colname]),colname])} 19 | clustersize_dict = table(cell_clusters_unique_name) 20 | 21 | # This is where the desired_cluster_size comes into play 22 | smallest_clusters = names(clustersize_dict)[clustersize_dict %in% sub_desired_cluster_size] 23 | best_prediction = data.matrix(matrix(F, nrow=ncol(data_matrix), ncol=ncol(data_matrix))); rownames(best_prediction) = colnames(data_matrix); colnames(best_prediction) = colnames(data_matrix) 24 | for (cluster in smallest_clusters){cells_in_cluster = rownames(best_prediction)[rowSums(cell_clusters_unique_name==cluster, na.rm=T)>0]; best_prediction[cells_in_cluster,cells_in_cluster] <- T} 25 | diag(best_prediction) = F 26 | results = results + best_prediction 27 | } 28 | # Transform the prediction matrix into a lineage info 29 | network = (results >= cutoff) 30 | network_edges = as.matrix(network) 31 | network_graph = igraph::graph.adjacency(network_edges, mode="undirected", weighted=NULL) 32 | prediction_dict = igraph::clusters(network_graph)$membership 33 | family_table = cbind(names(prediction_dict), as.vector(prediction_dict)) 34 | colnames(family_table) = c("cell.ID", "clone.ID") 35 | GEMLI_prediction_list[[j]] = family_table 36 | } 37 | 38 | # combine in one dataframe that can be used for cluster stability calculation 39 | for(u in 1:length(GEMLI_prediction_list)){colnames(GEMLI_prediction_list[[u]]) <- c("cell.ID",paste0("K", u)) } 40 | GEMLI_prediction_multiple_sizes_output <- Reduce(function(x,y)merge(x,y,by="cell.ID"), GEMLI_prediction_list) 41 | GEMLI_items[['prediction_multiple_sizes']]<-GEMLI_prediction_multiple_sizes_output 42 | return(GEMLI_items) 43 | } 44 | -------------------------------------------------------------------------------- /GEMLI_package_v0/R/predict_lineages_with_known_markers.R: -------------------------------------------------------------------------------- 1 | predict_lineages_with_known_markers <- function(GEMLI_items, repetitions=100, sample_size=(2/3), desired_cluster_size=c(2,3), fast=FALSE) 2 | { 3 | norm_data = norm_data = GEMLI_items[['gene_expression']] 4 | marker_genes = GEMLI_items[['known_markers']] 5 | results = data.matrix(matrix(0, nrow=ncol(norm_data), ncol=ncol(norm_data))); rownames(results) = colnames(norm_data); colnames(results) = colnames(norm_data) 6 | for (i in seq(1,repetitions)) 7 | { 8 | marker_genes_sample = sample(intersect(marker_genes, rownames(norm_data)), round(length(intersect(marker_genes, rownames(norm_data)))*sample_size,0)) 9 | cell_clusters = quantify_clusters_iterative(norm_data, marker_genes_sample, N=2, fast=FALSE) 10 | cell_clusters_unique_name = cell_clusters; for (colname in 1:ncol(cell_clusters)){cell_clusters_unique_name[!is.na(cell_clusters_unique_name[,colname]),colname] = paste0(colname,'_',cell_clusters_unique_name[!is.na(cell_clusters_unique_name[,colname]),colname])} 11 | clustersize_dict = table(cell_clusters_unique_name) 12 | smallest_clusters = names(clustersize_dict)[clustersize_dict %in% desired_cluster_size] 13 | best_prediction = data.matrix(matrix(F, nrow=ncol(norm_data), ncol=ncol(norm_data))); rownames(best_prediction) = colnames(norm_data); colnames(best_prediction) = colnames(norm_data) 14 | for (cluster in smallest_clusters){cells_in_cluster = rownames(best_prediction)[rowSums(cell_clusters_unique_name==cluster, na.rm=T)>0]; best_prediction[cells_in_cluster,cells_in_cluster] <- T} 15 | diag(best_prediction) = F 16 | results = results + best_prediction 17 | } 18 | GEMLI_items[["prediction"]] = results 19 | return(GEMLI_items) 20 | } 21 | -------------------------------------------------------------------------------- /GEMLI_package_v0/R/prediction_to_lineage_information.R: -------------------------------------------------------------------------------- 1 | library(igraph) 2 | prediction_to_lineage_information <- function(GEMLI_items, cutoff, output_as_dict=T) # check 3 | { 4 | lineage_predictions_matrix = GEMLI_items[["prediction"]] 5 | network = (lineage_predictions_matrix >= cutoff) 6 | network_edges = as.matrix(network) 7 | network_graph = igraph::graph.adjacency(network_edges, mode="undirected", weighted=NULL) 8 | family_dict = igraph::clusters(network_graph)$membership 9 | GEMLI_items[["predicted_lineages"]] = family_dict 10 | family_table = cbind(names(family_dict), as.vector(family_dict)); colnames(family_table) = c("cell.ID", "clone.ID") 11 | GEMLI_items[["predicted_lineage_table"]] = family_table 12 | return(GEMLI_items) 13 | } 14 | -------------------------------------------------------------------------------- /GEMLI_package_v0/R/quantify_clusters_iterative.R: -------------------------------------------------------------------------------- 1 | quantify_clusters_iterative = function(data_matrix, marker_genes, N=2, fast=FALSE) 2 | { 3 | iterate = T; i = 2 4 | genes = intersect(marker_genes, rownames(data_matrix)[rowMeans(data_matrix)>0]) 5 | data_matrix = data_matrix[genes,] 6 | corr_expr_raw = calculate_correlations(t(data_matrix), fast=FALSE); corr_expr = (1 - corr_expr_raw)/2 7 | cell_clusters = data.matrix(matrix(0, nrow=ncol(data_matrix), ncol=1)); rownames(cell_clusters) = colnames(data_matrix) 8 | cell_clusters[,1] = rep(1, ncol(data_matrix)) 9 | while (iterate) 10 | { 11 | cell_clusters = cbind(cell_clusters, rep(0,nrow(cell_clusters))) 12 | for (cluster in setdiff(unique(cell_clusters[,(i-1)]),0)) 13 | { 14 | cells_in_cluster = rownames(cell_clusters)[cell_clusters[,(i-1)]==cluster] 15 | if (length(cells_in_cluster) >= 4) # this line ends the sub clustering # min of desired cluster size 16 | { 17 | correlation = mean((corr_expr_raw[cells_in_cluster,cells_in_cluster])[lower.tri(corr_expr_raw[cells_in_cluster,cells_in_cluster], diag=F)]) 18 | corr_expr_subset = corr_expr[cells_in_cluster,cells_in_cluster] 19 | clustering = cutree(hclust(as.dist(corr_expr_subset), method = "ward.D2", ), k=N) 20 | cell_clusters[names(clustering),i] = as.vector(clustering) + max(c(0, cell_clusters[,i]), na.rm=T) 21 | } 22 | else {cell_clusters[cells_in_cluster,i] = 0} 23 | } 24 | if (sum(cell_clusters[,i], na.rm=T)==0) {iterate = F} 25 | i = i+1 26 | } 27 | cell_clusters[cell_clusters==0] = NA 28 | return(cell_clusters) 29 | } 30 | -------------------------------------------------------------------------------- /GEMLI_package_v0/R/suggest_network_trimming_to_size.R: -------------------------------------------------------------------------------- 1 | suggest_network_trimming_to_size <- function(GEMLI_items, max_size=4, cutoff=70, max_edge_width=5, display_orphan=F, include_labels=F, ground_truth=F, layout_style="fr") 2 | { 3 | lineage_predictions_matrix_original = GEMLI_items[["prediction"]] 4 | predicted_lineages_original = GEMLI_items[["predicted_lineages"]] 5 | GEMLI_items_in_trimming = GEMLI_items 6 | predicted_lineages = prediction_to_lineage_information(GEMLI_items, cutoff, output_as_dict=T)$predicted_lineages 7 | if (ground_truth) {lineage_dict = GEMLI_items[['barcodes']]} else {lineage_dict = prediction_to_lineage_information(GEMLI_items, cutoff, output_as_dict=T)$predicted_lineages} 8 | while (sum(table(GEMLI_items_in_trimming$predicted_lineages)>max_size)!=0) 9 | { predicted_lineages = GEMLI_items_in_trimming$predicted_lineages; 10 | lineage_predictions_matrix = GEMLI_items_in_trimming$prediction 11 | oversized_families = names(table(predicted_lineages))[table(predicted_lineages)>max_size] 12 | for (oversized_family in oversized_families) 13 | { 14 | oversized_familiy_scores = lineage_predictions_matrix[names(which(predicted_lineages==oversized_family)), names(which(predicted_lineages==oversized_family))] 15 | weakest_link = min(oversized_familiy_scores[oversized_familiy_scores!=0]) 16 | oversized_familiy_scores[oversized_familiy_scores==weakest_link] <- 0 17 | lineage_predictions_matrix[names(which(predicted_lineages==oversized_family)), names(which(predicted_lineages==oversized_family))] = oversized_familiy_scores 18 | } 19 | GEMLI_items_in_trimming$prediction = lineage_predictions_matrix 20 | GEMLI_items_in_trimming$predicted_lineages = prediction_to_lineage_information(GEMLI_items_in_trimming, cutoff, output_as_dict=T)$predicted_lineages 21 | } 22 | # visualization 23 | par(mar=c(0,0,3.5,0)) 24 | layout(mat = matrix(c(1, 2), nrow = 2, ncol = 1), heights = c(3, 1)) 25 | base_colors = rep(c('#a50026','#d73027','#f46d43','#fdae61','#fee090','#e0f3f8','#abd9e9','#74add1','#4575b4','#313695','#40004b','#762a83','#9970ab','#c2a5cf','#e7d4e8','#d9f0d3','#a6dba0','#5aae61','#1b7837','#00441b','#543005','#8c510a','#bf812d','#dfc27d','#f6e8c3','#e0e0e0','#bababa','#878787','#4d4d4d','#1a1a1a'), 100) 26 | network_edges = as.matrix(lineage_predictions_matrix_original) 27 | network_edges[lineage_predictions_matrix[rownames(network_edges),rownames(network_edges)]==0] <- ((-1) * network_edges[lineage_predictions_matrix[rownames(network_edges),rownames(network_edges)]==0]) 28 | network_edges[abs(network_edges)cutoff)])), col = c("black", "black"), bty = "n", lty=1:1, lwd=c(max(edge_width),(min(edge_width)+ (max_edge_width*0.1))), title = "Confidence", title.adj =0, horiz=F, xpd=TRUE, inset=c(.15,0),ncol=1) 50 | # Vertex_color 51 | if (include_labels==T) {legend("topright", legend=c("Color - prediction","Number - cell ID"), bty = "n", title = "Vertex ", title.adj =0.5, horiz=F, xpd=TRUE, inset=c(.05, 0),ncol=1)} else 52 | {legend("topright", legend=c("Color - prediction"), bty = "n", title = "Vertex ", title.adj =0.5, horiz=F, xpd=TRUE, inset=c(.05, 0),ncol=1)} 53 | } 54 | -------------------------------------------------------------------------------- /GEMLI_package_v0/R/test_lineages.R: -------------------------------------------------------------------------------- 1 | test_lineages <- function(GEMLI_items, valid_fam_sizes=(1:5), max_interval=100, plot_results=F) 2 | { 3 | lineage_predictions_matrix = GEMLI_items[['prediction']] 4 | lineage_dict_bc = GEMLI_items[['barcodes']] 5 | valid_family_dict = lineage_dict_bc[as.character(lineage_dict_bc) %in% names(table(lineage_dict_bc))[table(lineage_dict_bc) %in% valid_fam_sizes]] 6 | cell_with_annotation = intersect(rownames(lineage_predictions_matrix), names(valid_family_dict)) 7 | family_dict_filt = valid_family_dict[cell_with_annotation] 8 | real_family_matrix = outer(family_dict_filt[cell_with_annotation], family_dict_filt[cell_with_annotation], FUN='=='); diag(real_family_matrix) = F 9 | results_repeated_annotated = lineage_predictions_matrix[cell_with_annotation, cell_with_annotation] 10 | if (is.na(max_interval)) {intervals = unique(round(seq(0,1,0.1)*max(results_repeated_annotated),0))} else {intervals = unique(round(seq(0,1,0.1)*max_interval,0))} 11 | output_matrix = matrix(NA, ncol=4, nrow=length(intervals)); colnames(output_matrix) = c('precision','TP','FP','sensitivity'); rownames(output_matrix) = intervals 12 | for (interval in intervals){output_matrix[as.character(interval),1:3] = c(sum(real_family_matrix & (results_repeated_annotated>=interval)) / sum(results_repeated_annotated>=interval), sum(real_family_matrix & (results_repeated_annotated>=interval)), sum((!real_family_matrix) & (results_repeated_annotated>=interval)))} 13 | # replace this with an apply function 14 | output_matrix[,"sensitivity"] = output_matrix[,"TP"]/output_matrix["0","TP"] 15 | output_matrix = output_matrix[,c('TP','FP','precision','sensitivity')] 16 | if (plot_results) 17 | { 18 | par(mar=c(4.5, 4.5, 3.5, 4.5)); plot(as.numeric(rownames(output_matrix)), output_matrix[,"precision"], type="o", pch=16, lwd=3, col="darkred", xlab="confidence level", ylab="precision (red)", log="", main="testing lineage prediction", ylim=c(0,1)); par(new=T); plot(as.numeric(rownames(output_matrix)), output_matrix[,"sensitivity"], type="o", pch=16, lwd=3, axes=F, bty="n", xlab="", ylab="", col="grey", log="", ylim=c(0,1)); axis(side=4, at=pretty(range(output_matrix[,"sensitivity"]))); mtext("sensitivity (grey)", side=4, line=3) 19 | } 20 | GEMLI_items[['testing_results']] = output_matrix 21 | return(GEMLI_items) 22 | } 23 | -------------------------------------------------------------------------------- /GEMLI_package_v0/R/trim_network_to_size.R: -------------------------------------------------------------------------------- 1 | trim_network_to_size <- function(GEMLI_items, max_size=4, cutoff=70) 2 | { 3 | GEMLI_items_in_trimming = GEMLI_items 4 | while (sum(table(GEMLI_items_in_trimming$predicted_lineages)>max_size)!=0) 5 | { 6 | predicted_lineages = GEMLI_items_in_trimming$predicted_lineages; lineage_predictions_matrix = GEMLI_items_in_trimming$prediction 7 | oversized_families = names(table(predicted_lineages))[table(predicted_lineages)>max_size] 8 | for (oversized_family in oversized_families) 9 | { 10 | oversized_familiy_scores = lineage_predictions_matrix[names(which(predicted_lineages==oversized_family)), names(which(predicted_lineages==oversized_family))] 11 | weakest_link = min(oversized_familiy_scores[oversized_familiy_scores!=0]) 12 | oversized_familiy_scores[oversized_familiy_scores==weakest_link] <- 0 13 | lineage_predictions_matrix[names(which(predicted_lineages==oversized_family)), names(which(predicted_lineages==oversized_family))] = oversized_familiy_scores 14 | } 15 | GEMLI_items_in_trimming$prediction = lineage_predictions_matrix 16 | GEMLI_items_in_trimming = prediction_to_lineage_information(GEMLI_items_in_trimming, cutoff) 17 | } 18 | return(GEMLI_items_in_trimming) 19 | } 20 | -------------------------------------------------------------------------------- /GEMLI_package_v0/R/visualize_as_network.R: -------------------------------------------------------------------------------- 1 | visualize_as_network <- function(GEMLI_items, cutoff=70, max_edge_width=5, display_orphan=F, include_labels=F, ground_truth=F, highlight_FPs=F, layout_style='fr', cell_type_colors=F) 2 | { 3 | par(mar=c(0,0,2,0)) 4 | if (cell_type_colors) {layout(mat = matrix(c(1, 2, 3, 0), nrow = 2, ncol = 2), heights = c(3, 1), widths = c(3,1))} else {layout(mat = matrix(c(1, 2), nrow = 2, ncol = 1), heights = c(3, 1))} 5 | # title 6 | lineage_predictions_matrix = GEMLI_items[["prediction"]] 7 | if (ground_truth) {lineage_dict = GEMLI_items[['barcodes']]} else {lineage_dict = prediction_to_lineage_information(GEMLI_items, cutoff, output_as_dict=T)$predicted_lineages} 8 | base_colors = rep(c('#a50026','#d73027','#f46d43','#fdae61','#fee090','#e0f3f8','#abd9e9','#74add1','#4575b4','#313695','#40004b','#762a83','#9970ab','#c2a5cf','#e7d4e8','#d9f0d3','#a6dba0','#5aae61','#1b7837','#00441b','#543005','#8c510a','#bf812d','#dfc27d','#f6e8c3','#e0e0e0','#bababa','#878787','#4d4d4d','#1a1a1a'), 100) 9 | if (cell_type_colors) { if (length(GEMLI_items[['cell_type_color']])!=0){ 10 | base_colors = GEMLI_items[['cell_type_color']]$color 11 | } else { 12 | cell.type<-unique(GEMLI_items[['cell_type']]$cell.type) 13 | color<-base_colors[rank(unique(GEMLI_items[['cell_type']]$cell.type))] 14 | GEMLI_items[['cell_type_color']] = data.frame(cell.type, color) 15 | }} else {base_colors = base_colors 16 | } 17 | network_edges = as.matrix(lineage_predictions_matrix) 18 | network_edges[network_edgescutoff)])), col = c("black", "black"), bty = "n", lty=1:1, lwd=c(max(edge_width),(min(edge_width)+ (max_edge_width*0.1))), title = "Confidence", title.adj =0, horiz=F, xpd=TRUE, inset=c(.15,0),ncol=1) 53 | # Vertex_color 54 | if ((ground_truth==T) & (cell_type_colors==F) & (include_labels==F)) {legend("topright", legend=c("Color - ground truth","White - no ground truth"), bty = "n", title = "Color ", title.adj =0.5, horiz=F, xpd=TRUE, inset=c(.05, 0),ncol=1)} 55 | if ((ground_truth==F) & (cell_type_colors==F) & (include_labels==F)) {legend("top", legend=c("prediction",""), bty = "n", title = " Color by", title.adj =0.5, horiz=F, xpd=TRUE, inset=c(0, 0),ncol=1)} 56 | if ((ground_truth==T) & (cell_type_colors==F) & (include_labels==T)) {legend("topright", legend=c("Color - ground truth","White - no ground truth","Number - cell ID"), bty = "n", title = "Vertex ", title.adj =0.5, horiz=F, xpd=TRUE, inset=c(.05, 0),ncol=1)} 57 | if ((ground_truth==F) & (cell_type_colors==F) & (include_labels==T)) {legend("top", legend=c("Color - prediction","Number - cell ID"), bty = "n", title = " Vertex", title.adj =0.5, horiz=F, xpd=TRUE, inset=c(0, 0),ncol=1)} 58 | if (cell_type_colors) { 59 | plot(NULL ,xaxt='n',yaxt='n',bty='n',ylab='',xlab='', xlim=0:1, ylim=0:1) 60 | legend("left", legend = GEMLI_items[['cell_type_color']]$cell.type, pch = 16, col = GEMLI_items[['cell_type_color']]$color, title = "Cell type", bty = "o", horiz=F, xpd=TRUE, inset=c(0, 0),ncol=1)} 61 | } 62 | -------------------------------------------------------------------------------- /GEMLI_package_v0/man/DEG_volcano_plot.Rd: -------------------------------------------------------------------------------- 1 | \name{DEG_volcano_plot} 2 | \alias{DEG_volcano_plot} 3 | %- Also NEED an '\alias' for EACH other topic documented here. 4 | \title{ 5 | DEG_volcano_plot 6 | } 7 | \description{ 8 | This function plots a simple volcano plot for the differential expressed genes (DEG) called using the function 'cell_fate_DEG_calling'. 9 | } 10 | \usage{ 11 | DEG_volcano_plot(GEMLI_items, name1, name2) 12 | } 13 | \arguments{ 14 | \item{ 15 | GEMLI_items}{GEMLI_items is a list of GEMLI inputs and outputs. '. To run 'DEG_volcano_plot' it should contain a 'DEG' element. The 'DEG' element is the output of the 'cell_fate_DEG_calling' function. It is a dataframe with the columns 'p_val', 'avg_log2FC', 'pct1', 'pct.2', 'p_val-adj'. 16 | } 17 | \item{ 18 | name1}{'name1' is a character vector specifying the first population of cells analysed for DEG calling. It will appears in the title and legend of the volcano plot. It should correspond to the 'ident1' parameter of the 'cell.fate_DEG_calling' function used to generate the 'GEMLI_items' 'DEG' element. 19 | } 20 | \item{ 21 | name2}{'name2' is a character vector specifying the second population of cells analysed for DEG calling. See 'name1' parameter. 22 | } 23 | } 24 | \details{ 25 | %% ~~ If necessary, more details than the description above ~~ 26 | } 27 | \value{ 28 | 'DEG_volcano_plot' plots a volcano plot for the DEG called using the function 'cell_fate_DEG_calling'. 29 | } 30 | \references{ 31 | %% ~put references to the literature/web site here ~ 32 | } 33 | \author{ 34 | } 35 | \note{ 36 | %% ~~further notes~~ 37 | } 38 | 39 | %% ~Make other sections like Warning with \section{Warning }{....} ~ 40 | 41 | \seealso{ 42 | %% ~~objects to See Also as \code{\link{help}}, ~~~ 43 | } 44 | \examples{ 45 | ##---- Should be DIRECTLY executable !! ---- 46 | ##-- ==> Define data, use random, 47 | ##-- or do help(data=index) for the standard data sets. 48 | 49 | ## The function is currently defined as 50 | function (x) 51 | { 52 | } 53 | } 54 | % Add one or more standard keywords, see file 'KEYWORDS' in the 55 | % R documentation directory (show via RShowDoc("KEYWORDS")): 56 | % \keyword{ ~kwd1 } 57 | % \keyword{ ~kwd2 } 58 | % Use only one keyword per line. 59 | % For non-standard keywords, use \concept instead of \keyword: 60 | % \concept{ ~cpt1 } 61 | % \concept{ ~cpt2 } 62 | % Use only one concept per line. 63 | -------------------------------------------------------------------------------- /GEMLI_package_v0/man/calculate_correlations.Rd: -------------------------------------------------------------------------------- 1 | \name{calculate_correlations} 2 | \alias{calculate_correlations} 3 | %- Also NEED an '\alias' for EACH other topic documented here. 4 | \title{ 5 | calculate_correlations 6 | } 7 | \description{ 8 | This function provides a fast way to calculate Spearmans ranked correlation or Pearsons correlation. 9 | } 10 | \usage{ 11 | calculate_correlations(data_matrix, fast=FALSE) 12 | } 13 | \arguments{ 14 | \item{ 15 | data_matrix}{'data_matrix' is a gene expression matrix where rownames are genes (features) and column names are cell IDs (samples). 16 | } 17 | \item{ 18 | fast}{'fast' = FALSE will calculate the Spearman rank correlation, fast = TRUE will make use of the package HiClimR for the calculation of a Pearson correlation. The calculation of the Pearson correlation is faster than the calculation of Spearman rank correlation, however precision of lineage predictions will be slighlty reduced using the Pearson correlation. The default value is FALSE. 19 | } 20 | } 21 | \details{ 22 | %% ~~ If necessary, more details than the description above ~~ 23 | } 24 | \value{ 25 | The output is a cell by cell matrix with each value representing the rho value of Spearmans ranked correlation. 26 | } 27 | \references{ 28 | %% ~put references to the literature/web site here ~ 29 | } 30 | \author{ 31 | Marcel Tarbier and Almut Eisele 32 | } 33 | \note{ 34 | %% ~~further notes~~ 35 | } 36 | 37 | %% ~Make other sections like Warning with \section{Warning }{....} ~ 38 | 39 | \seealso{ 40 | %% ~~objects to See Also as \code{\link{help}}, ~~~ 41 | } 42 | \examples{ 43 | ##---- Should be DIRECTLY executable !! ---- 44 | ##-- ==> Define data, use random, 45 | ##-- or do help(data=index) for the standard data sets. 46 | 47 | ## The function is currently defined as 48 | function (x) 49 | { 50 | } 51 | } 52 | % Add one or more standard keywords, see file 'KEYWORDS' in the 53 | % R documentation directory (show via RShowDoc("KEYWORDS")): 54 | % \keyword{ ~kwd1 } 55 | % \keyword{ ~kwd2 } 56 | % Use only one keyword per line. 57 | % For non-standard keywords, use \concept instead of \keyword: 58 | % \concept{ ~cpt1 } 59 | % \concept{ ~cpt2 } 60 | % Use only one concept per line. 61 | -------------------------------------------------------------------------------- /GEMLI_package_v0/man/cell_fate_DEG_calling.Rd: -------------------------------------------------------------------------------- 1 | \name{cell_fate_DEG_calling} 2 | \alias{cell_fate_DEG_calling} 3 | %- Also NEED an '\alias' for EACH other topic documented here. 4 | \title{ 5 | cell_fate_DEG_calling 6 | } 7 | \description{ 8 | This function calls differential expressed genes (DEG) between cells of specific cell types being members of asymmetric cell lineages (containing members of two or more cell types of interest) or members of symmetric cell lineages (contains members of only one cell type of interest). DEG calling is performed using Seurats FindMarkers function. 9 | } 10 | \usage{ 11 | cell_fate_DEG_calling(GEMLI_items, ident1, ident2, min.pct=0.05, logfc.threshold=0.1) 12 | } 13 | \arguments{ 14 | \item{ 15 | GEMLI_items}{GEMLI_items is a list of GEMLI inputs and outputs. To run 'cell_fate_DEG_calling' it should contain a 'gene_expression' as well as a 'cell_fate_analysis' element. The 'gene_expression' element is a quality controlled and normalised gene expression matrix where rownames are genes (features) and column names are cell IDs (cell barcodes). The 'cell_fate_analysis' element is a data frame with column 'cell.ID, 'clone.ID', 'cell.type' and 'cell.fate' generated by the 'extract_cell_fate_lineages' function. 16 | } 17 | \item{ 18 | ident1}{'ident1' specifies the first 'cell.fate' of the GEMLI_items 'cell_fate_analysis' element to be used for DEG calling. It is a character vector which can encompass several cell fates. Cell fates contain the lineage type (sym or asym for symmetric or asymmetric respectively) and the cell type separated by an underscore. Cell fates can for example be 'sym_DCIS' or 'asym_inv_tumor'. 19 | } 20 | \item{ 21 | ident2}{'ident2' specifies the second 'cell.fate' of the GEMLI_items 'cell_fate_analysis' element to be used for DEG calling. See ident1 for the format. 22 | } 23 | \item{ 24 | min.pct}{'min.pct' is the min.pct parameter of Seurats FindMarker function. It is the minimum fraction of cells in either of the compared cell populations in which a gene should be expressed in order to be tested. the default value is 0.05. 25 | } 26 | \item{ 27 | logfc.threshold}{'logfc.threshold' is the logfc.threshold parameter of Seurats FindMarker function. It limits the DEG calling to genes which show, on average, at least x-fold differences (log-scale) between the two compared cell populations. 28 | } 29 | } 30 | \details{ 31 | %% ~~ If necessary, more details than the description above ~~ 32 | } 33 | \value{ 34 | 'cell_fate_DEG_calling' yields a data frame which is added to the 'GEMLI_items' under the name 'DEG'. The data frame is the output of Seurats FindMarker function and contains the column 'p_val', 'avg_log2FC', 'pct.1', 'pct2' and 'p_val_adj'. 35 | } 36 | \references{ 37 | %% ~put references to the literature/web site here ~ 38 | } 39 | \author{ 40 | } 41 | \note{ 42 | %% ~~further notes~~ 43 | } 44 | 45 | %% ~Make other sections like Warning with \section{Warning }{....} ~ 46 | 47 | \seealso{ 48 | %% ~~objects to See Also as \code{\link{help}}, ~~~ 49 | } 50 | \examples{ 51 | ##---- Should be DIRECTLY executable !! ---- 52 | ##-- ==> Define data, use random, 53 | ##-- or do help(data=index) for the standard data sets. 54 | 55 | ## The function is currently defined as 56 | function (x) 57 | { 58 | } 59 | } 60 | % Add one or more standard keywords, see file 'KEYWORDS' in the 61 | % R documentation directory (show via RShowDoc("KEYWORDS")): 62 | % \keyword{ ~kwd1 } 63 | % \keyword{ ~kwd2 } 64 | % Use only one keyword per line. 65 | % For non-standard keywords, use \concept instead of \keyword: 66 | % \concept{ ~cpt1 } 67 | % \concept{ ~cpt2 } 68 | % Use only one concept per line. 69 | -------------------------------------------------------------------------------- /GEMLI_package_v0/man/cell_type_composition_plot.Rd: -------------------------------------------------------------------------------- 1 | \name{cell_type_composition_plot} 2 | \alias{cell_type_composition_plot} 3 | %- Also NEED an '\alias' for EACH other topic documented here. 4 | \title{ 5 | cell_type_composition_plot 6 | } 7 | \description{ 8 | This function generates simple plots of the cell type composition of predicted or ground truth lineages. 9 | } 10 | \usage{ 11 | cell_type_composition_plot(GEMLI_items, ground_truth=F, cell_type_colors=F, type) 12 | } 13 | \arguments{ 14 | \item{ 15 | GEMLI_items}{GEMLI_items is a list of GEMLI inputs and outputs. To run 'cell_type_composition_plot' it should contain a 'barcodes' (composition of ground truth) or 'predicted_lineage_table' (composition of predicted lineages) element. The 'predicted_lineage_table' is the output of the 'prediction_to_lineage_info' function. 16 | } 17 | \item{ 18 | ground_truth}{==T/TRUE indicates that the composition of ground truth lineages is analyzed. If 'ground_truth'==F, the compositionof predicted lineages is analyzed. Default is F. 19 | } 20 | \item{ 21 | cell_type_colors}{'cell_type_colors'==T/TRUE specifies that custom colors for every cell type stored in GEMLI_items 'cell_type_colors' elment should be used. Default is F. 22 | } 23 | \item{ 24 | type}{'type' specifies which of three plots is generated. Type can be 'bubble', 'upsetR', or 'plain'. type='plain' will output a simple table of the number of lineages for different cell type combinations. type='upsetR' will generate an upsetR plot showing the number of lineages for different cell type combinations. type='bubble' will generate a bubble plot of the cell type composition of individual lineages. This is especially meaningful when analyzing multicellular structures. 25 | } 26 | } 27 | \details{ 28 | %% ~~ If necessary, more details than the description above ~~ 29 | } 30 | \value{ 31 | 'cell_type_composition_plot' yields one of three possible plot types (see 'type') specifying the cell type composition of ground truth or predicted lineages. 32 | } 33 | \references{ 34 | %% ~put references to the literature/web site here ~ 35 | } 36 | \author{ 37 | } 38 | \note{ 39 | %% ~~further notes~~ 40 | } 41 | 42 | %% ~Make other sections like Warning with \section{Warning }{....} ~ 43 | 44 | \seealso{ 45 | %% ~~objects to See Also as \code{\link{help}}, ~~~ 46 | } 47 | \examples{ 48 | ##---- Should be DIRECTLY executable !! ---- 49 | ##-- ==> Define data, use random, 50 | ##-- or do help(data=index) for the standard data sets. 51 | 52 | ## The function is currently defined as 53 | function (x) 54 | { 55 | } 56 | } 57 | % Add one or more standard keywords, see file 'KEYWORDS' in the 58 | % R documentation directory (show via RShowDoc("KEYWORDS")): 59 | % \keyword{ ~kwd1 } 60 | % \keyword{ ~kwd2 } 61 | % Use only one keyword per line. 62 | % For non-standard keywords, use \concept instead of \keyword: 63 | % \concept{ ~cpt1 } 64 | % \concept{ ~cpt2 } 65 | % Use only one concept per line. 66 | -------------------------------------------------------------------------------- /GEMLI_package_v0/man/cluster_stability_plot.Rd: -------------------------------------------------------------------------------- 1 | \name{cluster_stability_plot} 2 | \alias{cluster_stability_plot} 3 | %- Also NEED an '\alias' for EACH other topic documented here. 4 | \title{ 5 | cluster_stability_plot 6 | } 7 | \description{ 8 | This function calculates a cluster stability index for lineage predictions allowing for different cluster sizes and generates a plot of cluster stability index vs cluster size. For multicellular structures, the cluster size at which the cluster stability index plateaus allows to estimate the maximal size of the multicellular structures present in the data. This is the cluster size till which lineage predictions will have a high precision, and the cluster size at which recovery will be maximal. 9 | } 10 | \usage{ 11 | cluster_stability_plot(GEMLI_items) 12 | } 13 | \arguments{ 14 | \item{ 15 | GEMLI_items}{GEMLI_items is a list of GEMLI inputs and outputs. To run 'cluster_stability_plot' it should contain a prediction matrix named 'prediction_multiple_sizes' that is generated and added to the items list by the function 'predict lineages_multiple_sizes'. 16 | } 17 | } 18 | \details{ 19 | %% ~~ If necessary, more details than the description above ~~ 20 | } 21 | \value{ 22 | 'cluster_stability_plot' yields a plot of cluster stability index vs cluster size based on which the size of multicellular structures present in the single-cell RNA-sequencing dataset can be estimated. 23 | } 24 | \references{ 25 | %% ~put references to the literature/web site here ~ 26 | } 27 | \author{ 28 | Almut Eisele and Marcel Tarbier 29 | } 30 | \note{ 31 | %% ~~further notes~~ 32 | } 33 | 34 | %% ~Make other sections like Warning with \section{Warning }{....} ~ 35 | 36 | \seealso{ 37 | %% ~~objects to See Also as \code{\link{help}}, ~~~ 38 | } 39 | \examples{ 40 | ##---- Should be DIRECTLY executable !! ---- 41 | ##-- ==> Define data, use random, 42 | ##-- or do help(data=index) for the standard data sets. 43 | 44 | ## The function is currently defined as 45 | function (x) 46 | { 47 | } 48 | } 49 | % Add one or more standard keywords, see file 'KEYWORDS' in the 50 | % R documentation directory (show via RShowDoc("KEYWORDS")): 51 | % \keyword{ ~kwd1 } 52 | % \keyword{ ~kwd2 } 53 | % Use only one keyword per line. 54 | % For non-standard keywords, use \concept instead of \keyword: 55 | % \concept{ ~cpt1 } 56 | % \concept{ ~cpt2 } 57 | % Use only one concept per line. 58 | -------------------------------------------------------------------------------- /GEMLI_package_v0/man/extract_cell_fate_lineages.Rd: -------------------------------------------------------------------------------- 1 | \name{extract_cell_fate_lineages} 2 | \alias{extract_cell_fate_lineages} 3 | %- Also NEED an '\alias' for EACH other topic documented here. 4 | \title{ 5 | extract_cell_fate_lineages 6 | } 7 | \description{ 8 | This function extracts symmetric (with all members in one considered cell type) and asymmetric cell lineages (with members in two or more of considered cell types). The function generates the input for the cell_fate_DEG_calling function. 9 | } 10 | \usage{ 11 | extract_cell_fate_lineages(GEMLI_items, selection, unique=TRUE, threshold) 12 | } 13 | \arguments{ 14 | \item{ 15 | GEMLI_items}{GEMLI_items is a list of GEMLI inputs and outputs. To run 'extract_cell_type_lineages' it should contain a 'predicted_lineage_table' as well as a 'cell_type' table. The 'predicted_lineage_table' is generated using the function 'prediction_to_lineage_information'. The 'cell_type' table is a data frame with column 'cell.ID' and celltype'. 16 | } 17 | \item{ 18 | selection}{'selection' specifies the cell types to be considered for the extraction of symmetric and asymmetric lineages. It is a vector of minimal two characters specifying the cell types to be considered. 19 | } 20 | \item{ 21 | unique}{'unique'=TRUE specifies that extracted lineages should contain only the cell types given in the 'selection' parameter. If 'unique'=FALSE, also other cell types, not considered in the lineage selection, can be present in the extraccted lineages. The default value is T/TRUE. 22 | } 23 | \item{ 24 | threshold}{'threshold' specifies the minimal percentage of a given cell type asymmetric lineages should contain in order to be considered. It is a vector of percentages (numbers) which give the percentages for different cell types in the order in which they are given in the 'selection' parameter. Threshold values for all cell types have to be met for an asymmetric lineage to be kept. 25 | } 26 | } 27 | \details{ 28 | %% ~~ If necessary, more details than the description above ~~ 29 | } 30 | \value{ 31 | 'extract_cell_fate_lineages' yields a data frame which is added to the 'GEMLI_items' under the name 'cell_fate_analysis'. The data frame contains the column 'cell.ID', 'clone.ID', 'cell.type' and 'cell.fate'. the 'cell.fate' column contains the lable 'asym' for selected asymmetric lineages and 'sym' for selected symmetric lineages, followed by the cell type of the specific cell, separated by an underscore (e.g. 'sym_DCIS', or 'asym_inv_tumor'). The function generates the input for the function 'cell_fate_DEG_calling'. 32 | } 33 | \references{ 34 | %% ~put references to the literature/web site here ~ 35 | } 36 | \author{ 37 | } 38 | \note{ 39 | %% ~~further notes~~ 40 | } 41 | 42 | %% ~Make other sections like Warning with \section{Warning }{....} ~ 43 | 44 | \seealso{ 45 | %% ~~objects to See Also as \code{\link{help}}, ~~~ 46 | } 47 | \examples{ 48 | ##---- Should be DIRECTLY executable !! ---- 49 | ##-- ==> Define data, use random, 50 | ##-- or do help(data=index) for the standard data sets. 51 | 52 | ## The function is currently defined as 53 | function (x) 54 | { 55 | } 56 | } 57 | % Add one or more standard keywords, see file 'KEYWORDS' in the 58 | % R documentation directory (show via RShowDoc("KEYWORDS")): 59 | % \keyword{ ~kwd1 } 60 | % \keyword{ ~kwd2 } 61 | % Use only one keyword per line. 62 | % For non-standard keywords, use \concept instead of \keyword: 63 | % \concept{ ~cpt1 } 64 | % \concept{ ~cpt2 } 65 | % Use only one concept per line. 66 | -------------------------------------------------------------------------------- /GEMLI_package_v0/man/memory_gene_calling.Rd: -------------------------------------------------------------------------------- 1 | \name{memory_gene_calling} 2 | \alias{memory_gene_calling} 3 | %- Also NEED an '\alias' for EACH other topic documented here. 4 | \title{ 5 | memory_gene_calling 6 | } 7 | \description{ 8 | This function identifies memory genes (or lineage markers) based on gene expression variability across lineages (either from predictions or from ground truth) from single-cell RNA-sequencing data. 9 | } 10 | \usage{ 11 | memory_gene_calling(GEMLI_items, valid_lineage_sizes=2:5, use_median=T, ground_truth=F) 12 | } 13 | \arguments{ 14 | \item{ 15 | GEMLI_items}{'GEMLI_items' is a list of GEMLI inputs and outputs. To run 'memory_gene_calling' GEMLI_items must contain a 'gene_expression' element. If it is run on bracodes 'GEMLI_items' needs to contain a 'barcodes' element. To run 'memory_gene_calling' on predictions 'GEMLI_items' needs to contain a 'predicted_lineage_table' elment. It is also possible to run 'memory_gene_calling' on a symmetric or asymmetric lineage type of specific cell types. In this case, the GEMLI_items must contain a 'cell_fate_analysis' element. The ‘gene_expression’ element is a quality controlled and normalised gene expression matrix where rownames are genes (features) and column names are cell IDs (cell barcodes). The 'predicted_lineage_table' is generated using the function prediction_to_lineage_information. The 'barcodes' element is a named vector of lineage ground truth (names=cell.ID, value=clone.ID). The 'cell_fate_analysis' element is generated using the 'extract_cell_fate_lineages' function. 16 | } 17 | \item{ 18 | valid_lineage_sizes}{'valid_lineage_sizes' specifies the range of lineage sizes to be included. Depending on the question to be investigated it can be beneficial to either restrict this to small lineages or large lineages respectively. Default is small lineage from 2 to 5 cells (2:5). 19 | } 20 | \item{ 21 | use_median}{'use_median' specifies whether the median of lineages should be used rather than the mean. This makes the approach more robust to outliers. Default is 'true'/'T'. 22 | } 23 | \item{ 24 | use_barcodes}{'ground_truth' specifies whether to call memory genes on ground truth instead of predictions. Default is 'false'/'F'. 25 | } 26 | \item{ 27 | cell_fate}{'cell_fate', when present, specifies to call memory genes on specific symmetric or asymmetric lineages. It is a vector of the 'cell.fate' in the 'cell_fate_analysis' GEMLI_items element of the lineages to be used for memory gene calling. Cell fates start with sym or asym, followed by the cell type, separated by an underscore. 28 | } 29 | } 30 | \details{ 31 | %% ~~ If necessary, more details than the description above ~~ 32 | } 33 | \value{ 34 | 'memory_gene_calling' yields a table of gene names or IDs of potential memory genes (row names), as well as their variability ('var') across lineages and a p-values ('p'). This table is stored un the GEMLI_items list as element 'memory_genes'. 35 | } 36 | \references{ 37 | %% ~put references to the literature/web site here ~ 38 | } 39 | \author{ 40 | Marcel Tarbier and Almut Eisele 41 | } 42 | \note{ 43 | %% ~~further notes~~ 44 | } 45 | 46 | %% ~Make other sections like Warning with \section{Warning }{....} ~ 47 | 48 | \seealso{ 49 | %% ~~objects to See Also as \code{\link{help}}, ~~~ 50 | } 51 | \examples{ 52 | ##---- Should be DIRECTLY executable !! ---- 53 | ##-- ==> Define data, use random, 54 | ##-- or do help(data=index) for the standard data sets. 55 | 56 | ## The function is currently defined as 57 | function (x) 58 | { 59 | } 60 | } 61 | % Add one or more standard keywords, see file 'KEYWORDS' in the 62 | % R documentation directory (show via RShowDoc("KEYWORDS")): 63 | % \keyword{ ~kwd1 } 64 | % \keyword{ ~kwd2 } 65 | % Use only one keyword per line. 66 | % For non-standard keywords, use \concept instead of \keyword: 67 | % \concept{ ~cpt1 } 68 | % \concept{ ~cpt2 } 69 | % Use only one concept per line. 70 | -------------------------------------------------------------------------------- /GEMLI_package_v0/man/potential_markers.Rd: -------------------------------------------------------------------------------- 1 | \name{potential_marker} 2 | \alias{potential_marker} 3 | %- Also NEED an '\alias' for EACH other topic documented here. 4 | \title{ 5 | potential_marker 6 | } 7 | \description{ 8 | This function identifies potential lineage markers based on mean gene expression and gene expression variability from single-cell RNA-sequencing data. It is part of the 'predict_lineages' function, but can also be called idependently on a quality controlled and normalized gene expression matrix where rownames are genes (features) and column names are cell IDs (samples). 9 | } 10 | \usage{ 11 | potential_marker(data_matrix) 12 | } 13 | \arguments{ 14 | \item{ 15 | data_matrix}{'data_matrix' is a quality controlled and normalized gene expression matrix where rownames are genes (features) and column names are cell IDs (samples). 16 | } 17 | } 18 | \details{ 19 | %% ~~ If necessary, more details than the description above ~~ 20 | } 21 | \value{ 22 | 'potential_marker' yields a vector of gene names or IDs of potential lineage marker genes. While genes are selected purely based on gene expression mean and variability, it has been shown that this approach enriches for genes with lineage specific gene expression profiles. 23 | } 24 | \references{ 25 | %% ~put references to the literature/web site here ~ 26 | } 27 | \author{ 28 | Marcel Tarbier and Almut Eisele 29 | } 30 | \note{ 31 | %% ~~further notes~~ 32 | } 33 | 34 | %% ~Make other sections like Warning with \section{Warning }{....} ~ 35 | 36 | \seealso{ 37 | %% ~~objects to See Also as \code{\link{help}}, ~~~ 38 | } 39 | \examples{ 40 | ##---- Should be DIRECTLY executable !! ---- 41 | ##-- ==> Define data, use random, 42 | ##-- or do help(data=index) for the standard data sets. 43 | 44 | ## The function is currently defined as 45 | function (x) 46 | { 47 | } 48 | } 49 | % Add one or more standard keywords, see file 'KEYWORDS' in the 50 | % R documentation directory (show via RShowDoc("KEYWORDS")): 51 | % \keyword{ ~kwd1 } 52 | % \keyword{ ~kwd2 } 53 | % Use only one keyword per line. 54 | % For non-standard keywords, use \concept instead of \keyword: 55 | % \concept{ ~cpt1 } 56 | % \concept{ ~cpt2 } 57 | % Use only one concept per line. 58 | -------------------------------------------------------------------------------- /GEMLI_package_v0/man/predict_lineages.Rd: -------------------------------------------------------------------------------- 1 | \name{predict_lineages} 2 | \alias{predict_lineages} 3 | %- Also NEED an '\alias' for EACH other topic documented here. 4 | \title{ 5 | predict_lineages 6 | } 7 | \description{ 8 | This function predicts cell lineages from from single-cell RNA-sequencing data. It identifies potential lineage markers based on mean gene expression and gene expression variability and uses these markers in a repeated iterative clustering approach. Subsets of these genes are used to cluster cells until the desired cluster size is reached. This clustering is repeated many times for random subsets. The result is a cell by cell matrix that lists how many times each cell pair clustered together which translates into a confidence score. Cell pairs with high confidence scores are likely to be members of the same lineage. 9 | } 10 | \usage{ 11 | predict_lineages(GEMLI_items, repetitions=100, sample_size=(2/3), desired_cluster_size=c(2,3), fast=FALSE) 12 | } 13 | \arguments{ 14 | \item{ 15 | GEMLI_items}{GEMLI_items is a list of GEMLI inputs and outputs. To run 'predict_lineages' it should contain a gene expression matrix named 'gene_expression'. This is a quality controlled and normalized gene expression matrix where rownames are genes (features) and column names are cell IDs (samples). 16 | } 17 | \item{ 18 | repetitions}{'repetitions' specifies how many times the input matrix will be clustered using random subsamples of potential markers. A higher number of iterations leads to more robust results. 10 iterations is considered to be the minimum, but 100 iterations are strongly recommended (default value). Runtime is linear with regard to the number of iterations. 19 | } 20 | \item{ 21 | sample_size}{'sample_size' is a value between 0 and 1 and specifies the fraction of potential markers that are used in each clustering. values between 0.5 and 0.67 (default value) are recommended. 22 | } 23 | \item{ 24 | desired_cluster_size}{'desired_cluster_size' specifies the number of cells per cluster to be achieved in each clustering. The input is a list of values, e.g. c(2,3,4) or (2:4). The desired_cluster_size parameter should generally be small. Values between 2 and 4 are recommended (default). 25 | } 26 | \item{ 27 | fast}{'fast' =TRUE uses the HiClimR package for calculating correlations. This will make predictions faster but reduce precision. The default value is FALSE. 28 | } 29 | } 30 | \details{ 31 | %% ~~ If necessary, more details than the description above ~~ 32 | } 33 | \value{ 34 | 'predict_lineages' yields a cell by cell matrix containing confidence scores. Cell pairs with high confidence scores are likely to be members of the same lineage. This matrix is added to the 'GEMLI_items' under the name 'prediction'. 35 | } 36 | \references{ 37 | %% ~put references to the literature/web site here ~ 38 | } 39 | \author{ 40 | Marcel Tarbier and Almut Eisele 41 | } 42 | \note{ 43 | %% ~~further notes~~ 44 | } 45 | 46 | %% ~Make other sections like Warning with \section{Warning }{....} ~ 47 | 48 | \seealso{ 49 | %% ~~objects to See Also as \code{\link{help}}, ~~~ 50 | } 51 | \examples{ 52 | ##---- Should be DIRECTLY executable !! ---- 53 | ##-- ==> Define data, use random, 54 | ##-- or do help(data=index) for the standard data sets. 55 | 56 | ## The function is currently defined as 57 | function (x) 58 | { 59 | } 60 | } 61 | % Add one or more standard keywords, see file 'KEYWORDS' in the 62 | % R documentation directory (show via RShowDoc("KEYWORDS")): 63 | % \keyword{ ~kwd1 } 64 | % \keyword{ ~kwd2 } 65 | % Use only one keyword per line. 66 | % For non-standard keywords, use \concept instead of \keyword: 67 | % \concept{ ~cpt1 } 68 | % \concept{ ~cpt2 } 69 | % Use only one concept per line. 70 | -------------------------------------------------------------------------------- /GEMLI_package_v0/man/predict_lineages_multiple_sizes.Rd: -------------------------------------------------------------------------------- 1 | \name{predict_lineages_multiple_sizes} 2 | \alias{predict_lineages_multiple_sizes} 3 | %- Also NEED an '\alias' for EACH other topic documented here. 4 | \title{ 5 | predict_lineages_multiple_sizes 6 | } 7 | \description{ 8 | This function predicts cell lineages from single-cell RNA-sequencing data with sizes ranging from a minimal to a maximal value. The prediction of lineages is performed as for the ‘predict_lineages’ function not only for a single desired lineage size, but independently for all cluster sizes in between a minimal and maximal value. For each prediction, the predicted lineage information is generated as for the ‘prediction_to_lineage_information function’. The function generates the input for the function ‘cluster_stability_plot’, which allows to estimate the size of multicellular structures present in the single-cell RNA-sequencing data. 9 | } 10 | \usage{ 11 | predict_lineages_multiple_sizes(GEMLI_items, repetitions=10, sample_size=(2/3), minimal_maximal_cluster_size=c(2,50), fast=FALSE, cutoff=5) 12 | } 13 | \arguments{ 14 | \item{ 15 | GEMLI_items}{GEMLI_items is a list of GEMLI inputs and outputs. To run 'predict_lineages' it should contain a gene expression matrix named 'gene_expression'. This is a quality controlled and normalized gene expression matrix where rownames are genes (features) and column names are cell IDs (samples). 16 | } 17 | \item{ 18 | repetitions}{'repetitions' specifies how many times the input matrix will be clustered using random subsamples of potential markers. A higher number of iterations leads to more robust results. 10 iterations is considered to be the minimum, but 100 iterations are strongly recommended (default value). Runtime is linear with regard to the number of iterations. 19 | } 20 | \item{ 21 | sample_size}{'sample_size' is a value between 0 and 1 and specifies the fraction of potential markers that are used in each clustering. values between 0.5 and 0.67 (default value) are recommended. 22 | } 23 | \item{ 24 | minimal_maximal_cluster_size}{'minimal_maximal_cluster_size' gives the minimal and maximal number of cells per cluster for which independent lineage predictions are run. The input is a vector of two values (minimal, maximal), e.g. c(2,50). The maximal value chosen should correspond to a value close or above the maximal expected size of multicellular structures present in the single-cell RNA-sequencing data. The default value is c(2,50). 25 | } 26 | \item{ 27 | fast}{'fast' =TRUE uses the HiClimR package for calculating correlations. This makes predictions faster but reduces precision. The default value is FALSE. 28 | } 29 | \item{ 30 | cutoff}{'cutoff' specifies the confidence score at which a cell pair is considered to be part of the same lineage. High values (e.g. 70-100) provide high precision but lower sensitivity. Low values (e.g. 30-60) provide higher sensitivity but lower precision. 31 | } 32 | } 33 | \details{ 34 | %% ~~ If necessary, more details than the description above ~~ 35 | } 36 | \value{ 37 | 'predict_lineages_multiple_sizes' yields a cell by lineage size matrix containing lineage IDs. This matrix is added to the 'GEMLI_items' under the name 'prediction_multiple_sizes'. The function generates the input for the function 'cluster_stability_plot', which allows to estimate the size of multicellular structures present in the single-cell RNA-sequencing data 38 | } 39 | \references{ 40 | %% ~put references to the literature/web site here ~ 41 | } 42 | \author{ 43 | Almut Eisele and Marcel Tarbier 44 | } 45 | \note{ 46 | %% ~~further notes~~ 47 | } 48 | 49 | %% ~Make other sections like Warning with \section{Warning }{....} ~ 50 | 51 | \seealso{ 52 | %% ~~objects to See Also as \code{\link{help}}, ~~~ 53 | } 54 | \examples{ 55 | ##---- Should be DIRECTLY executable !! ---- 56 | ##-- ==> Define data, use random, 57 | ##-- or do help(data=index) for the standard data sets. 58 | 59 | ## The function is currently defined as 60 | function (x) 61 | { 62 | } 63 | } 64 | % Add one or more standard keywords, see file 'KEYWORDS' in the 65 | % R documentation directory (show via RShowDoc("KEYWORDS")): 66 | % \keyword{ ~kwd1 } 67 | % \keyword{ ~kwd2 } 68 | % Use only one keyword per line. 69 | % For non-standard keywords, use \concept instead of \keyword: 70 | % \concept{ ~cpt1 } 71 | % \concept{ ~cpt2 } 72 | % Use only one concept per line. 73 | -------------------------------------------------------------------------------- /GEMLI_package_v0/man/predict_lineages_with_known_markers.Rd: -------------------------------------------------------------------------------- 1 | \name{predict_lineages_with_known_markers} 2 | \alias{predict_lineages_with_known_markers} 3 | %- Also NEED an '\alias' for EACH other topic documented here. 4 | \title{ 5 | predict_lineages_with_known_markers 6 | } 7 | \description{ 8 | This function predicts cell lineages from from single-cell RNA-sequencing data when lineage markers are already knwon, e.g. from a barcoding experiment. Subsets of these genes are used to cluster cells until the desired cluster size is reached. This clustering is repeated many times for random subsets. The result is a cell by cell matrix that lists how many times each cell pair clustered together which translates into a confidence score. Cell pairs with high confidence scores are likely to be members of the same lineage. 9 | } 10 | \usage{ 11 | predict_lineages_with_known_markers(GEMLI_items, repetitions=100, sample_size=(2/3), desired_cluster_size=c(2,3), fast=FALSE) 12 | } 13 | \arguments{ 14 | \item{ 15 | GEMLI_items}{GEMLI_items is a list of GEMLI inputs and outputs. To run 'predict_lineages' it should contain a gene expression matrix named 'gene_expression'. This is a quality controlled and normalized gene expression matrix where rownames are genes (features) and column names are cell IDs (samples). In addition it needs to contain a vector of known marker genes names named 'known_markers'. 16 | } 17 | \item{ 18 | repetitions}{'repetitions' specifies how many times the input matrix will be clustered using random subsamples of potential markers. A higher number of iterations leads to more robust results. 10 iterations is considered to be the minimum, but 100 iterations are strongly recommended (default value). Runtime is linear with regard to the number of iterations. 19 | } 20 | \item{ 21 | sample_size}{'sample_size' is a value between 0 and 1 and specifies the fraction of potential markers that are used in each clustering. values between 0.5 and 0.67 (default value) are recommended. 22 | } 23 | \item{ 24 | desired_cluster_size}{'desired_cluster_size' specifies the number of cells per cluster to be achieved in each clustering. The input is a lsit of values, e.g. c(2,3,4) or (2:4). The desired_cluster_size parameter should generally be small. Values between 2 and 4 are recommended (default). 25 | } 26 | \item{ 27 | fast}{'fast' =TRUE uses the HiClimR package for calculating correlations. This makes predictions faster but reduces precision. The default value is FALSE. 28 | } 29 | } 30 | \details{ 31 | %% ~~ If necessary, more details than the description above ~~ 32 | } 33 | \value{ 34 | 'predict_lineages' yields a cell by cell matrix containing confidence scores. Cell pairs with high confidence scores are likely to be members of the same lineage. This matrix is added to the 'GEMLI_items' under the name 'prediction'. 35 | } 36 | \references{ 37 | %% ~put references to the literature/web site here ~ 38 | } 39 | \author{ 40 | Marcel Tarbier and Almut Eisele 41 | } 42 | \note{ 43 | %% ~~further notes~~ 44 | } 45 | 46 | %% ~Make other sections like Warning with \section{Warning }{....} ~ 47 | 48 | \seealso{ 49 | %% ~~objects to See Also as \code{\link{help}}, ~~~ 50 | } 51 | \examples{ 52 | ##---- Should be DIRECTLY executable !! ---- 53 | ##-- ==> Define data, use random, 54 | ##-- or do help(data=index) for the standard data sets. 55 | 56 | ## The function is currently defined as 57 | function (x) 58 | { 59 | } 60 | } 61 | % Add one or more standard keywords, see file 'KEYWORDS' in the 62 | % R documentation directory (show via RShowDoc("KEYWORDS")): 63 | % \keyword{ ~kwd1 } 64 | % \keyword{ ~kwd2 } 65 | % Use only one keyword per line. 66 | % For non-standard keywords, use \concept instead of \keyword: 67 | % \concept{ ~cpt1 } 68 | % \concept{ ~cpt2 } 69 | % Use only one concept per line. 70 | -------------------------------------------------------------------------------- /GEMLI_package_v0/man/prediction_to_lineage_information.Rd: -------------------------------------------------------------------------------- 1 | \name{prediction_to_lineage_information} 2 | \alias{prediction_to_lineage_information} 3 | %- Also NEED an '\alias' for EACH other topic documented here. 4 | \title{ 5 | prediction_to_lineage_information 6 | } 7 | \description{ 8 | This transforms a cell by cell matrix of confidence scores as created by the 'predict_lineages' function into a table (or vector, if specified) of predicted lineages. 9 | } 10 | \usage{ 11 | prediction_to_lineage_information(GEMLI_items, cutoff=50) 12 | } 13 | \arguments{ 14 | \item{ 15 | GEMLI_items}{'GEMLI_items' is a list of GEMLI inputs and outputs. To run 'test_lineages' it should contain a prediction matrix named 'prediction' that is generated and added to the items list by the function 'predict lineages'. 16 | } 17 | \item{ 18 | cutoff}{'cutoff' specifies the confidence score at which a cell pair is considered to be part of the same lineage. High values (e.g. 70-100) provide high precision but lower sensitivity. Low values (e.g. 30-60) provide higher sensitivity but lower precision. The default value is 50. 19 | } 20 | } 21 | \details{ 22 | %% ~~ If necessary, more details than the description above ~~ 23 | } 24 | \value{ 25 | The output is a matrix that lists cell IDs in the first column and the predicted lineage in the second column. This is added to the 'GEMLI_items' under the name 'predicted_lineage_table'. It also generates and adds the result as a vector that contains the predicted lineage as values and the cell IDs as names under the name 'predicted_lineages'. 26 | } 27 | \references{ 28 | %% ~put references to the literature/web site here ~ 29 | } 30 | \author{ 31 | Marcel Tarbier and Almut Eisele 32 | } 33 | \note{ 34 | %% ~~further notes~~ 35 | } 36 | 37 | %% ~Make other sections like Warning with \section{Warning }{....} ~ 38 | 39 | \seealso{ 40 | %% ~~objects to See Also as \code{\link{help}}, ~~~ 41 | } 42 | \examples{ 43 | ##---- Should be DIRECTLY executable !! ---- 44 | ##-- ==> Define data, use random, 45 | ##-- or do help(data=index) for the standard data sets. 46 | 47 | ## The function is currently defined as 48 | function (x) 49 | { 50 | } 51 | } 52 | % Add one or more standard keywords, see file 'KEYWORDS' in the 53 | % R documentation directory (show via RShowDoc("KEYWORDS")): 54 | % \keyword{ ~kwd1 } 55 | % \keyword{ ~kwd2 } 56 | % Use only one keyword per line. 57 | % For non-standard keywords, use \concept instead of \keyword: 58 | % \concept{ ~cpt1 } 59 | % \concept{ ~cpt2 } 60 | % Use only one concept per line. 61 | -------------------------------------------------------------------------------- /GEMLI_package_v0/man/quantify_clusters_iterative.Rd: -------------------------------------------------------------------------------- 1 | \name{quantify_clusters_iterative} 2 | \alias{quantify_clusters_iterative} 3 | %- Also NEED an '\alias' for EACH other topic documented here. 4 | \title{ 5 | quantify_clusters_iterative 6 | } 7 | \description{ 8 | This function clusters the input matrix repeatedly until a desired cluster size is reached. 9 | } 10 | \usage{ 11 | quantify_clusters_iterative(data_matrix, marker_genes, N=2, fast=FALSE) 12 | } 13 | \arguments{ 14 | \item{ 15 | data_matrix}{'data_matrix' is a quality controlled and normalized gene expression matrix where rownames are genes (features) and column names are cell IDs (samples). 16 | } 17 | \item{ 18 | marker_genes}{'marker_genes' is a vector of gene names or IDs of potential lineage marker genes. It is automatically created in the 'predict_lineages' function or can be computed manually using the 'potential_markers' function. 19 | } 20 | \item{ 21 | N}{'N' describes in how many branches the data matrix is split in each clustering step. Higher numbers speed up the clustering but can negatively impact the result of the prediction. It is highly recommended to keep the default value, 2. 22 | } 23 | \item{ 24 | fast}{'fast' =TRUE uses the HiClimR package for calculating correlations. This makes the function faster but less precise. The default is FALSE. 25 | } 26 | } 27 | \details{ 28 | %% ~~ If necessary, more details than the description above ~~ 29 | } 30 | \value{ 31 | The output is a matrix that indicates which cells (rows) cluster together in each iteration (colums). It is used in the 'predict_lineages' function. 32 | } 33 | \references{ 34 | %% ~put references to the literature/web site here ~ 35 | } 36 | \author{ 37 | Marcel Tarbier and Almut Eisele 38 | } 39 | \note{ 40 | %% ~~further notes~~ 41 | } 42 | 43 | %% ~Make other sections like Warning with \section{Warning }{....} ~ 44 | 45 | \seealso{ 46 | %% ~~objects to See Also as \code{\link{help}}, ~~~ 47 | } 48 | \examples{ 49 | ##---- Should be DIRECTLY executable !! ---- 50 | ##-- ==> Define data, use random, 51 | ##-- or do help(data=index) for the standard data sets. 52 | 53 | ## The function is currently defined as 54 | function (x) 55 | { 56 | } 57 | } 58 | % Add one or more standard keywords, see file 'KEYWORDS' in the 59 | % R documentation directory (show via RShowDoc("KEYWORDS")): 60 | % \keyword{ ~kwd1 } 61 | % \keyword{ ~kwd2 } 62 | % Use only one keyword per line. 63 | % For non-standard keywords, use \concept instead of \keyword: 64 | % \concept{ ~cpt1 } 65 | % \concept{ ~cpt2 } 66 | % Use only one concept per line. 67 | -------------------------------------------------------------------------------- /GEMLI_package_v0/man/suggest_network_trimming_to_size.Rd: -------------------------------------------------------------------------------- 1 | \name{suggest_network_trimming_to_size} 2 | \alias{suggest_network_trimming_to_size} 3 | %- Also NEED an '\alias' for EACH other topic documented here. 4 | \title{ 5 | suggest_network_trimming_to_size 6 | } 7 | \description{ 8 | This function visualizes the lineage prediction as a network and highlights edges that could be removed based on a maximum lineage size cutoff. In lineages that exceed this size, the weakest links (least likely shared lineages) are suggested to be trimmed until the desired maximum size is reached. Trimming suggestions are highlighted in red. 9 | } 10 | \usage{ 11 | suggest_network_trimming_to_size(GEMLI_items) # max_size=4, cutoff=70, max_edge_width=5, display_orphan=F, include_labels=T, ground_truth=F) 12 | } 13 | \arguments{ 14 | \item{ 15 | GEMLI_items}{'GEMLI_items' is a list of GEMLI inputs and outputs. To run 'suggest_network_trimming_to_size' it should contain a prediction matrix named 'prediction' that is generated and added to the items list by the function 'predict lineages'. It also needs to contain a vector of lineages (values are the lineages and names are the cell IDs) from predictions named 'predicted_lineages'. This vector can be added to the 'GEMLI_items' using the 'prediction_to_lineage_information' function. 16 | } 17 | \item{ 18 | max_size}{'max_size' specifies maximum size of lineages. 19 | } 20 | \item{ 21 | cutoff}{'cutoff' specifies the confidence score at which a cell pair is considered to be part of the same lineage. High values (e.g. 70-100) provide high precision but lower sensitivity. Low values (e.g. 30-60) provide higher sensitivity but lower precision. Default value is 70. 22 | } 23 | \item{ 24 | max_edge_with}{'max_edge_width' specifies the maximum width of edges in the network visualization. All edge weights above the defined 'cutoff' will be scaled between 0.1*'max_edge_with' and 'max_edge_with'. Default value is 5. 25 | } 26 | \item{ 27 | display_orphan}{'display_orphan' defines whether cells without connections should be displayed. This commonly leads the network plot getting less readable. Therefore the suggestion and the default are 'false'/'F'. 28 | } 29 | \item{ 30 | include_labels}{'include_labels' defines whether nodes should be numbered. Cell IDs are not shown for readability. 31 | } 32 | \item{ 33 | ground_truth}{If the 'GEMLI_items' list contains a 'barcodes' vector with orthogonal lineage information 'ground_truth' can be set 'true'/'T' to color cells according to their lineage. Default is 'false'/'F'. 34 | } 35 | \item{ 36 | layout_style}{Depending on the number of cells, and the size of lineages, different layout styles can improve readability. In the suggest_network_trimming_to_size two layout algorithms can be chosen: Fruchterman-Reingold ("fr") and Kamada-Kawai ("kk"). Default is "fr". 37 | } 38 | } 39 | \details{ 40 | %% ~~ If necessary, more details than the description above ~~ 41 | } 42 | \value{ 43 | This function has no output. It is for visualization only. 44 | } 45 | \references{ 46 | %% ~put references to the literature/web site here ~ 47 | } 48 | \author{ 49 | Marcel Tarbier and Almut Eisele 50 | } 51 | \note{ 52 | %% ~~further notes~~ 53 | } 54 | 55 | %% ~Make other sections like Warning with \section{Warning }{....} ~ 56 | 57 | \seealso{ 58 | %% ~~objects to See Also as \code{\link{help}}, ~~~ 59 | } 60 | \examples{ 61 | ##---- Should be DIRECTLY executable !! ---- 62 | ##-- ==> Define data, use random, 63 | ##-- or do help(data=index) for the standard data sets. 64 | 65 | ## The function is currently defined as 66 | function (x) 67 | { 68 | } 69 | } 70 | % Add one or more standard keywords, see file 'KEYWORDS' in the 71 | % R documentation directory (show via RShowDoc("KEYWORDS")): 72 | % \keyword{ ~kwd1 } 73 | % \keyword{ ~kwd2 } 74 | % Use only one keyword per line. 75 | % For non-standard keywords, use \concept instead of \keyword: 76 | % \concept{ ~cpt1 } 77 | % \concept{ ~cpt2 } 78 | % Use only one concept per line. 79 | -------------------------------------------------------------------------------- /GEMLI_package_v0/man/test_lineages.Rd: -------------------------------------------------------------------------------- 1 | \name{test_lineages} 2 | \alias{test_lineages} 3 | %- Also NEED an '\alias' for EACH other topic documented here. 4 | \title{ 5 | test_lineages 6 | } 7 | \description{ 8 | This function tests the results of lineage assignments by comparing it to lineage assignments from cell barcoding. 9 | } 10 | \usage{ 11 | test_lineages(GEMLI_items) 12 | } 13 | lineage_predictions_matrix, lineage_dict_bc, valid_fam_sizes=(1:5), max_interval=100, plot_results=F) 14 | \arguments{ 15 | \item{ 16 | GEMLI_items}{'GEMLI_items' is a list of GEMLI inputs and outputs. To run 'test_lineages' it should contain a prediction matrix named 'prediction' that is generated and added to the items list by the function 'predict lineages'. It also needs to contain a ground truth names 'barcodes' provided as a named vector (values are the lineages and names are the cell IDs). 17 | } 18 | \item{ 19 | valid_fam_sizes}{'valid_fam_sizes' specifies a resonable range for lineage sizes. 20 | } 21 | \item{ 22 | max_interval}{'max_interval' is the number of repetitions used in the 'predict_lineages' function. The default value is 100. 23 | } 24 | \item{ 25 | plot_results}{'plot_results' specifies whether the results of the test are visualized (plotted). 26 | } 27 | } 28 | \details{ 29 | %% ~~ If necessary, more details than the description above ~~ 30 | } 31 | \value{ 32 | The output is a table that lists the number of false positives (FP) and true positives (TP), the precision (TP/PP, where PP is the number of predicted positives which is the sum of TP and FP), and the sensitivity (TP/P, where P is the number of real positives which is the sum of TP and FN - false negatives). 33 | } 34 | \references{ 35 | %% ~put references to the literature/web site here ~ 36 | } 37 | \author{ 38 | Marcel Tarbier and Almut Eisele 39 | } 40 | \note{ 41 | %% ~~further notes~~ 42 | } 43 | 44 | %% ~Make other sections like Warning with \section{Warning }{....} ~ 45 | 46 | \seealso{ 47 | %% ~~objects to See Also as \code{\link{help}}, ~~~ 48 | } 49 | \examples{ 50 | ##---- Should be DIRECTLY executable !! ---- 51 | ##-- ==> Define data, use random, 52 | ##-- or do help(data=index) for the standard data sets. 53 | 54 | ## The function is currently defined as 55 | function (x) 56 | { 57 | } 58 | } 59 | % Add one or more standard keywords, see file 'KEYWORDS' in the 60 | % R documentation directory (show via RShowDoc("KEYWORDS")): 61 | % \keyword{ ~kwd1 } 62 | % \keyword{ ~kwd2 } 63 | % Use only one keyword per line. 64 | % For non-standard keywords, use \concept instead of \keyword: 65 | % \concept{ ~cpt1 } 66 | % \concept{ ~cpt2 } 67 | % Use only one concept per line. 68 | -------------------------------------------------------------------------------- /GEMLI_package_v0/man/trim_network_to_size.Rd: -------------------------------------------------------------------------------- 1 | \name{trim_network_to_size} 2 | \alias{trim_network_to_size} 3 | %- Also NEED an '\alias' for EACH other topic documented here. 4 | \title{ 5 | trim_network_to_size 6 | } 7 | \description{ 8 | This function trims predicted lineages that exceed a maximum lineage size. In lineages that exceed this size, the weakest links (least likely shared lineages) are trimmed until the desired maximum size is reached. 9 | } 10 | \usage{ 11 | trim_network_to_size(GEMLI_items) 12 | } 13 | \arguments{ 14 | \item{ 15 | GEMLI_items}{'GEMLI_items' is a list of GEMLI inputs and outputs. To run 'suggest_network_trimming_to_size' it should contain a prediction matrix named 'prediction' that is generated and added to the items list by the function 'predict lineages'. It also needs to contain a vector of lineages (values are the lineages and names are the cell IDs) from predictions named 'predicted_lineages'. This vector can be added to the 'GEMLI_items' using the 'prediction_to_lineage_information' function. 16 | } 17 | \item{ 18 | max_size}{'max_size' specifies maximum size of lineages. 19 | } 20 | \item{ 21 | cutoff}{'cutoff' specifies the confidence score at which a cell pair is considered to be part of the same lineage. High values (e.g. 70-100) provide high precision but lower sensitivity. Low values (e.g. 30-60) provide higher sensitivity but lower precision. Default value is 70. 22 | } 23 | } 24 | \details{ 25 | %% ~~ If necessary, more details than the description above ~~ 26 | } 27 | \value{ 28 | This function return a 'GEMLI_items' list in which the prediction matrix ('prediction'), the prediction table ('predicted_lineage_table') and the prediction vector ('predicted_lineages') have been trimmed to the specified size. 29 | } 30 | \references{ 31 | %% ~put references to the literature/web site here ~ 32 | } 33 | \author{ 34 | Marcel Tarbier and Almut Eisele 35 | } 36 | \note{ 37 | %% ~~further notes~~ 38 | } 39 | 40 | %% ~Make other sections like Warning with \section{Warning }{....} ~ 41 | 42 | \seealso{ 43 | %% ~~objects to See Also as \code{\link{help}}, ~~~ 44 | } 45 | \examples{ 46 | ##---- Should be DIRECTLY executable !! ---- 47 | ##-- ==> Define data, use random, 48 | ##-- or do help(data=index) for the standard data sets. 49 | 50 | ## The function is currently defined as 51 | function (x) 52 | { 53 | } 54 | } 55 | % Add one or more standard keywords, see file 'KEYWORDS' in the 56 | % R documentation directory (show via RShowDoc("KEYWORDS")): 57 | % \keyword{ ~kwd1 } 58 | % \keyword{ ~kwd2 } 59 | % Use only one keyword per line. 60 | % For non-standard keywords, use \concept instead of \keyword: 61 | % \concept{ ~cpt1 } 62 | % \concept{ ~cpt2 } 63 | % Use only one concept per line. 64 | -------------------------------------------------------------------------------- /GEMLI_package_v0/man/visualize_as_network.Rd: -------------------------------------------------------------------------------- 1 | \name{visualize_as_network} 2 | \alias{visualize_as_network} 3 | %- Also NEED an '\alias' for EACH other topic documented here. 4 | \title{ 5 | visualize_as_network 6 | } 7 | \description{ 8 | This function visualizes the lineage prediction as a network. 9 | } 10 | \usage{ 11 | visualize_as_network(GEMLI_items, cutoff=70, display_orphan=F, max_edge_width=5, ground_truth=F, include_labels=F, highlight_FPs=F, layout_style="fr", cell_type_colors=F) 12 | } 13 | \arguments{ 14 | \item{ 15 | GEMLI_items}{GEMLI_items is a list of GEMLI inputs and outputs. To run 'visualize_as_network' it should contain a prediction matrix named 'prediction' that is generated and added to the items list by the function 'predict lineages'. 16 | } 17 | \item{ 18 | cutoff}{'cutoff' specifies the confidence score at which a cell pair is considered to be part of the same lineage. High values (e.g. 70-100) provide high precision but lower sensitivity. Low values (e.g. 30-60) provide higher sensitivity but lower precision. Default value is 70. 19 | } 20 | \item{ 21 | max_edge_width}{'max_edge_width' specifies the maximum width of edges in the network visualization. All edge weights above the defined 'cutoff' will be scaled between 0.1*'max_edge_with' and 'max_edge_with'. Default value is 5. 22 | } 23 | \item{ 24 | display_orphan}{'display_orphan' defines whether cells without connections should be displayed. This commonly leads the network plot getting less readable. Therefore the suggestion and the default are 'false'/'F'. 25 | } 26 | \item{ 27 | include_labels}{'include_labels' defines whether nodes should be numbered. Cell IDs are not shown for readability. 28 | } 29 | \item{ 30 | ground_truth}{If the 'GEMLI_items' list contains a 'barcodes' vector with orthogonal lineage information 'ground_truth' can be set 'true'/'T' to color cells according to their lineage. Default is 'false'/'F'. 31 | } 32 | \item{ 33 | highlight_FPs}{If the 'GEMLI_items' list contains a 'barcodes' vector with orthogonal lineage information and 'ground_truth' is 'true'/'T' connections between cell that are false positives will be highlighted in red. It can be set tp 'false'/'F' to not highlight false predictions. Default is 'false'/'F'. 34 | } 35 | \item{ 36 | layout_style}{Depending on the number of cells, and the size of lineages, different layout styles can improve readability. Currently three different network layout algorithms canbe chosen: Fruchterman-Reingold ("fr"), Kamada-Kawai ("kk"), and grid ("grid"). Default is "fr". 37 | } 38 | \item{ 39 | cell_type_colors}{'cell_type_colors' can be set 'true'/'T' to color cells by assigned cell type. For such coloring the GEMLI items list must contain a cell_type element (dataframe with column 'cell.ID' and 'cell.type'). Specific colors will be assigned to specific cell types if a GEMLI items list elemnt 'cell_type_color' is added (dataframe with column 'cell-type' and 'color'). If no 'cell_type_color' element is present, random colors will be assigned to each cell type. Default is cell_type_colors = 'F'. 40 | } 41 | } 42 | \details{ 43 | %% ~~ If necessary, more details than the description above ~~ 44 | } 45 | \value{ 46 | WIP 47 | } 48 | \references{ 49 | %% ~put references to the literature/web site here ~ 50 | } 51 | \author{ 52 | Marcel Tarbier and Almut Eisele 53 | } 54 | \note{ 55 | %% ~~further notes~~ 56 | } 57 | 58 | %% ~Make other sections like Warning with \section{Warning }{....} ~ 59 | 60 | \seealso{ 61 | %% ~~objects to See Also as \code{\link{help}}, ~~~ 62 | } 63 | \examples{ 64 | ##---- Should be DIRECTLY executable !! ---- 65 | ##-- ==> Define data, use random, 66 | ##-- or do help(data=index) for the standard data sets. 67 | 68 | ## The function is currently defined as 69 | function (x) 70 | { 71 | } 72 | } 73 | % Add one or more standard keywords, see file 'KEYWORDS' in the 74 | % R documentation directory (show via RShowDoc("KEYWORDS")): 75 | % \keyword{ ~kwd1 } 76 | % \keyword{ ~kwd2 } 77 | % Use only one keyword per line. 78 | % For non-standard keywords, use \concept instead of \keyword: 79 | % \concept{ ~cpt1 } 80 | % \concept{ ~cpt2 } 81 | % Use only one concept per line. 82 | -------------------------------------------------------------------------------- /GEMLI_package_v0/vignettes/Example1_simple_lineage_predictions.Rmd: -------------------------------------------------------------------------------- 1 | --- 2 | title: "Example1_simple_lineage_predictions" 3 | output: html_vignette 4 | date: "2023-06-02" 5 | vignette: > 6 | %\VignetteIndexEntry{Example1_simple_lineage_predictions} 7 | %\VignetteEngine{knitr::rmarkdown} 8 | %\VignetteEncoding{UTF-8} 9 | 10 | --- 11 | 12 | ```{r, echo = FALSE, message=FALSE} 13 | knitr::opts_chunk$set(message = FALSE, warning = FALSE) 14 | ``` 15 | 16 | 17 | For our first example we'll be looking at mouse embryonic stem cells that have been barcoded and cultured for 48h. We'll be working with a subset of this data for fast processing. In our subset we find 'family sizes' ranging from just two up to five related cells. 18 | 19 | ## Load package and example data 20 | 21 | First we load the example data. 22 | 23 | ```{r, eval=T, echo=T} 24 | library(GEMLI) 25 | library(igraph) 26 | library(HiClimR) 27 | 28 | load('GEMLI_example_data_matrix.RData') 29 | load('GEMLI_example_barcode_information.RData') 30 | 31 | ``` 32 | 33 | ## Create a GEMLI items list 34 | 35 | GEMLI's inputs and outputs are stored in a list of objects with predefined names. To run GEMLI you need at least a quality controlled and normalized gene expression matrix (rows = genes/features, colums = cells/samples). In this example we also provide a ground truth for lineages stemming from a barcoding experiment (values = barcode ID, names = cell IDs). 36 | 37 | ```{r, eval=T, echo=T} 38 | # Making GEMLI list and storing data 39 | GEMLI_items = list() 40 | GEMLI_items[['gene_expression']] = data_matrix 41 | GEMLI_items[['barcodes']] = lineage_dict_bc 42 | 43 | # A brief look at the loaded data 44 | GEMLI_items[['gene_expression']][9:14,1:5] 45 | GEMLI_items[['barcodes']][1:5] 46 | 47 | ``` 48 | 49 | ## Perform lineage predictions 50 | 51 | We can then identify cell lineages through repeated iterative clustering (this may take 2-3min). The predict_lineages function takes our GEMLI_items as input. It outputs a matrix of all cells against all cells with values corresponding to a confidence score that they are part of the same lineage. 52 | 53 | ```{r, eval=T, echo=T} 54 | # Perform lineage predictions 55 | GEMLI_items = predict_lineages(GEMLI_items) 56 | 57 | # A brief look at the result 58 | GEMLI_items[['prediction']][1:5,15:19] 59 | ``` 60 | 61 | ## Test lineage prediction 62 | 63 | Since we have barcoding data for this dataset we can test the predicted lineages against our ground truth. The test_lineage_prediction function again takes our GEMLI_items as input. It's important that a predcition has been run first with predict_lineages. It outputs the number of true positive predictions (TP), false positive predictions (FP), as well as precision and sensitivity for various confidence intervals. The output can be visualized by setting plot_results to true/T. 64 | 65 | ```{r, eval=T, echo=T, fig.height=6, fig.width=6} 66 | GEMLI_items = test_lineages(GEMLI_items) 67 | 68 | # A brief look at the resulting table 69 | GEMLI_items$testing_results 70 | 71 | # And a run with plotting of the result 72 | GEMLI_items = test_lineages(GEMLI_items, plot_results=T) 73 | ``` 74 | 75 | ## Visualize predictions as network 76 | 77 | We can also investigate our predictions by visualizing them as a network with the visualize_as_network function. Here we need to set a cutoff that defines which predictions we want to consider. It represents a confidence score and high values yield fewer predictions with high precision while low values yield more predictions with lower precision. 78 | 79 | ```{r, eval=T, echo=T, fig.height=6, fig.width=6} 80 | 81 | visualize_as_network(GEMLI_items, cutoff=90) 82 | visualize_as_network(GEMLI_items, cutoff=50) 83 | ``` 84 | 85 | 86 | If a ground truth e.g. from barcoding is avalable we can set ground_truth to true/T to highlight false predictions with red edges. Cells without barcode information will be displayed in white. 87 | 88 | ```{r, eval=T, echo=T, fig.height=6, fig.width=6} 89 | visualize_as_network(GEMLI_items, cutoff=90, ground_truth=T) 90 | visualize_as_network(GEMLI_items, cutoff=50, ground_truth=T) 91 | ``` 92 | 93 | ## Extract lineage information 94 | 95 | Now we can extract the lineage information with the prediction_to_lineage_information function. Again we need to set a cutoff that defines which predictions we want to consider. The function outputs both a lineage table and a 'dictionary', a vector that has the lineage number as values and the cell IDs as names. 96 | 97 | ```{r, eval=T, echo=T} 98 | GEMLI_items = prediction_to_lineage_information(GEMLI_items, cutoff=50) 99 | 100 | # A brief look at the result 101 | GEMLI_items$predicted_lineage_table[1:5,] 102 | 103 | ``` 104 | 105 | -------------------------------------------------------------------------------- /GEMLI_package_v0/vignettes/Example2_predicting_multicellular_structures.Rmd: -------------------------------------------------------------------------------- 1 | 2 | --- 3 | title: "Example2_predicting_multicellular_structures" 4 | output: html_vignette 5 | date: "2023-06-05" 6 | vignette: > 7 | %\VignetteIndexEntry{Example2_predicting_multicellular_structures} 8 | %\VignetteEngine{knitr::rmarkdown} 9 | %\VignetteEncoding{UTF-8} 10 | 11 | --- 12 | 13 | ```{r, echo = FALSE, message=FALSE} 14 | knitr::opts_chunk$set(message = FALSE, warning = FALSE) 15 | ``` 16 | 17 | In this example we'll be visualizing GEMLI results from a lineage-annotated dataset of murine intestinal crypts. Intestinal crypts originate from one or few intestinal stem cells, and are composed of a number of different cell types. In the data we find cells from individual crypts that range from just three up to forty-four related cells. The scRNA-seq dataset is derived from Bues et al. 2022 (PMID 35165449) and is publically available under Gene Expression Omnibus accession number GSE148093. 18 | 19 | 20 | ## Load package and example data 21 | 22 | First we load the example data. Here we already predicted the lineages using GEMLI and therefore do not include a count matrix, but rather start with the predictions right away. 23 | 24 | ```{r, eval=T, echo=T} 25 | library(GEMLI) 26 | library(igraph) 27 | library(HiClimR) 28 | 29 | load('GEMLI_crypts_example_data_matrix.RData') 30 | load('GEMLI_crypts_example_barcode_information.RData') 31 | 32 | ``` 33 | 34 | ## Create a GEMLI items list 35 | 36 | We then create a GEMLI items list. This list is used to store the data, and create and store the outputs of GEMLI (for details check example one). 37 | 38 | ```{r, eval=T, echo=T} 39 | 40 | GEMLI_items_crypts = list() 41 | GEMLI_items_crypts[['prediction']] = Crypts 42 | GEMLI_items_crypts[['barcodes']] = Crypts_bc_dict 43 | 44 | ``` 45 | 46 | ## Visualize predictions as network 47 | 48 | To visualize large lineages we'll use three different network layout algorithms: Fruchterman-Reingold, Kamada-Kawai, and grid. Each of them has advantages and disadvantages. 49 | 50 | (1) Fruchterman-Reingold dispalys the cells of individual predicted crypts close together with ample space between crypts. This makes it hard to see connections within individual crypts but allows to get a good overview of individual structures. 51 | 52 | (2) Kamada-Kawai spaces individual cells well, so we can see individual connections between them. It may, however, happen that two different predicted crypts are partially overlayed, as can be seen for dark red and bright green lineages on the right side of the plot. 53 | 54 | (3) When the network is layed out as a grid, one gets generally a good overview of the predicted lineages and their connections, but it's hard to see which connections belongs to which cell in the same row. 55 | 56 | ```{r, eval=T, echo=T, fig.height=6, fig.width=6} 57 | 58 | visualize_as_network(GEMLI_items_crypts, cutoff=70, display_orphan=F, max_edge_width=1, ground_truth=T, include_labels=F, layout_style="fr") 59 | visualize_as_network(GEMLI_items_crypts, cutoff=70, display_orphan=F, max_edge_width=1, ground_truth=T, include_labels=F, layout_style="kk") 60 | visualize_as_network(GEMLI_items_crypts, cutoff=70, display_orphan=F, max_edge_width=1, ground_truth=T, include_labels=F, layout_style="grid") 61 | 62 | ``` 63 | 64 | ## Adding cell type information to the GEMLI items list 65 | 66 | The cells of individual intestinal crypts can be assigned to different cell types. This information can be added to the GEMLI items list as 'cell_type' slot in the form of a dataframe with column 'cell.ID' and 'cell.type'. 67 | 68 | ```{r, eval=T, echo=T} 69 | 70 | load('GEMLI_crypts_example_cell_type_annotation.RData') 71 | GEMLI_items_crypts[['cell_type']] = Crypts_annotation 72 | 73 | ``` 74 | 75 | ## Color prediction network visualization by cell type 76 | 77 | The visualization of the lineage predictions can now be colored by the cell type annotation. This allows to see the composition of individual intestinal crypts. 78 | 79 | ```{r, eval=T, echo=T, fig.height=6, fig.width=6} 80 | 81 | visualize_as_network(GEMLI_items_crypts, cutoff=70, max_edge_width=5, display_orphan=F, include_labels=F, ground_truth=T, highlight_FPs=T, layout_style="kk", cell_type_colors=T) 82 | 83 | ``` 84 | 85 | Specific colors can be assigned to specific cell types by adding a dataframe with column 'cell.type' and 'color' in the GEMLI items list slot 'cell_type_color'. This can also allow to highlight just one or two selected cell types. 86 | 87 | ```{r, eval=T, echo=T, fig.height=6, fig.width=6} 88 | 89 | # Adding custom color as cell_type_color element to GEMLI_items 90 | cell.type <- unique(GEMLI_items_crypts[['cell_type']]$cell.type) 91 | color <- c("#5386BD", "skyblue1", "darkgreen", "gold", "red", "darkred", "black") 92 | Cell_type_color <- data.frame(cell.type, color) 93 | GEMLI_items_crypts[['cell_type_color']] = Cell_type_color 94 | 95 | # Make a visualization network 96 | visualize_as_network(GEMLI_items_crypts, cutoff=70, max_edge_width=5, display_orphan=F, include_labels=F, ground_truth=T, highlight_FPs=T, layout_style="kk", cell_type_colors=T) 97 | 98 | ``` 99 | -------------------------------------------------------------------------------- /GEMLI_package_v0/vignettes/Example3_cell_fate_analysis.Rmd: -------------------------------------------------------------------------------- 1 | 2 | --- 3 | title: "Example3_cell_fate_analysis" 4 | output: html_document 5 | date: "2023-06-05" 6 | vignette: > 7 | %\VignetteIndexEntry{Example3_cell_fate_analysis} 8 | %\VignetteEngine{knitr::rmarkdown} 9 | %\VignetteEncoding{UTF-8} 10 | --- 11 | 12 | ```{r, echo = FALSE, message=FALSE} 13 | knitr::opts_chunk$set(message = FALSE, warning = FALSE) 14 | ``` 15 | 16 | For our third example we'll be looking at cell fate decisions in a scRNA-seq dataset of human breast cancer encompassing both ductal carcinoma in situ (DCIS) and invasive tumor (inv_tumor) cells. We'll be working with a subset of this data for fast processing. No ground truth is available. We will study the fate transition from DCIS to invasive tumor cells. The data is derived from a public 10X Genomics dataset associated to the following preprint bioRxiv 2022.10.06.510405; doi: https://doi.org/10.1101/2022.10.06.510405 and downloaded from https://www.10xgenomics.com/products/xenium-in-situ/preview-dataset-human-breast (April 2023). 17 | 18 | ## Load example data 19 | 20 | First we load the example data. We already predicted the lineages and extracted the predicted lineages using GEMLI. Furthermore we load the previously generated cell type information for all cells in the dataset. 21 | 22 | ```{r, eval=T, echo=T} 23 | library(GEMLI) 24 | library(igraph) 25 | library(HiClimR) 26 | library(dplyr) 27 | library(ggplot2) 28 | library(Seurat) 29 | library(ggrepel) 30 | 31 | load('GEMLI_cancer_example_norm_count.RData') 32 | load('GEMLI_cancer_example_predicted_lineages.RData') 33 | load('GEMLI_cancer_example_cell_type_annotation.RData') 34 | 35 | ``` 36 | 37 | 38 | ## Create a GEMLI items list 39 | 40 | We then create a GEMLI items list. This list is used to store the data, and create and store the outputs of GEMLI (for details check example one). 41 | 42 | ```{r, eval=T, echo=T} 43 | 44 | GEMLI_items = list() 45 | GEMLI_items[['gene_expression']] = Cancer_norm_count 46 | GEMLI_items[['predicted_lineage_table']] = Cancer_predicted_lineages 47 | GEMLI_items[['cell_type']] = Cancer_annotation 48 | 49 | ``` 50 | 51 | 52 | ## Extract symmetric and asymmetric cell lineages 53 | 54 | We extract now predicted cell lineages with members in only one cell type (symmetric) or in two or more cell types (asymmetric). To analyze the transition from DCIS to invasive breast cancer we will extract symmetric DCIS, asymmetric DCIS and invasive tumor, and symmetric inv_tumor lineages. To exclude lineages with a too large asymmetry, we set a threshold to extract asymmetric lineages containing at least 10% of each cell type. The function output is stored in GEMLI_items 'cell_fate_analysis' item. It is a data frame with a column cell.fate with label sym or asym and cell type separated by an underscore. This cell.fate designation allows to subsequently analyze only a specific cell type in asymmetric cell lineages. 55 | 56 | ```{r, eval=T, echo=T} 57 | 58 | GEMLI_items<-extract_cell_fate_lineages(GEMLI_items, selection=c("inv_tumor", "DCIS"), unique=FALSE, threshold=c(10,10)) 59 | 60 | # A brief look at the result 61 | GEMLI_items[['cell_fate_analysis']][1:10,] 62 | table(GEMLI_items[['cell_fate_analysis']]$cell.fate) 63 | 64 | ``` 65 | 66 | ## Call and visualize DEG for cells in specific lineage types 67 | 68 | Based on the symmetric and asymmetric lineages we extracted, we will now call differentially expressed genes (DEG) specific for cells of specific cell types in specific lineages types. To analyze the transition from DCIS to invasive breast cancer, we notable call DEG for DCIS cells in asymmetric and symmetric lineages. These are genes specific to DCIS cells at the start of the transition. 69 | 70 | ```{r, eval=T, echo=T} 71 | 72 | GEMLI_items<-cell_fate_DEG_calling(GEMLI_items, ident1="sym_DCIS", ident2="asym_DCIS", min.pct=0.05, logfc.threshold=0.1) 73 | 74 | # A brief look at the result 75 | GEMLI_items[['DEG']][1:10,] 76 | 77 | # Volcano plot of the DEG analysis 78 | DEG_volcano_plot(GEMLI_items, name1="Sym_DCIS", name2="Asym_DCIS") 79 | 80 | ``` 81 | -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | # GEMLI: Gene expression memory for lineage identification 2 | 3 | GEMLI is an R package to predict cell lineages (cells with a common ancestor) from single cell RNA sequencing datasets and to call genes with a high gene expression memory on the predicted cell lineages. It is described in A.S. Eisele*, M. Tarbier*, A.A. Dormann, V. Pelechano, D.M. Suter | "Gene-expression memory-based prediction of cell lineages from scRNA-seq datasets" | Nature Communications 15, 2744 (2024). https://doi.org/10.1038/s41467-024-47158-y 4 | 5 | The approach is based on findings of Phillips et al. 2019 (doi.org/10.1038/s41467-019-09189-8) where it was shown that some genes show varying gene expression across cell lineages that is stable over multiple cell generations. 6 | 7 | ## Installation 8 | Simply run `library(devtools)` and then `install_github("UPSUTER/GEMLI", subdir="GEMLI_package_v0")`. If `devtools` is not installed yet, you can do so with `install.packages("devtools")`. GEMLI is now installed and can be used via `library(GEMLI)`. Dependencies will be installed with the package. 9 | 10 | 11 | ## Development and feedback 12 | We are still working to make GEMLI more intuitive, user-friendly, faster and versatile. Therefore existing functions will still be updated and new functionalities will be added. We'll publish a list of changes for each version for you to keep track. Your feedback is very welcome and will help us make GEMLI even better. What do you like about GEMLI? Something not working? What functions are you missing? Let us know! Contact: marcel.tarbier@scilifelab.se or almut.eisele@epfl.ch 13 | 14 | ## Example 1: small lineages in mouse embryonic stem cells 15 | 16 | For our first example we'll be looking at mouse embryonic stem cells that have been barcoded and cultured for 48h. We'll be working with a subset of this data for fast processing. In our subset we find 'family sizes' ranging from just two up to five related cells. 17 | 18 | ### Load example data 19 | First we load the example data. 20 | 21 | ``` 22 | > load('GEMLI_example_data_matrix.RData') 23 | > load('GEMLI_example_barcode_information.RData') 24 | ``` 25 | 26 | ### Create a GEMLI items list 27 | GEMLI's inputs and outputs are stored in a list of objects with predefined names. To run GEMLI you need at least a quality controlled and normalized gene expression matrix (rows = genes/features, colums = cells/samples). In this example we also provide a ground truth for lineages stemming from a barcoding experiment (values = barcode ID, names = cell IDs). 28 | 29 | ``` 30 | > GEMLI_items = list() 31 | > GEMLI_items[['gene_expression']] = data_matrix 32 | > GEMLI_items[['barcodes']] = lineage_dict_bc 33 | > 34 | > GEMLI_items[['gene_expression']][9:14,1:5] 35 | AAACGAACAGGTGTGA-1 AAAGGTAGTTGCTTGA-1 AACCACAAGTTTGTCG-1 AAGCCATGTTCCACGG-1 AAGCGAGGTACGGCAA-1 36 | ENSMUSG00000033845 14.761746 12.9570026 13.240645 8.8596794 12.791617 37 | ENSMUSG00000025903 3.163231 1.2340002 4.878132 0.7383066 0.000000 38 | ENSMUSG00000033813 6.326463 5.5530011 4.181256 8.8596794 5.482122 39 | ENSMUSG00000002459 0.000000 0.0000000 0.000000 0.0000000 0.000000 40 | ENSMUSG00000085623 0.000000 0.0000000 0.000000 0.0000000 0.000000 41 | ENSMUSG00000033793 3.163231 0.6170001 1.393752 2.2149199 5.482122 42 | > 43 | > GEMLI_items[['barcodes']][1:5] 44 | CACAGATAGTGATGGC-1 TATCTTGGTACGGGAT-1 AAACGAACAGGTGTGA-1 AGAGAATAGGTCATAA-1 GAGTGAGTCCAGTACA-1 45 | 2 2 2 7 7 46 | ``` 47 | 48 | ### Perform lineage prediction 49 | We can then identify cell lineages through repeated iterative clustering (this may take 2-3min). The `predict_lineages` function takes our GEMLI_items as input. It outputs a matrix of all cells against all cells with values corresponding to a confidence score that they are part of the same lineage. 50 | 51 | ``` 52 | > GEMLI_items = predict_lineages(GEMLI_items) 53 | > 54 | > GEMLI_items[['prediction']][1:5,15:19] 55 | AGAGAATAGGTCATAA-1 AGAGCAGCAAGTGATA-1 AGATGCTTCAAAGACA-1 AGGATCTGTATCGTTG-1 AGGGAGTAGACGATAT-1 56 | AAACGAACAGGTGTGA-1 0 0 0 0 14 57 | AAAGGTAGTTGCTTGA-1 87 0 0 0 0 58 | AACCACAAGTTTGTCG-1 0 0 0 0 39 59 | AAGCCATGTTCCACGG-1 0 0 0 0 0 60 | AAGCGAGGTACGGCAA-1 0 0 0 0 0 61 | ``` 62 | 63 | ### Test lineage prediction 64 | Since we have barcoding data for this dataset we can test the predicted lineages against our ground truth. 65 | The `test_lineage_prediction` function again takes our `GEMLI_items` as input. It's important that a predcition has been run first with `predict_lineages`. It outputs the number of true positive predictions (TP), false positive predictions (FP), as well as precision and sensitivity for various confidence intervals. The output can be visualized by setting `plot_results` to `true`/`T`. 66 | 67 | ``` 68 | > GEMLI_items = test_lineages(GEMLI_items) 69 | > 70 | > GEMLI_items$testing_results 71 | TP FP precision sensitivity 72 | 0 274 24062 0.01125904 1.0000000 73 | 10 104 128 0.44827586 0.3795620 74 | 20 90 82 0.52325581 0.3284672 75 | 30 84 56 0.60000000 0.3065693 76 | 40 80 36 0.68965517 0.2919708 77 | 50 68 22 0.75555556 0.2481752 78 | 60 64 14 0.82051282 0.2335766 79 | 70 62 10 0.86111111 0.2262774 80 | 80 58 2 0.96666667 0.2116788 81 | 90 46 0 1.00000000 0.1678832 82 | 100 34 0 1.00000000 0.1240876 83 | > 84 | > GEMLI_items = test_lineages(GEMLI_items, plot_results=T) 85 | ``` 86 | 87 |

88 | 89 |

90 | 91 | ### Visualize predictions as network 92 | We can also investigate our predictions by visualizing them as a network with the `visualize_as_network` function. Here we need to set a `cutoff` that defines which predictions we want to consider. It represents a confidence score and high values yield fewer predictions with high precision while low values yield more predictions with lower precision. 93 | Network visualization requires the `igraph` library which can be loaded with `library(igraph)`. 94 | 95 | ``` 96 | > visualize_as_network(GEMLI_items, cutoff=90) # top image 97 | > visualize_as_network(GEMLI_items, cutoff=50) # lower image 98 | ``` 99 |

100 | 101 | 102 |

103 | 104 | If a ground truth e.g. from barcoding is available we can set `ground_truth` to `true`/`T` and `highlight_FPs` to `true`/`T` to highlight false predictions with red edges. Cells without barcode information will be displayed in white. 105 | 106 | ``` 107 | > visualize_as_network(GEMLI_items, cutoff=90, ground_truth=T, highlight_FPs=T) # top image 108 | > visualize_as_network(GEMLI_items, cutoff=50, ground_truth=T, highlight_FPs=T) # lower image 109 | ``` 110 |

111 | 112 | 113 |

114 | 115 | ### Extract lineage information 116 | Now we can extract the lineage information with the `prediction_to_lineage_information` function. Again we need to set a `cutoff` that defines which predictions we want to consider. The function outputs both a lineage table and a 'dictionary', a vector that has the lineage number as values and the cell IDs as names. 117 | 118 | ``` 119 | > GEMLI_items = prediction_to_lineage_information(GEMLI_items, cutoff=50) 120 | > 121 | > GEMLI_items$predicted_lineage_table[1:5,] 122 | cell.ID clone.ID 123 | [1,] "AAACGAACAGGTGTGA-1" "1" 124 | [2,] "AAAGGTAGTTGCTTGA-1" "2" 125 | [3,] "AACCACAAGTTTGTCG-1" "3" 126 | [4,] "AAGCCATGTTCCACGG-1" "4" 127 | [5,] "AAGCGAGGTACGGCAA-1" "5" 128 | > 129 | > GEMLI_items$predicted_lineages[1:5] 130 | AAACGAACAGGTGTGA-1 AAAGGTAGTTGCTTGA-1 AACCACAAGTTTGTCG-1 AAGCCATGTTCCACGG-1 AAGCGAGGTACGGCAA-1 131 | 1 2 3 4 5 132 | ``` 133 | 134 | ### Trim lineages that are too big 135 | 136 | In some applications it may be useful to trim lineages that are too big. For instance if it is known that cells should have undergone only a certain number of divisions effectively limiting the lineage size or if you are only interested in sister cell pairs. Similarly, if you investigate large lineages you want to avoid lineages being merged due to few false predictions between otherwise well-interconnected lineages. The `suggest_network_trimming_to_size` function allows you to preview what a trimming to size would look like. It will again show the predicted lineages as networks but highlight all connections that would be trimmed given a certain size restriction (`max_size`). If you are happy with the suggested trimming you create new trimmed `GEMLI_items` list, in this example we called it `GEMLI_items_post_processed`. You can then again visualize the predictions to see how the changes have affected your predictions. 137 | ``` 138 | > suggest_network_trimming_to_size(GEMLI_items, max_size=2, cutoff=50) # left image 139 | > GEMLI_items_post_processed = trim_network_to_size(GEMLI_items, max_size=2, cutoff=50) 140 | > visualize_as_network(GEMLI_items_post_processed, cutoff=50) # right image 141 | ``` 142 |

143 | 144 | 145 |

146 | 147 | 148 | ## Example 2: large lineages in intestinal crypts 149 | 150 | In this example we'll be visualizing GEMLI results from a lineage-annotated dataset of murine intestinal crypts. Intestinal crypts originate from one or few intestinal stem cells, and are composed of a number of different cell types. In the data we find cells from individual crypts that range from just three up to forty-four related cells. The scRNA-seq dataset is derived from Bues et al. 2022 (PMID 35165449) and is publically available under Gene Expression Omnibus accession number GSE148093. 151 | 152 | ### Load example data. 153 | First we load the example data. Here we already predicted the lineages using GEMLI and therefore do not include a count matrix, but rather start with the predictions right away. 154 | 155 | ``` 156 | > load('GEMLI_crypts_example_data_matrix.RData') 157 | > load('GEMLI_crypts_example_barcode_information.RData') 158 | ``` 159 | 160 | ### Create a GEMLI items list 161 | We then create a GEMLI items list. This list is used to store the data, and create and store the outputs of GEMLI (for details check example one). 162 | 163 | ``` 164 | > GEMLI_items_crypts = list() 165 | > GEMLI_items_crypts[['prediction']] = Crypts 166 | > GEMLI_items_crypts[['barcodes']] = Crypts_bc_dict 167 | ``` 168 | 169 | ### Visualize predictions as network 170 | To visualize large lineages we'll use three different network layout algorithms: Fruchterman-Reingold, Kamada-Kawai, and grid. Each of them has advantages and disadvantages. 171 | 172 | (1) Fruchterman-Reingold dispalys the cells of individual predicted crypts close together with ample space between crypts. This makes it hard to see connections within individual crypts but allows to get a good overview of individual structures. 173 | 174 | (2) Kamada-Kawai spaces individual cells well, so we can see individual connections between them. It may, however, happen that two different predicted crypts are partially overlayed, as can be seen for dark red and bright green lineages on the right side of the plot. 175 | 176 | (3) When the network is layed out as a grid, one gets generally a good overview of the predicted lineages and their connections, but it's hard to see which connections belongs to which cell in the same row. 177 | 178 | ``` 179 | > visualize_as_network(GEMLI_items_crypts, cutoff=70, display_orphan=F, max_edge_width=1, ground_truth=T, include_labels=F, layout_style="fr") # first image 180 | > visualize_as_network(GEMLI_items_crypts, cutoff=70, display_orphan=F, max_edge_width=1, ground_truth=T, include_labels=F, layout_style="kk") # second image 181 | > visualize_as_network(GEMLI_items_crypts, cutoff=70, display_orphan=F, max_edge_width=1, ground_truth=T, include_labels=F, layout_style="grid") # third image 182 | ``` 183 | 184 |

185 | 186 | 187 | 188 |

189 | 190 | ### Adding cell type information to the GEMLI items list 191 | The cells of individual intestinal crypts can be assigned to different cell types. This information can be added to the GEMLI items list as 'cell_type' slot in the form of a dataframe with column 'cell.ID' and 'cell.type'. 192 | 193 | ``` 194 | > load('GEMLI_crypts_example_cell_type_annotation.RData') 195 | > GEMLI_items_crypts[['cell_type']] = Crypts_annotation 196 | ``` 197 | 198 | ### Color prediction network visualization by cell type 199 | The visualization of the lineage predictions can now be colored by the cell type annotation. This allows to see the composition of individual intestinal crypts. 200 | 201 | ``` 202 | > visualize_as_network(GEMLI_items_crypts, cutoff=70, max_edge_width=5, display_orphan=F, include_labels=F, ground_truth=T, highlight_FPs=T, layout_style="kk", cell_type_colors=T) 203 | 204 | ``` 205 |

206 | 207 |

208 | 209 | Specific colors can be assigned to specific cell types by adding a dataframe with column 'cell.type' and 'color' in the GEMLI items list slot 'cell_type_color'. This can also allow to highlight just one or two selected cell types. 210 | 211 | ``` 212 | > cell.type <- unique(GEMLI_items_crypts[['cell_type']]$cell.type) 213 | > color <- c("#5386BD", "skyblue1", "darkgreen", "gold", "red", "darkred", "black") 214 | > Cell_type_color <- data.frame(cell.type, color) 215 | > GEMLI_items_crypts[['cell_type_color']] = Cell_type_color 216 | > 217 | > visualize_as_network(GEMLI_items_crypts, cutoff=70, max_edge_width=5, display_orphan=F, include_labels=F, ground_truth=T, highlight_FPs=T, layout_style="kk", cell_type_colors=T) 218 | 219 | ``` 220 | 221 |

222 | 223 |

224 | 225 | ### Get overview of crypt cell type composition 226 | To get an even more quantitative overview of the cell type composition of individual intestinal crypts, we can use the function 'cell_type_composition_plot'. To do so we have to run the 'prediction_to_lineage_information' function first. 227 | 228 | ``` 229 | > GEMLI_items_crypts = prediction_to_lineage_information(GEMLI_items_crypts, cutoff=50) 230 | > cell_type_composition_plot(GEMLI_items_crypts, cell_type_colors=T, type=c("bubble")) 231 | 232 | ``` 233 |

234 | 235 |

236 | 237 | 238 | With other 'type' parameters we can output using the same function, and upsetR or plain table of the lineage numbers with different cell type compositions. 239 | 240 | ``` 241 | > cell_type_composition_plot(GEMLI_items_crypts, ground_truth=F, cell_type_colors=T, type=c("upsetR")) 242 | > cell_type_composition_plot(GEMLI_items_crypts, ground_truth=F, cell_type_colors=T, type=c("plain")) 243 | combi n 244 | Entero 1 245 | Entero__Goblet__PIC__Stem 1 246 | Entero__Goblet__PIC__Stem__TA 1 247 | Entero__PIC__Paneth__Regstem__Stem__TA 1 248 | Entero__Stem__TA 1 249 | Goblet__PIC__Regstem__Stem__TA 1 250 | Goblet__PIC__Stem__TA 1 251 | Goblet__Regstem__Stem__TA 1 252 | Goblet__Stem__TA 2 253 | PIC 3 254 | PIC__Stem__TA 2 255 | Paneth 2 256 | Paneth__Regstem__Stem 1 257 | Paneth__Stem 1 258 | Regstem__Stem 1 259 | Stem 3 260 | TA 1 261 | ``` 262 |

263 | 264 |

265 | 266 | 267 | 268 | ## Example 3: Cell fate decisions in human breast cancer 269 | 270 | For our third example we'll be looking at cell fate decisions in a scRNA-seq dataset of human breast cancer encompassing both ductal carcinoma in situ (DCIS) and invasive tumor (inv_tumor) cells. We'll be working with a subset of this data for fast processing. No ground truth is available. We will study the fate transition from DCIS to invasive tumor cells. The data is derived from a public 10X Genomics dataset associated to the following preprint bioRxiv 2022.10.06.510405; doi: https://doi.org/10.1101/2022.10.06.510405 and downloaded from https://www.10xgenomics.com/products/xenium-in-situ/preview-dataset-human-breast (April 2023). 271 | 272 | ### Load example data. 273 | First we load the example data. We already predicted the lineages and extracted the predicted lineages using GEMLI. Furthermore we load the previously generated cell type information for all cells in the dataset. 274 | 275 | ``` 276 | > load('GEMLI_cancer_example_norm_count.RData') 277 | > load('GEMLI_cancer_example_predicted_lineages.RData') 278 | > load('GEMLI_cancer_example_cell_type_annotation.RData') 279 | ``` 280 | 281 | ### Create a GEMLI items list 282 | We then create a GEMLI items list. This list is used to store the data, and create and store the outputs of GEMLI (for details check example one). 283 | 284 | ``` 285 | > GEMLI_items = list() 286 | > GEMLI_items[['gene_expression']] = Cancer_norm_count 287 | > GEMLI_items[['predicted_lineage_table']] = Cancer_predicted_lineages 288 | > GEMLI_items[['cell_type']] = Cancer_annotation 289 | ``` 290 | 291 | ### Get an overview of symmetric and asymmetric cell lineages 292 | To get an overview of the symmetric and asymmetric cell lineages present in the data, we can use the function 'cell_type_composition_plot' with parameter 'type' ="plain" or "upsetR". 293 | 294 | ``` 295 | > cell_type_composition_plot(GEMLI_items, type=c("plain")) 296 | combi n 297 | DCIS 250 298 | DCIS__inv_tumor 25 299 | inv_tumor 203 300 | > 301 | > cell_type_composition_plot(GEMLI_items, type=c("upsetR")) 302 | ``` 303 | 304 |

305 | 306 |

307 | 308 | ### Extract symmetric and asymmetric cell lineages 309 | We extract now predicted cell lineages with members in only one cell type (symmetric) or in two or more cell types (asymmetric). To analyze the transition from DCIS to invasive breast cancer we will extract symmetric DCIS, asymmetric DCIS and invasive tumor, and symmetric inv_tumor lineages. To exclude lineages with a too large asymmetry, we set a threshold to extract asymmetric lineages containing at least 10% of each cell type. The function output is stored in GEMLI_items 'cell_fate_analysis' item. It is a data frame with a column cell.fate with label sym or asym and cell type separated by an underscore. This cell.fate designation allows to subsequently analyze only a specific cell type in asymmetric cell lineages. 310 | 311 | ``` 312 | > GEMLI_items<-extract_cell_fate_lineages(GEMLI_items, selection=c("inv_tumor", "DCIS"), unique=FALSE, threshold=c(10,10)) 313 | > 314 | > GEMLI_items[['cell_fate_analysis']][1:10,] 315 | cell.ID clone.ID cell.type cell.fate 316 | AAACCCACATCCGTGG-3 2826 DCIS NA_DCIS 317 | AAACCCATCCTTATAC-4 54 DCIS asym_DCIS 318 | AAACGAACAACACGTT-2 1466 inv_tumor sym_inv_tumor 319 | AAAGAACCAACAGCTT-3 726 DCIS sym_DCIS 320 | AAAGAACGTCGAATGG-2 1467 inv_tumor sym_inv_tumor 321 | AAAGGATCAGAGTTCT-4 4383 inv_tumor sym_inv_tumor 322 | AAAGGATTCGCCAATA-4 4385 inv_tumor sym_inv_tumor 323 | AAAGGGCCAGTCGGAA-3 754 inv_tumor sym_inv_tumor 324 | AAAGGGCGTAAGAACT-1 10 inv_tumor sym_inv_tumor 325 | AAAGGGCGTAGTTCCA-1 11 inv_tumor sym_inv_tumor 326 | > 327 | > table(GEMLI_items[['cell_fate_analysis']]$cell.fate) 328 | asym_DCIS asym_inv_tumor NA_DCIS sym_DCIS sym_inv_tumor 329 | 77 96 225 273 744 330 | ``` 331 | 332 | ### Call and visualize DEG for cells in specific lineage types 333 | Based on the symmetric and asymmetric lineages we extracted, we will now call differentially expressed genes (DEG) specific for cells of specific cell types in specific lineages types. To analyze the transition from DCIS to invasive breast cancer, we notable call DEG for DCIS cells in asymmetric and symmetric lineages. These are genes specific to DCIS cells at the start of the transition. 334 | 335 | ``` 336 | > GEMLI_items<-cell_fate_DEG_calling(GEMLI_items, ident1="sym_DCIS", ident2="asym_DCIS", min.pct=0.05, logfc.threshold=0.1) 337 | > 338 | > GEMLI_items[['DEG']][1:10,] 339 | p_val avg_log2FC pct.1 pct.2 p_val_adj 340 | DCAF7 4.308339e-15 -1.0275658 0.985 1.000 1.086520e-10 341 | NRAS 2.630887e-11 -1.2821469 0.516 0.766 6.634834e-07 342 | LINC01999 2.824883e-11 -1.4817245 0.736 0.961 7.124072e-07 343 | CSDE1 7.762498e-10 -0.8055031 0.952 0.974 1.957624e-05 344 | S100A10 3.676826e-09 1.1729996 0.839 0.610 9.272588e-05 345 | MAN1A2 8.900499e-09 -0.8487466 0.868 0.961 2.244617e-04 346 | APPBP2-DT 9.767327e-09 -1.0797417 0.495 0.792 2.463222e-04 347 | PVALB 9.882215e-09 -1.1913536 0.304 0.623 2.492196e-04 348 | CDH2 1.169570e-08 -1.6410588 0.264 0.558 2.949538e-04 349 | CSTA 2.204180e-08 -1.0506112 0.198 0.506 5.558721e-04 350 | > 351 | > DEG_volcano_plot(GEMLI_items, name1="Sym_DCIS", name2="Asym_DCIS") 352 | ``` 353 | 354 |

355 | 356 |

357 | 358 | 359 | 360 | ## Citation 361 | If you use the package, please cite A.S. Eisele*, M. Tarbier*, A.A. Dormann, V. Pelechano, D.M. Suter | "Gene-expression memory-based prediction of cell lineages from scRNA-seq datasets" | Nature Communications 15, 2744 (2024). https://doi.org/10.1038/s41467-024-47158-y 362 | --------------------------------------------------------------------------------