├── Example
    ├── GEMIL_GitHub_testing.png
    ├── GEMLI_GitHub_cancer_lineage_overview_upsetR.png
    ├── GEMLI_GitHub_crypts_lineage_overview_bubble.png
    ├── GEMLI_GitHub_crypts_lineage_overview_upsetR.png
    ├── GEMLI_GitHub_crypts_network_70_cell_type_colors.png
    ├── GEMLI_GitHub_crypts_network_70_custom_cell_type_colors.png
    ├── GEMLI_GitHub_crypts_network_70_fr.png
    ├── GEMLI_GitHub_crypts_network_70_grid.png
    ├── GEMLI_GitHub_crypts_network_70_kk.png
    ├── GEMLI_GitHub_network_50.png
    ├── GEMLI_GitHub_network_50_GT.png
    ├── GEMLI_GitHub_network_50_GT_ST.png
    ├── GEMLI_GitHub_network_50_trim.png
    ├── GEMLI_GitHub_network_50_trim_GT.png
    ├── GEMLI_GitHub_network_90.png
    ├── GEMLI_GitHub_network_90_GT.png
    ├── GEMLI_GitHub_volcano_asym_inv_tumor_asym_DCIS.png
    ├── GEMLI_GitHub_volcano_sym_DCIS_asym_DCIS.png
    ├── GEMLI_cancer_example_cell_type_annotation.RData
    ├── GEMLI_cancer_example_norm_count.RData
    ├── GEMLI_cancer_example_predicted_lineages.RData
    ├── GEMLI_crypts_example_barcode_information.RData
    ├── GEMLI_crypts_example_cell_type_annotation.RData
    ├── GEMLI_crypts_example_data_matrix.RData
    ├── GEMLI_example_barcode_information.RData
    ├── GEMLI_example_data_matrix.RData
    └── Scheme_Phillips.png
├── GEMLI_package_v0
    ├── DESCRIPTION
    ├── GEMLI.Rproj
    ├── NAMESPACE
    ├── R
    │   ├── DEG_volcano_plot.R
    │   ├── calculate_correlations.R
    │   ├── cell_fate_DEG_calling.R
    │   ├── cell_type_composition_plot.R
    │   ├── cluster_stability_plot.R
    │   ├── extract_cell_fate_lineages.R
    │   ├── memory_gene_calling.R
    │   ├── potential_markers.R
    │   ├── predict_lineages.R
    │   ├── predict_lineages_multiple_sizes.R
    │   ├── predict_lineages_with_known_markers.R
    │   ├── prediction_to_lineage_information.R
    │   ├── quantify_clusters_iterative.R
    │   ├── suggest_network_trimming_to_size.R
    │   ├── test_lineages.R
    │   ├── trim_network_to_size.R
    │   └── visualize_as_network.R
    ├── man
    │   ├── DEG_volcano_plot.Rd
    │   ├── calculate_correlations.Rd
    │   ├── cell_fate_DEG_calling.Rd
    │   ├── cell_type_composition_plot.Rd
    │   ├── cluster_stability_plot.Rd
    │   ├── extract_cell_fate_lineages.Rd
    │   ├── memory_gene_calling.Rd
    │   ├── potential_markers.Rd
    │   ├── predict_lineages.Rd
    │   ├── predict_lineages_multiple_sizes.Rd
    │   ├── predict_lineages_with_known_markers.Rd
    │   ├── prediction_to_lineage_information.Rd
    │   ├── quantify_clusters_iterative.Rd
    │   ├── suggest_network_trimming_to_size.Rd
    │   ├── test_lineages.Rd
    │   ├── trim_network_to_size.Rd
    │   └── visualize_as_network.Rd
    └── vignettes
    │   ├── Example1_simple_lineage_predictions.Rmd
    │   ├── Example2_predicting_multicellular_structures.Rmd
    │   └── Example3_cell_fate_analysis.Rmd
└── README.md


/Example/GEMIL_GitHub_testing.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/UPSUTER/GEMLI/fe4eb91d16aadec3875ef266940b2703b052a30a/Example/GEMIL_GitHub_testing.png


--------------------------------------------------------------------------------
/Example/GEMLI_GitHub_cancer_lineage_overview_upsetR.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/UPSUTER/GEMLI/fe4eb91d16aadec3875ef266940b2703b052a30a/Example/GEMLI_GitHub_cancer_lineage_overview_upsetR.png


--------------------------------------------------------------------------------
/Example/GEMLI_GitHub_crypts_lineage_overview_bubble.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/UPSUTER/GEMLI/fe4eb91d16aadec3875ef266940b2703b052a30a/Example/GEMLI_GitHub_crypts_lineage_overview_bubble.png


--------------------------------------------------------------------------------
/Example/GEMLI_GitHub_crypts_lineage_overview_upsetR.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/UPSUTER/GEMLI/fe4eb91d16aadec3875ef266940b2703b052a30a/Example/GEMLI_GitHub_crypts_lineage_overview_upsetR.png


--------------------------------------------------------------------------------
/Example/GEMLI_GitHub_crypts_network_70_cell_type_colors.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/UPSUTER/GEMLI/fe4eb91d16aadec3875ef266940b2703b052a30a/Example/GEMLI_GitHub_crypts_network_70_cell_type_colors.png


--------------------------------------------------------------------------------
/Example/GEMLI_GitHub_crypts_network_70_custom_cell_type_colors.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/UPSUTER/GEMLI/fe4eb91d16aadec3875ef266940b2703b052a30a/Example/GEMLI_GitHub_crypts_network_70_custom_cell_type_colors.png


--------------------------------------------------------------------------------
/Example/GEMLI_GitHub_crypts_network_70_fr.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/UPSUTER/GEMLI/fe4eb91d16aadec3875ef266940b2703b052a30a/Example/GEMLI_GitHub_crypts_network_70_fr.png


--------------------------------------------------------------------------------
/Example/GEMLI_GitHub_crypts_network_70_grid.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/UPSUTER/GEMLI/fe4eb91d16aadec3875ef266940b2703b052a30a/Example/GEMLI_GitHub_crypts_network_70_grid.png


--------------------------------------------------------------------------------
/Example/GEMLI_GitHub_crypts_network_70_kk.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/UPSUTER/GEMLI/fe4eb91d16aadec3875ef266940b2703b052a30a/Example/GEMLI_GitHub_crypts_network_70_kk.png


--------------------------------------------------------------------------------
/Example/GEMLI_GitHub_network_50.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/UPSUTER/GEMLI/fe4eb91d16aadec3875ef266940b2703b052a30a/Example/GEMLI_GitHub_network_50.png


--------------------------------------------------------------------------------
/Example/GEMLI_GitHub_network_50_GT.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/UPSUTER/GEMLI/fe4eb91d16aadec3875ef266940b2703b052a30a/Example/GEMLI_GitHub_network_50_GT.png


--------------------------------------------------------------------------------
/Example/GEMLI_GitHub_network_50_GT_ST.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/UPSUTER/GEMLI/fe4eb91d16aadec3875ef266940b2703b052a30a/Example/GEMLI_GitHub_network_50_GT_ST.png


--------------------------------------------------------------------------------
/Example/GEMLI_GitHub_network_50_trim.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/UPSUTER/GEMLI/fe4eb91d16aadec3875ef266940b2703b052a30a/Example/GEMLI_GitHub_network_50_trim.png


--------------------------------------------------------------------------------
/Example/GEMLI_GitHub_network_50_trim_GT.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/UPSUTER/GEMLI/fe4eb91d16aadec3875ef266940b2703b052a30a/Example/GEMLI_GitHub_network_50_trim_GT.png


--------------------------------------------------------------------------------
/Example/GEMLI_GitHub_network_90.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/UPSUTER/GEMLI/fe4eb91d16aadec3875ef266940b2703b052a30a/Example/GEMLI_GitHub_network_90.png


--------------------------------------------------------------------------------
/Example/GEMLI_GitHub_network_90_GT.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/UPSUTER/GEMLI/fe4eb91d16aadec3875ef266940b2703b052a30a/Example/GEMLI_GitHub_network_90_GT.png


--------------------------------------------------------------------------------
/Example/GEMLI_GitHub_volcano_asym_inv_tumor_asym_DCIS.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/UPSUTER/GEMLI/fe4eb91d16aadec3875ef266940b2703b052a30a/Example/GEMLI_GitHub_volcano_asym_inv_tumor_asym_DCIS.png


--------------------------------------------------------------------------------
/Example/GEMLI_GitHub_volcano_sym_DCIS_asym_DCIS.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/UPSUTER/GEMLI/fe4eb91d16aadec3875ef266940b2703b052a30a/Example/GEMLI_GitHub_volcano_sym_DCIS_asym_DCIS.png


--------------------------------------------------------------------------------
/Example/GEMLI_cancer_example_cell_type_annotation.RData:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/UPSUTER/GEMLI/fe4eb91d16aadec3875ef266940b2703b052a30a/Example/GEMLI_cancer_example_cell_type_annotation.RData


--------------------------------------------------------------------------------
/Example/GEMLI_cancer_example_norm_count.RData:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/UPSUTER/GEMLI/fe4eb91d16aadec3875ef266940b2703b052a30a/Example/GEMLI_cancer_example_norm_count.RData


--------------------------------------------------------------------------------
/Example/GEMLI_cancer_example_predicted_lineages.RData:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/UPSUTER/GEMLI/fe4eb91d16aadec3875ef266940b2703b052a30a/Example/GEMLI_cancer_example_predicted_lineages.RData


--------------------------------------------------------------------------------
/Example/GEMLI_crypts_example_barcode_information.RData:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/UPSUTER/GEMLI/fe4eb91d16aadec3875ef266940b2703b052a30a/Example/GEMLI_crypts_example_barcode_information.RData


--------------------------------------------------------------------------------
/Example/GEMLI_crypts_example_cell_type_annotation.RData:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/UPSUTER/GEMLI/fe4eb91d16aadec3875ef266940b2703b052a30a/Example/GEMLI_crypts_example_cell_type_annotation.RData


--------------------------------------------------------------------------------
/Example/GEMLI_crypts_example_data_matrix.RData:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/UPSUTER/GEMLI/fe4eb91d16aadec3875ef266940b2703b052a30a/Example/GEMLI_crypts_example_data_matrix.RData


--------------------------------------------------------------------------------
/Example/GEMLI_example_barcode_information.RData:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/UPSUTER/GEMLI/fe4eb91d16aadec3875ef266940b2703b052a30a/Example/GEMLI_example_barcode_information.RData


--------------------------------------------------------------------------------
/Example/GEMLI_example_data_matrix.RData:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/UPSUTER/GEMLI/fe4eb91d16aadec3875ef266940b2703b052a30a/Example/GEMLI_example_data_matrix.RData


--------------------------------------------------------------------------------
/Example/Scheme_Phillips.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/UPSUTER/GEMLI/fe4eb91d16aadec3875ef266940b2703b052a30a/Example/Scheme_Phillips.png


--------------------------------------------------------------------------------
/GEMLI_package_v0/DESCRIPTION:
--------------------------------------------------------------------------------
 1 | Package: GEMLI
 2 | Type: Package
 3 | Title: Gene expression memory based lineage inference
 4 | Version: 0.1.0
 5 | Author: Who wrote it
 6 | Maintainer: The package maintainer <yourself@somewhere.net>
 7 | Description: Uses general characteristics of genes with lineage-specific expression 
 8 |     to predict cell lineages in scRNA-seq datasets. 
 9 | URL: https://github.com/UPSUTER/GEMLI
10 | Imports: 
11 |     igraph (>= 1.4.3),
12 |     HiClimR (>= 2.2.1),
13 |     dplyr (>= 1.1.2),
14 |     Seurat (>= 4.3.0),
15 |     ggplot2 (>= 3.4.2),
16 |     ggrepel (>= 0.9.3),
17 |     clustree (>= 0.5.0),
18 |     UpSetR (>= 1.4.0),
19 |     reshape (>= 0.8.9),
20 |     tidyr (>= 1.3.0)
21 | License: What license is it under?
22 | Encoding: UTF-8
23 | LazyData: true
24 | 


--------------------------------------------------------------------------------
/GEMLI_package_v0/GEMLI.Rproj:
--------------------------------------------------------------------------------
 1 | Version: 1.0
 2 | 
 3 | RestoreWorkspace: Default
 4 | SaveWorkspace: Default
 5 | AlwaysSaveHistory: Default
 6 | 
 7 | EnableCodeIndexing: Yes
 8 | UseSpacesForTab: Yes
 9 | NumSpacesForTab: 2
10 | Encoding: UTF-8
11 | 
12 | RnwWeave: Sweave
13 | LaTeX: pdfLaTeX
14 | 
15 | AutoAppendNewline: Yes
16 | StripTrailingWhitespace: Yes
17 | 
18 | BuildType: Package
19 | PackageUseDevtools: Yes
20 | PackageInstallArgs: --no-multiarch --with-keep.source
21 | 


--------------------------------------------------------------------------------
/GEMLI_package_v0/NAMESPACE:
--------------------------------------------------------------------------------
1 | exportPattern("^[[:alpha:]]+")
2 | 


--------------------------------------------------------------------------------
/GEMLI_package_v0/R/DEG_volcano_plot.R:
--------------------------------------------------------------------------------
 1 | DEG_volcano_plot<-function(GEMLI_items, name1, name2){
 2 | DEG<-GEMLI_items[['DEG']]
 3 | DEG$change = ifelse(DEG$p_val_adj <= 0.05 & abs(DEG$avg_log2FC) >= 0.5, ifelse(DEG$avg_log2FC> 0.5 ,name1,name2),'Stable')
 4 | DEG$label=rownames(DEG)
 5 | plt<-ggplot(data = DEG, aes(x = avg_log2FC , y = -log10(p_val_adj), colour=change, label=label)) +
 6 |   geom_point(alpha=0.4, size=3.5)+
 7 |   xlim(c(-4.5, 4.5)) +
 8 |   scale_color_manual(values=c("#5386BD", "darkred","grey"))+
 9 |   geom_vline(xintercept=c(-0.5,0.5),lty=4,col="black",lwd=0.8) +
10 |   geom_hline(yintercept = 1.301,lty=4,col="black",lwd=0.8)+
11 |   labs(x="log2(fold change)",y="-log10 (p-value)", title=paste0("DEG"," ",name1," vs ",name2)) +
12 |   theme_bw()+
13 |   theme(plot.title = element_text(hjust = 0.5),
14 |         legend.position="right",
15 |         legend.title = element_blank()) +
16 |   geom_text_repel(data = subset(DEG, avg_log2FC >= 0.5 | avg_log2FC < -0.5), aes(label = label), max.overlaps = 15)
17 | suppressWarnings(print(plt))
18 | }
19 | 


--------------------------------------------------------------------------------
/GEMLI_package_v0/R/calculate_correlations.R:
--------------------------------------------------------------------------------
 1 | calculate_correlations <- function(data, fast = FALSE)
 2 | {
 3 |   tmp = t(apply(data, 1, rank)); tmp = as.matrix(tmp); tmp = tmp - rowMeans(tmp); tmp = tmp / sqrt(rowSums(tmp^2))
 4 |   if(fast == TRUE){
 5 |   r = fastCor(t(tmp), nSplit = 10, upperTri = TRUE, optBLAS = TRUE, verbose=FALSE)}
 6 |   else{
 7 |   r = tcrossprod(tmp)}
 8 |   diag(r) <- 0
 9 |   return(r)
10 | }
11 | 


--------------------------------------------------------------------------------
/GEMLI_package_v0/R/cell_fate_DEG_calling.R:
--------------------------------------------------------------------------------
 1 | cell_fate_DEG_calling<-function(GEMLI_items, ident1, ident2, min.pct=0.05, logfc.threshold=0.1)
 2 | {
 3 |   GEMLI_Seurat<-CreateSeuratObject(GEMLI_items[['gene_expression']], project = "SeuratProject", assay = "RNA")
 4 |   Metadata<-GEMLI_items[['cell_fate_analysis']];Metadata$ident<-NA
 5 |   Metadata$ident[Metadata$cell.fate %in% ident1]<-"ident1"
 6 |   Metadata$ident[Metadata$cell.fate%in%ident2]<-"ident2"
 7 |   Meta<-as.data.frame(Metadata[,c(5)]); rownames(Meta)<-Metadata$cell.ID; colnames(Meta)<-c("cell.fate")
 8 |   GEMLI_Seurat<-AddMetaData(GEMLI_Seurat, Meta, col.name = NULL); DefaultAssay(object = GEMLI_Seurat) <- "RNA"; Idents(GEMLI_Seurat) <- GEMLI_Seurat$cell.fate
 9 |   DEG <- FindMarkers(object = GEMLI_Seurat, ident.1 = "ident1", ident.2 = "ident2", min.pct =min.pct, logfc.threshold = logfc.threshold)
10 |   GEMLI_items[['DEG']]<-DEG
11 |   return(GEMLI_items)
12 | }
13 | 


--------------------------------------------------------------------------------
/GEMLI_package_v0/R/cell_type_composition_plot.R:
--------------------------------------------------------------------------------
 1 | cell_type_composition_plot <- function(GEMLI_items, ground_truth=F, cell_type_colors=F, type, intersections=NULL)
 2 | {
 3 |   base_colors = rep(c('#a50026','#d73027','#f46d43','#fdae61','#fee090','#e0f3f8','#abd9e9','#74add1','#4575b4','#313695','#40004b','#762a83','#9970ab','#c2a5cf','#e7d4e8','#d9f0d3','#a6dba0','#5aae61','#1b7837','#00441b','#543005','#8c510a','#bf812d','#dfc27d','#f6e8c3','#e0e0e0','#bababa','#878787','#4d4d4d','#1a1a1a'), 100)
 4 |   if (cell_type_colors==F){
 5 |     cell.type<-unique(GEMLI_items[['cell_type']]$cell.type)
 6 |     color<-base_colors[rank(unique(GEMLI_items[['cell_type']]$cell.type))]
 7 |     GEMLI_items[['cell_type_color']] = data.frame(cell.type, color)} 
 8 |   
 9 |  if (ground_truth){cell.ID<-names(GEMLI_items[['barcodes']]); clone.ID<-unname(GEMLI_items[['barcodes']]); GT<-as.data.frame(cbind(clone.ID,cell.ID));
10 |   Lookup<-merge(GT, GEMLI_items[['cell_type']], by="cell.ID", all=TRUE)} else {
11 |     Lookup<-merge(as.data.frame(GEMLI_items[['predicted_lineage_table']]), GEMLI_items[['cell_type']], by="cell.ID", all=TRUE)
12 |     }
13 |   
14 |   if (type == "bubble"){
15 |     Lookup <- Lookup %>% group_by(clone.ID, cell.type) %>% summarise(cnt = n()) %>% mutate(freq = round(cnt / sum(cnt), 3)); Lookup <- reshape::cast(Lookup, clone.ID~cell.type, value="freq"); 
16 |     base_colors = GEMLI_items[['cell_type_color']]$color[match(colnames(Lookup[,2:length(Lookup)]),GEMLI_items[['cell_type_color']]$cell.type)]
17 |     p<-Lookup %>% gather(cell.type, percentage, -clone.ID)%>% ggplot(group=cell.type) + geom_point(aes(x = cell.type, y = clone.ID, size = percentage, col= cell.type))+ theme_classic()+ scale_colour_manual(values = base_colors)}
18 |   
19 |   if (type == "upsetR"){
20 |     Lookup_list <- split(Lookup$clone.ID, Lookup$cell.type)
21 |     p<-upset(fromList(Lookup_list), order.by = "freq", nsets = length(Lookup_list), 
22 |           sets.x.label = "Lineages in cell type", mainbar.y.label = "Number of lineages",
23 |           nintersects = NA, intersections = NULL, point.size=5, mb.ratio = c(0.5, 0.5), text.scale = 2,
24 |           set_size.show = TRUE, set_size.numbers_size = 7, set_size.scale_max = length(unique(Lookup$clone.ID)))}
25 | 
26 |   if (type == "plain"){
27 |     Lookup<-unique(Lookup[,-c(1)])
28 |     p<-Lookup %>% group_by(clone.ID) %>% arrange(clone.ID, cell.type) %>% summarize(combi = paste0(cell.type, collapse = "__"), .groups = "drop") %>% count(combi)}
29 |   
30 |   return(p)
31 |   }
32 | 


--------------------------------------------------------------------------------
/GEMLI_package_v0/R/cluster_stability_plot.R:
--------------------------------------------------------------------------------
1 | cluster_stability_plot <- function(GEMLI_items) # check
2 | {
3 |   data_matrix<-GEMLI_items[['prediction_multiple_sizes']]
4 |   clustree<-clustree(data_matrix, prefix = "K")
5 |   clustree<-clustree[["data"]][which(clustree[["data"]]$size != 1),]
6 |   plot(clustree$K, clustree$sc3_stability, xlab="lineage size", ylab="clustree_stability_index")
7 | }
8 | 


--------------------------------------------------------------------------------
/GEMLI_package_v0/R/extract_cell_fate_lineages.R:
--------------------------------------------------------------------------------
 1 | extract_cell_fate_lineages<- function(GEMLI, selection, unique=FALSE, threshold)
 2 | {
 3 |   Lookup<-merge(as.data.frame(GEMLI[['predicted_lineage_table']]), GEMLI[['cell_type']], by="cell.ID", all=TRUE)
 4 |   if (unique){ 
 5 |     Lookup$cell.fate<-NA
 6 |     Lookup<-Lookup %>% group_by(clone.ID) %>% mutate(cell.fate=case_when((n_distinct(cell.type)==length(selection)& all(cell.type %in% selection)& is.na(clone.ID)==FALSE)~ "asym", (n_distinct(cell.type)==1& all(cell.type %in% selection) & n_distinct(cell.ID)>1& is.na(clone.ID)==FALSE)~"sym"))
 7 |   } else {
 8 |     Lookup2<-Lookup[Lookup$cell.type %in% selection, ]
 9 |     Lookup2<-Lookup2 %>% group_by(clone.ID) %>% mutate(cell.fate=case_when((n_distinct(cell.type)==length(selection)& all(cell.type %in% selection)& is.na(clone.ID)==FALSE)~ "asym", (n_distinct(cell.type)==1& all(cell.type %in% selection) & n_distinct(cell.ID)>1& is.na(clone.ID)==FALSE)~"sym"))
10 |     Lookup2<-Lookup2[,c(1,4)]; Lookup<-merge(Lookup, Lookup2, by=c("cell.ID" ), all=TRUE)
11 |   }
12 |   # filter by threshold
13 |   Lookup <- Lookup %>% group_by(cell.fate, clone.ID) %>% mutate(cnt = n()); Lookup<-Lookup%>% group_by(cell.fate, clone.ID, cell.type) %>% mutate(per= (n()/cnt)*100) 
14 |   for(i in 1:length(selection)){Lookup <- Lookup %>% group_by(cell.fate, clone.ID) %>% mutate(cell.fate=case_when(((cell.fate=="asym") & (cell.type==selection[i] ) & (per < threshold[i]))~"filtered", TRUE~cell.fate))}
15 |   Lookup<-Lookup %>% group_by(clone.ID) %>% mutate(cell.fate=case_when(any(cell.fate=="filtered")~NA, TRUE~cell.fate))
16 |   Lookup$cell.fate <- paste(Lookup$cell.fate,Lookup$cell.type,sep = "_")
17 |   GEMLI[['cell_fate_analysis']]<-Lookup[,1:4]
18 |   return(GEMLI)
19 | } 
20 | 


--------------------------------------------------------------------------------
/GEMLI_package_v0/R/memory_gene_calling.R:
--------------------------------------------------------------------------------
 1 | memory_gene_calling <- function(GEMLI_items, valid_lineage_sizes=(2:5), use_median=T, ground_truth=F, cell_fate)
 2 | {
 3 |   markers_by_cvsq_of_lineage_means <- function(data_matrix, lineage_dict, valid_lineage_sizes=(2:5), use_median=T)
 4 |   {
 5 |     cv_sq <- function(data_matrix)
 6 |     {
 7 |       sd = apply(data_matrix, 1, sd,  na.rm = TRUE)
 8 |       mean = apply(data_matrix, 1, mean,  na.rm = TRUE)
 9 |       noise = (sd/mean)**2
10 |       return(noise)
11 |     }
12 |     lineage_dict_filt = lineage_dict[intersect(colnames(data_matrix), names(lineage_dict))]
13 |     valid_lineage_dict = lineage_dict_filt[as.character(lineage_dict_filt) %in% names(table(lineage_dict_filt))[table(lineage_dict_filt) %in% valid_lineage_sizes]]
14 |     lineage_center = matrix(NA, ncol=length(unique(valid_lineage_dict)), nrow=nrow(data_matrix)); colnames(lineage_center) = unique(valid_lineage_dict); rownames(lineage_center) = rownames(data_matrix)
15 |     for (lineage in as.character(unique(valid_lineage_dict)))
16 |     {
17 |       if (use_median){lineage_center[,lineage] = apply(data_matrix[,names(valid_lineage_dict)[valid_lineage_dict==lineage]], 1, quantile, probs = 0.5,  na.rm = TRUE)}
18 |       else {lineage_center[,lineage] = rowMeans(data_matrix[,names(valid_lineage_dict)[valid_lineage_dict==lineage]])}
19 |     }
20 |     x = rowMeans(lineage_center); y = cv_sq(lineage_center); x = log2(x); y = log2(y)
21 |     filter = is.na(x) | is.na(y) | is.infinite(x) | is.infinite(y) | (x==0); x=x[!filter]; y=y[!filter]; loess_means = loess(y ~ x, span=0.75, control=loess.control(surface="direct"))
22 |     filter = names(which(!filter))
23 |     lineage_center_variation = loess_means$residuals
24 |     return(sort(lineage_center_variation[filter], decreasing=T))
25 |   }
26 |   data_matrix = GEMLI_items[['gene_expression']]
27 |   
28 |    if (ground_truth) {lineage_dict = GEMLI_items[['barcodes']]} else {
29 |     if (length(GEMLI_items[['predicted_lineages']])>0){lineage_dict = GEMLI_items[['predicted_lineages']]} else {
30 |       lineage_dict = GEMLI_items[['predicted_lineage_table']]$clone.ID; names(lineage_dict) = GEMLI_items[['predicted_lineage_table']]$cell.ID}}
31 |   if (hasArg(cell_fate)){match<-GEMLI_items[['cell_fate_analysis']][GEMLI_items[['cell_fate_analysis']]$cell.fate ==cell_fate,]
32 |     lineage_dict=lineage_dict[names(lineage_dict)%in% match$cell.ID]}
33 |   
34 |   lineage_center_variation = markers_by_cvsq_of_lineage_means(data_matrix, lineage_dict, valid_lineage_sizes=valid_lineage_sizes, use_median=use_median)
35 | 
36 |   data_matrix_control = matrix(NA, ncol=20, nrow=nrow(data_matrix)); rownames(data_matrix_control) = rownames(data_matrix)
37 |   for (i in c(1:20))
38 |   {
39 |     lineage_dict_sampled = lineage_dict; names(lineage_dict_sampled) = sample(names(lineage_dict))
40 |     tmp = markers_by_cvsq_of_lineage_means(as.matrix(data_matrix), lineage_dict_sampled, valid_lineage_sizes=valid_lineage_sizes, use_median=use_median)
41 |     data_matrix_control[names(tmp),i] = tmp
42 |   }
43 |   markers_pvalue = rowSums(data_matrix_control[intersect(rownames(data_matrix_control), names(lineage_center_variation)),]>lineage_center_variation[intersect(rownames(data_matrix_control), names(lineage_center_variation))], na.rm=T)/20
44 | 
45 |   shared_genes = intersect(names(lineage_center_variation), names(markers_pvalue))
46 |   marker_table = data.frame(cbind(lineage_center_variation[shared_genes], markers_pvalue[shared_genes])); rownames(marker_table) = shared_genes; colnames(marker_table) = c("var","p")
47 |   marker_table = marker_table[with(marker_table, order(p, -var)),]
48 | 
49 |   GEMLI_items[["memory_genes"]] = marker_table
50 |   return(GEMLI_items)
51 | }
52 | 


--------------------------------------------------------------------------------
/GEMLI_package_v0/R/potential_markers.R:
--------------------------------------------------------------------------------
 1 | potential_markers <- function(data_matrix) # check
 2 | {
 3 |   means = rowMeans(data_matrix)
 4 |   variation = (apply(data_matrix, 1, sd,  na.rm=T) / apply(data_matrix, 1, mean,  na.rm=T))**2
 5 |   filter = names(which(!(is.na(means) | is.na(variation) | is.infinite(means) | is.infinite(variation) | (means==0)))); x=means[filter]; y=variation[filter]
 6 |   linear_fit = lm(log2(y) ~ log2(x))
 7 |   variation_residuals = residuals(linear_fit)
 8 |   means = means[filter]
 9 |   mean_quantiles = quantile(means, seq(0.01,1,0.01))
10 |   variation_quantiles = quantile(variation_residuals, seq(0.01,1,0.01))
11 |   memory_genes = names(which((means>=mean_quantiles[98]) | (means>=mean_quantiles[90] & variation_residuals>=variation_quantiles[40]) | (means>=mean_quantiles[80] & variation_residuals>=variation_quantiles[80]) |(means>=mean_quantiles[60] & variation_residuals>=variation_quantiles[90])))
12 |   return(memory_genes)
13 | }
14 | 


--------------------------------------------------------------------------------
/GEMLI_package_v0/R/predict_lineages.R:
--------------------------------------------------------------------------------
 1 | predict_lineages <- function(GEMLI_items, repetitions=100, sample_size=(2/3), desired_cluster_size=c(2,3), fast=FALSE) # check
 2 | {
 3 |   data_matrix = GEMLI_items[['gene_expression']]
 4 |   marker_genes = potential_markers(data_matrix)
 5 |   results = data.matrix(matrix(0, nrow=ncol(data_matrix), ncol=ncol(data_matrix))); rownames(results) = colnames(data_matrix); colnames(results) = colnames(data_matrix)
 6 |   for (i in seq(1,repetitions))
 7 |   {
 8 |     marker_genes_sample = sample(intersect(marker_genes, rownames(data_matrix)), round(length(intersect(marker_genes, rownames(data_matrix)))*sample_size,0))
 9 |     cell_clusters = quantify_clusters_iterative(data_matrix, marker_genes_sample, N=2, fast)
10 |     cell_clusters_unique_name = cell_clusters; for (colname in 1:ncol(cell_clusters)){cell_clusters_unique_name[!is.na(cell_clusters_unique_name[,colname]),colname] = paste0(colname,'_',cell_clusters_unique_name[!is.na(cell_clusters_unique_name[,colname]),colname])}
11 |     clustersize_dict = table(cell_clusters_unique_name)
12 |     smallest_clusters = names(clustersize_dict)[clustersize_dict %in% desired_cluster_size]
13 |     best_prediction = data.matrix(matrix(F, nrow=ncol(data_matrix), ncol=ncol(data_matrix))); rownames(best_prediction) = colnames(data_matrix); colnames(best_prediction) = colnames(data_matrix)
14 |     for (cluster in smallest_clusters){cells_in_cluster = rownames(best_prediction)[rowSums(cell_clusters_unique_name==cluster, na.rm=T)>0]; best_prediction[cells_in_cluster,cells_in_cluster] <- T}
15 |     diag(best_prediction) = F
16 |     results = results + best_prediction
17 |   }
18 |   GEMLI_items[["prediction"]] = results
19 |   return(GEMLI_items)
20 | }
21 | 


--------------------------------------------------------------------------------
/GEMLI_package_v0/R/predict_lineages_multiple_sizes.R:
--------------------------------------------------------------------------------
 1 | predict_lineages_multiple_sizes <- function(GEMLI_items, repetitions=10, sample_size=(2/3), minimal_maximal_cluster_size=c(2,50), cutoff=5) # check
 2 | {
 3 |   # split out the minimal_maximal_cluster_size into vectors encompassing all combis between the min and max value
 4 |   # for each of the vectors then generate the lineage prediction and combine them into one table
 5 |   desired_sizes<-list()
 6 |   for (m in 2:minimal_maximal_cluster_size[2]){desired_sizes[[m-1]]<-c(1:m)}
 7 |   GEMLI_prediction_list<-list()
 8 |   for (j in 1:length(desired_sizes)) {
 9 |   progress((100*j)/length(desired_sizes))
10 |   sub_desired_cluster_size<-desired_sizes[[j]]
11 |   data_matrix = GEMLI_items[['gene_expression']]
12 |   marker_genes = potential_markers(data_matrix)
13 |   results = data.matrix(matrix(0, nrow=ncol(data_matrix), ncol=ncol(data_matrix))); rownames(results) = colnames(data_matrix); colnames(results) = colnames(data_matrix)
14 |   for (i in seq(1,repetitions))
15 |   {
16 |     marker_genes_sample = sample(intersect(marker_genes, rownames(data_matrix)), round(length(intersect(marker_genes, rownames(data_matrix)))*sample_size,0))
17 |     cell_clusters = quantify_clusters_iterative(data_matrix, marker_genes_sample, N=2)
18 |     cell_clusters_unique_name = cell_clusters; for (colname in 1:ncol(cell_clusters)){cell_clusters_unique_name[!is.na(cell_clusters_unique_name[,colname]),colname] = paste0(colname,'_',cell_clusters_unique_name[!is.na(cell_clusters_unique_name[,colname]),colname])}
19 |     clustersize_dict = table(cell_clusters_unique_name)
20 |     
21 |     # This is where the desired_cluster_size comes into play
22 |     smallest_clusters = names(clustersize_dict)[clustersize_dict %in% sub_desired_cluster_size]
23 |     best_prediction = data.matrix(matrix(F, nrow=ncol(data_matrix), ncol=ncol(data_matrix))); rownames(best_prediction) = colnames(data_matrix); colnames(best_prediction) = colnames(data_matrix)
24 |     for (cluster in smallest_clusters){cells_in_cluster = rownames(best_prediction)[rowSums(cell_clusters_unique_name==cluster, na.rm=T)>0]; best_prediction[cells_in_cluster,cells_in_cluster] <- T}
25 |     diag(best_prediction) = F
26 |     results = results + best_prediction
27 |   }
28 |   # Transform the prediction matrix into a lineage info
29 |   network = (results >= cutoff)
30 |   network_edges = as.matrix(network)
31 |   network_graph = igraph::graph.adjacency(network_edges, mode="undirected", weighted=NULL)
32 |   prediction_dict = igraph::clusters(network_graph)$membership
33 |   family_table = cbind(names(prediction_dict), as.vector(prediction_dict))
34 |   colnames(family_table) = c("cell.ID", "clone.ID")
35 |   GEMLI_prediction_list[[j]] = family_table
36 |   }
37 |   
38 |   # combine in one dataframe that can be used for cluster stability calculation
39 |   for(u in 1:length(GEMLI_prediction_list)){colnames(GEMLI_prediction_list[[u]]) <- c("cell.ID",paste0("K", u)) }
40 |   GEMLI_prediction_multiple_sizes_output <- Reduce(function(x,y)merge(x,y,by="cell.ID"), GEMLI_prediction_list)
41 |   GEMLI_items[['prediction_multiple_sizes']]<-GEMLI_prediction_multiple_sizes_output
42 |   return(GEMLI_items)
43 | }
44 | 


--------------------------------------------------------------------------------
/GEMLI_package_v0/R/predict_lineages_with_known_markers.R:
--------------------------------------------------------------------------------
 1 | predict_lineages_with_known_markers <- function(GEMLI_items, repetitions=100, sample_size=(2/3), desired_cluster_size=c(2,3), fast=FALSE)
 2 | {
 3 |   norm_data = norm_data = GEMLI_items[['gene_expression']]
 4 |   marker_genes = GEMLI_items[['known_markers']]
 5 |   results = data.matrix(matrix(0, nrow=ncol(norm_data), ncol=ncol(norm_data))); rownames(results) = colnames(norm_data); colnames(results) = colnames(norm_data)
 6 |   for (i in seq(1,repetitions))
 7 |   {
 8 |     marker_genes_sample = sample(intersect(marker_genes, rownames(norm_data)), round(length(intersect(marker_genes, rownames(norm_data)))*sample_size,0))
 9 |     cell_clusters = quantify_clusters_iterative(norm_data, marker_genes_sample, N=2, fast=FALSE)
10 |     cell_clusters_unique_name = cell_clusters; for (colname in 1:ncol(cell_clusters)){cell_clusters_unique_name[!is.na(cell_clusters_unique_name[,colname]),colname] = paste0(colname,'_',cell_clusters_unique_name[!is.na(cell_clusters_unique_name[,colname]),colname])}
11 |     clustersize_dict = table(cell_clusters_unique_name)
12 |     smallest_clusters = names(clustersize_dict)[clustersize_dict %in% desired_cluster_size]
13 |     best_prediction = data.matrix(matrix(F, nrow=ncol(norm_data), ncol=ncol(norm_data))); rownames(best_prediction) = colnames(norm_data); colnames(best_prediction) = colnames(norm_data)
14 |     for (cluster in smallest_clusters){cells_in_cluster = rownames(best_prediction)[rowSums(cell_clusters_unique_name==cluster, na.rm=T)>0]; best_prediction[cells_in_cluster,cells_in_cluster] <- T}
15 |     diag(best_prediction) = F
16 |     results = results + best_prediction
17 |   }
18 |   GEMLI_items[["prediction"]] = results
19 |   return(GEMLI_items)
20 | }
21 | 


--------------------------------------------------------------------------------
/GEMLI_package_v0/R/prediction_to_lineage_information.R:
--------------------------------------------------------------------------------
 1 | library(igraph)
 2 | prediction_to_lineage_information <- function(GEMLI_items, cutoff, output_as_dict=T) # check
 3 | {
 4 |   lineage_predictions_matrix = GEMLI_items[["prediction"]]
 5 |   network = (lineage_predictions_matrix >= cutoff)
 6 |   network_edges = as.matrix(network)
 7 |   network_graph = igraph::graph.adjacency(network_edges, mode="undirected", weighted=NULL)
 8 |   family_dict = igraph::clusters(network_graph)$membership
 9 |   GEMLI_items[["predicted_lineages"]] = family_dict
10 |   family_table = cbind(names(family_dict), as.vector(family_dict)); colnames(family_table) = c("cell.ID", "clone.ID")
11 |   GEMLI_items[["predicted_lineage_table"]] = family_table
12 |   return(GEMLI_items)
13 | }
14 | 


--------------------------------------------------------------------------------
/GEMLI_package_v0/R/quantify_clusters_iterative.R:
--------------------------------------------------------------------------------
 1 | quantify_clusters_iterative = function(data_matrix, marker_genes, N=2, fast=FALSE)
 2 | {
 3 |   iterate = T; i = 2
 4 |   genes = intersect(marker_genes, rownames(data_matrix)[rowMeans(data_matrix)>0])
 5 |   data_matrix = data_matrix[genes,]
 6 |   corr_expr_raw = calculate_correlations(t(data_matrix), fast=FALSE); corr_expr = (1 - corr_expr_raw)/2
 7 |   cell_clusters = data.matrix(matrix(0, nrow=ncol(data_matrix), ncol=1)); rownames(cell_clusters) = colnames(data_matrix)
 8 |   cell_clusters[,1] = rep(1, ncol(data_matrix))
 9 |   while (iterate)
10 |   {
11 |     cell_clusters = cbind(cell_clusters, rep(0,nrow(cell_clusters)))
12 |     for (cluster in setdiff(unique(cell_clusters[,(i-1)]),0))
13 |     {
14 |       cells_in_cluster = rownames(cell_clusters)[cell_clusters[,(i-1)]==cluster]
15 |       if (length(cells_in_cluster) >= 4) # this line ends the sub clustering # min of desired cluster size
16 |       {
17 |         correlation = mean((corr_expr_raw[cells_in_cluster,cells_in_cluster])[lower.tri(corr_expr_raw[cells_in_cluster,cells_in_cluster], diag=F)])
18 |         corr_expr_subset = corr_expr[cells_in_cluster,cells_in_cluster]
19 |         clustering = cutree(hclust(as.dist(corr_expr_subset), method = "ward.D2", ), k=N)
20 |         cell_clusters[names(clustering),i] = as.vector(clustering) + max(c(0, cell_clusters[,i]), na.rm=T)
21 |       }
22 |       else {cell_clusters[cells_in_cluster,i] = 0}
23 |     }
24 |     if (sum(cell_clusters[,i], na.rm=T)==0) {iterate = F}
25 |     i = i+1
26 |   }
27 |   cell_clusters[cell_clusters==0] = NA
28 |   return(cell_clusters)
29 | }
30 | 


--------------------------------------------------------------------------------
/GEMLI_package_v0/R/suggest_network_trimming_to_size.R:
--------------------------------------------------------------------------------
 1 | suggest_network_trimming_to_size <- function(GEMLI_items, max_size=4, cutoff=70, max_edge_width=5, display_orphan=F, include_labels=F, ground_truth=F, layout_style="fr")
 2 | {
 3 |   lineage_predictions_matrix_original = GEMLI_items[["prediction"]]
 4 |   predicted_lineages_original = GEMLI_items[["predicted_lineages"]]
 5 |   GEMLI_items_in_trimming = GEMLI_items
 6 |   predicted_lineages = prediction_to_lineage_information(GEMLI_items, cutoff, output_as_dict=T)$predicted_lineages
 7 |   if (ground_truth) {lineage_dict = GEMLI_items[['barcodes']]} else {lineage_dict = prediction_to_lineage_information(GEMLI_items, cutoff, output_as_dict=T)$predicted_lineages}
 8 |   while (sum(table(GEMLI_items_in_trimming$predicted_lineages)>max_size)!=0)
 9 |   { predicted_lineages = GEMLI_items_in_trimming$predicted_lineages; 
10 |     lineage_predictions_matrix = GEMLI_items_in_trimming$prediction
11 |     oversized_families = names(table(predicted_lineages))[table(predicted_lineages)>max_size]
12 |     for (oversized_family in oversized_families)
13 |     {
14 |       oversized_familiy_scores = lineage_predictions_matrix[names(which(predicted_lineages==oversized_family)), names(which(predicted_lineages==oversized_family))]
15 |       weakest_link = min(oversized_familiy_scores[oversized_familiy_scores!=0])
16 |       oversized_familiy_scores[oversized_familiy_scores==weakest_link] <- 0
17 |       lineage_predictions_matrix[names(which(predicted_lineages==oversized_family)), names(which(predicted_lineages==oversized_family))] = oversized_familiy_scores
18 |     }
19 |     GEMLI_items_in_trimming$prediction = lineage_predictions_matrix
20 |     GEMLI_items_in_trimming$predicted_lineages = prediction_to_lineage_information(GEMLI_items_in_trimming, cutoff, output_as_dict=T)$predicted_lineages
21 |   }
22 |   # visualization
23 |   par(mar=c(0,0,3.5,0))
24 |   layout(mat = matrix(c(1, 2), nrow = 2, ncol = 1), heights = c(3, 1))
25 |   base_colors = rep(c('#a50026','#d73027','#f46d43','#fdae61','#fee090','#e0f3f8','#abd9e9','#74add1','#4575b4','#313695','#40004b','#762a83','#9970ab','#c2a5cf','#e7d4e8','#d9f0d3','#a6dba0','#5aae61','#1b7837','#00441b','#543005','#8c510a','#bf812d','#dfc27d','#f6e8c3','#e0e0e0','#bababa','#878787','#4d4d4d','#1a1a1a'), 100)
26 |   network_edges = as.matrix(lineage_predictions_matrix_original)
27 |   network_edges[lineage_predictions_matrix[rownames(network_edges),rownames(network_edges)]==0] <- ((-1) * network_edges[lineage_predictions_matrix[rownames(network_edges),rownames(network_edges)]==0])
28 |   network_edges[abs(network_edges)<cutoff] <- 0
29 |   if (!display_orphan) {network_edges = network_edges[rowSums(network_edges)!=0,colSums(network_edges)!=0]}
30 |   
31 |   vertex_color = base_colors[rank(as.numeric(lineage_dict[rownames(network_edges)]), na.last="keep", ties.method="min")]; vertex_color[is.na(vertex_color)] = "white"
32 |     network_edges = network_edges/max(network_edges)
33 |     network_graph = igraph::graph.adjacency(network_edges, mode="undirected", weighted=T)
34 |     network_graph = igraph::set.vertex.attribute(network_graph, "name", value=(1:ncol(network_edges)))
35 |     
36 |     edge_color = igraph::edge_attr(network_graph, "weight") < 0; edge_color[edge_color==T] <- 'red'; edge_color[edge_color==F] <- 'grey'
37 |       igraph::edge_attr(network_graph, "weight") = abs(igraph::edge_attr(network_graph, "weight"))
38 |       edge_width = igraph::edge_attr(network_graph, "weight"); max(edge_width); min(edge_width)
39 |       edge_width = edge_width - min(edge_width); edge_width = edge_width / max(edge_width)
40 |       edge_width = edge_width * (max_edge_width*0.9); max(edge_width); min(edge_width); edge_width = edge_width + (max_edge_width*0.1); max(edge_width); min(edge_width)
41 |       if (!include_labels) {network_graph = igraph::set.vertex.attribute(network_graph, "name", value=rep("",ncol(network_edges)))}
42 |       if (layout_style=='fr') {igraph::plot.igraph(network_graph, vertex.size=3, vertex.label.cex=0.5, layout=layout.fruchterman.reingold(network_graph), vertex.color=vertex_color, vertex.label.color="black", vertex.label.family="Arial", edge.color=edge_color, rescale=TRUE, edge.width=edge_width, vertex.label.dist=0.5, vertex.label.degree=pi*1.5)}
43 |       if (layout_style=='kk') {igraph::plot.igraph(network_graph, vertex.size=3, vertex.label.cex=0.5, layout=layout.kamada.kawai(network_graph), vertex.color=vertex_color, vertex.label.color="black", vertex.label.family="Arial", edge.color=edge_color, rescale=TRUE, edge.width=edge_width, vertex.label.dist=0.5, vertex.label.degree=pi*1.5)}
44 |       # Title
45 |       title(main = paste0("Prediction at confidence level ", cutoff, "\nsuggested trimming to size ", max_size, "\n"))
46 |       # Edge_color_and_width
47 |       plot(NULL ,xaxt='n',yaxt='n',bty='n',ylab='',xlab='', xlim=0:1, ylim=0:1)
48 |       legend("top", legend = c("Keep", "Trim"), col = c("grey", "red"), bty = "n", lty=1:1, lwd=c(max(edge_width), max(edge_width)), title = "Suggested", title.adj =0, horiz=F, xpd=TRUE, inset=c(0,0),ncol=1)
49 |       legend("topleft", legend = c(max(lineage_predictions_matrix_original), min(lineage_predictions_matrix_original[which(lineage_predictions_matrix_original>cutoff)])), col = c("black", "black"), bty = "n", lty=1:1, lwd=c(max(edge_width),(min(edge_width)+ (max_edge_width*0.1))), title = "Confidence", title.adj =0, horiz=F, xpd=TRUE, inset=c(.15,0),ncol=1)
50 |       # Vertex_color
51 |       if (include_labels==T) {legend("topright", legend=c("Color - prediction","Number - cell ID"), bty = "n", title = "Vertex ", title.adj =0.5, horiz=F, xpd=TRUE, inset=c(.05, 0),ncol=1)} else
52 |         {legend("topright", legend=c("Color - prediction"), bty = "n", title = "Vertex ", title.adj =0.5, horiz=F, xpd=TRUE, inset=c(.05, 0),ncol=1)}
53 | }
54 | 


--------------------------------------------------------------------------------
/GEMLI_package_v0/R/test_lineages.R:
--------------------------------------------------------------------------------
 1 | test_lineages <- function(GEMLI_items, valid_fam_sizes=(1:5), max_interval=100, plot_results=F)
 2 | {
 3 |   lineage_predictions_matrix = GEMLI_items[['prediction']]
 4 |   lineage_dict_bc = GEMLI_items[['barcodes']]
 5 |   valid_family_dict = lineage_dict_bc[as.character(lineage_dict_bc) %in% names(table(lineage_dict_bc))[table(lineage_dict_bc) %in% valid_fam_sizes]]
 6 |   cell_with_annotation = intersect(rownames(lineage_predictions_matrix), names(valid_family_dict))
 7 |   family_dict_filt = valid_family_dict[cell_with_annotation]
 8 |   real_family_matrix = outer(family_dict_filt[cell_with_annotation], family_dict_filt[cell_with_annotation], FUN='=='); diag(real_family_matrix) = F
 9 |   results_repeated_annotated = lineage_predictions_matrix[cell_with_annotation, cell_with_annotation]
10 |   if (is.na(max_interval)) {intervals = unique(round(seq(0,1,0.1)*max(results_repeated_annotated),0))} else {intervals = unique(round(seq(0,1,0.1)*max_interval,0))}
11 |   output_matrix = matrix(NA, ncol=4, nrow=length(intervals)); colnames(output_matrix) = c('precision','TP','FP','sensitivity'); rownames(output_matrix) = intervals
12 |   for (interval in intervals){output_matrix[as.character(interval),1:3] = c(sum(real_family_matrix & (results_repeated_annotated>=interval)) / sum(results_repeated_annotated>=interval), sum(real_family_matrix & (results_repeated_annotated>=interval)), sum((!real_family_matrix) & (results_repeated_annotated>=interval)))}
13 |   # replace this with an apply function
14 |   output_matrix[,"sensitivity"] = output_matrix[,"TP"]/output_matrix["0","TP"]
15 |   output_matrix = output_matrix[,c('TP','FP','precision','sensitivity')]
16 |   if (plot_results)
17 |   {
18 |     par(mar=c(4.5, 4.5, 3.5, 4.5)); plot(as.numeric(rownames(output_matrix)), output_matrix[,"precision"], type="o", pch=16, lwd=3, col="darkred", xlab="confidence level", ylab="precision (red)", log="", main="testing lineage prediction", ylim=c(0,1)); par(new=T); plot(as.numeric(rownames(output_matrix)), output_matrix[,"sensitivity"], type="o", pch=16, lwd=3, axes=F, bty="n", xlab="", ylab="", col="grey", log="", ylim=c(0,1)); axis(side=4, at=pretty(range(output_matrix[,"sensitivity"]))); mtext("sensitivity (grey)", side=4, line=3)
19 |   }
20 |   GEMLI_items[['testing_results']] = output_matrix
21 |   return(GEMLI_items)
22 | }
23 | 


--------------------------------------------------------------------------------
/GEMLI_package_v0/R/trim_network_to_size.R:
--------------------------------------------------------------------------------
 1 | trim_network_to_size <- function(GEMLI_items, max_size=4, cutoff=70)
 2 | {
 3 |   GEMLI_items_in_trimming = GEMLI_items
 4 |   while (sum(table(GEMLI_items_in_trimming$predicted_lineages)>max_size)!=0)
 5 |   {
 6 |     predicted_lineages = GEMLI_items_in_trimming$predicted_lineages; lineage_predictions_matrix = GEMLI_items_in_trimming$prediction
 7 |     oversized_families = names(table(predicted_lineages))[table(predicted_lineages)>max_size]
 8 |     for (oversized_family in oversized_families)
 9 |     {
10 |       oversized_familiy_scores = lineage_predictions_matrix[names(which(predicted_lineages==oversized_family)), names(which(predicted_lineages==oversized_family))]
11 |       weakest_link = min(oversized_familiy_scores[oversized_familiy_scores!=0])
12 |       oversized_familiy_scores[oversized_familiy_scores==weakest_link] <- 0
13 |       lineage_predictions_matrix[names(which(predicted_lineages==oversized_family)), names(which(predicted_lineages==oversized_family))] = oversized_familiy_scores
14 |     }
15 |     GEMLI_items_in_trimming$prediction = lineage_predictions_matrix
16 |     GEMLI_items_in_trimming = prediction_to_lineage_information(GEMLI_items_in_trimming, cutoff)
17 |   }
18 |   return(GEMLI_items_in_trimming)
19 | }
20 | 


--------------------------------------------------------------------------------
/GEMLI_package_v0/R/visualize_as_network.R:
--------------------------------------------------------------------------------
 1 | visualize_as_network <- function(GEMLI_items, cutoff=70, max_edge_width=5, display_orphan=F, include_labels=F, ground_truth=F, highlight_FPs=F, layout_style='fr', cell_type_colors=F)
 2 | {
 3 |   par(mar=c(0,0,2,0))
 4 |   if (cell_type_colors) {layout(mat = matrix(c(1, 2, 3, 0), nrow = 2, ncol = 2), heights = c(3, 1), widths = c(3,1))} else {layout(mat = matrix(c(1, 2), nrow = 2, ncol = 1), heights = c(3, 1))}
 5 |   # title
 6 |   lineage_predictions_matrix = GEMLI_items[["prediction"]]
 7 |   if (ground_truth) {lineage_dict = GEMLI_items[['barcodes']]} else {lineage_dict = prediction_to_lineage_information(GEMLI_items, cutoff, output_as_dict=T)$predicted_lineages}
 8 |   base_colors = rep(c('#a50026','#d73027','#f46d43','#fdae61','#fee090','#e0f3f8','#abd9e9','#74add1','#4575b4','#313695','#40004b','#762a83','#9970ab','#c2a5cf','#e7d4e8','#d9f0d3','#a6dba0','#5aae61','#1b7837','#00441b','#543005','#8c510a','#bf812d','#dfc27d','#f6e8c3','#e0e0e0','#bababa','#878787','#4d4d4d','#1a1a1a'), 100)
 9 |   if (cell_type_colors) { if (length(GEMLI_items[['cell_type_color']])!=0){
10 |     base_colors = GEMLI_items[['cell_type_color']]$color 
11 |   } else {
12 |     cell.type<-unique(GEMLI_items[['cell_type']]$cell.type)
13 |     color<-base_colors[rank(unique(GEMLI_items[['cell_type']]$cell.type))]
14 |     GEMLI_items[['cell_type_color']] = data.frame(cell.type, color)
15 |   }} else {base_colors = base_colors
16 |   }
17 |   network_edges = as.matrix(lineage_predictions_matrix)
18 |   network_edges[network_edges<cutoff] <- 0
19 |   if (!display_orphan) {network_edges = network_edges[rowSums(network_edges)!=0,colSums(network_edges)!=0]}
20 |   if ((ground_truth==T) & (highlight_FPs==T))
21 |   { real_matrix = outer(lineage_dict[rownames(network_edges)], lineage_dict[rownames(network_edges)], "=="); real_matrix[is.na(real_matrix)] <- T
22 |     rownames(real_matrix) = rownames(network_edges); colnames(real_matrix) = rownames(network_edges)
23 |     network_edges[real_matrix[rownames(network_edges),rownames(network_edges)]==F] <- ((-1) * network_edges[real_matrix[rownames(network_edges),rownames(network_edges)]==F])
24 |   }
25 |   if (layout_style=='grid')
26 |   { lineage_dict_filt = lineage_dict[rownames(network_edges)]
27 |     family_order = names(sort(table(lineage_dict_filt), decreasing=T))
28 |     cell_order = c(); for (family in family_order){cell_order = c(cell_order, names(lineage_dict_filt[lineage_dict_filt==family]))}
29 |     network_edges = network_edges[cell_order, cell_order]
30 |   }
31 |   # Here is assignment of colors to vertex
32 |   if (cell_type_colors) {vertex_color = GEMLI_items[['cell_type_color']]$color[match(GEMLI_items[['cell_type']]$cell.type[match(names(lineage_dict[rownames(network_edges)]), GEMLI_items[['cell_type']]$cell.ID)], GEMLI_items[['cell_type_color']]$cell.type)]} else
33 |   {vertex_color = base_colors[rank(as.numeric(lineage_dict[rownames(network_edges)]), na.last="keep", ties.method="min")]}
34 |   vertex_color[is.na(vertex_color)] = "white"
35 |   network_edges = network_edges/max(network_edges)
36 |   network_graph = igraph::graph.adjacency(network_edges, mode="undirected", weighted=T)
37 |   network_graph = igraph::set.vertex.attribute(network_graph, "name", value=(1:ncol(network_edges)))
38 |   edge_color = igraph::edge_attr(network_graph, "weight") < 0; edge_color[edge_color==T] <- 'red'; edge_color[edge_color==F] <- 'grey'
39 |     igraph::edge_attr(network_graph, "weight") = abs(igraph::edge_attr(network_graph, "weight"))
40 |     edge_width = igraph::edge_attr(network_graph, "weight");
41 |     edge_width = edge_width - min(edge_width); edge_width = edge_width / max(edge_width) # get it to a 0-1 percentage value of the maximal present value
42 |     edge_width = edge_width * (max_edge_width*0.9); edge_width = edge_width + (max_edge_width*0.1); 
43 |     if (!include_labels) {network_graph = igraph::set.vertex.attribute(network_graph, "name", value=rep("",ncol(network_edges)))}
44 |     if (layout_style=='fr') {plot.igraph(network_graph, vertex.size=3, vertex.label.cex=0.5, vertex.color=vertex_color, vertex.label.color="black", vertex.label.family="Arial", edge.color=edge_color, layout=layout.fruchterman.reingold(network_graph), rescale=TRUE, edge.width=edge_width, vertex.label.dist=0.5, vertex.label.degree=pi*1.5)}
45 |     if (layout_style=='kk') {plot.igraph(network_graph, vertex.size=3, vertex.label.cex=0.5, vertex.color=vertex_color, vertex.label.color="black", vertex.label.family="Arial", edge.color=edge_color, layout=layout.kamada.kawai(network_graph), rescale=TRUE, edge.width=edge_width, vertex.label.dist=0.5, vertex.label.degree=pi*1.5)}
46 |     if (layout_style=='grid') {plot.igraph(network_graph, vertex.size=3, vertex.label.cex=0.5, vertex.color=vertex_color, vertex.label.color="black", vertex.label.family="Arial", edge.color=edge_color, layout=layout.grid(network_graph), rescale=TRUE, edge.width=edge_width, vertex.label.dist=0.5, vertex.label.degree=pi*1.5)}
47 |     # Title
48 |     title(main = paste0("Prediction at confidence level ", cutoff, "\n "))
49 |     # Edge_color_and_width
50 |     plot(NULL ,xaxt='n',yaxt='n',bty='n',ylab='',xlab='', xlim=0:1, ylim=0:1)
51 |     if ((ground_truth==T) & (highlight_FPs==T)) {legend("top", legend = c("Correct", "False"), col = c("grey", "red"), bty = "n", lty=1:1, lwd=c(max(edge_width), max(edge_width)), title = "Prediction", title.adj =0, horiz=F, xpd=TRUE, inset=c(0,0),ncol=1)}
52 |     legend("topleft", legend = c(max(lineage_predictions_matrix), min(lineage_predictions_matrix[which(lineage_predictions_matrix>cutoff)])), col = c("black", "black"), bty = "n", lty=1:1, lwd=c(max(edge_width),(min(edge_width)+ (max_edge_width*0.1))), title = "Confidence", title.adj =0, horiz=F, xpd=TRUE, inset=c(.15,0),ncol=1)
53 |     # Vertex_color
54 |     if ((ground_truth==T) & (cell_type_colors==F) & (include_labels==F)) {legend("topright", legend=c("Color - ground truth","White - no ground truth"), bty = "n", title = "Color ", title.adj =0.5, horiz=F, xpd=TRUE, inset=c(.05, 0),ncol=1)}
55 |     if ((ground_truth==F) & (cell_type_colors==F) & (include_labels==F)) {legend("top", legend=c("prediction",""), bty = "n", title = "  Color by", title.adj =0.5, horiz=F, xpd=TRUE, inset=c(0, 0),ncol=1)}
56 |     if ((ground_truth==T) & (cell_type_colors==F) & (include_labels==T)) {legend("topright", legend=c("Color - ground truth","White - no ground truth","Number - cell ID"), bty = "n", title = "Vertex ", title.adj =0.5, horiz=F, xpd=TRUE, inset=c(.05, 0),ncol=1)}
57 |     if ((ground_truth==F) & (cell_type_colors==F) & (include_labels==T)) {legend("top", legend=c("Color - prediction","Number - cell ID"), bty = "n", title = "  Vertex", title.adj =0.5, horiz=F, xpd=TRUE, inset=c(0, 0),ncol=1)}
58 |     if (cell_type_colors) {
59 |       plot(NULL ,xaxt='n',yaxt='n',bty='n',ylab='',xlab='', xlim=0:1, ylim=0:1)
60 |       legend("left", legend = GEMLI_items[['cell_type_color']]$cell.type, pch = 16, col = GEMLI_items[['cell_type_color']]$color, title = "Cell type", bty = "o", horiz=F, xpd=TRUE, inset=c(0, 0),ncol=1)}
61 |      }
62 | 


--------------------------------------------------------------------------------
/GEMLI_package_v0/man/DEG_volcano_plot.Rd:
--------------------------------------------------------------------------------
 1 | \name{DEG_volcano_plot}
 2 | \alias{DEG_volcano_plot}
 3 | %- Also NEED an '\alias' for EACH other topic documented here.
 4 | \title{
 5 | DEG_volcano_plot
 6 | }
 7 | \description{
 8 | This function plots a simple volcano plot for the differential expressed genes (DEG) called using the function 'cell_fate_DEG_calling'.
 9 | }
10 | \usage{
11 | DEG_volcano_plot(GEMLI_items, name1, name2)
12 | }
13 | \arguments{
14 |   \item{
15 |   GEMLI_items}{GEMLI_items is a list of GEMLI inputs and outputs. '. To run 'DEG_volcano_plot' it should contain a 'DEG' element. The 'DEG' element is the output of the 'cell_fate_DEG_calling' function. It is a dataframe with the columns 'p_val', 'avg_log2FC', 'pct1', 'pct.2', 'p_val-adj'.
16 |   }
17 |   \item{
18 |   name1}{'name1' is a character vector specifying the first population of cells analysed for DEG calling. It will appears in the title and legend of the volcano plot. It should correspond to the 'ident1' parameter of the 'cell.fate_DEG_calling' function used to generate the 'GEMLI_items' 'DEG' element.
19 |   }
20 |   \item{
21 |   name2}{'name2' is a character vector specifying the second population of cells analysed for DEG calling. See 'name1' parameter.
22 |   }
23 | }
24 | \details{
25 | %%  ~~ If necessary, more details than the description above ~~
26 | }
27 | \value{
28 | 'DEG_volcano_plot' plots a volcano plot for the DEG called using the function 'cell_fate_DEG_calling'. 
29 | }
30 | \references{
31 | %% ~put references to the literature/web site here ~
32 | }
33 | \author{
34 | }
35 | \note{
36 | %%  ~~further notes~~
37 | }
38 | 
39 | %% ~Make other sections like Warning with \section{Warning }{....} ~
40 | 
41 | \seealso{
42 | %% ~~objects to See Also as \code{\link{help}}, ~~~
43 | }
44 | \examples{
45 | ##---- Should be DIRECTLY executable !! ----
46 | ##-- ==>  Define data, use random,
47 | ##--	or do  help(data=index)  for the standard data sets.
48 | 
49 | ## The function is currently defined as
50 | function (x)
51 | {
52 |   }
53 | }
54 | % Add one or more standard keywords, see file 'KEYWORDS' in the
55 | % R documentation directory (show via RShowDoc("KEYWORDS")):
56 | % \keyword{ ~kwd1 }
57 | % \keyword{ ~kwd2 }
58 | % Use only one keyword per line.
59 | % For non-standard keywords, use \concept instead of \keyword:
60 | % \concept{ ~cpt1 }
61 | % \concept{ ~cpt2 }
62 | % Use only one concept per line.
63 | 


--------------------------------------------------------------------------------
/GEMLI_package_v0/man/calculate_correlations.Rd:
--------------------------------------------------------------------------------
 1 | \name{calculate_correlations}
 2 | \alias{calculate_correlations}
 3 | %- Also NEED an '\alias' for EACH other topic documented here.
 4 | \title{
 5 | calculate_correlations
 6 | }
 7 | \description{
 8 | This function provides a fast way to calculate Spearmans ranked correlation or Pearsons correlation.
 9 | }
10 | \usage{
11 | calculate_correlations(data_matrix, fast=FALSE)
12 | }
13 | \arguments{
14 |   \item{
15 |   data_matrix}{'data_matrix' is a gene expression matrix where rownames are genes (features) and column names are cell IDs (samples).
16 |   }
17 |   \item{
18 |   fast}{'fast' = FALSE will calculate the Spearman rank correlation, fast = TRUE will make use of the package HiClimR for the calculation of a Pearson correlation. The calculation of the Pearson correlation is faster than the calculation of Spearman rank correlation, however precision of lineage predictions will be slighlty reduced using the Pearson correlation. The default value is FALSE. 
19 | }
20 | }
21 | \details{
22 | %%  ~~ If necessary, more details than the description above ~~
23 | }
24 | \value{
25 | The output is a cell by cell matrix with each value representing the rho value of Spearmans ranked correlation.
26 | }
27 | \references{
28 | %% ~put references to the literature/web site here ~
29 | }
30 | \author{
31 | Marcel Tarbier and Almut Eisele
32 | }
33 | \note{
34 | %%  ~~further notes~~
35 | }
36 | 
37 | %% ~Make other sections like Warning with \section{Warning }{....} ~
38 | 
39 | \seealso{
40 | %% ~~objects to See Also as \code{\link{help}}, ~~~
41 | }
42 | \examples{
43 | ##---- Should be DIRECTLY executable !! ----
44 | ##-- ==>  Define data, use random,
45 | ##--	or do  help(data=index)  for the standard data sets.
46 | 
47 | ## The function is currently defined as
48 | function (x)
49 | {
50 |   }
51 | }
52 | % Add one or more standard keywords, see file 'KEYWORDS' in the
53 | % R documentation directory (show via RShowDoc("KEYWORDS")):
54 | % \keyword{ ~kwd1 }
55 | % \keyword{ ~kwd2 }
56 | % Use only one keyword per line.
57 | % For non-standard keywords, use \concept instead of \keyword:
58 | % \concept{ ~cpt1 }
59 | % \concept{ ~cpt2 }
60 | % Use only one concept per line.
61 | 


--------------------------------------------------------------------------------
/GEMLI_package_v0/man/cell_fate_DEG_calling.Rd:
--------------------------------------------------------------------------------
 1 | \name{cell_fate_DEG_calling}
 2 | \alias{cell_fate_DEG_calling}
 3 | %- Also NEED an '\alias' for EACH other topic documented here.
 4 | \title{
 5 | cell_fate_DEG_calling
 6 | }
 7 | \description{
 8 | This function calls differential expressed genes (DEG) between cells of specific cell types being members of asymmetric cell lineages (containing members of two or more cell types of interest) or members of symmetric cell lineages (contains members of only one cell type of interest). DEG calling is performed using Seurats FindMarkers function.
 9 | }
10 | \usage{
11 | cell_fate_DEG_calling(GEMLI_items, ident1, ident2, min.pct=0.05, logfc.threshold=0.1)
12 | }
13 | \arguments{
14 |   \item{
15 |   GEMLI_items}{GEMLI_items is a list of GEMLI inputs and outputs. To run 'cell_fate_DEG_calling' it should contain a 'gene_expression' as well as a 'cell_fate_analysis' element. The 'gene_expression' element is a quality controlled and normalised gene expression matrix where rownames are genes (features) and column names are cell IDs (cell barcodes). The 'cell_fate_analysis' element is a data frame with column 'cell.ID, 'clone.ID', 'cell.type' and 'cell.fate' generated by the 'extract_cell_fate_lineages' function.
16 |   }
17 |   \item{
18 |   ident1}{'ident1' specifies the first 'cell.fate' of the GEMLI_items 'cell_fate_analysis' element to be used for DEG calling. It is a character vector which can encompass several cell fates. Cell fates contain the lineage type (sym or asym for symmetric or asymmetric respectively) and the cell type separated by an underscore. Cell fates can for example be 'sym_DCIS' or 'asym_inv_tumor'.
19 |   }
20 |   \item{
21 |   ident2}{'ident2' specifies the second 'cell.fate' of the GEMLI_items 'cell_fate_analysis' element to be used for DEG calling. See ident1 for the format.
22 |   }
23 |   \item{
24 |   min.pct}{'min.pct' is the min.pct parameter of Seurats FindMarker function. It is the minimum fraction of cells in either of the compared cell populations in which a gene should be expressed in order to be tested. the default value is 0.05. 
25 |   }
26 |   \item{
27 |   logfc.threshold}{'logfc.threshold' is the logfc.threshold parameter of Seurats FindMarker function. It limits the DEG calling to genes which show, on average, at least x-fold differences (log-scale) between the two compared cell populations. 
28 |   }
29 | }
30 | \details{
31 | %%  ~~ If necessary, more details than the description above ~~
32 | }
33 | \value{
34 | 'cell_fate_DEG_calling' yields a data frame which is added to the 'GEMLI_items' under the name 'DEG'. The data frame is the output of Seurats FindMarker function and contains the column 'p_val', 'avg_log2FC', 'pct.1', 'pct2' and 'p_val_adj'.
35 | }
36 | \references{
37 | %% ~put references to the literature/web site here ~
38 | }
39 | \author{
40 | }
41 | \note{
42 | %%  ~~further notes~~
43 | }
44 | 
45 | %% ~Make other sections like Warning with \section{Warning }{....} ~
46 | 
47 | \seealso{
48 | %% ~~objects to See Also as \code{\link{help}}, ~~~
49 | }
50 | \examples{
51 | ##---- Should be DIRECTLY executable !! ----
52 | ##-- ==>  Define data, use random,
53 | ##--	or do  help(data=index)  for the standard data sets.
54 | 
55 | ## The function is currently defined as
56 | function (x)
57 | {
58 |   }
59 | }
60 | % Add one or more standard keywords, see file 'KEYWORDS' in the
61 | % R documentation directory (show via RShowDoc("KEYWORDS")):
62 | % \keyword{ ~kwd1 }
63 | % \keyword{ ~kwd2 }
64 | % Use only one keyword per line.
65 | % For non-standard keywords, use \concept instead of \keyword:
66 | % \concept{ ~cpt1 }
67 | % \concept{ ~cpt2 }
68 | % Use only one concept per line.
69 | 


--------------------------------------------------------------------------------
/GEMLI_package_v0/man/cell_type_composition_plot.Rd:
--------------------------------------------------------------------------------
 1 | \name{cell_type_composition_plot}
 2 | \alias{cell_type_composition_plot}
 3 | %- Also NEED an '\alias' for EACH other topic documented here.
 4 | \title{
 5 | cell_type_composition_plot
 6 | }
 7 | \description{
 8 | This function generates simple plots of the cell type composition of predicted or ground truth lineages.
 9 | }
10 | \usage{
11 | cell_type_composition_plot(GEMLI_items, ground_truth=F, cell_type_colors=F, type)
12 | }
13 | \arguments{
14 |   \item{
15 |   GEMLI_items}{GEMLI_items is a list of GEMLI inputs and outputs. To run 'cell_type_composition_plot' it should contain a 'barcodes' (composition of ground truth) or 'predicted_lineage_table' (composition of predicted lineages) element. The 'predicted_lineage_table' is the output of the 'prediction_to_lineage_info' function. 
16 |   }
17 |   \item{
18 |   ground_truth}{==T/TRUE indicates that the composition of ground truth lineages is analyzed. If 'ground_truth'==F, the compositionof predicted lineages is analyzed. Default is F.
19 |   }
20 |   \item{
21 |   cell_type_colors}{'cell_type_colors'==T/TRUE specifies that custom colors for every cell type stored in GEMLI_items 'cell_type_colors' elment should be used. Default is F.
22 |   }
23 |   \item{
24 |   type}{'type' specifies which of three plots is generated. Type can be 'bubble', 'upsetR', or 'plain'. type='plain' will output a simple table of the number of lineages for different cell type combinations. type='upsetR' will generate an upsetR plot showing the number of lineages for different cell type combinations. type='bubble' will generate a bubble plot of the cell type composition of individual lineages. This is especially meaningful when analyzing multicellular structures. 
25 |   }
26 | }
27 | \details{
28 | %%  ~~ If necessary, more details than the description above ~~
29 | }
30 | \value{
31 | 'cell_type_composition_plot' yields one of three possible plot types (see 'type') specifying the cell type composition of ground truth or predicted lineages.
32 | }
33 | \references{
34 | %% ~put references to the literature/web site here ~
35 | }
36 | \author{
37 | }
38 | \note{
39 | %%  ~~further notes~~
40 | }
41 | 
42 | %% ~Make other sections like Warning with \section{Warning }{....} ~
43 | 
44 | \seealso{
45 | %% ~~objects to See Also as \code{\link{help}}, ~~~
46 | }
47 | \examples{
48 | ##---- Should be DIRECTLY executable !! ----
49 | ##-- ==>  Define data, use random,
50 | ##--	or do  help(data=index)  for the standard data sets.
51 | 
52 | ## The function is currently defined as
53 | function (x)
54 | {
55 |   }
56 | }
57 | % Add one or more standard keywords, see file 'KEYWORDS' in the
58 | % R documentation directory (show via RShowDoc("KEYWORDS")):
59 | % \keyword{ ~kwd1 }
60 | % \keyword{ ~kwd2 }
61 | % Use only one keyword per line.
62 | % For non-standard keywords, use \concept instead of \keyword:
63 | % \concept{ ~cpt1 }
64 | % \concept{ ~cpt2 }
65 | % Use only one concept per line.
66 | 


--------------------------------------------------------------------------------
/GEMLI_package_v0/man/cluster_stability_plot.Rd:
--------------------------------------------------------------------------------
 1 | \name{cluster_stability_plot}
 2 | \alias{cluster_stability_plot}
 3 | %- Also NEED an '\alias' for EACH other topic documented here.
 4 | \title{
 5 | cluster_stability_plot
 6 | }
 7 | \description{
 8 | This function calculates a cluster stability index for lineage predictions allowing for different cluster sizes and generates a plot of cluster stability index vs cluster size. For multicellular structures, the cluster size at which the cluster stability index plateaus allows to estimate the maximal size of the multicellular structures present in the data. This is the cluster size till which lineage predictions will have a high precision, and the cluster size at which recovery will be maximal. 
 9 | }
10 | \usage{
11 | cluster_stability_plot(GEMLI_items)
12 | }
13 | \arguments{
14 |   \item{
15 |   GEMLI_items}{GEMLI_items is a list of GEMLI inputs and outputs. To run 'cluster_stability_plot' it should contain  a prediction matrix named 'prediction_multiple_sizes' that is generated and added to the items list by the function 'predict lineages_multiple_sizes'.
16 |   }
17 | }
18 | \details{
19 | %%  ~~ If necessary, more details than the description above ~~
20 | }
21 | \value{
22 | 'cluster_stability_plot' yields a plot of cluster stability index vs cluster size based on which the size of multicellular structures present in the single-cell RNA-sequencing dataset can be estimated.
23 | }
24 | \references{
25 | %% ~put references to the literature/web site here ~
26 | }
27 | \author{
28 | Almut Eisele and Marcel Tarbier
29 | }
30 | \note{
31 | %%  ~~further notes~~
32 | }
33 | 
34 | %% ~Make other sections like Warning with \section{Warning }{....} ~
35 | 
36 | \seealso{
37 | %% ~~objects to See Also as \code{\link{help}}, ~~~
38 | }
39 | \examples{
40 | ##---- Should be DIRECTLY executable !! ----
41 | ##-- ==>  Define data, use random,
42 | ##--	or do  help(data=index)  for the standard data sets.
43 | 
44 | ## The function is currently defined as
45 | function (x)
46 | {
47 |   }
48 | }
49 | % Add one or more standard keywords, see file 'KEYWORDS' in the
50 | % R documentation directory (show via RShowDoc("KEYWORDS")):
51 | % \keyword{ ~kwd1 }
52 | % \keyword{ ~kwd2 }
53 | % Use only one keyword per line.
54 | % For non-standard keywords, use \concept instead of \keyword:
55 | % \concept{ ~cpt1 }
56 | % \concept{ ~cpt2 }
57 | % Use only one concept per line.
58 | 


--------------------------------------------------------------------------------
/GEMLI_package_v0/man/extract_cell_fate_lineages.Rd:
--------------------------------------------------------------------------------
 1 | \name{extract_cell_fate_lineages}
 2 | \alias{extract_cell_fate_lineages}
 3 | %- Also NEED an '\alias' for EACH other topic documented here.
 4 | \title{
 5 | extract_cell_fate_lineages
 6 | }
 7 | \description{
 8 | This function extracts symmetric (with all members in one considered cell type) and asymmetric cell lineages (with members in two or more of considered cell types). The function generates the input for the cell_fate_DEG_calling function.
 9 | }
10 | \usage{
11 | extract_cell_fate_lineages(GEMLI_items, selection, unique=TRUE, threshold)
12 | }
13 | \arguments{
14 |   \item{
15 |   GEMLI_items}{GEMLI_items is a list of GEMLI inputs and outputs. To run 'extract_cell_type_lineages' it should contain a 'predicted_lineage_table' as well as a 'cell_type' table. The 'predicted_lineage_table' is generated using the function 'prediction_to_lineage_information'. The 'cell_type' table is a data frame with column 'cell.ID' and celltype'.
16 |   }
17 |   \item{
18 |   selection}{'selection' specifies the cell types to be considered for the extraction of symmetric and asymmetric lineages. It is a vector of minimal two characters specifying the cell types to be considered.
19 |   }
20 |   \item{
21 |   unique}{'unique'=TRUE specifies that extracted lineages should contain only the cell types given in the 'selection' parameter. If 'unique'=FALSE, also other cell types, not considered in the lineage selection, can be present in the extraccted lineages. The default value is T/TRUE.
22 |   }
23 |   \item{
24 |   threshold}{'threshold' specifies the minimal percentage of a given cell type asymmetric lineages should contain in order to be considered. It is a vector of percentages (numbers) which give the percentages for different cell types in the order in which they are given in the 'selection' parameter. Threshold values for all cell types have to be met for an asymmetric lineage to be kept.
25 |   }
26 | }
27 | \details{
28 | %%  ~~ If necessary, more details than the description above ~~
29 | }
30 | \value{
31 | 'extract_cell_fate_lineages' yields a data frame which is added to the 'GEMLI_items' under the name 'cell_fate_analysis'. The data frame contains the column 'cell.ID', 'clone.ID', 'cell.type' and 'cell.fate'. the 'cell.fate' column contains the lable 'asym' for selected asymmetric lineages and 'sym' for selected symmetric lineages, followed by the cell type of the specific cell, separated by an underscore (e.g. 'sym_DCIS', or 'asym_inv_tumor'). The function generates the input for the function 'cell_fate_DEG_calling'.
32 | }
33 | \references{
34 | %% ~put references to the literature/web site here ~
35 | }
36 | \author{
37 | }
38 | \note{
39 | %%  ~~further notes~~
40 | }
41 | 
42 | %% ~Make other sections like Warning with \section{Warning }{....} ~
43 | 
44 | \seealso{
45 | %% ~~objects to See Also as \code{\link{help}}, ~~~
46 | }
47 | \examples{
48 | ##---- Should be DIRECTLY executable !! ----
49 | ##-- ==>  Define data, use random,
50 | ##--	or do  help(data=index)  for the standard data sets.
51 | 
52 | ## The function is currently defined as
53 | function (x)
54 | {
55 |   }
56 | }
57 | % Add one or more standard keywords, see file 'KEYWORDS' in the
58 | % R documentation directory (show via RShowDoc("KEYWORDS")):
59 | % \keyword{ ~kwd1 }
60 | % \keyword{ ~kwd2 }
61 | % Use only one keyword per line.
62 | % For non-standard keywords, use \concept instead of \keyword:
63 | % \concept{ ~cpt1 }
64 | % \concept{ ~cpt2 }
65 | % Use only one concept per line.
66 | 


--------------------------------------------------------------------------------
/GEMLI_package_v0/man/memory_gene_calling.Rd:
--------------------------------------------------------------------------------
 1 | \name{memory_gene_calling}
 2 | \alias{memory_gene_calling}
 3 | %- Also NEED an '\alias' for EACH other topic documented here.
 4 | \title{
 5 | memory_gene_calling
 6 | }
 7 | \description{
 8 | This function identifies memory genes (or lineage markers) based on gene expression variability across lineages (either from predictions or from ground truth) from single-cell RNA-sequencing data.
 9 | }
10 | \usage{
11 | memory_gene_calling(GEMLI_items, valid_lineage_sizes=2:5, use_median=T, ground_truth=F)
12 | }
13 | \arguments{
14 |   \item{
15 |   GEMLI_items}{'GEMLI_items' is a list of GEMLI inputs and outputs. To run 'memory_gene_calling' GEMLI_items must contain a 'gene_expression' element. If it is run on bracodes 'GEMLI_items' needs to contain a 'barcodes' element. To run 'memory_gene_calling' on predictions 'GEMLI_items' needs to contain a 'predicted_lineage_table' elment. It is also possible to run 'memory_gene_calling' on a symmetric or asymmetric lineage type of specific cell types. In this case, the GEMLI_items must contain a 'cell_fate_analysis' element. The ‘gene_expression’ element is a quality controlled and normalised gene expression matrix where rownames are genes (features) and column names are cell IDs (cell barcodes). The 'predicted_lineage_table' is generated using the function prediction_to_lineage_information. The 'barcodes' element is a named vector of lineage ground truth (names=cell.ID, value=clone.ID). The 'cell_fate_analysis' element is generated using the 'extract_cell_fate_lineages' function.  
16 |   }
17 |   \item{
18 |   valid_lineage_sizes}{'valid_lineage_sizes' specifies the range of lineage sizes to be included. Depending on the question to be investigated it can be beneficial to either restrict this to small lineages or large lineages respectively. Default is small lineage from 2 to 5 cells (2:5).
19 |   }
20 |   \item{
21 |   use_median}{'use_median' specifies whether the median of lineages should be used rather than the mean. This makes the approach more robust to outliers. Default is 'true'/'T'.
22 |   }
23 |   \item{
24 |   use_barcodes}{'ground_truth' specifies whether to call memory genes on ground truth instead of predictions. Default is 'false'/'F'.
25 |   }
26 |   \item{
27 |   cell_fate}{'cell_fate', when present, specifies to call memory genes on specific symmetric or asymmetric lineages. It is a vector of the 'cell.fate' in the 'cell_fate_analysis' GEMLI_items element of the lineages to be used for memory gene calling. Cell fates start with sym or asym, followed by the cell type, separated by an underscore. 
28 |   }
29 | }
30 | \details{
31 | %%  ~~ If necessary, more details than the description above ~~
32 | }
33 | \value{
34 | 'memory_gene_calling' yields a table of gene names or IDs of potential memory genes (row names), as well as their variability ('var') across lineages and a p-values ('p'). This table is stored un the GEMLI_items list as element 'memory_genes'.
35 | }
36 | \references{
37 | %% ~put references to the literature/web site here ~
38 | }
39 | \author{
40 | Marcel Tarbier and Almut Eisele
41 | }
42 | \note{
43 | %%  ~~further notes~~
44 | }
45 | 
46 | %% ~Make other sections like Warning with \section{Warning }{....} ~
47 | 
48 | \seealso{
49 | %% ~~objects to See Also as \code{\link{help}}, ~~~
50 | }
51 | \examples{
52 | ##---- Should be DIRECTLY executable !! ----
53 | ##-- ==>  Define data, use random,
54 | ##--	or do  help(data=index)  for the standard data sets.
55 | 
56 | ## The function is currently defined as
57 | function (x)
58 | {
59 |   }
60 | }
61 | % Add one or more standard keywords, see file 'KEYWORDS' in the
62 | % R documentation directory (show via RShowDoc("KEYWORDS")):
63 | % \keyword{ ~kwd1 }
64 | % \keyword{ ~kwd2 }
65 | % Use only one keyword per line.
66 | % For non-standard keywords, use \concept instead of \keyword:
67 | % \concept{ ~cpt1 }
68 | % \concept{ ~cpt2 }
69 | % Use only one concept per line.
70 | 


--------------------------------------------------------------------------------
/GEMLI_package_v0/man/potential_markers.Rd:
--------------------------------------------------------------------------------
 1 | \name{potential_marker}
 2 | \alias{potential_marker}
 3 | %- Also NEED an '\alias' for EACH other topic documented here.
 4 | \title{
 5 | potential_marker
 6 | }
 7 | \description{
 8 | This function identifies potential lineage markers based on mean gene expression and gene expression variability from single-cell RNA-sequencing data. It is part of the 'predict_lineages' function, but can also be called idependently on a quality controlled and normalized gene expression matrix where rownames are genes (features) and column names are cell IDs (samples).
 9 | }
10 | \usage{
11 | potential_marker(data_matrix)
12 | }
13 | \arguments{
14 |   \item{
15 |   data_matrix}{'data_matrix' is a quality controlled and normalized gene expression matrix where rownames are genes (features) and column names are cell IDs (samples).
16 |   }
17 | }
18 | \details{
19 | %%  ~~ If necessary, more details than the description above ~~
20 | }
21 | \value{
22 | 'potential_marker' yields a vector of gene names or IDs of potential lineage marker genes. While genes are selected purely based on gene expression mean and variability, it has been shown that this approach enriches for genes with lineage specific gene expression profiles.
23 | }
24 | \references{
25 | %% ~put references to the literature/web site here ~
26 | }
27 | \author{
28 | Marcel Tarbier and Almut Eisele
29 | }
30 | \note{
31 | %%  ~~further notes~~
32 | }
33 | 
34 | %% ~Make other sections like Warning with \section{Warning }{....} ~
35 | 
36 | \seealso{
37 | %% ~~objects to See Also as \code{\link{help}}, ~~~
38 | }
39 | \examples{
40 | ##---- Should be DIRECTLY executable !! ----
41 | ##-- ==>  Define data, use random,
42 | ##--	or do  help(data=index)  for the standard data sets.
43 | 
44 | ## The function is currently defined as
45 | function (x)
46 | {
47 |   }
48 | }
49 | % Add one or more standard keywords, see file 'KEYWORDS' in the
50 | % R documentation directory (show via RShowDoc("KEYWORDS")):
51 | % \keyword{ ~kwd1 }
52 | % \keyword{ ~kwd2 }
53 | % Use only one keyword per line.
54 | % For non-standard keywords, use \concept instead of \keyword:
55 | % \concept{ ~cpt1 }
56 | % \concept{ ~cpt2 }
57 | % Use only one concept per line.
58 | 


--------------------------------------------------------------------------------
/GEMLI_package_v0/man/predict_lineages.Rd:
--------------------------------------------------------------------------------
 1 | \name{predict_lineages}
 2 | \alias{predict_lineages}
 3 | %- Also NEED an '\alias' for EACH other topic documented here.
 4 | \title{
 5 | predict_lineages
 6 | }
 7 | \description{
 8 | This function predicts cell lineages from from single-cell RNA-sequencing data. It identifies potential lineage markers based on mean gene expression and gene expression variability and uses these markers in a repeated iterative clustering approach. Subsets of these genes are used to cluster cells until the desired cluster size is reached. This clustering is repeated many times for random subsets. The result is a cell by cell matrix that lists how many times each cell pair clustered together which translates into a confidence score. Cell pairs with high confidence scores are likely to be members of the same lineage.
 9 | }
10 | \usage{
11 | predict_lineages(GEMLI_items, repetitions=100, sample_size=(2/3), desired_cluster_size=c(2,3), fast=FALSE)
12 | }
13 | \arguments{
14 |   \item{
15 |   GEMLI_items}{GEMLI_items is a list of GEMLI inputs and outputs. To run 'predict_lineages' it should contain a gene expression matrix named 'gene_expression'. This is a quality controlled and normalized gene expression matrix where rownames are genes (features) and column names are cell IDs (samples).
16 |   }
17 |   \item{
18 |   repetitions}{'repetitions' specifies how many times the input matrix will be clustered using random subsamples of potential markers. A higher number of iterations leads to more robust results. 10 iterations is considered to be the minimum, but 100 iterations are strongly recommended (default value). Runtime is linear with regard to the number of iterations.
19 |   }
20 |   \item{
21 |   sample_size}{'sample_size' is a value between 0 and 1 and specifies the fraction of potential markers that are used in each clustering. values between 0.5 and 0.67 (default value) are recommended.
22 |   }
23 |   \item{
24 |   desired_cluster_size}{'desired_cluster_size' specifies the number of cells per cluster to be achieved in each clustering. The input is a list of values, e.g. c(2,3,4) or (2:4). The desired_cluster_size parameter should generally be small. Values between 2 and 4 are recommended (default).
25 |   }
26 |   \item{
27 |   fast}{'fast' =TRUE uses the HiClimR package for calculating correlations. This will make predictions faster but reduce precision. The default value is FALSE.
28 |   }
29 | }
30 | \details{
31 | %%  ~~ If necessary, more details than the description above ~~
32 | }
33 | \value{
34 | 'predict_lineages' yields a cell by cell matrix containing confidence scores. Cell pairs with high confidence scores are likely to be members of the same lineage. This matrix is added to the 'GEMLI_items' under the name 'prediction'.
35 | }
36 | \references{
37 | %% ~put references to the literature/web site here ~
38 | }
39 | \author{
40 | Marcel Tarbier and Almut Eisele
41 | }
42 | \note{
43 | %%  ~~further notes~~
44 | }
45 | 
46 | %% ~Make other sections like Warning with \section{Warning }{....} ~
47 | 
48 | \seealso{
49 | %% ~~objects to See Also as \code{\link{help}}, ~~~
50 | }
51 | \examples{
52 | ##---- Should be DIRECTLY executable !! ----
53 | ##-- ==>  Define data, use random,
54 | ##--	or do  help(data=index)  for the standard data sets.
55 | 
56 | ## The function is currently defined as
57 | function (x)
58 | {
59 |   }
60 | }
61 | % Add one or more standard keywords, see file 'KEYWORDS' in the
62 | % R documentation directory (show via RShowDoc("KEYWORDS")):
63 | % \keyword{ ~kwd1 }
64 | % \keyword{ ~kwd2 }
65 | % Use only one keyword per line.
66 | % For non-standard keywords, use \concept instead of \keyword:
67 | % \concept{ ~cpt1 }
68 | % \concept{ ~cpt2 }
69 | % Use only one concept per line.
70 | 


--------------------------------------------------------------------------------
/GEMLI_package_v0/man/predict_lineages_multiple_sizes.Rd:
--------------------------------------------------------------------------------
 1 | \name{predict_lineages_multiple_sizes}
 2 | \alias{predict_lineages_multiple_sizes}
 3 | %- Also NEED an '\alias' for EACH other topic documented here.
 4 | \title{
 5 | predict_lineages_multiple_sizes
 6 | }
 7 | \description{
 8 | This function predicts cell lineages from single-cell RNA-sequencing data with sizes ranging from a minimal to a maximal value. The prediction of lineages is performed as for the ‘predict_lineages’ function not only for a single desired lineage size, but independently for all cluster sizes in between a minimal and maximal value. For each prediction, the predicted lineage information is generated as for the ‘prediction_to_lineage_information function’. The function generates the input for the function ‘cluster_stability_plot’, which allows to estimate the size of multicellular structures present in the single-cell RNA-sequencing data.
 9 |   }
10 | \usage{
11 | predict_lineages_multiple_sizes(GEMLI_items, repetitions=10, sample_size=(2/3), minimal_maximal_cluster_size=c(2,50), fast=FALSE, cutoff=5)
12 | }
13 | \arguments{
14 |   \item{
15 |   GEMLI_items}{GEMLI_items is a list of GEMLI inputs and outputs. To run 'predict_lineages' it should contain a gene expression matrix named 'gene_expression'. This is a quality controlled and normalized gene expression matrix where rownames are genes (features) and column names are cell IDs (samples).
16 |   }
17 |   \item{
18 |   repetitions}{'repetitions' specifies how many times the input matrix will be clustered using random subsamples of potential markers. A higher number of iterations leads to more robust results. 10 iterations is considered to be the minimum, but 100 iterations are strongly recommended (default value). Runtime is linear with regard to the number of iterations.
19 |   }
20 |   \item{
21 |   sample_size}{'sample_size' is a value between 0 and 1 and specifies the fraction of potential markers that are used in each clustering. values between 0.5 and 0.67 (default value) are recommended.
22 |   }
23 |   \item{
24 |   minimal_maximal_cluster_size}{'minimal_maximal_cluster_size' gives the minimal and maximal number of cells per cluster for which independent lineage predictions are run. The input is a vector of two values (minimal, maximal), e.g. c(2,50). The maximal value chosen should correspond to a value close or above the maximal expected size of multicellular structures present in the single-cell RNA-sequencing data. The default value is c(2,50).
25 |   }
26 |   \item{
27 |   fast}{'fast' =TRUE uses the HiClimR package for calculating correlations. This makes predictions faster but reduces precision. The default value is FALSE.
28 |   }
29 |   \item{
30 |   cutoff}{'cutoff' specifies the confidence score at which a cell pair is considered to be part of the same lineage. High values (e.g. 70-100) provide high precision but lower sensitivity. Low values (e.g. 30-60) provide higher sensitivity but lower precision.
31 |   }
32 | }
33 | \details{
34 | %%  ~~ If necessary, more details than the description above ~~
35 | }
36 | \value{
37 |  'predict_lineages_multiple_sizes' yields a cell by lineage size matrix containing lineage IDs. This matrix is added to the 'GEMLI_items' under the name 'prediction_multiple_sizes'. The function generates the input for the function 'cluster_stability_plot', which allows to estimate the size of multicellular structures present in the single-cell RNA-sequencing data
38 | }
39 | \references{
40 | %% ~put references to the literature/web site here ~
41 | }
42 | \author{
43 | Almut Eisele and Marcel Tarbier
44 | }
45 | \note{
46 | %%  ~~further notes~~
47 | }
48 | 
49 | %% ~Make other sections like Warning with \section{Warning }{....} ~
50 | 
51 | \seealso{
52 | %% ~~objects to See Also as \code{\link{help}}, ~~~
53 | }
54 | \examples{
55 | ##---- Should be DIRECTLY executable !! ----
56 | ##-- ==>  Define data, use random,
57 | ##--	or do  help(data=index)  for the standard data sets.
58 | 
59 | ## The function is currently defined as
60 | function (x)
61 | {
62 |   }
63 | }
64 | % Add one or more standard keywords, see file 'KEYWORDS' in the
65 | % R documentation directory (show via RShowDoc("KEYWORDS")):
66 | % \keyword{ ~kwd1 }
67 | % \keyword{ ~kwd2 }
68 | % Use only one keyword per line.
69 | % For non-standard keywords, use \concept instead of \keyword:
70 | % \concept{ ~cpt1 }
71 | % \concept{ ~cpt2 }
72 | % Use only one concept per line.
73 | 


--------------------------------------------------------------------------------
/GEMLI_package_v0/man/predict_lineages_with_known_markers.Rd:
--------------------------------------------------------------------------------
 1 | \name{predict_lineages_with_known_markers}
 2 | \alias{predict_lineages_with_known_markers}
 3 | %- Also NEED an '\alias' for EACH other topic documented here.
 4 | \title{
 5 | predict_lineages_with_known_markers
 6 | }
 7 | \description{
 8 | This function predicts cell lineages from from single-cell RNA-sequencing data when lineage markers are already knwon, e.g. from a barcoding experiment. Subsets of these genes are used to cluster cells until the desired cluster size is reached. This clustering is repeated many times for random subsets. The result is a cell by cell matrix that lists how many times each cell pair clustered together which translates into a confidence score. Cell pairs with high confidence scores are likely to be members of the same lineage.
 9 | }
10 | \usage{
11 | predict_lineages_with_known_markers(GEMLI_items, repetitions=100, sample_size=(2/3), desired_cluster_size=c(2,3), fast=FALSE)
12 | }
13 | \arguments{
14 |   \item{
15 |   GEMLI_items}{GEMLI_items is a list of GEMLI inputs and outputs. To run 'predict_lineages' it should contain a gene expression matrix named 'gene_expression'. This is a quality controlled and normalized gene expression matrix where rownames are genes (features) and column names are cell IDs (samples). In addition it needs to contain a vector of known marker genes names named 'known_markers'.
16 |   }
17 |   \item{
18 |   repetitions}{'repetitions' specifies how many times the input matrix will be clustered using random subsamples of potential markers. A higher number of iterations leads to more robust results. 10 iterations is considered to be the minimum, but 100 iterations are strongly recommended (default value). Runtime is linear with regard to the number of iterations.
19 |   }
20 |   \item{
21 |   sample_size}{'sample_size' is a value between 0 and 1 and specifies the fraction of potential markers that are used in each clustering. values between 0.5 and 0.67 (default value) are recommended.
22 |   }
23 |   \item{
24 |   desired_cluster_size}{'desired_cluster_size' specifies the number of cells per cluster to be achieved in each clustering. The input is a lsit of values, e.g. c(2,3,4) or (2:4). The desired_cluster_size parameter should generally be small. Values between 2 and 4 are recommended (default).
25 |   }
26 |   \item{
27 |   fast}{'fast' =TRUE uses the HiClimR package for calculating correlations. This makes predictions faster but reduces precision. The default value is FALSE.
28 |   }
29 | }
30 | \details{
31 | %%  ~~ If necessary, more details than the description above ~~
32 | }
33 | \value{
34 | 'predict_lineages' yields a cell by cell matrix containing confidence scores. Cell pairs with high confidence scores are likely to be members of the same lineage. This matrix is added to the 'GEMLI_items' under the name 'prediction'.
35 | }
36 | \references{
37 | %% ~put references to the literature/web site here ~
38 | }
39 | \author{
40 | Marcel Tarbier and Almut Eisele
41 | }
42 | \note{
43 | %%  ~~further notes~~
44 | }
45 | 
46 | %% ~Make other sections like Warning with \section{Warning }{....} ~
47 | 
48 | \seealso{
49 | %% ~~objects to See Also as \code{\link{help}}, ~~~
50 | }
51 | \examples{
52 | ##---- Should be DIRECTLY executable !! ----
53 | ##-- ==>  Define data, use random,
54 | ##--	or do  help(data=index)  for the standard data sets.
55 | 
56 | ## The function is currently defined as
57 | function (x)
58 | {
59 |   }
60 | }
61 | % Add one or more standard keywords, see file 'KEYWORDS' in the
62 | % R documentation directory (show via RShowDoc("KEYWORDS")):
63 | % \keyword{ ~kwd1 }
64 | % \keyword{ ~kwd2 }
65 | % Use only one keyword per line.
66 | % For non-standard keywords, use \concept instead of \keyword:
67 | % \concept{ ~cpt1 }
68 | % \concept{ ~cpt2 }
69 | % Use only one concept per line.
70 | 


--------------------------------------------------------------------------------
/GEMLI_package_v0/man/prediction_to_lineage_information.Rd:
--------------------------------------------------------------------------------
 1 | \name{prediction_to_lineage_information}
 2 | \alias{prediction_to_lineage_information}
 3 | %- Also NEED an '\alias' for EACH other topic documented here.
 4 | \title{
 5 | prediction_to_lineage_information
 6 | }
 7 | \description{
 8 | This transforms a cell by cell matrix of confidence scores as created by the 'predict_lineages' function into a table (or vector, if specified) of predicted lineages.
 9 | }
10 | \usage{
11 | prediction_to_lineage_information(GEMLI_items, cutoff=50)
12 | }
13 | \arguments{
14 |   \item{
15 |   GEMLI_items}{'GEMLI_items' is a list of GEMLI inputs and outputs. To run 'test_lineages' it should contain a prediction matrix named 'prediction' that is generated and added to the items list by the function 'predict lineages'.
16 |   }
17 |   \item{
18 |   cutoff}{'cutoff' specifies the confidence score at which a cell pair is considered to be part of the same lineage. High values (e.g. 70-100) provide high precision but lower sensitivity. Low values (e.g. 30-60) provide higher sensitivity but lower precision. The default value is 50.
19 |   }
20 | }
21 | \details{
22 | %%  ~~ If necessary, more details than the description above ~~
23 | }
24 | \value{
25 | The output is a matrix that lists cell IDs in the first column and the predicted lineage in the second column. This is added to the 'GEMLI_items' under the name 'predicted_lineage_table'. It also generates and adds the result as a vector that contains the predicted lineage as values and the cell IDs as names under the name 'predicted_lineages'.
26 | }
27 | \references{
28 | %% ~put references to the literature/web site here ~
29 | }
30 | \author{
31 | Marcel Tarbier and Almut Eisele
32 | }
33 | \note{
34 | %%  ~~further notes~~
35 | }
36 | 
37 | %% ~Make other sections like Warning with \section{Warning }{....} ~
38 | 
39 | \seealso{
40 | %% ~~objects to See Also as \code{\link{help}}, ~~~
41 | }
42 | \examples{
43 | ##---- Should be DIRECTLY executable !! ----
44 | ##-- ==>  Define data, use random,
45 | ##--	or do  help(data=index)  for the standard data sets.
46 | 
47 | ## The function is currently defined as
48 | function (x)
49 | {
50 |   }
51 | }
52 | % Add one or more standard keywords, see file 'KEYWORDS' in the
53 | % R documentation directory (show via RShowDoc("KEYWORDS")):
54 | % \keyword{ ~kwd1 }
55 | % \keyword{ ~kwd2 }
56 | % Use only one keyword per line.
57 | % For non-standard keywords, use \concept instead of \keyword:
58 | % \concept{ ~cpt1 }
59 | % \concept{ ~cpt2 }
60 | % Use only one concept per line.
61 | 


--------------------------------------------------------------------------------
/GEMLI_package_v0/man/quantify_clusters_iterative.Rd:
--------------------------------------------------------------------------------
 1 | \name{quantify_clusters_iterative}
 2 | \alias{quantify_clusters_iterative}
 3 | %- Also NEED an '\alias' for EACH other topic documented here.
 4 | \title{
 5 | quantify_clusters_iterative
 6 | }
 7 | \description{
 8 | This function clusters the input matrix repeatedly until a desired cluster size is reached.
 9 | }
10 | \usage{
11 | quantify_clusters_iterative(data_matrix, marker_genes, N=2, fast=FALSE)
12 | }
13 | \arguments{
14 |   \item{
15 |   data_matrix}{'data_matrix' is a quality controlled and normalized gene expression matrix where rownames are genes (features) and column names are cell IDs (samples).
16 |   }
17 |   \item{
18 |   marker_genes}{'marker_genes' is a vector of gene names or IDs of potential lineage marker genes. It is automatically created in the 'predict_lineages' function or can be computed manually using the 'potential_markers' function.
19 |   }
20 |   \item{
21 |   N}{'N' describes in how many branches the data matrix is split in each clustering step. Higher numbers speed up the clustering but can negatively impact the result of the prediction. It is highly recommended to keep the default value, 2.
22 |   }
23 |   \item{
24 |   fast}{'fast' =TRUE uses the HiClimR package for calculating correlations. This makes the function faster but less precise. The default is FALSE.
25 |   }
26 | }
27 | \details{
28 | %%  ~~ If necessary, more details than the description above ~~
29 | }
30 | \value{
31 | The output is a matrix that indicates which cells (rows) cluster together in each iteration (colums). It is used in the 'predict_lineages' function.
32 | }
33 | \references{
34 | %% ~put references to the literature/web site here ~
35 | }
36 | \author{
37 | Marcel Tarbier and Almut Eisele
38 | }
39 | \note{
40 | %%  ~~further notes~~
41 | }
42 | 
43 | %% ~Make other sections like Warning with \section{Warning }{....} ~
44 | 
45 | \seealso{
46 | %% ~~objects to See Also as \code{\link{help}}, ~~~
47 | }
48 | \examples{
49 | ##---- Should be DIRECTLY executable !! ----
50 | ##-- ==>  Define data, use random,
51 | ##--	or do  help(data=index)  for the standard data sets.
52 | 
53 | ## The function is currently defined as
54 | function (x)
55 | {
56 |   }
57 | }
58 | % Add one or more standard keywords, see file 'KEYWORDS' in the
59 | % R documentation directory (show via RShowDoc("KEYWORDS")):
60 | % \keyword{ ~kwd1 }
61 | % \keyword{ ~kwd2 }
62 | % Use only one keyword per line.
63 | % For non-standard keywords, use \concept instead of \keyword:
64 | % \concept{ ~cpt1 }
65 | % \concept{ ~cpt2 }
66 | % Use only one concept per line.
67 | 


--------------------------------------------------------------------------------
/GEMLI_package_v0/man/suggest_network_trimming_to_size.Rd:
--------------------------------------------------------------------------------
 1 | \name{suggest_network_trimming_to_size}
 2 | \alias{suggest_network_trimming_to_size}
 3 | %- Also NEED an '\alias' for EACH other topic documented here.
 4 | \title{
 5 | suggest_network_trimming_to_size
 6 | }
 7 | \description{
 8 | This function visualizes the lineage prediction as a network and highlights edges that could be removed based on a maximum lineage size cutoff. In lineages that exceed this size, the weakest links (least likely shared lineages) are suggested to be trimmed until the desired maximum size is reached. Trimming suggestions are highlighted in red.
 9 | }
10 | \usage{
11 | suggest_network_trimming_to_size(GEMLI_items) # max_size=4, cutoff=70, max_edge_width=5, display_orphan=F, include_labels=T, ground_truth=F)
12 | }
13 | \arguments{
14 |   \item{
15 |   GEMLI_items}{'GEMLI_items' is a list of GEMLI inputs and outputs. To run 'suggest_network_trimming_to_size' it should contain a prediction matrix named 'prediction' that is generated and added to the items list by the function 'predict lineages'. It also needs to contain a vector of lineages (values are the lineages and names are the cell IDs) from predictions named 'predicted_lineages'. This vector can be added to the 'GEMLI_items' using the 'prediction_to_lineage_information' function.
16 |   }
17 |   \item{
18 |   max_size}{'max_size' specifies maximum size of lineages.
19 |   }
20 |   \item{
21 |   cutoff}{'cutoff' specifies the confidence score at which a cell pair is considered to be part of the same lineage. High values (e.g. 70-100) provide high precision but lower sensitivity. Low values (e.g. 30-60) provide higher sensitivity but lower precision. Default value is 70.
22 |   }
23 |   \item{
24 |   max_edge_with}{'max_edge_width' specifies the maximum width of edges in the network visualization. All edge weights above the defined 'cutoff' will be scaled between 0.1*'max_edge_with' and 'max_edge_with'. Default value is 5.
25 |   }
26 |   \item{
27 |   display_orphan}{'display_orphan' defines whether cells without connections should be displayed. This commonly leads the network plot getting less readable. Therefore the suggestion and the default are 'false'/'F'.
28 |   }
29 |   \item{
30 |   include_labels}{'include_labels' defines whether nodes should be numbered. Cell IDs are not shown for readability.
31 |   }
32 |   \item{
33 |   ground_truth}{If the 'GEMLI_items' list contains a 'barcodes' vector with orthogonal lineage information 'ground_truth' can be set 'true'/'T' to color cells according to their lineage. Default is 'false'/'F'.
34 |   }
35 |    \item{
36 |   layout_style}{Depending on the number of cells, and the size of lineages, different layout styles can improve readability. In the suggest_network_trimming_to_size two layout algorithms can be chosen: Fruchterman-Reingold ("fr") and Kamada-Kawai ("kk"). Default is "fr".
37 |   }
38 | }
39 | \details{
40 | %%  ~~ If necessary, more details than the description above ~~
41 | }
42 | \value{
43 | This function has no output. It is for visualization only.
44 | }
45 | \references{
46 | %% ~put references to the literature/web site here ~
47 | }
48 | \author{
49 | Marcel Tarbier and Almut Eisele
50 | }
51 | \note{
52 | %%  ~~further notes~~
53 | }
54 | 
55 | %% ~Make other sections like Warning with \section{Warning }{....} ~
56 | 
57 | \seealso{
58 | %% ~~objects to See Also as \code{\link{help}}, ~~~
59 | }
60 | \examples{
61 | ##---- Should be DIRECTLY executable !! ----
62 | ##-- ==>  Define data, use random,
63 | ##--	or do  help(data=index)  for the standard data sets.
64 | 
65 | ## The function is currently defined as
66 | function (x)
67 | {
68 |   }
69 | }
70 | % Add one or more standard keywords, see file 'KEYWORDS' in the
71 | % R documentation directory (show via RShowDoc("KEYWORDS")):
72 | % \keyword{ ~kwd1 }
73 | % \keyword{ ~kwd2 }
74 | % Use only one keyword per line.
75 | % For non-standard keywords, use \concept instead of \keyword:
76 | % \concept{ ~cpt1 }
77 | % \concept{ ~cpt2 }
78 | % Use only one concept per line.
79 | 


--------------------------------------------------------------------------------
/GEMLI_package_v0/man/test_lineages.Rd:
--------------------------------------------------------------------------------
 1 | \name{test_lineages}
 2 | \alias{test_lineages}
 3 | %- Also NEED an '\alias' for EACH other topic documented here.
 4 | \title{
 5 | test_lineages
 6 | }
 7 | \description{
 8 | This function tests the results of lineage assignments by comparing it to lineage assignments from cell barcoding.
 9 | }
10 | \usage{
11 | test_lineages(GEMLI_items)
12 | }
13 | lineage_predictions_matrix, lineage_dict_bc, valid_fam_sizes=(1:5), max_interval=100, plot_results=F)
14 | \arguments{
15 |   \item{
16 |   GEMLI_items}{'GEMLI_items' is a list of GEMLI inputs and outputs. To run 'test_lineages' it should contain a prediction matrix named 'prediction' that is generated and added to the items list by the function 'predict lineages'. It also needs to contain a ground truth names 'barcodes' provided as a named vector (values are the lineages and names are the cell IDs).
17 |   }
18 |   \item{
19 |   valid_fam_sizes}{'valid_fam_sizes' specifies a resonable range for lineage sizes.
20 |   }
21 |   \item{
22 |   max_interval}{'max_interval' is the number of repetitions used in the 'predict_lineages' function. The default value is 100.
23 |   }
24 |   \item{
25 |   plot_results}{'plot_results' specifies whether the results of the test are visualized (plotted).
26 |   }
27 | }
28 | \details{
29 | %%  ~~ If necessary, more details than the description above ~~
30 | }
31 | \value{
32 | The output is a table that lists the number of false positives (FP) and true positives (TP), the precision (TP/PP, where PP is the number of predicted positives which is the sum of TP and FP), and the sensitivity (TP/P, where P is the number of real positives which is the sum of TP and FN - false negatives).
33 | }
34 | \references{
35 | %% ~put references to the literature/web site here ~
36 | }
37 | \author{
38 | Marcel Tarbier and Almut Eisele
39 | }
40 | \note{
41 | %%  ~~further notes~~
42 | }
43 | 
44 | %% ~Make other sections like Warning with \section{Warning }{....} ~
45 | 
46 | \seealso{
47 | %% ~~objects to See Also as \code{\link{help}}, ~~~
48 | }
49 | \examples{
50 | ##---- Should be DIRECTLY executable !! ----
51 | ##-- ==>  Define data, use random,
52 | ##--	or do  help(data=index)  for the standard data sets.
53 | 
54 | ## The function is currently defined as
55 | function (x)
56 | {
57 |   }
58 | }
59 | % Add one or more standard keywords, see file 'KEYWORDS' in the
60 | % R documentation directory (show via RShowDoc("KEYWORDS")):
61 | % \keyword{ ~kwd1 }
62 | % \keyword{ ~kwd2 }
63 | % Use only one keyword per line.
64 | % For non-standard keywords, use \concept instead of \keyword:
65 | % \concept{ ~cpt1 }
66 | % \concept{ ~cpt2 }
67 | % Use only one concept per line.
68 | 


--------------------------------------------------------------------------------
/GEMLI_package_v0/man/trim_network_to_size.Rd:
--------------------------------------------------------------------------------
 1 | \name{trim_network_to_size}
 2 | \alias{trim_network_to_size}
 3 | %- Also NEED an '\alias' for EACH other topic documented here.
 4 | \title{
 5 | trim_network_to_size
 6 | }
 7 | \description{
 8 | This function trims predicted lineages that exceed a maximum lineage size. In lineages that exceed this size, the weakest links (least likely shared lineages) are trimmed until the desired maximum size is reached.
 9 | }
10 | \usage{
11 | trim_network_to_size(GEMLI_items)
12 | }
13 | \arguments{
14 |   \item{
15 |   GEMLI_items}{'GEMLI_items' is a list of GEMLI inputs and outputs. To run 'suggest_network_trimming_to_size' it should contain a prediction matrix named 'prediction' that is generated and added to the items list by the function 'predict lineages'. It also needs to contain a vector of lineages (values are the lineages and names are the cell IDs) from predictions named 'predicted_lineages'. This vector can be added to the 'GEMLI_items' using the 'prediction_to_lineage_information' function.
16 |   }
17 |   \item{
18 |   max_size}{'max_size' specifies maximum size of lineages.
19 |   }
20 |   \item{
21 |   cutoff}{'cutoff' specifies the confidence score at which a cell pair is considered to be part of the same lineage. High values (e.g. 70-100) provide high precision but lower sensitivity. Low values (e.g. 30-60) provide higher sensitivity but lower precision. Default value is 70.
22 |   }
23 | }
24 | \details{
25 | %%  ~~ If necessary, more details than the description above ~~
26 | }
27 | \value{
28 | This function return a 'GEMLI_items' list in which the prediction matrix ('prediction'), the prediction table ('predicted_lineage_table') and the prediction vector ('predicted_lineages') have been trimmed to the specified size.
29 | }
30 | \references{
31 | %% ~put references to the literature/web site here ~
32 | }
33 | \author{
34 | Marcel Tarbier and Almut Eisele
35 | }
36 | \note{
37 | %%  ~~further notes~~
38 | }
39 | 
40 | %% ~Make other sections like Warning with \section{Warning }{....} ~
41 | 
42 | \seealso{
43 | %% ~~objects to See Also as \code{\link{help}}, ~~~
44 | }
45 | \examples{
46 | ##---- Should be DIRECTLY executable !! ----
47 | ##-- ==>  Define data, use random,
48 | ##--	or do  help(data=index)  for the standard data sets.
49 | 
50 | ## The function is currently defined as
51 | function (x)
52 | {
53 |   }
54 | }
55 | % Add one or more standard keywords, see file 'KEYWORDS' in the
56 | % R documentation directory (show via RShowDoc("KEYWORDS")):
57 | % \keyword{ ~kwd1 }
58 | % \keyword{ ~kwd2 }
59 | % Use only one keyword per line.
60 | % For non-standard keywords, use \concept instead of \keyword:
61 | % \concept{ ~cpt1 }
62 | % \concept{ ~cpt2 }
63 | % Use only one concept per line.
64 | 


--------------------------------------------------------------------------------
/GEMLI_package_v0/man/visualize_as_network.Rd:
--------------------------------------------------------------------------------
 1 | \name{visualize_as_network}
 2 | \alias{visualize_as_network}
 3 | %- Also NEED an '\alias' for EACH other topic documented here.
 4 | \title{
 5 | visualize_as_network
 6 | }
 7 | \description{
 8 | This function visualizes the lineage prediction as a network.
 9 | }
10 | \usage{
11 | visualize_as_network(GEMLI_items, cutoff=70, display_orphan=F, max_edge_width=5, ground_truth=F, include_labels=F, highlight_FPs=F, layout_style="fr", cell_type_colors=F)
12 | }
13 | \arguments{
14 |   \item{
15 |   GEMLI_items}{GEMLI_items is a list of GEMLI inputs and outputs. To run 'visualize_as_network' it should contain a prediction matrix named 'prediction' that is generated and added to the items list by the function 'predict lineages'.
16 |   }
17 |   \item{
18 |   cutoff}{'cutoff' specifies the confidence score at which a cell pair is considered to be part of the same lineage. High values (e.g. 70-100) provide high precision but lower sensitivity. Low values (e.g. 30-60) provide higher sensitivity but lower precision. Default value is 70.
19 |   }
20 |   \item{
21 |   max_edge_width}{'max_edge_width' specifies the maximum width of edges in the network visualization. All edge weights above the defined 'cutoff' will be scaled between 0.1*'max_edge_with' and 'max_edge_with'. Default value is 5.
22 |   }
23 |   \item{
24 |   display_orphan}{'display_orphan' defines whether cells without connections should be displayed. This commonly leads the network plot getting less readable. Therefore the suggestion and the default are 'false'/'F'.
25 |   }
26 |   \item{
27 |   include_labels}{'include_labels' defines whether nodes should be numbered. Cell IDs are not shown for readability.
28 |   }
29 |   \item{
30 |   ground_truth}{If the 'GEMLI_items' list contains a 'barcodes' vector with orthogonal lineage information 'ground_truth' can be set 'true'/'T' to color cells according to their lineage. Default is 'false'/'F'.
31 |   }
32 |   \item{
33 |   highlight_FPs}{If the 'GEMLI_items' list contains a 'barcodes' vector with orthogonal lineage information and 'ground_truth' is 'true'/'T' connections between cell that are false positives will be highlighted in red. It can be set tp 'false'/'F' to not highlight false predictions. Default is 'false'/'F'.
34 |   }
35 |   \item{
36 |   layout_style}{Depending on the number of cells, and the size of lineages, different layout styles can improve readability. Currently three different network layout algorithms canbe chosen: Fruchterman-Reingold ("fr"), Kamada-Kawai ("kk"), and grid ("grid"). Default is "fr".
37 |   }
38 |     \item{
39 |   cell_type_colors}{'cell_type_colors' can be set 'true'/'T' to color cells by assigned cell type. For such coloring the GEMLI items list must contain a cell_type element (dataframe with column 'cell.ID' and 'cell.type'). Specific colors will be assigned to specific cell types if a GEMLI items list elemnt 'cell_type_color' is added (dataframe with column 'cell-type' and 'color'). If no 'cell_type_color' element is present, random colors will be assigned to each cell type. Default is cell_type_colors = 'F'.
40 |   }
41 | }
42 | \details{
43 | %%  ~~ If necessary, more details than the description above ~~
44 | }
45 | \value{
46 | WIP
47 | }
48 | \references{
49 | %% ~put references to the literature/web site here ~
50 | }
51 | \author{
52 | Marcel Tarbier and Almut Eisele
53 | }
54 | \note{
55 | %%  ~~further notes~~
56 | }
57 | 
58 | %% ~Make other sections like Warning with \section{Warning }{....} ~
59 | 
60 | \seealso{
61 | %% ~~objects to See Also as \code{\link{help}}, ~~~
62 | }
63 | \examples{
64 | ##---- Should be DIRECTLY executable !! ----
65 | ##-- ==>  Define data, use random,
66 | ##--	or do  help(data=index)  for the standard data sets.
67 | 
68 | ## The function is currently defined as
69 | function (x)
70 | {
71 |   }
72 | }
73 | % Add one or more standard keywords, see file 'KEYWORDS' in the
74 | % R documentation directory (show via RShowDoc("KEYWORDS")):
75 | % \keyword{ ~kwd1 }
76 | % \keyword{ ~kwd2 }
77 | % Use only one keyword per line.
78 | % For non-standard keywords, use \concept instead of \keyword:
79 | % \concept{ ~cpt1 }
80 | % \concept{ ~cpt2 }
81 | % Use only one concept per line.
82 | 


--------------------------------------------------------------------------------
/GEMLI_package_v0/vignettes/Example1_simple_lineage_predictions.Rmd:
--------------------------------------------------------------------------------
  1 | ---
  2 | title: "Example1_simple_lineage_predictions"
  3 | output: html_vignette
  4 | date: "2023-06-02"
  5 | vignette: >
  6 |   %\VignetteIndexEntry{Example1_simple_lineage_predictions}
  7 |   %\VignetteEngine{knitr::rmarkdown}
  8 |   %\VignetteEncoding{UTF-8}
  9 | 
 10 | ---
 11 | 
 12 | ```{r, echo = FALSE, message=FALSE}
 13 | knitr::opts_chunk$set(message = FALSE, warning = FALSE)
 14 | ```
 15 | 
 16 | 
 17 | For our first example we'll be looking at mouse embryonic stem cells that have been barcoded and cultured for 48h. We'll be working with a subset of this data for fast processing. In our subset we find 'family sizes' ranging from just two up to five related cells.
 18 | 
 19 | ## Load package and example data
 20 | 
 21 | First we load the example data.
 22 | 
 23 | ```{r, eval=T, echo=T}
 24 | library(GEMLI)
 25 | library(igraph)
 26 | library(HiClimR)
 27 | 
 28 | load('GEMLI_example_data_matrix.RData')
 29 | load('GEMLI_example_barcode_information.RData')
 30 | 
 31 | ```
 32 | 
 33 | ## Create a GEMLI items list
 34 | 
 35 | GEMLI's inputs and outputs are stored in a list of objects with predefined names. To run GEMLI you need at least a quality controlled and normalized gene expression matrix (rows = genes/features, colums = cells/samples). In this example we also provide a ground truth for lineages stemming from a barcoding experiment (values = barcode ID, names = cell IDs).
 36 | 
 37 | ```{r, eval=T, echo=T}
 38 | # Making GEMLI list and storing data
 39 | GEMLI_items = list()
 40 | GEMLI_items[['gene_expression']] = data_matrix
 41 | GEMLI_items[['barcodes']] = lineage_dict_bc
 42 | 
 43 | # A brief look at the loaded data
 44 | GEMLI_items[['gene_expression']][9:14,1:5]
 45 | GEMLI_items[['barcodes']][1:5]
 46 | 
 47 | ```
 48 | 
 49 | ## Perform lineage predictions
 50 | 
 51 | We can then identify cell lineages through repeated iterative clustering (this may take 2-3min). The predict_lineages function takes our GEMLI_items as input. It outputs a matrix of all cells against all cells with values corresponding to a confidence score that they are part of the same lineage.
 52 | 
 53 | ```{r, eval=T, echo=T}
 54 | # Perform lineage predictions
 55 | GEMLI_items = predict_lineages(GEMLI_items)
 56 | 
 57 | # A brief look at the result
 58 | GEMLI_items[['prediction']][1:5,15:19]
 59 | ```
 60 | 
 61 | ## Test lineage prediction
 62 | 
 63 | Since we have barcoding data for this dataset we can test the predicted lineages against our ground truth. The test_lineage_prediction function again takes our GEMLI_items as input. It's important that a predcition has been run first with predict_lineages. It outputs the number of true positive predictions (TP), false positive predictions (FP), as well as precision and sensitivity for various confidence intervals. The output can be visualized by setting plot_results to true/T.
 64 | 
 65 | ```{r, eval=T, echo=T, fig.height=6, fig.width=6}
 66 | GEMLI_items = test_lineages(GEMLI_items)
 67 | 
 68 | # A brief look at the resulting table
 69 | GEMLI_items$testing_results
 70 | 
 71 | # And a run with plotting of the result
 72 | GEMLI_items = test_lineages(GEMLI_items, plot_results=T)
 73 | ```
 74 | 
 75 | ## Visualize predictions as network
 76 | 
 77 | We can also investigate our predictions by visualizing them as a network with the visualize_as_network function. Here we need to set a cutoff that defines which predictions we want to consider. It represents a confidence score and high values yield fewer predictions with high precision while low values yield more predictions with lower precision.
 78 | 
 79 | ```{r, eval=T, echo=T, fig.height=6, fig.width=6}
 80 | 
 81 | visualize_as_network(GEMLI_items, cutoff=90)
 82 | visualize_as_network(GEMLI_items, cutoff=50)
 83 | ```
 84 | 
 85 | 
 86 | If a ground truth e.g. from barcoding is avalable we can set ground_truth to true/T to highlight false predictions with red edges. Cells without barcode information will be displayed in white.
 87 | 
 88 | ```{r, eval=T, echo=T, fig.height=6, fig.width=6}
 89 | visualize_as_network(GEMLI_items, cutoff=90, ground_truth=T)
 90 | visualize_as_network(GEMLI_items, cutoff=50, ground_truth=T)
 91 | ```
 92 | 
 93 | ## Extract lineage information
 94 | 
 95 | Now we can extract the lineage information with the prediction_to_lineage_information function. Again we need to set a cutoff that defines which predictions we want to consider. The function outputs both a lineage table and a 'dictionary', a vector that has the lineage number as values and the cell IDs as names.
 96 | 
 97 | ```{r, eval=T, echo=T}
 98 | GEMLI_items = prediction_to_lineage_information(GEMLI_items, cutoff=50)
 99 | 
100 | # A brief look at the result
101 | GEMLI_items$predicted_lineage_table[1:5,]
102 | 
103 | ```
104 | 
105 | 


--------------------------------------------------------------------------------
/GEMLI_package_v0/vignettes/Example2_predicting_multicellular_structures.Rmd:
--------------------------------------------------------------------------------
 1 | 
 2 | ---
 3 | title: "Example2_predicting_multicellular_structures"
 4 | output: html_vignette
 5 | date: "2023-06-05"
 6 | vignette: >
 7 |   %\VignetteIndexEntry{Example2_predicting_multicellular_structures}
 8 |   %\VignetteEngine{knitr::rmarkdown}
 9 |   %\VignetteEncoding{UTF-8}
10 | 
11 | ---
12 | 
13 | ```{r, echo = FALSE, message=FALSE}
14 | knitr::opts_chunk$set(message = FALSE, warning = FALSE)
15 | ```
16 | 
17 | In this example we'll be visualizing GEMLI results from a lineage-annotated dataset of murine intestinal crypts. Intestinal crypts originate from one or few intestinal stem cells, and are composed of a number of different cell types. In the data we find cells from individual crypts that range from just three up to forty-four related cells. The scRNA-seq dataset is derived from Bues et al. 2022 (PMID 35165449) and is publically available under Gene Expression Omnibus accession number GSE148093.
18 | 
19 | 
20 | ## Load package and example data
21 | 
22 | First we load the example data. Here we already predicted the lineages using GEMLI and therefore do not include a count matrix, but rather start with the predictions right away.
23 | 
24 | ```{r, eval=T, echo=T}
25 | library(GEMLI)
26 | library(igraph)
27 | library(HiClimR)
28 | 
29 | load('GEMLI_crypts_example_data_matrix.RData')
30 | load('GEMLI_crypts_example_barcode_information.RData')
31 | 
32 | ```
33 | 
34 | ## Create a GEMLI items list
35 | 
36 | We then create a GEMLI items list. This list is used to store the data, and create and store the outputs of GEMLI (for details check example one).
37 | 
38 | ```{r, eval=T, echo=T}
39 | 
40 | GEMLI_items_crypts = list()
41 | GEMLI_items_crypts[['prediction']] = Crypts
42 | GEMLI_items_crypts[['barcodes']] = Crypts_bc_dict
43 | 
44 | ```
45 | 
46 | ## Visualize predictions as network
47 | 
48 | To visualize large lineages we'll use three different network layout algorithms: Fruchterman-Reingold, Kamada-Kawai, and grid. Each of them has advantages and disadvantages.
49 | 
50 | (1) Fruchterman-Reingold dispalys the cells of individual predicted crypts close together with ample space between crypts. This makes it hard to see connections within individual crypts but allows to get a good overview of individual structures.
51 | 
52 | (2) Kamada-Kawai spaces individual cells well, so we can see individual connections between them. It may, however, happen that two different predicted crypts are partially overlayed, as can be seen for dark red and bright green lineages on the right side of the plot.
53 | 
54 | (3) When the network is layed out as a grid, one gets generally a good overview of the predicted lineages and their connections, but it's hard to see which connections belongs to which cell in the same row.
55 | 
56 | ```{r, eval=T, echo=T, fig.height=6, fig.width=6}
57 | 
58 | visualize_as_network(GEMLI_items_crypts, cutoff=70, display_orphan=F, max_edge_width=1, ground_truth=T, include_labels=F, layout_style="fr")
59 | visualize_as_network(GEMLI_items_crypts, cutoff=70, display_orphan=F, max_edge_width=1, ground_truth=T, include_labels=F, layout_style="kk")
60 | visualize_as_network(GEMLI_items_crypts, cutoff=70, display_orphan=F, max_edge_width=1, ground_truth=T, include_labels=F, layout_style="grid")
61 | 
62 | ```
63 | 
64 | ## Adding cell type information to the GEMLI items list
65 | 
66 | The cells of individual intestinal crypts can be assigned to different cell types. This information can be added to the GEMLI items list as 'cell_type' slot in the form of a dataframe with column 'cell.ID' and 'cell.type'.
67 | 
68 | ```{r, eval=T, echo=T}
69 | 
70 | load('GEMLI_crypts_example_cell_type_annotation.RData')
71 | GEMLI_items_crypts[['cell_type']] = Crypts_annotation
72 | 
73 | ```
74 | 
75 | ## Color prediction network visualization by cell type
76 | 
77 | The visualization of the lineage predictions can now be colored by the cell type annotation. This allows to see the composition of individual intestinal crypts.
78 | 
79 | ```{r, eval=T, echo=T, fig.height=6, fig.width=6}
80 | 
81 | visualize_as_network(GEMLI_items_crypts, cutoff=70, max_edge_width=5, display_orphan=F, include_labels=F, ground_truth=T, highlight_FPs=T, layout_style="kk", cell_type_colors=T)
82 | 
83 | ```
84 | 
85 | Specific colors can be assigned to specific cell types by adding a dataframe with column 'cell.type' and 'color' in the GEMLI items list slot 'cell_type_color'. This can also allow to highlight just one or two selected cell types.
86 | 
87 | ```{r, eval=T, echo=T, fig.height=6, fig.width=6}
88 | 
89 | # Adding custom color as cell_type_color element to GEMLI_items
90 | cell.type <- unique(GEMLI_items_crypts[['cell_type']]$cell.type)
91 | color <- c("#5386BD", "skyblue1", "darkgreen", "gold", "red", "darkred", "black")
92 | Cell_type_color <- data.frame(cell.type, color)
93 | GEMLI_items_crypts[['cell_type_color']] = Cell_type_color
94 | 
95 | # Make a visualization network
96 | visualize_as_network(GEMLI_items_crypts, cutoff=70, max_edge_width=5, display_orphan=F, include_labels=F, ground_truth=T, highlight_FPs=T, layout_style="kk", cell_type_colors=T)
97 | 
98 | ```
99 | 


--------------------------------------------------------------------------------
/GEMLI_package_v0/vignettes/Example3_cell_fate_analysis.Rmd:
--------------------------------------------------------------------------------
 1 | 
 2 | ---
 3 | title: "Example3_cell_fate_analysis"
 4 | output: html_document
 5 | date: "2023-06-05"
 6 | vignette: >
 7 |   %\VignetteIndexEntry{Example3_cell_fate_analysis}
 8 |   %\VignetteEngine{knitr::rmarkdown}
 9 |   %\VignetteEncoding{UTF-8}
10 | ---
11 | 
12 | ```{r, echo = FALSE, message=FALSE}
13 | knitr::opts_chunk$set(message = FALSE, warning = FALSE)
14 | ```
15 | 
16 | For our third example we'll be looking at cell fate decisions in a scRNA-seq dataset of human breast cancer encompassing both ductal carcinoma in situ (DCIS) and invasive tumor (inv_tumor) cells. We'll be working with a subset of this data for fast processing. No ground truth is available. We will study the fate transition from DCIS to invasive tumor cells. The data is derived from a public 10X Genomics dataset associated to the following preprint bioRxiv 2022.10.06.510405; doi: https://doi.org/10.1101/2022.10.06.510405 and downloaded from https://www.10xgenomics.com/products/xenium-in-situ/preview-dataset-human-breast (April 2023).
17 | 
18 | ## Load example data
19 | 
20 | First we load the example data. We already predicted the lineages and extracted the predicted lineages using GEMLI. Furthermore we load the previously generated cell type information for all cells in the dataset.
21 | 
22 | ```{r, eval=T, echo=T}
23 | library(GEMLI)
24 | library(igraph)
25 | library(HiClimR)
26 | library(dplyr)
27 | library(ggplot2)
28 | library(Seurat)
29 | library(ggrepel)
30 | 
31 | load('GEMLI_cancer_example_norm_count.RData')
32 | load('GEMLI_cancer_example_predicted_lineages.RData')
33 | load('GEMLI_cancer_example_cell_type_annotation.RData')
34 | 
35 | ```
36 | 
37 | 
38 | ## Create a GEMLI items list
39 | 
40 | We then create a GEMLI items list. This list is used to store the data, and create and store the outputs of GEMLI (for details check example one).
41 | 
42 | ```{r, eval=T, echo=T}
43 | 
44 | GEMLI_items = list()
45 | GEMLI_items[['gene_expression']] = Cancer_norm_count
46 | GEMLI_items[['predicted_lineage_table']] = Cancer_predicted_lineages
47 | GEMLI_items[['cell_type']] = Cancer_annotation
48 | 
49 | ```
50 | 
51 | 
52 | ## Extract symmetric and asymmetric cell lineages
53 | 
54 | We extract now predicted cell lineages with members in only one cell type (symmetric) or in two or more cell types (asymmetric). To analyze the transition from DCIS to invasive breast cancer we will extract symmetric DCIS, asymmetric DCIS and invasive tumor, and symmetric inv_tumor lineages. To exclude lineages with a too large asymmetry, we set a threshold to extract asymmetric lineages containing at least 10% of each cell type. The function output is stored in GEMLI_items 'cell_fate_analysis' item. It is a data frame with a column cell.fate with label sym or asym and cell type separated by an underscore. This cell.fate designation allows to subsequently analyze only a specific cell type in asymmetric cell lineages.
55 | 
56 | ```{r, eval=T, echo=T}
57 | 
58 | GEMLI_items<-extract_cell_fate_lineages(GEMLI_items, selection=c("inv_tumor", "DCIS"), unique=FALSE, threshold=c(10,10))
59 | 
60 | # A brief look at the result
61 | GEMLI_items[['cell_fate_analysis']][1:10,] 
62 | table(GEMLI_items[['cell_fate_analysis']]$cell.fate)
63 | 
64 | ```
65 | 
66 | ## Call and visualize DEG for cells in specific lineage types
67 | 
68 | Based on the symmetric and asymmetric lineages we extracted, we will now call differentially expressed genes (DEG) specific for cells of specific cell types in specific lineages types. To analyze the transition from DCIS to invasive breast cancer, we notable call DEG for DCIS cells in asymmetric and symmetric lineages. These are genes specific to DCIS cells at the start of the transition.
69 | 
70 | ```{r, eval=T, echo=T}
71 | 
72 | GEMLI_items<-cell_fate_DEG_calling(GEMLI_items, ident1="sym_DCIS", ident2="asym_DCIS", min.pct=0.05, logfc.threshold=0.1)
73 | 
74 | # A brief look at the result
75 | GEMLI_items[['DEG']][1:10,] 
76 | 
77 | # Volcano plot of the DEG analysis
78 | DEG_volcano_plot(GEMLI_items, name1="Sym_DCIS", name2="Asym_DCIS")
79 | 
80 | ```
81 | 


--------------------------------------------------------------------------------
/README.md:
--------------------------------------------------------------------------------
  1 | # GEMLI: Gene expression memory for lineage identification
  2 | 
  3 | GEMLI is an R package to predict cell lineages (cells with a common ancestor) from single cell RNA sequencing datasets and to call genes with a high gene expression memory on the predicted cell lineages. It is described in A.S. Eisele*, M. Tarbier*, A.A. Dormann, V. Pelechano, D.M. Suter | "Gene-expression memory-based prediction of cell lineages from scRNA-seq datasets" | Nature Communications 15, 2744 (2024). https://doi.org/10.1038/s41467-024-47158-y
  4 | 
  5 | The approach is based on findings of Phillips et al. 2019 (doi.org/10.1038/s41467-019-09189-8) where it was shown that some genes show varying gene expression across cell lineages that is stable over multiple cell generations.
  6 | 
  7 | ## Installation
  8 | Simply run `library(devtools)` and then `install_github("UPSUTER/GEMLI", subdir="GEMLI_package_v0")`. If `devtools` is not installed yet, you can do so with `install.packages("devtools")`. GEMLI is now installed and can be used via `library(GEMLI)`. Dependencies will be installed with the package.
  9 | 
 10 | 
 11 | ## Development and feedback
 12 | We are still working to make GEMLI more intuitive, user-friendly, faster and versatile. Therefore existing functions will still be updated and new functionalities will be added. We'll publish a list of changes for each version for you to keep track. Your feedback is very welcome and will help us make GEMLI even better. What do you like about GEMLI? Something not working? What functions are you missing? Let us know! Contact: marcel.tarbier@scilifelab.se or almut.eisele@epfl.ch 
 13 | 
 14 | ## Example 1: small lineages in mouse embryonic stem cells
 15 | 
 16 | For our first example we'll be looking at mouse embryonic stem cells that have been barcoded and cultured for 48h. We'll be working with a subset of this data for fast processing. In our subset we find 'family sizes' ranging from just two up to five related cells. 
 17 | 
 18 | ### Load example data
 19 | First we load the example data.
 20 | 
 21 | ```
 22 | > load('GEMLI_example_data_matrix.RData')
 23 | > load('GEMLI_example_barcode_information.RData')
 24 | ```
 25 | 
 26 | ### Create a GEMLI items list
 27 | GEMLI's inputs and outputs are stored in a list of objects with predefined names. To run GEMLI you need at least a quality controlled and normalized gene expression matrix (rows = genes/features, colums = cells/samples). In this example we also provide a ground truth for lineages stemming from a barcoding experiment (values = barcode ID, names = cell IDs).
 28 | 
 29 | ```
 30 | > GEMLI_items = list()
 31 | > GEMLI_items[['gene_expression']] = data_matrix
 32 | > GEMLI_items[['barcodes']] = lineage_dict_bc
 33 | >
 34 | > GEMLI_items[['gene_expression']][9:14,1:5]
 35 |                    AAACGAACAGGTGTGA-1 AAAGGTAGTTGCTTGA-1 AACCACAAGTTTGTCG-1 AAGCCATGTTCCACGG-1 AAGCGAGGTACGGCAA-1
 36 | ENSMUSG00000033845          14.761746         12.9570026          13.240645          8.8596794          12.791617
 37 | ENSMUSG00000025903           3.163231          1.2340002           4.878132          0.7383066           0.000000
 38 | ENSMUSG00000033813           6.326463          5.5530011           4.181256          8.8596794           5.482122
 39 | ENSMUSG00000002459           0.000000          0.0000000           0.000000          0.0000000           0.000000
 40 | ENSMUSG00000085623           0.000000          0.0000000           0.000000          0.0000000           0.000000
 41 | ENSMUSG00000033793           3.163231          0.6170001           1.393752          2.2149199           5.482122
 42 | >
 43 | > GEMLI_items[['barcodes']][1:5]
 44 | CACAGATAGTGATGGC-1 TATCTTGGTACGGGAT-1 AAACGAACAGGTGTGA-1 AGAGAATAGGTCATAA-1 GAGTGAGTCCAGTACA-1
 45 |                  2                  2                  2                  7                  7
 46 | ```
 47 | 
 48 | ### Perform lineage prediction
 49 | We can then identify cell lineages through repeated iterative clustering (this may take 2-3min). The `predict_lineages` function takes our GEMLI_items as input. It outputs a matrix of all cells against all cells with values corresponding to a confidence score that they are part of the same lineage. 
 50 | 
 51 | ```
 52 | > GEMLI_items = predict_lineages(GEMLI_items)
 53 | >
 54 | > GEMLI_items[['prediction']][1:5,15:19]
 55 |                    AGAGAATAGGTCATAA-1 AGAGCAGCAAGTGATA-1 AGATGCTTCAAAGACA-1 AGGATCTGTATCGTTG-1 AGGGAGTAGACGATAT-1
 56 | AAACGAACAGGTGTGA-1                  0                  0                  0                  0                 14
 57 | AAAGGTAGTTGCTTGA-1                 87                  0                  0                  0                  0
 58 | AACCACAAGTTTGTCG-1                  0                  0                  0                  0                 39
 59 | AAGCCATGTTCCACGG-1                  0                  0                  0                  0                  0
 60 | AAGCGAGGTACGGCAA-1                  0                  0                  0                  0                  0
 61 | ```
 62 | 
 63 | ### Test lineage prediction
 64 | Since we have barcoding data for this dataset we can test the predicted lineages against our ground truth.
 65 | The `test_lineage_prediction` function again takes our `GEMLI_items` as input. It's important that a predcition has been run first with `predict_lineages`. It outputs the number of true positive predictions (TP), false positive predictions (FP), as well as precision and sensitivity for various confidence intervals. The output can be visualized by setting `plot_results` to `true`/`T`.
 66 | 
 67 | ```
 68 | > GEMLI_items = test_lineages(GEMLI_items)
 69 | >
 70 | > GEMLI_items$testing_results
 71 |      TP    FP  precision sensitivity
 72 | 0   274 24062 0.01125904   1.0000000
 73 | 10  104   128 0.44827586   0.3795620
 74 | 20   90    82 0.52325581   0.3284672
 75 | 30   84    56 0.60000000   0.3065693
 76 | 40   80    36 0.68965517   0.2919708
 77 | 50   68    22 0.75555556   0.2481752
 78 | 60   64    14 0.82051282   0.2335766
 79 | 70   62    10 0.86111111   0.2262774
 80 | 80   58     2 0.96666667   0.2116788
 81 | 90   46     0 1.00000000   0.1678832
 82 | 100  34     0 1.00000000   0.1240876
 83 | >
 84 | > GEMLI_items = test_lineages(GEMLI_items, plot_results=T)
 85 | ```
 86 | 
 87 | <p align="center">
 88 |   <img width="500" height="500" src="https://github.com/UPSUTER/GEMLI/blob/main/Example/GEMIL_GitHub_testing.png">
 89 | </p>
 90 | 
 91 | ### Visualize predictions as network
 92 | We can also investigate our predictions by visualizing them as a network with the `visualize_as_network` function. Here we need to set a `cutoff` that defines which predictions we want to consider. It represents a confidence score and high values yield fewer predictions with high precision while low values yield more predictions with lower precision.
 93 | Network visualization requires the `igraph` library which can be loaded with `library(igraph)`.
 94 | 
 95 | ```
 96 | > visualize_as_network(GEMLI_items, cutoff=90) # top image
 97 | > visualize_as_network(GEMLI_items, cutoff=50) # lower image
 98 | ```
 99 | <p float="left">
100 |   <img width="430" height="330" src="https://github.com/UPSUTER/GEMLI/blob/main/Example/GEMLI_GitHub_network_90.png">
101 |   <img width="430" height="330" src="https://github.com/UPSUTER/GEMLI/blob/main/Example/GEMLI_GitHub_network_50.png">
102 | </p>
103 | 
104 | If a ground truth e.g. from barcoding is available we can set `ground_truth` to `true`/`T` and `highlight_FPs` to `true`/`T` to highlight false predictions with red edges. Cells without barcode information will be displayed in white.
105 | 
106 | ```
107 | > visualize_as_network(GEMLI_items, cutoff=90, ground_truth=T, highlight_FPs=T) # top image
108 | > visualize_as_network(GEMLI_items, cutoff=50, ground_truth=T, highlight_FPs=T) # lower image
109 | ```
110 | <p float="left">
111 |   <img width="430" height="330" src="https://github.com/UPSUTER/GEMLI/blob/main/Example/GEMLI_GitHub_network_90_GT.png">
112 |   <img width="430" height="330" src="https://github.com/UPSUTER/GEMLI/blob/main/Example/GEMLI_GitHub_network_50_GT.png">
113 | </p>
114 | 
115 | ### Extract lineage information
116 | Now we can extract the lineage information with the `prediction_to_lineage_information` function. Again we need to set a `cutoff` that defines which predictions we want to consider. The function outputs both a lineage table and a 'dictionary', a vector that has the lineage number as values and the cell IDs as names.
117 | 
118 | ```
119 | > GEMLI_items = prediction_to_lineage_information(GEMLI_items, cutoff=50)
120 | >
121 | > GEMLI_items$predicted_lineage_table[1:5,]
122 |      cell.ID              clone.ID
123 | [1,] "AAACGAACAGGTGTGA-1" "1"
124 | [2,] "AAAGGTAGTTGCTTGA-1" "2"
125 | [3,] "AACCACAAGTTTGTCG-1" "3"
126 | [4,] "AAGCCATGTTCCACGG-1" "4"
127 | [5,] "AAGCGAGGTACGGCAA-1" "5"
128 | >
129 | > GEMLI_items$predicted_lineages[1:5]
130 | AAACGAACAGGTGTGA-1 AAAGGTAGTTGCTTGA-1 AACCACAAGTTTGTCG-1 AAGCCATGTTCCACGG-1 AAGCGAGGTACGGCAA-1
131 |                  1                  2                  3                  4                  5
132 | ```
133 | 
134 | ### Trim lineages that are too big
135 | 
136 | In some applications it may be useful to trim lineages that are too big. For instance if it is known that cells should have undergone only a certain number of divisions effectively limiting the lineage size or if you are only interested in sister cell pairs. Similarly, if you investigate large lineages you want to avoid lineages being merged due to few false predictions between otherwise well-interconnected lineages. The `suggest_network_trimming_to_size` function allows you to preview what a trimming to size would look like. It will again show the predicted lineages as networks but highlight all connections that would be trimmed given a certain size restriction (`max_size`). If you are happy with the suggested trimming you create new trimmed `GEMLI_items` list, in this example we called it `GEMLI_items_post_processed`. You can then again visualize the predictions to see how the changes have affected your predictions.
137 | ```
138 | > suggest_network_trimming_to_size(GEMLI_items, max_size=2, cutoff=50) # left image
139 | > GEMLI_items_post_processed = trim_network_to_size(GEMLI_items, max_size=2, cutoff=50)
140 | > visualize_as_network(GEMLI_items_post_processed, cutoff=50) # right image
141 | ```
142 | <p float="left">
143 |   <img width="330" height="330" src="https://github.com/UPSUTER/GEMLI/blob/main/Example/GEMLI_GitHub_network_50_GT_ST.png">
144 |   <img width="330" height="330" src="https://github.com/UPSUTER/GEMLI/blob/main/Example/GEMLI_GitHub_network_50_trim.png">
145 | </p>
146 | 
147 | 
148 | ## Example 2: large lineages in intestinal crypts
149 | 
150 | In this example we'll be visualizing GEMLI results from a lineage-annotated dataset of murine intestinal crypts. Intestinal crypts originate from one or few intestinal stem cells, and are composed of a number of different cell types. In the data we find cells from individual crypts that range from just three up to forty-four related cells. The scRNA-seq dataset is derived from Bues et al. 2022 (PMID 35165449) and is publically available under Gene Expression Omnibus accession number GSE148093.
151 | 
152 | ### Load example data.
153 | First we load the example data. Here we already predicted the lineages using GEMLI and therefore do not include a count matrix, but rather start with the predictions right away.
154 | 
155 | ```
156 | > load('GEMLI_crypts_example_data_matrix.RData')
157 | > load('GEMLI_crypts_example_barcode_information.RData')
158 | ```
159 | 
160 | ### Create a GEMLI items list
161 | We then create a GEMLI items list. This list is used to store the data, and create and store the outputs of GEMLI (for details check example one).
162 | 
163 | ```
164 | > GEMLI_items_crypts = list()
165 | > GEMLI_items_crypts[['prediction']] = Crypts
166 | > GEMLI_items_crypts[['barcodes']] = Crypts_bc_dict
167 | ```
168 | 
169 | ### Visualize predictions as network
170 | To visualize large lineages we'll use three different network layout algorithms: Fruchterman-Reingold, Kamada-Kawai, and grid. Each of them has advantages and disadvantages.
171 | 
172 | (1) Fruchterman-Reingold dispalys the cells of individual predicted crypts close together with ample space between crypts. This makes it hard to see connections within individual crypts but allows to get a good overview of individual structures.
173 | 
174 | (2) Kamada-Kawai spaces individual cells well, so we can see individual connections between them. It may, however, happen that two different predicted crypts are partially overlayed, as can be seen for dark red and bright green lineages on the right side of the plot.
175 | 
176 | (3) When the network is layed out as a grid, one gets generally a good overview of the predicted lineages and their connections, but it's hard to see which connections belongs to which cell in the same row.
177 | 
178 | ```
179 | > visualize_as_network(GEMLI_items_crypts, cutoff=70, display_orphan=F, max_edge_width=1, ground_truth=T, include_labels=F, layout_style="fr") # first image
180 | > visualize_as_network(GEMLI_items_crypts, cutoff=70, display_orphan=F, max_edge_width=1, ground_truth=T, include_labels=F, layout_style="kk") # second image
181 | > visualize_as_network(GEMLI_items_crypts, cutoff=70, display_orphan=F, max_edge_width=1, ground_truth=T, include_labels=F, layout_style="grid") # third image
182 | ```
183 | 
184 | <p float="left">
185 |   <img width="430" height="330" src="https://github.com/UPSUTER/GEMLI/blob/main/Example/GEMLI_GitHub_crypts_network_70_fr.png">
186 |   <img width="430" height="330" src="https://github.com/UPSUTER/GEMLI/blob/main/Example/GEMLI_GitHub_crypts_network_70_kk.png">
187 |   <img width="430" height="330" src="https://github.com/UPSUTER/GEMLI/blob/main/Example/GEMLI_GitHub_crypts_network_70_grid.png">
188 | </p>
189 | 
190 | ### Adding cell type information to the GEMLI items list
191 | The cells of individual intestinal crypts can be assigned to different cell types. This information can be added to the GEMLI items list as 'cell_type' slot in the form of a dataframe with column 'cell.ID' and 'cell.type'.
192 | 
193 | ```
194 | > load('GEMLI_crypts_example_cell_type_annotation.RData')
195 | > GEMLI_items_crypts[['cell_type']] = Crypts_annotation
196 | ```
197 | 
198 | ### Color prediction network visualization by cell type
199 | The visualization of the lineage predictions can now be colored by the cell type annotation. This allows to see the composition of individual intestinal crypts. 
200 | 
201 | ```
202 | > visualize_as_network(GEMLI_items_crypts, cutoff=70, max_edge_width=5, display_orphan=F, include_labels=F, ground_truth=T, highlight_FPs=T, layout_style="kk", cell_type_colors=T)
203 | 
204 | ```
205 | <p float="left">
206 |   <img width="430" height="300" src="https://github.com/UPSUTER/GEMLI/blob/main/Example/GEMLI_GitHub_crypts_network_70_cell_type_colors.png">
207 | </p>
208 | 
209 | Specific colors can be assigned to specific cell types by adding a dataframe with column 'cell.type' and 'color' in the GEMLI items list slot 'cell_type_color'. This can also allow to highlight just one or two selected cell types.
210 | 
211 | ```
212 | > cell.type <- unique(GEMLI_items_crypts[['cell_type']]$cell.type)
213 | > color <- c("#5386BD", "skyblue1", "darkgreen", "gold", "red", "darkred", "black")
214 | > Cell_type_color <- data.frame(cell.type, color)
215 | > GEMLI_items_crypts[['cell_type_color']] = Cell_type_color
216 | >
217 | > visualize_as_network(GEMLI_items_crypts, cutoff=70, max_edge_width=5, display_orphan=F, include_labels=F, ground_truth=T, highlight_FPs=T, layout_style="kk", cell_type_colors=T)
218 | 
219 | ```
220 | 
221 | <p float="left">
222 |   <img width="430" height="300" src="https://github.com/UPSUTER/GEMLI/blob/main/Example/GEMLI_GitHub_crypts_network_70_custom_cell_type_colors.png">
223 | </p>
224 | 
225 | ### Get overview of crypt cell type composition
226 | To get an even more quantitative overview of the cell type composition of individual intestinal crypts, we can use the function 'cell_type_composition_plot'. To do so we have to run the 'prediction_to_lineage_information' function first.
227 | 
228 | ```
229 | > GEMLI_items_crypts = prediction_to_lineage_information(GEMLI_items_crypts, cutoff=50)
230 | > cell_type_composition_plot(GEMLI_items_crypts, cell_type_colors=T, type=c("bubble"))
231 | 
232 | ```
233 | <p align="left">
234 |   <img width="200" height="350" src="https://github.com/UPSUTER/GEMLI/blob/main/Example/GEMLI_GitHub_crypts_lineage_overview_bubble.png">
235 | </p>
236 | 
237 | 
238 | With other 'type' parameters we can output using the same function, and upsetR or plain table of the lineage numbers with different cell type compositions.
239 | 
240 | ```
241 | > cell_type_composition_plot(GEMLI_items_crypts, ground_truth=F, cell_type_colors=T, type=c("upsetR")) 
242 | > cell_type_composition_plot(GEMLI_items_crypts, ground_truth=F, cell_type_colors=T, type=c("plain"))
243 | combi                                      n
244 | Entero                                     1
245 | Entero__Goblet__PIC__Stem                  1
246 | Entero__Goblet__PIC__Stem__TA              1
247 | Entero__PIC__Paneth__Regstem__Stem__TA     1
248 | Entero__Stem__TA                           1
249 | Goblet__PIC__Regstem__Stem__TA             1
250 | Goblet__PIC__Stem__TA                      1
251 | Goblet__Regstem__Stem__TA                  1
252 | Goblet__Stem__TA                           2
253 | PIC                                        3
254 | PIC__Stem__TA                              2
255 | Paneth                                     2
256 | Paneth__Regstem__Stem                      1
257 | Paneth__Stem                               1
258 | Regstem__Stem                              1
259 | Stem                                       3
260 | TA                                         1
261 | ```
262 | <p align="centre">
263 |   <img width="400" height="300" src="https://github.com/UPSUTER/GEMLI/blob/main/Example/GEMLI_GitHub_crypts_lineage_overview_upsetR.png">
264 | </p>
265 | 
266 | 
267 | 
268 | ## Example 3: Cell fate decisions in human breast cancer
269 | 
270 | For our third example we'll be looking at cell fate decisions in a scRNA-seq dataset of human breast cancer encompassing both ductal carcinoma in situ (DCIS) and invasive tumor (inv_tumor) cells. We'll be working with a subset of this data for fast processing. No ground truth is available. We will study the fate transition from DCIS to invasive tumor cells. The data is derived from a public 10X Genomics dataset associated to the following preprint bioRxiv 2022.10.06.510405; doi: https://doi.org/10.1101/2022.10.06.510405 and downloaded from https://www.10xgenomics.com/products/xenium-in-situ/preview-dataset-human-breast (April 2023).
271 | 
272 | ### Load example data.
273 | First we load the example data. We already predicted the lineages and extracted the predicted lineages using GEMLI. Furthermore we load the previously generated cell type information for all cells in the dataset. 
274 | 
275 | ```
276 | > load('GEMLI_cancer_example_norm_count.RData')
277 | > load('GEMLI_cancer_example_predicted_lineages.RData')
278 | > load('GEMLI_cancer_example_cell_type_annotation.RData')
279 | ```
280 | 
281 | ### Create a GEMLI items list
282 | We then create a GEMLI items list. This list is used to store the data, and create and store the outputs of GEMLI (for details check example one).
283 | 
284 | ```
285 | > GEMLI_items = list()
286 | > GEMLI_items[['gene_expression']] = Cancer_norm_count
287 | > GEMLI_items[['predicted_lineage_table']] = Cancer_predicted_lineages
288 | > GEMLI_items[['cell_type']] = Cancer_annotation
289 | ```
290 | 
291 | ### Get an overview of symmetric and asymmetric cell lineages
292 | To get an overview of the symmetric and asymmetric cell lineages present in the data, we can use the function 'cell_type_composition_plot' with parameter 'type' ="plain" or "upsetR". 
293 | 
294 | ```
295 | > cell_type_composition_plot(GEMLI_items, type=c("plain"))
296 | combi               n
297 | DCIS              250
298 | DCIS__inv_tumor    25
299 | inv_tumor         203
300 | >
301 | > cell_type_composition_plot(GEMLI_items, type=c("upsetR"))
302 | ```
303 | 
304 | <p align="center">
305 |   <img width="450" height="200" src="https://github.com/UPSUTER/GEMLI/blob/main/Example/GEMLI_GitHub_cancer_lineage_overview_upsetR.png">
306 | </p>
307 | 
308 | ### Extract symmetric and asymmetric cell lineages
309 | We extract now predicted cell lineages with members in only one cell type (symmetric) or in two or more cell types (asymmetric). To analyze the transition from DCIS to invasive breast cancer we will extract symmetric DCIS, asymmetric DCIS and invasive tumor, and symmetric inv_tumor lineages. To exclude lineages with a too large asymmetry, we set a threshold to extract asymmetric lineages containing at least 10% of each cell type. The function output is stored in GEMLI_items 'cell_fate_analysis' item. It is a data frame with a column cell.fate with label sym or asym and cell type separated by an underscore. This cell.fate designation allows to subsequently analyze only a specific cell type in asymmetric cell lineages.
310 | 
311 | ```
312 | > GEMLI_items<-extract_cell_fate_lineages(GEMLI_items, selection=c("inv_tumor", "DCIS"), unique=FALSE, threshold=c(10,10))
313 | >
314 | > GEMLI_items[['cell_fate_analysis']][1:10,] 
315 |   cell.ID            clone.ID cell.type cell.fate       
316 |  AAACCCACATCCGTGG-3     2826 DCIS      NA_DCIS      
317 |  AAACCCATCCTTATAC-4       54 DCIS      asym_DCIS    
318 |  AAACGAACAACACGTT-2     1466 inv_tumor sym_inv_tumor
319 |  AAAGAACCAACAGCTT-3      726 DCIS      sym_DCIS     
320 |  AAAGAACGTCGAATGG-2     1467 inv_tumor sym_inv_tumor
321 |  AAAGGATCAGAGTTCT-4     4383 inv_tumor sym_inv_tumor
322 |  AAAGGATTCGCCAATA-4     4385 inv_tumor sym_inv_tumor
323 |  AAAGGGCCAGTCGGAA-3      754 inv_tumor sym_inv_tumor
324 |  AAAGGGCGTAAGAACT-1       10 inv_tumor sym_inv_tumor
325 |  AAAGGGCGTAGTTCCA-1       11 inv_tumor sym_inv_tumor
326 | >
327 | > table(GEMLI_items[['cell_fate_analysis']]$cell.fate)
328 |      asym_DCIS asym_inv_tumor        NA_DCIS       sym_DCIS  sym_inv_tumor 
329 |             77             96            225            273            744
330 | ```
331 | 
332 | ### Call and visualize DEG for cells in specific lineage types
333 | Based on the symmetric and asymmetric lineages we extracted, we will now call differentially expressed genes (DEG) specific for cells of specific cell types in specific lineages types. To analyze the transition from DCIS to invasive breast cancer, we notable call DEG for DCIS cells in asymmetric and symmetric lineages. These are genes specific to DCIS cells at the start of the transition. 
334 | 
335 | ```
336 | > GEMLI_items<-cell_fate_DEG_calling(GEMLI_items, ident1="sym_DCIS", ident2="asym_DCIS", min.pct=0.05, logfc.threshold=0.1)
337 | >
338 | > GEMLI_items[['DEG']][1:10,] 
339 |                  p_val avg_log2FC pct.1 pct.2    p_val_adj
340 | DCAF7     4.308339e-15 -1.0275658 0.985 1.000 1.086520e-10
341 | NRAS      2.630887e-11 -1.2821469 0.516 0.766 6.634834e-07
342 | LINC01999 2.824883e-11 -1.4817245 0.736 0.961 7.124072e-07
343 | CSDE1     7.762498e-10 -0.8055031 0.952 0.974 1.957624e-05
344 | S100A10   3.676826e-09  1.1729996 0.839 0.610 9.272588e-05
345 | MAN1A2    8.900499e-09 -0.8487466 0.868 0.961 2.244617e-04
346 | APPBP2-DT 9.767327e-09 -1.0797417 0.495 0.792 2.463222e-04
347 | PVALB     9.882215e-09 -1.1913536 0.304 0.623 2.492196e-04
348 | CDH2      1.169570e-08 -1.6410588 0.264 0.558 2.949538e-04
349 | CSTA      2.204180e-08 -1.0506112 0.198 0.506 5.558721e-04
350 | >
351 | > DEG_volcano_plot(GEMLI_items, name1="Sym_DCIS", name2="Asym_DCIS")
352 | ```
353 | 
354 | <p align="center">
355 |   <img width="500" height="430" src="https://github.com/UPSUTER/GEMLI/blob/main/Example/GEMLI_GitHub_volcano_sym_DCIS_asym_DCIS.png">
356 | </p>
357 | 
358 | 
359 | 
360 | ## Citation
361 | If you use the package, please cite A.S. Eisele*, M. Tarbier*, A.A. Dormann, V. Pelechano, D.M. Suter | "Gene-expression memory-based prediction of cell lineages from scRNA-seq datasets" | Nature Communications 15, 2744 (2024). https://doi.org/10.1038/s41467-024-47158-y
362 | 


--------------------------------------------------------------------------------