├── .Rbuildignore
├── .github
├── README.html
└── README.md
├── .gitignore
├── DESCRIPTION
├── NAMESPACE
├── NEWS
├── R
├── contrasting_functions.r
├── data.r
├── group_labelling_functions.r
├── loading_helper_functions.r
└── plotting_functions.R
├── data
├── de_table.demo_query.rda
├── de_table.demo_ref.rda
├── demo_cell_info_table.rda
├── demo_counts_matrix.rda
├── demo_gene_info_table.rda
├── demo_microarray_expr.rda
├── demo_microarray_sample_sheet.rda
├── demo_query_se.rda
└── demo_ref_se.rda
├── inst
└── extdata
│ ├── demo_microarray_expression.tab
│ ├── demo_microarray_info.tab
│ ├── larger_doco_examples.rdata
│ ├── sim_cr_dataset
│ ├── analysis
│ │ └── clustering
│ │ │ └── kmeans_4_clusters
│ │ │ └── clusters.csv
│ └── filtered_gene_bc_matrices
│ │ └── GRCh38
│ │ ├── barcodes.tsv
│ │ ├── genes.tsv
│ │ └── matrix.mtx
│ ├── sim_query_cell_info.tab
│ ├── sim_query_counts.tab
│ ├── sim_query_gene_info.tab
│ ├── sim_ref_cell_info.tab
│ ├── sim_ref_counts.tab
│ └── sim_ref_gene_info.tab
├── man
├── contrast_each_group_to_the_rest.Rd
├── contrast_each_group_to_the_rest_for_norm_ma_with_limma.Rd
├── contrast_the_group_to_the_rest.Rd
├── contrast_the_group_to_the_rest_with_limma_for_microarray.Rd
├── convert_se_gene_ids.Rd
├── de_table.demo_query.Rd
├── de_table.demo_ref.Rd
├── demo_cell_info_table.Rd
├── demo_counts_matrix.Rd
├── demo_gene_info_table.Rd
├── demo_microarray_expr.Rd
├── demo_microarray_sample_sheet.Rd
├── demo_query_se.Rd
├── demo_ref_se.Rd
├── find_within_match_differences.Rd
├── get_counts_index.Rd
├── get_inner_or_outer_ci.Rd
├── get_limma_top_table_with_ci.Rd
├── get_matched_stepped_mwtest_res_table.Rd
├── get_ranking_and_test_results.Rd
├── get_rankstat_table.Rd
├── get_reciprocal_matches.Rd
├── get_stepped_pvals_str.Rd
├── get_the_up_genes_for_all_possible_groups.Rd
├── get_the_up_genes_for_group.Rd
├── get_vs_random_pval.Rd
├── load_dataset_10Xdata.Rd
├── load_se_from_tables.Rd
├── make_ranking_violin_plot.Rd
├── make_ref_similarity_names.Rd
├── make_ref_similarity_names_for_group.Rd
├── run_pair_test_stats.Rd
├── subset_cells_by_group.Rd
├── subset_se_cells_for_group_test.Rd
└── trim_small_groups_and_low_expression_genes.Rd
├── tests
├── testthat.R
└── testthat
│ ├── test-contrasting_functions.R
│ └── test-loading_helper_functions.R
└── vignettes
├── celaref.bib
├── celaref_doco.Rmd
└── images
├── pbmc4k_cloupe_kmeans7.png
├── violin_plot_example.png
└── workflow_diagram.png
/.Rbuildignore:
--------------------------------------------------------------------------------
1 | ^.*\.Rproj$
2 | ^\.Rproj\.user$
3 |
--------------------------------------------------------------------------------
/.github/README.md:
--------------------------------------------------------------------------------
1 | # celaref
2 |
3 |
4 | ### Function
5 |
6 | Single cell RNA sequencing (scRNAseq) has made it possible to examine the
7 | cellular heterogeny within a tissue or sample, and observe changes and
8 | characteristics in specific cell types. To do this, we need to group the cells
9 | into clusters and figure out what they are.
10 |
11 | The celaref (*ce*ll *la*belling by *ref*erence) package aims to streamline the cell-type identification step, by
12 | suggesting cluster labels on the basis of similarity to an already-characterised
13 | reference dataset - wheather that's from a similar experiment performed
14 | previously in the same lab, or from a public dataset from a similar sample.
15 |
16 | ### Input
17 |
18 | To look for cluster similarities celaref needs:
19 |
20 | * The query dataset :
21 | - a table of read counts per cell per gene
22 | - a list of which cells belong in which cluster
23 |
24 | * A reference dataset:
25 | - a table of read counts per cell per gene
26 | - a list of which cells belong in which *annotated* cluster
27 |
28 | ### Output
29 |
30 |
31 |
32 | 
33 |
34 |
35 | Query Group | Short Label | pval |
36 | ------------|------------------------------------|---------|
37 | cluster_1 |cluster_1:astrocytes_ependymal |2.98e-23 |
38 | cluster_2 |cluster_2:endothelial-mural |8.44e-10 |
39 | cluster_3 |cluster_3:no_similarity |NA |
40 | cluster_4 |cluster_4:microglia |2.71e-19 |
41 | cluster_5 |cluster_5:pyramidal SS\|interneurons|3.49e-10 |
42 | cluster_6 |cluster_6:oligodendrocytes |2.15e-28 |
43 |
44 |
45 |
46 |
47 | This is a comparison of brain scRNAseq data from :
48 |
49 | * Zeisel, A., Manchado, A. B. M., Codeluppi, S., Lonnerberg, P., La Manno, G., Jureus, A., … Linnarsson, S. (2015). *Cell types in the mouse cortex and hippocampus revealed by single-cell RNA-seq.* Science, 347(6226), 1138–42. http://doi.org/10.1126/science.aaa1934
50 | * Darmanis, S., Sloan, S. A., Zhang, Y., Enge, M., Caneda, C., Shuer, L. M., … Quake, S. R. (2015). *A survey of human brain transcriptome diversity at the single cell level.* Proceedings of the National Academy of Sciences, 112(23), 201507125. http://doi.org/10.1073/pnas.1507125112
51 |
52 |
53 | ### More information?
54 |
55 | Full details in the vignette [html](http://bioinformatics.erc.monash.edu/home/sarah.williams/projects/cell_groupings/doco/celaref_doco.html) - method description, manual and example analyses.
56 |
57 |
--------------------------------------------------------------------------------
/.gitignore:
--------------------------------------------------------------------------------
1 | .Rproj.user
2 | .Rhistory
3 | .RData
4 | inst/doc
5 | github
6 | vignettes/celaref_doco_files
7 | vignettes/celaref_doco_cache
8 | vignettes/celaref_doco.html
9 | github/README.html
10 |
--------------------------------------------------------------------------------
/DESCRIPTION:
--------------------------------------------------------------------------------
1 | Package: celaref
2 | Title: Single-cell RNAseq cell cluster labelling by reference
3 | Version: 1.3.0
4 | Authors@R: person("Sarah", "Williams", email = "sarah.williams1@monash.edu", role = c("aut", "cre"))
5 | Description: After the clustering step of a single-cell RNAseq experiment, this
6 | package aims to suggest labels/cell types for the clusters, on the basis of
7 | similarity to a reference dataset. It requires a table of read counts per
8 | cell per gene, and a list of the cells belonging to each of the clusters,
9 | (for both test and reference data).
10 | Depends: R (>= 3.5.0),
11 | SummarizedExperiment
12 | Imports:
13 | MAST,
14 | ggplot2,
15 | Matrix,
16 | dplyr,
17 | magrittr,
18 | stats,
19 | utils,
20 | rlang,
21 | BiocGenerics,
22 | S4Vectors,
23 | readr,
24 | tibble,
25 | DelayedArray
26 | Suggests:
27 | limma,
28 | parallel,
29 | knitr,
30 | rmarkdown,
31 | ExperimentHub,
32 | testthat
33 | biocViews: SingleCell
34 | VignetteBuilder: knitr
35 | License: GPL-3
36 | Encoding: UTF-8
37 | LazyData: true
38 | RoxygenNote: 6.1.1
39 |
--------------------------------------------------------------------------------
/NAMESPACE:
--------------------------------------------------------------------------------
1 | # Generated by roxygen2: do not edit by hand
2 |
3 | export(contrast_each_group_to_the_rest)
4 | export(contrast_each_group_to_the_rest_for_norm_ma_with_limma)
5 | export(convert_se_gene_ids)
6 | export(get_rankstat_table)
7 | export(get_the_up_genes_for_all_possible_groups)
8 | export(get_the_up_genes_for_group)
9 | export(load_dataset_10Xdata)
10 | export(load_se_from_files)
11 | export(load_se_from_tables)
12 | export(make_ranking_violin_plot)
13 | export(make_ref_similarity_names)
14 | export(make_ref_similarity_names_using_marked)
15 | export(subset_cells_by_group)
16 | export(trim_small_groups_and_low_expression_genes)
17 | import(MAST)
18 | import(SummarizedExperiment)
19 | importFrom(dplyr,n)
20 | importFrom(magrittr,"%>%")
21 | importFrom(rlang,.data)
22 |
--------------------------------------------------------------------------------
/NEWS:
--------------------------------------------------------------------------------
1 | CHANGES IN VERSION 1.3.x
2 | -----------------------
3 |
4 | UPDATES
5 |
6 | o Added factors_to_rm option.
7 |
8 |
9 | CHANGES IN VERSION 1.2.0
10 | -----------------------
11 |
12 | UPDATES
13 |
14 | o Support for passing hdf5-backed SummarizedExperiment objects
15 | (internal conversion to sparse)
16 | o Explicitly handle multiple assay() in the summarizedExperiment. Must have
17 | a named 'counts' assay, or, just the one unnamed assay.
18 | o Same as 1.1.10
19 |
20 |
21 | BUG FIXES
22 |
23 | o Subsampling per group should actually be used now.
24 |
25 |
26 | CHANGES IN VERSION 1.1.8
27 | -----------------------
28 |
29 | UPDATES
30 |
31 | o Updated vignette with large data handling.
32 | o Doco updates
33 | o make_ranking_violin_plot can pass parameters (e.g. rankmetrics) with ...
34 |
35 |
36 | CHANGES IN VERSION 1.1.4
37 | -----------------------
38 |
39 | UPDATES
40 |
41 | o Methods/options to subset large datatsets.
42 |
43 |
44 | CHANGES IN VERSION 1.1.3
45 | -----------------------
46 |
47 | UPDATES
48 |
49 | o None. Version bump for build only.
50 |
51 |
52 | CHANGES IN VERSION 1.1.2
53 | -----------------------
54 |
55 | UPDATES
56 |
57 | o Testing tests
58 |
59 |
60 |
61 | CHANGES IN VERSION 1.1.1
62 | -----------------------
63 |
64 | BUG FIXES
65 |
66 | o Internal SingleCellAssay coercion bugfix. Should no longer require
67 | library(MAST) call to work.
68 |
69 |
70 | UPDATES
71 |
72 | o Partying hard with unit tests woo
73 |
74 |
75 | CHANGES IN VERSION 1.1.0
76 | -----------------------
77 |
78 | UPDATES
79 |
80 | o Option to change the gene ranking metric. Maybe useful for similar cell
81 | types like PBMCs.
82 | o Use sparse matricies to support larger datasets in less RAM.
83 |
84 |
85 | CHANGES IN VERSION 1.0.1
86 | -----------------------
87 |
88 | UPDATES
89 |
90 | o First bioconductor version
91 |
92 |
93 | CHANGES IN VERSION 0.99.1
94 | -----------------------
95 |
96 | BUG FIXES
97 |
98 | o Code style.
99 | o Do not attempt to multithread on windows (suggest mulithread on linux).
100 |
101 |
102 | CHANGES IN VERSION 0.99.0
103 | -----------------------
104 |
105 | NEW FEATURES
106 |
107 | o Initial Version.
108 |
--------------------------------------------------------------------------------
/R/data.r:
--------------------------------------------------------------------------------
1 | #' Demo query de table
2 | #'
3 | #' Small example dataset that is the output of
4 | #' \link{contrast_each_group_to_the_rest}. It contains the results
5 | #' of each group compared to the rest of the sample (ie within sample
6 | #' differential expression)
7 | #'
8 | #' @return An example de_table from
9 | #' \link{contrast_each_group_to_the_rest} (for demo query dataset)
10 | "de_table.demo_query"
11 |
12 |
13 | #' Demo ref de table
14 | #'
15 | #' Small example dataset that is the output of
16 | #' \link{contrast_each_group_to_the_rest}. It contains the results
17 | #' of each group compared to the rest of the sample (ie within sample
18 | #' differential expression)
19 | #'
20 | #' @return An example de_table from
21 | #' \link{contrast_each_group_to_the_rest} (for demo ref dataset)
22 | "de_table.demo_ref"
23 |
24 |
25 | #' Demo cell info table
26 | #'
27 | #' Sample sheet table listing each cell, its assignd cluster/group, and
28 | #' any other information that might be interesting (replicate, individual e.t.c)
29 | #'
30 | #' @return An example cell info table
31 | "demo_cell_info_table"
32 |
33 |
34 |
35 | #' Demo count matrix
36 | #'
37 | #' Counts matrix for a small, demo example datasets. Raw counts of
38 | #' reads per gene (row) per cell (column).
39 | #' @return An example counts matrix.
40 | "demo_counts_matrix"
41 |
42 |
43 | #' Demo gene info table
44 | #'
45 | #' Extra table of gene-level information for the demo example dataset.
46 | #' Can contain anything as long as theres a unique gene id.
47 | #' @return An example table of genes.
48 | "demo_gene_info_table"
49 |
50 | #' Demo microarray expression table
51 | #'
52 | #' Microarray-style expression table for the demo example dataset.
53 | #' Rows are genes, columns are samples, as per counts matrix.
54 | #' @return An example table of (fake) microarray data.
55 | "demo_microarray_expr"
56 |
57 |
58 | #' Demo microarray sample sheet table
59 | #'
60 | #' Microarray sample sheet table for the demo example dataset.
61 | #' Contains array identifiers, their group and any other information that could
62 | #' be useful.
63 | #' @return An example microarray sample sheet
64 | "demo_microarray_sample_sheet"
65 |
66 |
67 | #' Demo query se (summarizedExperiment)
68 | #'
69 | #' A summarisedExperiment object loaded from demo info tables, for a query set.
70 | #' @return An example summarised experiment (for demo query dataset)
71 | "demo_query_se"
72 |
73 | #' Demo reference se (summarizedExperiment)
74 | #'
75 | #' A summarisedExperiment object loaded from demo info tables, for a reference
76 | #' set.
77 | #' @return An example summarised experiment (for demo reference dataset)
78 | "demo_ref_se"
79 |
80 |
81 |
82 |
83 |
--------------------------------------------------------------------------------
/R/loading_helper_functions.r:
--------------------------------------------------------------------------------
1 | #' load_se_from_tables
2 | #'
3 | #' Create a SummarizedExperiment object (dataset_se) from a count matrix, cell
4 | #' information and optionally gene information.
5 | #'
6 | #' This function makes a SummarizedExperiment object in a form that
7 | #' should work for celaref functions. Specifically, that means it will have an
8 | #' 'ID' feild for genes (view with \code{rowData(dataset_se)}), and both
9 | #' 'cell_sample' and 'group' feild for cells (view with
10 | #' \code{colData(dataset_se)}). See parameters for detail.
11 | #' Additionally, the counts will be an integer matrix (not a
12 | #' sparse matrix), and the \emph{group} feild (but not \emph{cell_sample}
13 | #' or \emph{ID}) will be a factor.
14 | #'
15 | #' Note that data will be subsetted to cells present in both the counts matrix
16 | #' and cell info, this is handy for loading subsets of cells.
17 | #' However, if \bold{gene_info_file} is defined, all genes must match exactly.
18 | #'
19 | #' The \code{load_se_from_files} form of this function will run the same
20 | #' checks, but will read everything from files in one go. The
21 | #' \code{load_se_from_tables}
22 | #' form is perhaps more useful when the annotations need to be modified (e.g.
23 | #' programmatically adding a different gene identifier, renaming groups,
24 | #' removing unwanted samples).
25 | #'
26 | #' Note that the SummarizedExperiment object can also be created without using
27 | #' these functions, it just needs the \emph{cell_sample}, \emph{ID} and
28 | #' \emph{group} feilds as described above. Since sometimes it might be easier
29 | #' to add these to an existing \emph{SummarizedExperiment} from upstream
30 | #' analyses.
31 | #'
32 | #'
33 | #' @param counts_matrix A tab-separated matrix of read counts for each gene
34 | #' (row) and each cell (column). Columns and rows should be named.
35 | #'
36 | #' @param cell_info_table Table of cell information.
37 | #' If there is a column labelled
38 | #' \emph{cell_sample}, that will be used as the unique cell identifiers.
39 | #' If not, the first column is assumed to be cell identifiers, and will be
40 | #' copied to a new feild labelled \emph{cell_sample}.
41 | #' Similarly - the clusters of these cells should be listed in one column -
42 | #' which can be called 'group' (case-sensitive) or specified with
43 | #' \bold{group_col_name}. \emph{Minimal data format: }
44 | #'
45 | #' @param gene_info_table Optional table of gene information. If there is a
46 | #' column labelled
47 | #' \emph{ID}, that will be used as the gene identifiers (they must be unique!).
48 | #' If not, the first column is assumed to be a gene identifier, and will be
49 | #' copied to a
50 | #' new feild labelled \emph{ID}. Must match all rownames in
51 | #' \bold{counts_matrix}.
52 | #' If omitted, ID wll be generated from the rownames of counts_matrix.
53 | #' Default=NA
54 | #'
55 | #' @param group_col_name Name of the column in \bold{cell_info_table}
56 | #' containing
57 | #' the cluster/group that each cell belongs to. Case-sensitive. Default='group'
58 | #'
59 | #' @param cell_col_name Name of the column in \bold{cell_info_table} containing
60 | #' a cell id. Ignored if \emph{cell_sample} column is already present.
61 | #' If omitted, (and no \emph{cell_sample} column) will use first column.
62 | #' Case-sensitive. Default=NA
63 | #'
64 | #' @return A SummarisedExperiment object containing the count data, cell info
65 | #' and gene info.
66 | #'
67 | #' @examples
68 | #'
69 | #' # From data frames (or a matrix for counts) :
70 | #' demo_se <- load_se_from_tables(counts_matrix=demo_counts_matrix,
71 | #' cell_info_table=demo_cell_info_table)
72 | #' demo_se <- load_se_from_tables(counts_matrix=demo_counts_matrix,
73 | #' cell_info_table=demo_cell_info_table,
74 | #' gene_info_table=demo_gene_info_table)
75 | #'
76 | #' # Or from data files :
77 | #' counts_filepath <- system.file("extdata", "sim_query_counts.tab", package = "celaref")
78 | #' cell_info_filepath <- system.file("extdata", "sim_query_cell_info.tab", package = "celaref")
79 | #' gene_info_filepath <- system.file("extdata", "sim_query_gene_info.tab", package = "celaref")
80 | #'
81 | #' demo_se <- load_se_from_files(counts_file=counts_filepath, cell_info_file=cell_info_filepath)
82 | #' demo_se <- load_se_from_files(counts_file=counts_filepath, cell_info_file=cell_info_filepath,
83 | #' gene_info_file=gene_info_filepath )
84 | #'
85 | #' @seealso \href{https://bioconductor.org/packages/release/bioc/html/SummarizedExperiment.html}{SummarizedExperiment} For general doco on the SummarizedExperiment objects.
86 | #'
87 | #' @family Data-loading functions
88 | #'
89 | #' @import SummarizedExperiment
90 | #'
91 | #' @export
92 | load_se_from_tables <- function(
93 | counts_matrix, cell_info_table, gene_info_table = NA,
94 | group_col_name="group", cell_col_name=NA
95 | ) {
96 |
97 | cell_info_table <- data.frame(cell_info_table, stringsAsFactors = FALSE)
98 |
99 | # If there's no cell_sample, and no cell_col_name, make the first
100 | # one 'cell_sample', else use cell_col_name.
101 | if (! "cell_sample" %in% colnames(cell_info_table)) {
102 | if (is.na(cell_col_name)) {
103 | cell_info_table <- cbind.data.frame(cell_sample=cell_info_table[,1],
104 | cell_info_table,
105 | stringsAsFactors = FALSE )
106 | }
107 | else {
108 | stopifnot(cell_col_name %in% colnames(cell_info_table))
109 | cell_info_table <- cbind.data.frame(
110 | cell_sample=cell_info_table[,cell_col_name],
111 | cell_info_table,
112 | stringsAsFactors = FALSE )
113 | }
114 | }
115 |
116 | # Check for 'group' anything from group_col_name will be copied into group
117 | if (! group_col_name %in% colnames(cell_info_table)) {
118 | stop( "Couldn't find group/cluster column ", group_col_name ,
119 | " in cell_info_table ",cell_info_table )
120 | }
121 | if (group_col_name != "group") {
122 | cell_info_table$group <- cell_info_table[,group_col_name]
123 | }
124 |
125 |
126 | # Only keep common cells, match order
127 | cells <- intersect(cell_info_table$cell_sample, colnames(counts_matrix))
128 | if (length(cells) <= 1) {
129 | stop("Couldn't find cells in common between counts matrix ",
130 | "(col names) and cell_info_file (cell_sample column, ",
131 | "first col or specified as cell_col_name)")
132 | }
133 | if ( length(cells) != nrow(cell_info_table)
134 | || length(cells) != ncol(counts_matrix) ) {
135 | message("Not all cells were listed in both ",
136 | "counts matrix and cell_info_file. ",
137 | "Is this expected? Keeping the ", length(cells), " in common")
138 | }
139 | cell_info_table<-cell_info_table[match(cells, cell_info_table$cell_sample),]
140 | counts_matrix <-counts_matrix[,cells]
141 |
142 |
143 | # NB factorising group after removal of unmatched cells
144 | cell_info_table$group <- factor(cell_info_table$group)
145 |
146 |
147 |
148 |
149 | # Make summarised experiment.
150 | # With or without gene info file.
151 | dataset_se <- NA
152 | if (all(is.na(gene_info_table))) {
153 | dataset_se <- SummarizedExperiment(
154 | assays = S4Vectors::SimpleList(counts=counts_matrix),
155 | colData=base::as.data.frame(cell_info_table))
156 | rowData(dataset_se)$ID <- rownames(assay(dataset_se,'counts'))
157 | }
158 | else {
159 | gene_info_table <- data.frame(gene_info_table, stringsAsFactors = FALSE)
160 | # If there's no ID col, make the first 'ID'
161 | if (! "ID" %in% colnames(gene_info_table)) {
162 | gene_info_table <- cbind.data.frame(
163 | "ID"=as.character(gene_info_table[,1]),
164 | gene_info_table,
165 | stringsAsFactors=FALSE)
166 | }
167 |
168 | # Cells might not, but genes should be matching.
169 | genes <- intersect(gene_info_table$ID, rownames(counts_matrix))
170 | num_genes <- length(genes)
171 | if ( num_genes != nrow(gene_info_table)
172 | || num_genes != nrow(counts_matrix) ) {
173 | stop( "Gene IDs did not match between ID feild of ",
174 | "gene_info_file (or first column), and row names of ",
175 | "counts matrix")
176 | }
177 |
178 | # Create a summarised experiment object.
179 | dataset_se <- SummarizedExperiment(
180 | assays = S4Vectors::SimpleList(counts=counts_matrix),
181 | colData=S4Vectors::DataFrame(cell_info_table),
182 | rowData=S4Vectors::DataFrame(gene_info_table))
183 | }
184 |
185 | return(dataset_se)
186 | }
187 |
188 |
189 |
190 |
191 | #' load_se_from_files
192 | #'
193 | #' \code{load_se_from_files} is a wrapper for \code{load_se_from_tables} that
194 | #' will read in tables from specified files.
195 | #'
196 | #' @param counts_file A tab-separated file of a matrix of read counts. As per
197 | #' \bold{counts_matrix}. First column should be gene ID, and top row cell ids.
198 | #'
199 | #' @param cell_info_file Tab-separated text file of cell information, as per
200 | #' \bold{cell_info_table}. Columns must have names.
201 | #'
202 | #' @param gene_info_file Optional tab-separated text file of gene information,
203 | #' as per \bold{gene_info_file}. Columns must have names. Default=NA
204 | #'
205 | #' @family Data loading functions
206 | #'
207 | #' @describeIn load_se_from_tables To read from files
208 | #'
209 | #' @import SummarizedExperiment
210 | #'
211 | #' @export
212 | load_se_from_files <- function(
213 | counts_file, cell_info_file, gene_info_file = NA, group_col_name="group",
214 | cell_col_name=NA
215 | ) {
216 |
217 | counts_matrix <- as.matrix(utils::read.table(
218 | counts_file, row.names=1, header=TRUE, sep = "\t",
219 | stringsAsFactors = FALSE, check.names=FALSE ))
220 |
221 | cell_info_table <- utils::read.table(cell_info_file, header=TRUE,
222 | sep = "\t", stringsAsFactors = FALSE )
223 |
224 | # Read gene Info table, if specified
225 | gene_info_table <- NA
226 | if (! is.na(gene_info_file)) {
227 | gene_info_table <- utils::read.table(gene_info_file,
228 | header=TRUE,
229 | sep = "\t",
230 | stringsAsFactors = FALSE )
231 | }
232 |
233 | return(load_se_from_tables(counts_matrix, cell_info_table, gene_info_table,
234 | group_col_name, cell_col_name = cell_col_name) )
235 | }
236 |
237 |
238 |
239 |
240 |
241 |
242 |
243 |
244 |
245 | #' load_dataset_10Xdata
246 | #'
247 | #' Convenience function to create a SummarizedExperiment object (dataset_se)
248 | #' from a the output of 10X cell ranger pipeline run.
249 | #'
250 | #'
251 | #' This function makes a SummarizedExperiment object in a form that
252 | #' should work for celaref functions. Specifically, that means it will have an
253 | #' 'ID' feild for genes (view with \code{rowData(dataset_se)}), and both
254 | #' 'cell_sample' and 'group' feild for cells (view with
255 | #' \code{colData(dataset_se)}). See parameters for detail.
256 | #' Additionally, the counts will be an integer matrix (not a
257 | #' sparse matrix), and the \emph{group} feild (but not \emph{cell_sample}
258 | #' or \emph{ID}) will be a factor.
259 | #'
260 | #' The clustering information can be read from whichever cluster is specified,
261 | #' usually there will be several choices.
262 | #'
263 | #' This funciton is designed to work with output of version 2.0.1 of the
264 | #' cellRanger pipeline, may not work with others (will not work for 1.x).
265 | #'
266 | #' @param dataset_path Path to the directory of 10X data, as generated by the
267 | #' cellRanger pipeline (versions 2.1.0 and 2.0.1). The directory should have
268 | #' subdirecotires \emph{analysis}, \emph{filtered_gene_bc_matrices} and
269 | #' \emph{raw_gene_bc_matrices} (only the first 2 are read).
270 | #' @param dataset_genome The genome that the reads were aligned against,
271 | #' e.g. GRCh38. Check for this as a directory name under the
272 | #' \emph{filtered_gene_bc_matrices} subdirectory if unsure.
273 | #' @param clustering_set The 10X cellRanger pipeline produces several
274 | #' different cluster definitions per dataset. Specify which one to use e.g.
275 | #' kmeans_10_clusters Find them as directory names under
276 | #' \emph{analysis/clustering/}
277 | #' @param gene_id_cols_10X Vector of the names of the columns in the gene
278 | #' description file (\emph{filtered_gene_bc_matrices/GRCh38/genes.csv}). The
279 | #' first element of this will become the ID.
280 | #' Default = c("ensembl_ID","GeneSymbol")
281 | #' @param id_to_use Column from \bold{gene_id_cols_10X} that defines the gene
282 | #' identifier to use as 'ID' in the returned SummarisedExperiment object.
283 | #' Many-to-one relationships betwen the assumed unique first element of
284 | #' \bold{gene_id_cols_10X} and \bold{id_to_use} will be handled gracefully by
285 | #' \code{\link{convert_se_gene_ids}}.
286 | #' Defaults to first element of \bold{gene_id_cols_10X}
287 | #'
288 | #' @return A SummarisedExperiment object containing the count data, cell info
289 | #' and gene info.
290 | #'
291 | #' @examples
292 | #' example_10X_dir <- system.file("extdata", "sim_cr_dataset", package = "celaref")
293 | #' dataset_se <- load_dataset_10Xdata(example_10X_dir, dataset_genome="GRCh38",
294 | #' clustering_set="kmeans_4_clusters", gene_id_cols_10X=c("gene"))
295 | #'
296 | #' \dontrun{
297 | #' dataset_se <- load_dataset_10Xdata('~/path/to/data/10X_pbmc4k',
298 | #' dataset_genome="GRCh38",
299 | #' clustering_set="kmeans_7_clusters")
300 | #' }
301 | #'
302 | #' @seealso \href{https://bioconductor.org/packages/release/bioc/html/SummarizedExperiment.html}{SummarizedExperiment}
303 | #' For general doco on the SummarizedExperiment objects.
304 | #' @seealso \code{\link{convert_se_gene_ids}} describes method for
305 | #' converting IDs.
306 | #'
307 | #' @family Data loading functions
308 | #'
309 | #' @import SummarizedExperiment
310 | #'
311 | #' @export
312 | load_dataset_10Xdata <- function(
313 | dataset_path, dataset_genome, clustering_set,
314 | gene_id_cols_10X =c("ensembl_ID","GeneSymbol"),
315 | id_to_use = gene_id_cols_10X[1]
316 | ) {
317 |
318 | matrix_file <- file.path(dataset_path,"filtered_gene_bc_matrices",
319 | dataset_genome,"matrix.mtx")
320 | cells_file <- file.path(dataset_path,"filtered_gene_bc_matrices",
321 | dataset_genome,"barcodes.tsv")
322 | genes_file <- file.path(dataset_path,"filtered_gene_bc_matrices",
323 | dataset_genome,"genes.tsv")
324 |
325 | #.../10X_pbmc5pExpr/analysis/clustering/kmeans_5_clusters/clusters.csv
326 | clustering_file <- file.path(dataset_path,"analysis","clustering",
327 | clustering_set,"clusters.csv")
328 | clustering_table <- readr::read_csv(clustering_file, col_types=readr::cols())
329 | colnames(clustering_table) <- c('cell_sample', 'group')
330 | clustering_table$group <- factor(clustering_table$group)
331 |
332 | # gene info
333 | # Start with the first id (assumed uniq!), but change to specified after.
334 | genes_table <- readr::read_tsv(genes_file,
335 | col_names = gene_id_cols_10X,
336 | col_types = readr::cols())
337 | genes_table$ID <- dplyr::pull(genes_table, gene_id_cols_10X[1])
338 |
339 |
340 | # Not there's no lables here, but rows and columns
341 | filtered_matrix <- as.matrix(Matrix::readMM(matrix_file)) # from Matrix
342 | storage.mode(filtered_matrix ) <- "integer"
343 | order_of_cells <- scan(cells_file, what=character())
344 | colnames(filtered_matrix) <- order_of_cells
345 | rownames(filtered_matrix) <- genes_table$ID
346 |
347 | # Create a summarised experiment objct.
348 | dataset_se <- SummarizedExperiment(
349 | assays = S4Vectors::SimpleList(counts=filtered_matrix),
350 | colData=clustering_table,
351 | rowData=genes_table)
352 |
353 | # Optionally change id (handles m:1)
354 | rowData(dataset_se)$total_count <-
355 | Matrix::rowSums(assay(dataset_se,'counts'))
356 | if (id_to_use != gene_id_cols_10X[1] ) {
357 | dataset_se <- convert_se_gene_ids(dataset_se,
358 | new_id=id_to_use,
359 | eval_col='total_count')
360 | }
361 |
362 | return(dataset_se)
363 | }
364 |
365 |
366 |
367 |
368 |
369 |
370 |
371 |
372 |
373 |
374 |
375 |
376 | #' convert_se_gene_ids
377 | #'
378 | #' Change the gene IDs in in the supplied datatset_se object to some other id
379 | #' already present in the gene info (as seen with \code{rowData()})
380 | #'
381 | #' @param dataset_se Summarised experiment object containing count data. Also
382 | #' requires 'ID' and 'group' to be set within the cell information
383 | #' (see \code{colData()})
384 | #' @param new_id A column within the feature information (view
385 | #' \code{colData(dataset_se)})) of the \bold{dataset_se}, which will become
386 | #' the new ID column. Non-uniqueness of this column is handled gracefully!
387 | #' Any \emph{NAs} will be dropped.
388 | #' @param eval_col Which column to use to break ties of duplicate
389 | #' \bold{new_id}. Must be a column within the feature information (view
390 | #' \code{colData(dataset_se)})) of the \bold{dataset_se}. Total reads per gene
391 | #' feature is a good choice.
392 | #' @param find_max If false, this will choose the minimal \bold{eval_col}
393 | #' instead of max. Default = TRUE
394 | #'
395 | #' @return A modified dataset_se - ID will now be \bold{new_id}, and unique.
396 | #' It will have fewer genes if old ID to new ID was not a 1:1 mapping.
397 | #' The selected genes will be according to the eval col max (or min).
398 | #' \emph{should} pick the alphabetical first on ties, but could change.
399 | #'
400 | #' @examples
401 | #'
402 | #' # The demo dataset doesn't have other names, so make some up
403 | #' # (don't do this)
404 | #' dataset_se <- demo_ref_se
405 | #' rowData(dataset_se)$dummyname <- toupper(rowData(dataset_se)$ID)
406 | #'
407 | #' # If not already present, define a column to evaluate,
408 | #' # typically total reads/gene.
409 | #' rowData(dataset_se)$total_count <- rowSums(assay(dataset_se))
410 | #'
411 | #' dataset_se <- convert_se_gene_ids(dataset_se, new_id='dummyname', eval_col='total_count')
412 | #'
413 | #' @seealso \href{https://bioconductor.org/packages/release/bioc/html/SummarizedExperiment.html}{SummarizedExperiment}
414 | #' For general doco on the SummarizedExperiment objects.
415 | #' @seealso \code{\link{load_se_from_files}} For reading data from flat
416 | #' files (not 10X cellRanger output)
417 | #'
418 | #' @import SummarizedExperiment
419 | #' @importFrom magrittr %>%
420 | #' @importFrom rlang .data
421 | #' @export
422 | convert_se_gene_ids <- function(dataset_se, new_id, eval_col, find_max=TRUE) {
423 |
424 | old_id = "ID"
425 | if (! all(c(old_id, new_id, eval_col) %in% colnames(rowData(dataset_se)))) {
426 | stop("Can't find all of ", c(old_id, new_id, eval_col),
427 | " in rowData(dataset_se) colnames")
428 | }
429 |
430 | row_data_df <- BiocGenerics::as.data.frame(rowData(dataset_se))[,c(old_id,new_id, eval_col)]
431 | colnames(row_data_df) <- c("old_lab", "new_lab", "eval_lab")
432 | row_data_df <- row_data_df[!is.na(row_data_df$new_lab),]
433 | if (find_max) {
434 | row_data_df <- row_data_df %>%
435 | dplyr::arrange(.data$new_lab,
436 | dplyr::desc(.data$eval_lab),
437 | .data$old_lab)
438 | }else { #min
439 | row_data_df <- row_data_df %>%
440 | dplyr::arrange(.data$new_lab,
441 | .data$eval_lab,
442 | .data$old_lab)
443 | }
444 | row_data_unique <- row_data_df %>%
445 | dplyr::group_by(.data$new_lab) %>%
446 | dplyr::slice(1)
447 |
448 | # Subset to just those representative old ids, and give the unique new id
449 | dataset_se<- dataset_se[ row_data_unique$old_lab , ]
450 | rowData(dataset_se)$ID <- rowData(dataset_se)[[new_id]] # overwrite the ID
451 | rownames(dataset_se) <- rowData(dataset_se)[["ID"]]
452 |
453 | return(dataset_se)
454 | }
455 |
456 |
457 |
458 |
459 |
460 |
461 |
462 | #' trim_small_groups_and_low_expression_genes
463 | #'
464 | #' Filter and return a SummarizedExperiment object (dataset_se) by several
465 | #' metrics:
466 | #' \itemize{
467 | #' \item Cells with at least \bold{min_lib_size} total reads.
468 | #' \item Genes expressed in at least \bold{min_detected_by_min_samples}
469 | #' cells, at a threshold of \bold{min_reads_in_sample} per cell.
470 | #' \item Remove entire groups (clusters) of cells where there are fewer than
471 | #' \bold{min_group_membership} cells in that group.
472 | #' }
473 | #'
474 | #' If it hasn't been done already, it is highly reccomended to use this
475 | #' function to filter out genes with no/low total counts
476 | #' (especially in single cell data,
477 | #' there can be many) - without expression they are not useful and may reduce
478 | #' statistical power.
479 | #'
480 | #' Likewise, very small groups (<5 cells) are unlikely to give useful
481 | #' results with this method. And cells with abnormally small library sizes may
482 | #' not be desireable.
483 | #'
484 | #'
485 | #' Of course 'reasonable' thresholds for filtering cells/genes are subjective.
486 | #' Defaults are moderately sensible starting points.
487 | #'
488 | #' @param dataset_se Summarised experiment object containing count data. Also
489 | #' requires 'ID' and 'group' to be set within the cell information
490 | #' (see \code{colData()})
491 | #' @param min_lib_size Minimum library size. Cells with fewer than this many
492 | #' reads removed. Default = 1000
493 | #' @param min_reads_in_sample Require this many reads to consider a gene
494 | #' detected in a sample. Default = 1
495 | #' @param min_detected_by_min_samples Keep genes detected in this many
496 | #' samples. May change with experiment size. Default = 5
497 | #' @param min_group_membership Throw out groups/clusters with fewer than this
498 | #' many cells. May change with experiment size. Default = 5
499 | #'
500 | #' @return A filtered dataset_se, ready for use.
501 | #'
502 | #' @examples
503 | #'
504 | #' demo_query_se.trimmed <-
505 | #' trim_small_groups_and_low_expression_genes(demo_query_se)
506 | #' demo_query_se.trimmed2 <-
507 | #' trim_small_groups_and_low_expression_genes(demo_ref_se,
508 | #' min_group_membership = 10)
509 | #'
510 | #' @import SummarizedExperiment
511 | #'
512 | #' @export
513 | trim_small_groups_and_low_expression_genes <- function(
514 | dataset_se, min_lib_size=1000, min_group_membership=5,
515 | min_reads_in_sample=1, min_detected_by_min_samples=5
516 | ) {
517 |
518 | counts_index <- get_counts_index(n_assays=length(assays(dataset_se)),
519 | assay_names = names(assays(dataset_se)))
520 |
521 | ## Filter by min lib size, num samples detected in
522 | # Use different rowSums for hdf5-backed SCE.
523 | if (is(assay(dataset_se, counts_index) , "DelayedMatrix")) {
524 | samples_per_gene <- DelayedArray::rowSums(assay(dataset_se, counts_index) >= min_reads_in_sample)
525 | dataset_se <- dataset_se[,DelayedArray::colSums(assay(dataset_se, counts_index))>=min_lib_size ]
526 |
527 | } else {
528 | samples_per_gene <- Matrix::rowSums(assay(dataset_se, counts_index) >= min_reads_in_sample)
529 | dataset_se <- dataset_se[,Matrix::colSums(assay(dataset_se, counts_index))>=min_lib_size ]
530 | }
531 | dataset_se <- dataset_se[ samples_per_gene >= min_detected_by_min_samples, ]
532 |
533 |
534 | ## Less than a certain number of cells in a group,
535 | # discard the group, and its cells.
536 | # NB: also removes 'NA' group entries.
537 | cell_group_sizes <- table(dataset_se$group)
538 | groups_to_keep <- names(cell_group_sizes)[cell_group_sizes >= min_group_membership]
539 | dataset_se <- dataset_se[,dataset_se$group %in% groups_to_keep]
540 | dataset_se$group <- droplevels(dataset_se$group)
541 |
542 | return(dataset_se)
543 | }
544 |
545 |
546 |
547 |
548 |
--------------------------------------------------------------------------------
/R/plotting_functions.R:
--------------------------------------------------------------------------------
1 |
2 | #' make_ranking_violin_plot
3 | #'
4 | #' Plot a panel of violin plots showing the distribution of the 'top' genes of
5 | #' each of query group, across the reference dataset.
6 | #'
7 | #' In the plot output, each panel correponsds to a different group/cluster in
8 | #' the query experiment. The x-axis has the groups in the reference dataset.
9 | #' The y-axis is the rescaled rank of each 'top' gene from the query group,
10 | #' within each reference group.
11 | #'
12 | #' Only the 'top' genes for each query group are plotted, forming the violin
13 | #' plots - each individual gene is shown as a tickmark. Some groups have few
14 | #' top genes, and so their uncertanty can be seen on this plot.
15 | #'
16 | #' The thick black lines reprenset the median gene rescaled ranking for each
17 | #' query group / reference group combination. Having this fall above the dotted
18 | #' median threshold marker is a quick indication of potential similarity.
19 | #' A complete lack of similarity would have a median rank around 0.5. Median
20 | #' rankings much less than 0.5 are common though (an 'anti-cell-groupA'
21 | #' signature), because genes overrepresented in one group in an experiment,
22 | #' are likely to be relatively 'underrepresented' in the other groups.
23 | #' Taken to an
24 | #' extreme, if there are only two reference groups, they'll be complete
25 | #' opposites.
26 | #'
27 | #' Input can be either the precomputed \emph{de_table.marked} object for the
28 | #' comparison, OR both \emph{de_table.test} and \emph{de_table.ref}
29 | #' differential expression results to compare from
30 | #' \code{\link{contrast_each_group_to_the_rest}}
31 | #'
32 | #' @param de_table.marked The output of
33 | #' \code{\link{get_the_up_genes_for_all_possible_groups}}
34 | #' for the contrast of interest.
35 | #' @param de_table.test A differential expression table of the
36 | #' query experiment,
37 | #' as generated from \code{\link{contrast_each_group_to_the_rest}}
38 | #' @param de_table.ref A differential expression table of the
39 | #' reference dataset,
40 | #' as generated from \code{\link{contrast_each_group_to_the_rest}}
41 | #' @param log10trans Plot on a log scale? Useful for distinishing multiple
42 | #' similar, yet distinct cell type that bunch at top of plot. Default=FALSE.
43 | #'
44 | #' @param ... Further options to be passed to
45 | #' \code{\link{get_the_up_genes_for_all_possible_groups}},
46 | #' e.g. rankmetric
47 | #'
48 | #' @return A ggplot object.
49 | #'
50 | #' @examples
51 | #'
52 | #' # Make input
53 | #' # de_table.demo_query <- contrast_each_group_to_the_rest(demo_query_se, "demo_query")
54 | #' # de_table.demo_ref <- contrast_each_group_to_the_rest(demo_ref_se, "demo_ref")
55 | #'
56 | #' # This:
57 | #' make_ranking_violin_plot(de_table.test=de_table.demo_query,
58 | #' de_table.ref=de_table.demo_ref )
59 | #'
60 | #' # Is equivalent to this:
61 | #' de_table.marked.query_vs_ref <-
62 | #' get_the_up_genes_for_all_possible_groups( de_table.test=de_table.demo_query,
63 | #' de_table.ref=de_table.demo_ref)
64 | #' make_ranking_violin_plot(de_table.marked.query_vs_ref)
65 | #'
66 | #'
67 | #' @seealso \code{\link{get_the_up_genes_for_all_possible_groups}} To make
68 | #' the input data.
69 | #'
70 | #'
71 | #' @export
72 | make_ranking_violin_plot <- function(
73 | de_table.marked=NA, de_table.test=NA, de_table.ref=NA, log10trans=FALSE ,
74 | ... ) {
75 |
76 | defined_de_table.marked <- any(! is.na(de_table.marked))
77 | defined_de_table.test <- any(! is.na(de_table.test))
78 | defined_de_table.ref <- any(! is.na(de_table.ref) )
79 |
80 | if ( !defined_de_table.marked
81 | & defined_de_table.test
82 | & defined_de_table.ref ) {
83 | de_table.marked <- get_the_up_genes_for_all_possible_groups(de_table.test,
84 | de_table.ref,
85 | ... )
86 |
87 | } else if (!( defined_de_table.marked
88 | & !defined_de_table.test
89 | & !defined_de_table.ref )) {
90 | stop("Specify either 'de_table.marked' or both de_table.test ",
91 | "AND de_table.ref (naming parameters)")
92 | } #Else, de_table.marked provided, continue
93 |
94 |
95 | if (log10trans) {
96 | #happily, it'll never be 0
97 | de_table.marked$rescaled_rank <- log10(de_table.marked$rescaled_rank)
98 | }
99 |
100 | p <- ggplot2::ggplot(de_table.marked,
101 | ggplot2::aes_string(y='rescaled_rank',
102 | x='group',
103 | fill='group')) +
104 | ggplot2::geom_violin(ggplot2::aes_string(colour='group')) +
105 | ggplot2::geom_point(alpha=0.5, size=3, pch='-', show.legend = FALSE) +
106 | ggplot2::scale_y_reverse() +
107 | ggplot2::ylab("Test geneset rank in reference cluster") +
108 | ggplot2::xlab("") +
109 | ggplot2::stat_summary(fun.y = stats::median,
110 | fun.ymin = stats::median,
111 | fun.ymax = stats::median,
112 | geom = "crossbar",
113 | col="black",
114 | show.legend = FALSE) +
115 | ggplot2::theme_bw() +
116 | ggplot2::theme(panel.grid.major = ggplot2::element_blank(),
117 | panel.grid.minor = ggplot2::element_blank()) +
118 | ggplot2::theme(axis.text.x = ggplot2::element_text(angle = 90,
119 | hjust = 1,
120 | vjust=0.5)) +
121 | ggplot2::facet_wrap(~test_group)
122 |
123 | return (p)
124 | }
125 |
126 |
127 |
128 |
129 |
--------------------------------------------------------------------------------
/data/de_table.demo_query.rda:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/MonashBioinformaticsPlatform/celaref/585f2fb96f8d382803cebea3c6fc7adefa8d2054/data/de_table.demo_query.rda
--------------------------------------------------------------------------------
/data/de_table.demo_ref.rda:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/MonashBioinformaticsPlatform/celaref/585f2fb96f8d382803cebea3c6fc7adefa8d2054/data/de_table.demo_ref.rda
--------------------------------------------------------------------------------
/data/demo_cell_info_table.rda:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/MonashBioinformaticsPlatform/celaref/585f2fb96f8d382803cebea3c6fc7adefa8d2054/data/demo_cell_info_table.rda
--------------------------------------------------------------------------------
/data/demo_counts_matrix.rda:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/MonashBioinformaticsPlatform/celaref/585f2fb96f8d382803cebea3c6fc7adefa8d2054/data/demo_counts_matrix.rda
--------------------------------------------------------------------------------
/data/demo_gene_info_table.rda:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/MonashBioinformaticsPlatform/celaref/585f2fb96f8d382803cebea3c6fc7adefa8d2054/data/demo_gene_info_table.rda
--------------------------------------------------------------------------------
/data/demo_microarray_expr.rda:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/MonashBioinformaticsPlatform/celaref/585f2fb96f8d382803cebea3c6fc7adefa8d2054/data/demo_microarray_expr.rda
--------------------------------------------------------------------------------
/data/demo_microarray_sample_sheet.rda:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/MonashBioinformaticsPlatform/celaref/585f2fb96f8d382803cebea3c6fc7adefa8d2054/data/demo_microarray_sample_sheet.rda
--------------------------------------------------------------------------------
/data/demo_query_se.rda:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/MonashBioinformaticsPlatform/celaref/585f2fb96f8d382803cebea3c6fc7adefa8d2054/data/demo_query_se.rda
--------------------------------------------------------------------------------
/data/demo_ref_se.rda:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/MonashBioinformaticsPlatform/celaref/585f2fb96f8d382803cebea3c6fc7adefa8d2054/data/demo_ref_se.rda
--------------------------------------------------------------------------------
/inst/extdata/demo_microarray_info.tab:
--------------------------------------------------------------------------------
1 | cell_sample group
2 | array1 Dunno
3 | array10 Dunno
4 | array12 Dunno
5 | array24 Dunno
6 | array26 Dunno
7 | array14 Exciting
8 | array17 Exciting
9 | array19 Exciting
10 | array38 Exciting
11 | array41 Exciting
12 | array4 Mystery celltype
13 | array8 Mystery celltype
14 | array13 Mystery celltype
15 | array15 Mystery celltype
16 | array18 Mystery celltype
17 | array9 Weird subtype
18 | array42 Weird subtype
19 | array82 Weird subtype
20 | array95 Weird subtype
21 | array128 Weird subtype
22 |
--------------------------------------------------------------------------------
/inst/extdata/larger_doco_examples.rdata:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/MonashBioinformaticsPlatform/celaref/585f2fb96f8d382803cebea3c6fc7adefa8d2054/inst/extdata/larger_doco_examples.rdata
--------------------------------------------------------------------------------
/inst/extdata/sim_cr_dataset/analysis/clustering/kmeans_4_clusters/clusters.csv:
--------------------------------------------------------------------------------
1 | Barcode,Cluster
2 | Cell2,c4
3 | Cell3,c2
4 | Cell5,c1
5 | Cell6,c3
6 | Cell7,c4
7 | Cell11,c2
8 | Cell16,c4
9 | Cell20,c2
10 | Cell22,c2
11 | Cell23,c3
12 | Cell27,c2
13 | Cell30,c2
14 | Cell31,c3
15 | Cell32,c2
16 | Cell33,c3
17 | Cell34,c3
18 | Cell35,c3
19 | Cell36,c4
20 | Cell39,c4
21 | Cell40,c2
22 | Cell43,c2
23 | Cell48,c3
24 | Cell49,c2
25 | Cell50,c3
26 | Cell56,c3
27 | Cell61,c4
28 | Cell62,c3
29 | Cell63,c4
30 | Cell65,c2
31 | Cell66,c4
32 | Cell69,c3
33 | Cell71,c2
34 | Cell72,c2
35 | Cell73,c1
36 | Cell74,c2
37 | Cell76,c2
38 | Cell80,c3
39 | Cell83,c4
40 | Cell84,c3
41 | Cell87,c2
42 | Cell89,c1
43 | Cell91,c2
44 | Cell96,c4
45 | Cell97,c2
46 | Cell100,c4
47 | Cell101,c4
48 | Cell102,c3
49 | Cell103,c1
50 | Cell104,c3
51 | Cell106,c2
52 | Cell107,c4
53 | Cell111,c3
54 | Cell112,c4
55 | Cell114,c2
56 | Cell116,c4
57 | Cell119,c3
58 | Cell121,c4
59 | Cell122,c4
60 | Cell123,c3
61 | Cell124,c4
62 | Cell125,c4
63 | Cell126,c4
64 | Cell127,c4
65 | Cell129,c1
66 | Cell130,c3
67 | Cell131,c3
68 | Cell132,c2
69 | Cell133,c4
70 | Cell136,c2
71 | Cell137,c4
72 | Cell138,c1
73 | Cell139,c4
74 | Cell143,c4
75 | Cell147,c4
76 | Cell148,c3
77 | Cell149,c2
78 | Cell150,c3
79 | Cell152,c4
80 | Cell154,c2
81 | Cell155,c3
82 | Cell156,c4
83 | Cell158,c3
84 | Cell159,c2
85 | Cell161,c3
86 | Cell163,c2
87 | Cell164,c3
88 | Cell167,c2
89 | Cell170,c4
90 | Cell173,c4
91 | Cell176,c3
92 | Cell177,c4
93 | Cell180,c4
94 | Cell184,c4
95 | Cell187,c2
96 | Cell188,c3
97 | Cell191,c4
98 | Cell193,c3
99 | Cell194,c4
100 | Cell197,c2
101 | Cell199,c4
102 |
--------------------------------------------------------------------------------
/inst/extdata/sim_cr_dataset/filtered_gene_bc_matrices/GRCh38/barcodes.tsv:
--------------------------------------------------------------------------------
1 | Cell2
2 | Cell3
3 | Cell5
4 | Cell6
5 | Cell7
6 | Cell11
7 | Cell16
8 | Cell20
9 | Cell22
10 | Cell23
11 | Cell27
12 | Cell30
13 | Cell31
14 | Cell32
15 | Cell33
16 | Cell34
17 | Cell35
18 | Cell36
19 | Cell39
20 | Cell40
21 | Cell43
22 | Cell48
23 | Cell49
24 | Cell50
25 | Cell56
26 | Cell61
27 | Cell62
28 | Cell63
29 | Cell65
30 | Cell66
31 | Cell69
32 | Cell71
33 | Cell72
34 | Cell73
35 | Cell74
36 | Cell76
37 | Cell80
38 | Cell83
39 | Cell84
40 | Cell87
41 | Cell89
42 | Cell91
43 | Cell96
44 | Cell97
45 | Cell100
46 | Cell101
47 | Cell102
48 | Cell103
49 | Cell104
50 | Cell106
51 | Cell107
52 | Cell111
53 | Cell112
54 | Cell114
55 | Cell116
56 | Cell119
57 | Cell121
58 | Cell122
59 | Cell123
60 | Cell124
61 | Cell125
62 | Cell126
63 | Cell127
64 | Cell129
65 | Cell130
66 | Cell131
67 | Cell132
68 | Cell133
69 | Cell136
70 | Cell137
71 | Cell138
72 | Cell139
73 | Cell143
74 | Cell147
75 | Cell148
76 | Cell149
77 | Cell150
78 | Cell152
79 | Cell154
80 | Cell155
81 | Cell156
82 | Cell158
83 | Cell159
84 | Cell161
85 | Cell163
86 | Cell164
87 | Cell167
88 | Cell170
89 | Cell173
90 | Cell176
91 | Cell177
92 | Cell180
93 | Cell184
94 | Cell187
95 | Cell188
96 | Cell191
97 | Cell193
98 | Cell194
99 | Cell197
100 | Cell199
101 |
--------------------------------------------------------------------------------
/inst/extdata/sim_cr_dataset/filtered_gene_bc_matrices/GRCh38/genes.tsv:
--------------------------------------------------------------------------------
1 | Gene1
2 | Gene2
3 | Gene3
4 | Gene4
5 | Gene5
6 | Gene6
7 | Gene7
8 | Gene8
9 | Gene9
10 | Gene10
11 | Gene11
12 | Gene12
13 | Gene13
14 | Gene14
15 | Gene15
16 | Gene16
17 | Gene17
18 | Gene18
19 | Gene19
20 | Gene20
21 | Gene21
22 | Gene22
23 | Gene23
24 | Gene24
25 | Gene25
26 | Gene26
27 | Gene27
28 | Gene28
29 | Gene29
30 | Gene30
31 | Gene31
32 | Gene32
33 | Gene33
34 | Gene34
35 | Gene35
36 | Gene36
37 | Gene37
38 | Gene38
39 | Gene39
40 | Gene40
41 | Gene41
42 | Gene42
43 | Gene43
44 | Gene44
45 | Gene45
46 | Gene46
47 | Gene47
48 | Gene48
49 | Gene49
50 | Gene50
51 | Gene51
52 | Gene52
53 | Gene53
54 | Gene54
55 | Gene55
56 | Gene56
57 | Gene57
58 | Gene58
59 | Gene59
60 | Gene60
61 | Gene61
62 | Gene62
63 | Gene63
64 | Gene64
65 | Gene65
66 | Gene66
67 | Gene67
68 | Gene68
69 | Gene69
70 | Gene70
71 | Gene71
72 | Gene72
73 | Gene73
74 | Gene74
75 | Gene75
76 | Gene76
77 | Gene77
78 | Gene78
79 | Gene79
80 | Gene80
81 | Gene81
82 | Gene82
83 | Gene83
84 | Gene84
85 | Gene85
86 | Gene86
87 | Gene87
88 | Gene88
89 | Gene89
90 | Gene90
91 | Gene91
92 | Gene92
93 | Gene93
94 | Gene94
95 | Gene95
96 | Gene96
97 | Gene97
98 | Gene98
99 | Gene99
100 | Gene100
101 | Gene101
102 | Gene102
103 | Gene103
104 | Gene104
105 | Gene105
106 | Gene106
107 | Gene107
108 | Gene108
109 | Gene109
110 | Gene110
111 | Gene111
112 | Gene112
113 | Gene113
114 | Gene114
115 | Gene115
116 | Gene116
117 | Gene117
118 | Gene118
119 | Gene119
120 | Gene120
121 | Gene121
122 | Gene122
123 | Gene123
124 | Gene124
125 | Gene125
126 | Gene126
127 | Gene127
128 | Gene128
129 | Gene129
130 | Gene130
131 | Gene131
132 | Gene132
133 | Gene133
134 | Gene134
135 | Gene135
136 | Gene136
137 | Gene137
138 | Gene138
139 | Gene139
140 | Gene140
141 | Gene141
142 | Gene142
143 | Gene143
144 | Gene144
145 | Gene145
146 | Gene146
147 | Gene147
148 | Gene148
149 | Gene149
150 | Gene150
151 | Gene151
152 | Gene152
153 | Gene153
154 | Gene154
155 | Gene155
156 | Gene156
157 | Gene157
158 | Gene158
159 | Gene159
160 | Gene160
161 | Gene161
162 | Gene162
163 | Gene163
164 | Gene164
165 | Gene165
166 | Gene166
167 | Gene167
168 | Gene168
169 | Gene169
170 | Gene170
171 | Gene171
172 | Gene172
173 | Gene173
174 | Gene174
175 | Gene175
176 | Gene176
177 | Gene177
178 | Gene178
179 | Gene179
180 | Gene180
181 | Gene181
182 | Gene182
183 | Gene183
184 | Gene184
185 | Gene185
186 | Gene186
187 | Gene187
188 | Gene188
189 | Gene189
190 | Gene190
191 | Gene191
192 | Gene192
193 | Gene193
194 | Gene194
195 | Gene195
196 | Gene196
197 | Gene197
198 | Gene198
199 | Gene199
200 | Gene200
201 |
--------------------------------------------------------------------------------
/inst/extdata/sim_query_cell_info.tab:
--------------------------------------------------------------------------------
1 | Cell Batch group ExpLibSize
2 | Cell2 Batch1 Group4 55045.211935979336
3 | Cell3 Batch1 Group2 46942.962765139586
4 | Cell5 Batch1 Group1 72692.13311775407
5 | Cell6 Batch1 Group3 50267.15714478282
6 | Cell7 Batch1 Group4 71431.82077986447
7 | Cell11 Batch1 Group2 57801.51179249036
8 | Cell16 Batch1 Group4 60782.525139232435
9 | Cell20 Batch1 Group2 32291.376395933472
10 | Cell22 Batch1 Group2 64176.6589613515
11 | Cell23 Batch1 Group3 59747.499047330995
12 | Cell27 Batch1 Group2 60470.09152848558
13 | Cell30 Batch1 Group2 69802.51829486103
14 | Cell31 Batch1 Group3 55998.58265270814
15 | Cell32 Batch1 Group2 64108.98003048688
16 | Cell33 Batch1 Group3 35474.646469347615
17 | Cell34 Batch1 Group3 60575.56962875652
18 | Cell35 Batch1 Group3 51377.9749089028
19 | Cell36 Batch1 Group4 62404.34849785896
20 | Cell39 Batch1 Group4 54364.24118877251
21 | Cell40 Batch1 Group2 60255.227829446245
22 | Cell43 Batch1 Group2 44600.34473848163
23 | Cell48 Batch1 Group3 66989.11596906635
24 | Cell49 Batch1 Group2 49643.92670026899
25 | Cell50 Batch1 Group3 76592.1406419619
26 | Cell56 Batch1 Group3 98457.3787529401
27 | Cell61 Batch1 Group4 73917.23161412794
28 | Cell62 Batch1 Group3 55466.48636108551
29 | Cell63 Batch1 Group4 46823.27775549405
30 | Cell65 Batch1 Group2 51513.85532245833
31 | Cell66 Batch1 Group4 54505.22979637784
32 | Cell69 Batch1 Group3 74491.16758845026
33 | Cell71 Batch1 Group2 51169.56853776723
34 | Cell72 Batch1 Group2 84853.52779481266
35 | Cell73 Batch1 Group1 47008.062171698446
36 | Cell74 Batch1 Group2 57964.54317140159
37 | Cell76 Batch1 Group2 63575.42053761479
38 | Cell80 Batch1 Group3 57186.93516037964
39 | Cell83 Batch1 Group4 44021.57951072861
40 | Cell84 Batch1 Group3 60670.07180588897
41 | Cell87 Batch1 Group2 66248.06747901038
42 | Cell89 Batch1 Group1 79171.18267921747
43 | Cell91 Batch1 Group2 85936.28365559668
44 | Cell96 Batch1 Group4 52966.215119169225
45 | Cell97 Batch1 Group2 63829.30664853533
46 | Cell100 Batch1 Group4 59095.981328852904
47 | Cell101 Batch1 Group4 59795.92531779838
48 | Cell102 Batch1 Group3 90898.4676206882
49 | Cell103 Batch1 Group1 55420.939959992604
50 | Cell104 Batch1 Group3 69590.99062192626
51 | Cell106 Batch1 Group2 55855.15819670576
52 | Cell107 Batch1 Group4 75738.78220679189
53 | Cell111 Batch1 Group3 56032.77076099252
54 | Cell112 Batch1 Group4 60325.32160538211
55 | Cell114 Batch1 Group2 57698.416843254374
56 | Cell116 Batch1 Group4 48027.30256714334
57 | Cell119 Batch1 Group3 59859.33104285597
58 | Cell121 Batch1 Group4 55903.31429105042
59 | Cell122 Batch1 Group4 45783.67785044417
60 | Cell123 Batch1 Group3 48668.03566329688
61 | Cell124 Batch1 Group4 53512.53003792753
62 | Cell125 Batch1 Group4 53652.21569609339
63 | Cell126 Batch1 Group4 57462.08564837251
64 | Cell127 Batch1 Group4 47253.62682017155
65 | Cell129 Batch1 Group1 53390.57536562024
66 | Cell130 Batch1 Group3 69888.33576605862
67 | Cell131 Batch1 Group3 50933.28208504171
68 | Cell132 Batch1 Group2 57873.64943427228
69 | Cell133 Batch1 Group4 59831.67218172945
70 | Cell136 Batch1 Group2 77171.4079794374
71 | Cell137 Batch1 Group4 95548.54981559514
72 | Cell138 Batch1 Group1 56602.141116393555
73 | Cell139 Batch1 Group4 56601.3159577096
74 | Cell143 Batch1 Group4 59909.04617733183
75 | Cell147 Batch1 Group4 78264.66497132627
76 | Cell148 Batch1 Group3 75516.81549977609
77 | Cell149 Batch1 Group2 65085.63854658878
78 | Cell150 Batch1 Group3 50787.36548481734
79 | Cell152 Batch1 Group4 63987.997020185954
80 | Cell154 Batch1 Group2 60181.518371716695
81 | Cell155 Batch1 Group3 55504.249641624956
82 | Cell156 Batch1 Group4 94403.59117901337
83 | Cell158 Batch1 Group3 69381.95947033467
84 | Cell159 Batch1 Group2 56042.544621374
85 | Cell161 Batch1 Group3 52104.30868933979
86 | Cell163 Batch1 Group2 43188.91094666741
87 | Cell164 Batch1 Group3 75463.09414633608
88 | Cell167 Batch1 Group2 59186.99978757687
89 | Cell170 Batch1 Group4 71418.01533778882
90 | Cell173 Batch1 Group4 53250.77141610082
91 | Cell176 Batch1 Group3 58644.36674368383
92 | Cell177 Batch1 Group4 48901.83438692313
93 | Cell180 Batch1 Group4 85635.94617540664
94 | Cell184 Batch1 Group4 45614.84304096037
95 | Cell187 Batch1 Group2 53423.054389264944
96 | Cell188 Batch1 Group3 64087.436031051926
97 | Cell191 Batch1 Group4 63001.0418395676
98 | Cell193 Batch1 Group3 59702.21760197293
99 | Cell194 Batch1 Group4 69362.5647529825
100 | Cell197 Batch1 Group2 63211.8496261272
101 | Cell199 Batch1 Group4 49670.03482735444
102 | Cell201 Batch1 Group4 61760.324055229874
103 | Cell203 Batch1 Group4 55610.85456721159
104 | Cell204 Batch1 Group4 54099.61390827727
105 | Cell207 Batch1 Group4 74498.04712152222
106 | Cell208 Batch1 Group2 45217.554856646595
107 | Cell210 Batch1 Group2 79239.6468347486
108 | Cell212 Batch1 Group3 53610.78804963595
109 | Cell214 Batch1 Group4 63823.4218287253
110 | Cell215 Batch1 Group3 69466.79167030232
111 | Cell216 Batch1 Group4 90569.66327381074
112 | Cell217 Batch1 Group3 77167.35346164237
113 | Cell219 Batch1 Group3 53251.50839422345
114 | Cell220 Batch1 Group2 58481.37700236844
115 | Cell221 Batch1 Group4 55439.75632970351
116 | Cell223 Batch1 Group3 84559.29206558104
117 | Cell225 Batch1 Group4 58217.26279283668
118 | Cell226 Batch1 Group4 67051.11240341056
119 | Cell228 Batch1 Group1 76564.73024682916
120 | Cell232 Batch1 Group4 38691.538758825904
121 | Cell233 Batch1 Group4 53280.2808677595
122 | Cell234 Batch1 Group2 48173.52500200363
123 | Cell236 Batch1 Group4 72524.4729245371
124 | Cell237 Batch1 Group4 64174.92680381625
125 | Cell238 Batch1 Group4 54816.70556623328
126 | Cell239 Batch1 Group2 55990.728826870036
127 | Cell246 Batch1 Group2 54321.82176169253
128 | Cell247 Batch1 Group2 53360.17465025401
129 | Cell251 Batch1 Group3 46695.29400759923
130 | Cell255 Batch1 Group4 55656.354014515775
131 | Cell256 Batch1 Group3 71952.2459361064
132 | Cell257 Batch1 Group4 81893.50918336774
133 | Cell259 Batch1 Group3 78186.42435296281
134 | Cell260 Batch1 Group4 59869.213068357465
135 | Cell261 Batch1 Group2 74398.95104287741
136 | Cell262 Batch1 Group2 62349.11870707226
137 | Cell264 Batch1 Group2 57504.926294828096
138 | Cell265 Batch1 Group4 60832.27057959668
139 | Cell266 Batch1 Group3 45972.20651528205
140 | Cell269 Batch1 Group2 48448.50013740261
141 | Cell271 Batch1 Group1 56007.57369606149
142 | Cell273 Batch1 Group4 61799.83974912702
143 | Cell274 Batch1 Group3 64634.64664309145
144 | Cell275 Batch1 Group2 51232.623421348115
145 | Cell276 Batch1 Group2 78361.07311185921
146 | Cell277 Batch1 Group4 80830.34291836408
147 | Cell280 Batch1 Group4 67289.1328441728
148 | Cell281 Batch1 Group3 69457.79339855688
149 | Cell283 Batch1 Group4 49448.93383046807
150 | Cell286 Batch1 Group1 51534.88742390086
151 | Cell289 Batch1 Group4 47101.43206249451
152 | Cell291 Batch1 Group4 61647.86726580255
153 | Cell293 Batch1 Group3 58056.78573732392
154 | Cell294 Batch1 Group3 55102.676990906766
155 | Cell295 Batch1 Group4 54475.22518722605
156 | Cell296 Batch1 Group4 44364.39562863301
157 | Cell301 Batch1 Group3 46311.135368258416
158 | Cell302 Batch1 Group4 90744.13913059059
159 | Cell303 Batch1 Group4 55630.879403561725
160 | Cell304 Batch1 Group4 74750.16256368313
161 | Cell306 Batch1 Group2 77460.2380963579
162 | Cell308 Batch1 Group3 75377.2923583555
163 | Cell312 Batch1 Group4 39728.35599603908
164 | Cell317 Batch1 Group2 54947.605965535186
165 | Cell318 Batch1 Group3 60146.32204119423
166 | Cell319 Batch1 Group4 83756.79312788113
167 | Cell322 Batch1 Group4 72666.94045254748
168 | Cell327 Batch1 Group4 58871.142675962226
169 | Cell328 Batch1 Group4 53486.849877579974
170 | Cell332 Batch1 Group3 61433.48853632637
171 | Cell333 Batch1 Group1 58100.61612447167
172 | Cell335 Batch1 Group3 66839.12495747086
173 | Cell337 Batch1 Group4 65401.01185146557
174 | Cell340 Batch1 Group4 39425.533670533776
175 | Cell342 Batch1 Group4 68096.42132102429
176 | Cell343 Batch1 Group4 45095.93081737535
177 | Cell344 Batch1 Group4 66612.03042149784
178 | Cell345 Batch1 Group4 66665.70355166619
179 | Cell346 Batch1 Group4 67671.66083623093
180 | Cell348 Batch1 Group4 84599.48862710161
181 | Cell349 Batch1 Group4 61530.03742538303
182 | Cell351 Batch1 Group4 38535.057433454815
183 | Cell353 Batch1 Group2 75237.76359998228
184 | Cell354 Batch1 Group4 67650.68102592663
185 | Cell356 Batch1 Group2 45820.03577410458
186 | Cell357 Batch1 Group2 79261.91611565062
187 | Cell358 Batch1 Group3 64836.975633354654
188 | Cell360 Batch1 Group3 51805.79877815144
189 | Cell363 Batch1 Group4 50772.38991840925
190 | Cell365 Batch1 Group2 43331.86653196263
191 | Cell366 Batch1 Group2 64636.06226633208
192 | Cell367 Batch1 Group4 80303.48134486075
193 | Cell369 Batch1 Group4 76462.0969793205
194 | Cell371 Batch1 Group2 58218.55534893042
195 | Cell372 Batch1 Group4 69895.79492766099
196 | Cell377 Batch1 Group2 75972.72254440194
197 | Cell378 Batch1 Group4 67390.23969831594
198 | Cell380 Batch1 Group4 106641.73501079918
199 | Cell381 Batch1 Group3 53767.87223004134
200 | Cell383 Batch1 Group3 61774.49750877812
201 | Cell385 Batch1 Group4 60777.03251179295
202 | Cell386 Batch1 Group4 64811.5244158795
203 | Cell392 Batch1 Group4 108499.42301691597
204 | Cell393 Batch1 Group4 75905.72036316429
205 | Cell396 Batch1 Group3 77607.99259447245
206 | Cell404 Batch1 Group2 67568.70056143394
207 | Cell406 Batch1 Group3 49161.92808659108
208 | Cell407 Batch1 Group2 49078.77320243258
209 | Cell408 Batch1 Group4 69089.20159516943
210 | Cell412 Batch1 Group4 95465.68748465856
211 | Cell413 Batch1 Group2 73702.00184719308
212 | Cell414 Batch1 Group2 71394.02567102032
213 | Cell416 Batch1 Group4 81893.11190268018
214 | Cell419 Batch1 Group2 110513.15104214237
215 | Cell420 Batch1 Group4 70522.19271647074
216 | Cell423 Batch1 Group2 58937.64447575753
217 | Cell427 Batch1 Group4 52361.39307076591
218 | Cell429 Batch1 Group2 55180.82222002167
219 | Cell430 Batch1 Group4 70021.82788295683
220 | Cell434 Batch1 Group1 50686.938732602095
221 | Cell435 Batch1 Group2 47065.35656147533
222 | Cell437 Batch1 Group2 56480.13497027942
223 | Cell441 Batch1 Group2 76451.06969737713
224 | Cell444 Batch1 Group4 59937.6313876502
225 | Cell446 Batch1 Group4 62068.95576994645
226 | Cell447 Batch1 Group4 83231.64776864702
227 | Cell448 Batch1 Group4 56051.78479698383
228 | Cell449 Batch1 Group2 52595.8645832927
229 | Cell450 Batch1 Group3 54523.07506508324
230 | Cell452 Batch1 Group1 88912.54012495978
231 | Cell453 Batch1 Group4 47975.37568170933
232 | Cell454 Batch1 Group3 67365.12391083316
233 | Cell458 Batch1 Group4 70581.01015564842
234 | Cell459 Batch1 Group4 69205.01571501343
235 | Cell463 Batch1 Group2 64115.49510965184
236 | Cell464 Batch1 Group4 45090.425997240454
237 | Cell467 Batch1 Group3 51791.76436126253
238 | Cell468 Batch1 Group3 65987.1684275486
239 | Cell469 Batch1 Group3 63497.285946210024
240 | Cell472 Batch1 Group4 47497.64494193834
241 | Cell475 Batch1 Group4 58466.27279262591
242 | Cell479 Batch1 Group3 55160.73373239489
243 | Cell480 Batch1 Group4 65683.64657740944
244 | Cell482 Batch1 Group2 51124.56103667991
245 | Cell486 Batch1 Group3 54496.917493266104
246 | Cell489 Batch1 Group3 73877.84763763017
247 | Cell491 Batch1 Group4 57635.529324289324
248 | Cell495 Batch1 Group4 61390.957814298774
249 | Cell497 Batch1 Group2 52490.66031446426
250 | Cell500 Batch1 Group4 64452.66943504912
251 | Cell502 Batch1 Group4 59427.28297858763
252 | Cell503 Batch1 Group2 71988.13416816102
253 | Cell505 Batch1 Group3 70213.06423993556
254 | Cell509 Batch1 Group2 77470.57359426001
255 | Cell511 Batch1 Group4 82077.149146208
256 | Cell512 Batch1 Group3 56735.65896134725
257 | Cell513 Batch1 Group2 56929.9412186641
258 | Cell515 Batch1 Group3 60528.19275150474
259 | Cell516 Batch1 Group2 44130.96417960521
260 | Cell519 Batch1 Group1 65004.64479156216
261 | Cell522 Batch1 Group3 73051.72183489446
262 | Cell525 Batch1 Group3 83381.01151682546
263 | Cell526 Batch1 Group4 78239.89183872836
264 | Cell530 Batch1 Group3 92857.29314682864
265 | Cell531 Batch1 Group4 64524.440590123755
266 | Cell533 Batch1 Group2 52616.81595754013
267 | Cell534 Batch1 Group4 66837.57564223155
268 | Cell535 Batch1 Group3 68305.10656084202
269 | Cell538 Batch1 Group3 52953.32014207346
270 | Cell539 Batch1 Group3 56548.681933774926
271 | Cell540 Batch1 Group4 45324.30694147125
272 | Cell541 Batch1 Group2 57636.35312419474
273 | Cell544 Batch1 Group2 68469.48418404363
274 | Cell546 Batch1 Group1 50850.45859297336
275 | Cell548 Batch1 Group4 81738.98575687404
276 | Cell551 Batch1 Group4 73565.03679525343
277 | Cell553 Batch1 Group4 58857.092554393195
278 | Cell555 Batch1 Group4 72065.33637405386
279 | Cell556 Batch1 Group3 73333.44656516777
280 | Cell557 Batch1 Group2 78669.73070688167
281 | Cell559 Batch1 Group3 80824.8793849364
282 | Cell560 Batch1 Group3 48006.1976665016
283 | Cell564 Batch1 Group3 59489.81729474708
284 | Cell565 Batch1 Group3 47819.619988074584
285 | Cell567 Batch1 Group4 63290.14850777825
286 | Cell568 Batch1 Group3 62052.18322117015
287 | Cell569 Batch1 Group3 88119.50214326676
288 | Cell572 Batch1 Group3 77014.44501942628
289 | Cell573 Batch1 Group4 83428.89361342315
290 | Cell574 Batch1 Group3 45953.19043948141
291 | Cell576 Batch1 Group2 61950.01382429024
292 | Cell577 Batch1 Group4 57322.9614609041
293 | Cell578 Batch1 Group4 48886.57995672078
294 | Cell581 Batch1 Group2 49748.23779418759
295 | Cell587 Batch1 Group3 62210.62459097196
296 | Cell589 Batch1 Group4 72614.55310010164
297 | Cell592 Batch1 Group4 45335.531676595725
298 | Cell597 Batch1 Group2 61611.073947419725
299 | Cell598 Batch1 Group4 59535.945353196126
300 | Cell600 Batch1 Group3 51968.955654015925
301 | Cell601 Batch1 Group4 56073.94340189981
302 | Cell604 Batch1 Group2 57888.138813698395
303 | Cell606 Batch1 Group4 58233.19098839832
304 | Cell607 Batch1 Group2 42447.481661548176
305 | Cell608 Batch1 Group4 65330.72452582818
306 | Cell610 Batch1 Group3 94173.20045108836
307 | Cell613 Batch1 Group3 73212.51102571294
308 | Cell615 Batch1 Group3 63305.74806568626
309 | Cell617 Batch1 Group4 41103.05759277336
310 | Cell623 Batch1 Group1 46872.24452304599
311 | Cell625 Batch1 Group3 59769.01527724249
312 | Cell630 Batch1 Group2 47735.62398141603
313 | Cell635 Batch1 Group4 50328.34791033202
314 | Cell636 Batch1 Group2 53645.0813665348
315 | Cell637 Batch1 Group2 33066.639198324694
316 | Cell640 Batch1 Group2 70787.7539691832
317 | Cell642 Batch1 Group2 67478.32266456228
318 | Cell644 Batch1 Group3 37342.0804662592
319 | Cell645 Batch1 Group4 64732.0231081017
320 | Cell648 Batch1 Group4 70687.07618562752
321 | Cell649 Batch1 Group4 62117.709615804466
322 | Cell651 Batch1 Group4 66762.34875113204
323 | Cell653 Batch1 Group3 51337.98052996924
324 | Cell654 Batch1 Group4 57174.79404452918
325 | Cell655 Batch1 Group4 52219.20595934258
326 | Cell656 Batch1 Group4 65032.75449480318
327 | Cell657 Batch1 Group1 56481.82375922139
328 | Cell658 Batch1 Group2 77642.16706770551
329 | Cell663 Batch1 Group4 44475.65957829542
330 | Cell666 Batch1 Group2 54300.99028714421
331 | Cell667 Batch1 Group4 52202.42552486367
332 | Cell669 Batch1 Group2 63381.59168839421
333 | Cell671 Batch1 Group3 68583.56220842896
334 | Cell672 Batch1 Group2 66072.45716488647
335 | Cell675 Batch1 Group4 43541.45031715768
336 | Cell679 Batch1 Group2 53196.216005959795
337 | Cell680 Batch1 Group4 56839.063347097224
338 | Cell681 Batch1 Group2 49420.4555093929
339 | Cell682 Batch1 Group3 64654.22505009
340 | Cell683 Batch1 Group4 54296.256376663674
341 | Cell684 Batch1 Group3 48325.1824836285
342 | Cell687 Batch1 Group3 73337.32420781028
343 | Cell689 Batch1 Group3 64357.10570963703
344 | Cell691 Batch1 Group4 60182.7172869254
345 | Cell692 Batch1 Group4 66259.09419562017
346 | Cell695 Batch1 Group4 49012.67028671377
347 | Cell696 Batch1 Group4 60763.59646618964
348 | Cell697 Batch1 Group2 57538.088270904795
349 | Cell701 Batch1 Group2 45188.24825655201
350 | Cell704 Batch1 Group3 40370.86441903496
351 | Cell708 Batch1 Group4 67620.48993825342
352 | Cell710 Batch1 Group4 67519.48603742913
353 | Cell713 Batch1 Group4 74080.3374718697
354 | Cell714 Batch1 Group3 55757.673763471845
355 | Cell715 Batch1 Group1 62097.84979431549
356 | Cell717 Batch1 Group4 55436.42114769647
357 | Cell718 Batch1 Group4 44566.189825468195
358 | Cell719 Batch1 Group4 55892.958365087725
359 | Cell721 Batch1 Group2 64303.51464051373
360 | Cell722 Batch1 Group4 53719.070337791214
361 | Cell724 Batch1 Group4 65997.28103442215
362 | Cell734 Batch1 Group4 58931.651174542116
363 | Cell735 Batch1 Group4 44891.869352932314
364 | Cell740 Batch1 Group4 69768.58923555358
365 | Cell743 Batch1 Group3 59375.048972886725
366 | Cell744 Batch1 Group4 72255.56405499107
367 | Cell745 Batch1 Group4 77042.51042511035
368 | Cell753 Batch1 Group2 41460.7757666524
369 | Cell756 Batch1 Group1 46849.941934094
370 | Cell757 Batch1 Group2 66244.40127587176
371 | Cell760 Batch1 Group2 61143.684854501145
372 | Cell761 Batch1 Group2 83473.00264850094
373 | Cell762 Batch1 Group3 49202.39835700463
374 | Cell763 Batch1 Group3 70650.22771710184
375 | Cell765 Batch1 Group2 64722.34589481799
376 | Cell766 Batch1 Group2 65664.0322938361
377 | Cell767 Batch1 Group1 41868.395254806506
378 | Cell768 Batch1 Group2 47643.78854594363
379 | Cell769 Batch1 Group2 90364.64509068437
380 | Cell770 Batch1 Group3 62335.71527106323
381 | Cell772 Batch1 Group4 65766.53076444786
382 | Cell775 Batch1 Group4 47903.74573186333
383 | Cell776 Batch1 Group4 45784.37285802186
384 | Cell777 Batch1 Group2 64256.4262129188
385 | Cell778 Batch1 Group3 54068.79605332756
386 | Cell779 Batch1 Group1 78587.15708901401
387 | Cell780 Batch1 Group4 46984.13480352631
388 | Cell783 Batch1 Group4 78333.46080900206
389 | Cell784 Batch1 Group4 87659.96078369247
390 | Cell786 Batch1 Group2 56566.506938958526
391 | Cell788 Batch1 Group2 43713.58868696966
392 | Cell789 Batch1 Group4 39559.39112929476
393 | Cell794 Batch1 Group4 64575.233010369855
394 | Cell796 Batch1 Group3 52081.273242571486
395 | Cell798 Batch1 Group4 71747.80263367867
396 | Cell799 Batch1 Group3 57641.734168853494
397 | Cell800 Batch1 Group4 54863.9336016859
398 | Cell803 Batch1 Group2 56557.29627510162
399 | Cell804 Batch1 Group3 57492.37701488207
400 | Cell805 Batch1 Group3 46975.80631112637
401 | Cell808 Batch1 Group2 62551.94574169659
402 | Cell809 Batch1 Group1 51342.46987463979
403 | Cell810 Batch1 Group4 53065.80775883584
404 | Cell811 Batch1 Group2 51674.50351739878
405 | Cell818 Batch1 Group2 59266.86315473463
406 | Cell819 Batch1 Group2 72030.03576858308
407 | Cell821 Batch1 Group3 70880.8312828581
408 | Cell823 Batch1 Group2 56911.76592546331
409 | Cell824 Batch1 Group4 40685.66252848761
410 | Cell825 Batch1 Group3 62911.386250443364
411 | Cell826 Batch1 Group4 57101.73904288724
412 | Cell835 Batch1 Group3 65448.91980892367
413 | Cell837 Batch1 Group4 54238.85692070112
414 | Cell841 Batch1 Group3 44869.954906124294
415 | Cell843 Batch1 Group4 55988.887441902145
416 | Cell848 Batch1 Group2 70249.55818106886
417 | Cell851 Batch1 Group4 52649.020712481135
418 | Cell852 Batch1 Group4 61823.67740393481
419 | Cell854 Batch1 Group2 51911.90776104299
420 | Cell855 Batch1 Group2 73448.55949409209
421 | Cell856 Batch1 Group4 66099.24997674157
422 | Cell857 Batch1 Group2 58099.889187982175
423 | Cell858 Batch1 Group2 71596.41862670988
424 | Cell859 Batch1 Group2 58176.605365988544
425 | Cell866 Batch1 Group1 52303.208531428674
426 | Cell868 Batch1 Group1 43561.982005104255
427 | Cell873 Batch1 Group2 59442.417067045724
428 | Cell875 Batch1 Group2 56182.70972715173
429 | Cell878 Batch1 Group2 67139.7507150235
430 | Cell879 Batch1 Group3 55951.18801485375
431 | Cell882 Batch1 Group3 73063.24062862612
432 | Cell887 Batch1 Group1 54868.651642313365
433 | Cell888 Batch1 Group4 66045.06591550316
434 | Cell890 Batch1 Group4 53815.54419620353
435 | Cell891 Batch1 Group3 60458.723744747185
436 | Cell894 Batch1 Group4 51143.52254020912
437 | Cell895 Batch1 Group1 74432.51608438807
438 | Cell897 Batch1 Group3 72014.09703285371
439 | Cell898 Batch1 Group4 63831.86378201816
440 | Cell899 Batch1 Group4 73473.5934320163
441 | Cell901 Batch1 Group1 67425.61946244954
442 | Cell903 Batch1 Group1 58838.40718690198
443 | Cell907 Batch1 Group1 59450.43405054954
444 | Cell909 Batch1 Group4 37619.55516116481
445 | Cell910 Batch1 Group4 52222.36485086904
446 | Cell911 Batch1 Group3 47838.310240377126
447 | Cell914 Batch1 Group4 65577.56871884658
448 | Cell915 Batch1 Group2 43895.9051042549
449 | Cell918 Batch1 Group3 69769.09190087122
450 | Cell919 Batch1 Group4 61216.7945059993
451 | Cell920 Batch1 Group4 70630.25080127409
452 | Cell927 Batch1 Group2 51289.46481187577
453 | Cell931 Batch1 Group3 64130.41694661509
454 | Cell934 Batch1 Group2 78308.23885334641
455 | Cell941 Batch1 Group2 64525.783487762674
456 | Cell942 Batch1 Group4 54755.250906064444
457 | Cell943 Batch1 Group2 60170.130978370165
458 | Cell944 Batch1 Group2 57845.116354270795
459 | Cell945 Batch1 Group2 83507.54364238998
460 | Cell946 Batch1 Group2 101432.36859619302
461 | Cell951 Batch1 Group3 54296.37830393065
462 | Cell952 Batch1 Group3 45202.496889546295
463 | Cell955 Batch1 Group3 64438.533682449255
464 | Cell959 Batch1 Group4 85432.79490783332
465 | Cell962 Batch1 Group4 58692.108713920694
466 | Cell963 Batch1 Group4 73959.54568610046
467 | Cell967 Batch1 Group2 64688.77009092675
468 | Cell969 Batch1 Group3 50729.75002803493
469 | Cell970 Batch1 Group4 66381.80890662316
470 | Cell972 Batch1 Group2 50881.56943644793
471 | Cell973 Batch1 Group2 101075.54404900614
472 | Cell975 Batch1 Group2 70832.73872971303
473 | Cell976 Batch1 Group2 75159.32291042985
474 | Cell978 Batch1 Group2 70867.30161605243
475 | Cell979 Batch1 Group4 57250.88799421547
476 | Cell981 Batch1 Group3 62461.18938637405
477 | Cell982 Batch1 Group2 39859.54803445133
478 | Cell984 Batch1 Group4 66110.07926926417
479 | Cell986 Batch1 Group4 69059.71894564742
480 | Cell988 Batch1 Group4 59239.85517944907
481 | Cell992 Batch1 Group4 50334.1456987133
482 | Cell994 Batch1 Group2 61957.68939191427
483 | Cell996 Batch1 Group2 63270.44124687236
484 | Cell997 Batch1 Group2 51036.5735110997
485 | Cell998 Batch1 Group3 58885.13566264156
486 | Cell999 Batch1 Group2 53571.58520836509
487 |
--------------------------------------------------------------------------------
/inst/extdata/sim_query_gene_info.tab:
--------------------------------------------------------------------------------
1 | Gene BaseGeneMean
2 | Gene1 15.070166941865256
3 | Gene2 1.3214264125078057
4 | Gene3 2.6471940552392117
5 | Gene4 0.4314126274103263
6 | Gene5 1.3263294447897767
7 | Gene6 2.6556251736653076
8 | Gene7 1.5087404287107207
9 | Gene8 2.3980012018206116
10 | Gene9 1.174393945899882
11 | Gene10 0.16520007149995963
12 | Gene11 0.6063222399660461
13 | Gene12 1.1567203151084895
14 | Gene13 0.5697996434879279
15 | Gene14 5.910077587592439
16 | Gene15 1.2584703860405693
17 | Gene16 3.7312510321195247
18 | Gene17 3.732506094739015
19 | Gene18 0.9527541186413107
20 | Gene19 0.8299824883603495
21 | Gene20 5.419859401553251
22 | Gene21 0.12596021483220884
23 | Gene22 0.2838042849946762
24 | Gene23 0.19182208419572352
25 | Gene24 3.1676347186761196
26 | Gene25 0.5341929920646847
27 | Gene26 4.869121526664764
28 | Gene27 2.6688329911337663
29 | Gene28 0.8680950634977899
30 | Gene29 1.6676711932219812
31 | Gene30 0.7773442566244723
32 | Gene31 0.13776280782261727
33 | Gene32 1.62467446974602
34 | Gene33 3.408362066133117
35 | Gene34 0.1551462319196136
36 | Gene35 1.3693638794774319
37 | Gene36 6.539464980895205
38 | Gene37 0.9702595159605595
39 | Gene38 4.936128580990508e-4
40 | Gene39 1.4130366607242324
41 | Gene40 7.1535748703456825
42 | Gene41 0.8945476522623658
43 | Gene42 0.5154778965230534
44 | Gene43 5.282330848092755
45 | Gene44 3.53610188530163
46 | Gene45 5.439591965553024
47 | Gene46 0.5049124358836893
48 | Gene47 1.4218930614187446
49 | Gene48 0.3595905853302735
50 | Gene49 5.119043134371494
51 | Gene50 0.2502446473365546
52 | Gene51 0.006118641284641165
53 | Gene52 0.012327696958109266
54 | Gene53 2.270533237587709
55 | Gene54 0.2277377522940588
56 | Gene55 0.24543799611272676
57 | Gene56 1.0140785160835557
58 | Gene57 3.2974078405543414
59 | Gene58 3.1500104445427897
60 | Gene59 0.7890490693371935
61 | Gene60 6.322817718066329
62 | Gene61 0.418082717549265
63 | Gene62 2.034700826504123
64 | Gene63 2.440291341133348
65 | Gene64 0.002391602636955899
66 | Gene65 0.6029098216167582
67 | Gene66 3.3016014257423305
68 | Gene67 0.7482611461409928
69 | Gene68 0.12812044564547306
70 | Gene69 0.35066175036892044
71 | Gene70 0.1516936710876119
72 | Gene71 0.009882651695215254
73 | Gene72 0.31386242457573105
74 | Gene73 0.29994401190711634
75 | Gene74 1.4015772692111752
76 | Gene75 1.2497823102933978
77 | Gene76 0.5632321953622131
78 | Gene77 1.6801835300877066
79 | Gene78 0.2984888953070142
80 | Gene79 1.180053288644517
81 | Gene80 3.496870702720702
82 | Gene81 3.031213248293015
83 | Gene82 2.4544117966947323
84 | Gene83 0.6426568584477319
85 | Gene84 2.118998746932126
86 | Gene85 5.998968724366211
87 | Gene86 0.6370660786673482
88 | Gene87 4.878501798012513
89 | Gene88 4.236375736446719
90 | Gene89 2.284970524462676
91 | Gene90 0.9048320211852259
92 | Gene91 3.328864243871868
93 | Gene92 5.99485292531768
94 | Gene93 0.5110956753927606
95 | Gene94 0.667408242586006
96 | Gene95 0.4217533489766145
97 | Gene96 13.667521456964762
98 | Gene97 12.94918139728537
99 | Gene98 2.7655551566988636
100 | Gene99 0.026034366925850677
101 | Gene100 8.41958370917556
102 | Gene101 0.629002401311236
103 | Gene102 0.3984773552739324
104 | Gene103 6.648385010819624
105 | Gene104 0.52025440038014
106 | Gene105 0.6041210844214472
107 | Gene106 1.2171295700541882
108 | Gene107 2.1725070518596286
109 | Gene108 0.8655766682126647
110 | Gene109 0.20528408017518773
111 | Gene110 0.2871928791336786
112 | Gene111 0.6272521086349571
113 | Gene112 0.20938989155484852
114 | Gene113 1.1982525476674017
115 | Gene114 2.161009771439923
116 | Gene115 0.2108280997361641
117 | Gene116 5.490780408446104
118 | Gene117 7.901244927017148
119 | Gene118 1.6497365669481299
120 | Gene119 4.764385944692947
121 | Gene120 0.23214880715569838
122 | Gene121 1.826709920091376
123 | Gene122 2.222420903422962
124 | Gene123 0.28732215327512833
125 | Gene124 0.3270285059720184
126 | Gene125 0.028345690324059893
127 | Gene126 0.6385121429116922
128 | Gene127 0.45777438372310086
129 | Gene128 3.564006959063369
130 | Gene129 0.22257284891712545
131 | Gene130 3.06866579942385
132 | Gene131 0.010164450289338305
133 | Gene132 2.2159808283796263
134 | Gene133 0.18777347338931308
135 | Gene134 0.15308684636813338
136 | Gene135 5.029457080816111
137 | Gene136 0.19252884382655167
138 | Gene137 4.817288261266247
139 | Gene138 1.3571315484166406
140 | Gene139 0.9551584585099973
141 | Gene140 6.358966828719138
142 | Gene141 0.04758510724490084
143 | Gene142 0.5387642785968534
144 | Gene143 0.40903272802201124
145 | Gene144 0.9046274593654208
146 | Gene145 2.5300499331280255
147 | Gene146 0.007103927806247059
148 | Gene147 0.06617422376829078
149 | Gene148 1.1814835278920117
150 | Gene149 0.00730382422326564
151 | Gene150 4.523440167023822
152 | Gene151 5.0714787992616595
153 | Gene152 0.058381702597344695
154 | Gene153 9.849902477078747e-4
155 | Gene154 2.2714248681125615
156 | Gene155 2.8062016343972034
157 | Gene156 0.005019413599753972
158 | Gene157 1.1414874803343211
159 | Gene158 7.98467666157701
160 | Gene159 4.750130376606671
161 | Gene160 0.41449314009314947
162 | Gene161 0.15629966419234909
163 | Gene162 2.0033323934363065
164 | Gene163 1.9133958051043465
165 | Gene164 1.8693826892142416
166 | Gene165 4.264060971136903
167 | Gene166 0.9041637960610055
168 | Gene167 1.7028366821036134
169 | Gene168 0.11251934733987087
170 | Gene169 1.9744619054281058
171 | Gene170 0.7657225174288028
172 | Gene171 2.9666842911519877
173 | Gene172 2.5384890421122717
174 | Gene173 0.03553526322132183
175 | Gene174 0.17140160127119666
176 | Gene175 0.5375172760119603
177 | Gene176 1.1957388264762312
178 | Gene177 2.233141923375682
179 | Gene178 5.910669733366996
180 | Gene179 19.45900667463009
181 | Gene180 3.4749784094647818
182 | Gene181 1.813842633144611
183 | Gene182 3.875603753958344
184 | Gene183 0.36373824951414807
185 | Gene184 0.15860239608952753
186 | Gene185 1.5602047184804921
187 | Gene186 4.6993917033637205
188 | Gene187 8.789618789385647
189 | Gene188 0.7302572334579468
190 | Gene189 7.049027042557773
191 | Gene190 0.8125960912689993
192 | Gene191 0.43486651275110694
193 | Gene192 0.0038062829718254923
194 | Gene193 0.13273796373459007
195 | Gene194 0.5723144816254031
196 | Gene195 0.46429625203852326
197 | Gene196 4.781896261324305
198 | Gene197 0.0562914344264723
199 | Gene198 3.982487795926734
200 | Gene199 0.05075770320152484
201 | Gene200 1.329913146761545
202 |
--------------------------------------------------------------------------------
/inst/extdata/sim_ref_cell_info.tab:
--------------------------------------------------------------------------------
1 | Cell Batch group ExpLibSize
2 | Cell1 Batch1 Dunno 81807.75476847978
3 | Cell4 Batch1 Mystery celltype 62730.6390003853
4 | Cell8 Batch1 Mystery celltype 71526.13637673049
5 | Cell9 Batch1 Weird subtype 55765.83796711112
6 | Cell10 Batch1 Dunno 57154.759125106255
7 | Cell12 Batch1 Dunno 44653.9600760209
8 | Cell13 Batch1 Mystery celltype 56977.54532670995
9 | Cell14 Batch1 Exciting 79621.80423861962
10 | Cell15 Batch1 Mystery celltype 73021.20728154207
11 | Cell17 Batch1 Exciting 68663.57960914526
12 | Cell18 Batch1 Mystery celltype 61380.02858108507
13 | Cell19 Batch1 Exciting 49105.04257428885
14 | Cell21 Batch1 Mystery celltype 48745.50674600157
15 | Cell24 Batch1 Dunno 80939.53138566627
16 | Cell25 Batch1 Mystery celltype 60337.961437687976
17 | Cell26 Batch1 Dunno 71523.80437229766
18 | Cell28 Batch1 Dunno 51926.06440250282
19 | Cell29 Batch1 Dunno 58694.172070632536
20 | Cell37 Batch1 Mystery celltype 51880.90421612222
21 | Cell38 Batch1 Exciting 64803.71304274764
22 | Cell41 Batch1 Exciting 52603.54365645042
23 | Cell42 Batch1 Weird subtype 44843.82704880153
24 | Cell44 Batch1 Mystery celltype 68350.05378878301
25 | Cell45 Batch1 Mystery celltype 63382.24898997591
26 | Cell46 Batch1 Exciting 60935.07649125598
27 | Cell47 Batch1 Mystery celltype 56677.477360313824
28 | Cell51 Batch1 Dunno 47362.250320482985
29 | Cell52 Batch1 Exciting 57894.476854201625
30 | Cell53 Batch1 Exciting 70101.1065954678
31 | Cell54 Batch1 Dunno 59282.29335149211
32 | Cell55 Batch1 Mystery celltype 55565.21798107001
33 | Cell57 Batch1 Mystery celltype 91518.44169688094
34 | Cell58 Batch1 Exciting 56104.558317678595
35 | Cell59 Batch1 Mystery celltype 56100.782528681935
36 | Cell60 Batch1 Mystery celltype 74079.76508395211
37 | Cell64 Batch1 Dunno 51327.57192778397
38 | Cell67 Batch1 Dunno 56461.29586910634
39 | Cell68 Batch1 Dunno 54679.35005965752
40 | Cell70 Batch1 Exciting 70215.6929388313
41 | Cell75 Batch1 Exciting 57225.43833009415
42 | Cell77 Batch1 Dunno 29879.48368797497
43 | Cell78 Batch1 Mystery celltype 58249.49191451221
44 | Cell79 Batch1 Dunno 62975.96035025468
45 | Cell81 Batch1 Exciting 66246.82544356218
46 | Cell82 Batch1 Weird subtype 93246.34528982667
47 | Cell85 Batch1 Exciting 79589.44691883637
48 | Cell86 Batch1 Exciting 64036.026674086905
49 | Cell88 Batch1 Mystery celltype 58718.03624087147
50 | Cell90 Batch1 Exciting 64365.74649030141
51 | Cell92 Batch1 Mystery celltype 66218.46267656592
52 | Cell93 Batch1 Exciting 72973.980033263
53 | Cell94 Batch1 Dunno 79160.24380591838
54 | Cell95 Batch1 Weird subtype 64088.031522976635
55 | Cell98 Batch1 Dunno 53420.482214087046
56 | Cell99 Batch1 Mystery celltype 55899.847634598016
57 | Cell105 Batch1 Exciting 82964.58191610064
58 | Cell108 Batch1 Mystery celltype 64238.852406326834
59 | Cell109 Batch1 Dunno 56544.30180616188
60 | Cell110 Batch1 Dunno 58837.71107179407
61 | Cell113 Batch1 Exciting 39050.78756002819
62 | Cell115 Batch1 Mystery celltype 42787.25855694832
63 | Cell117 Batch1 Dunno 64127.35849047081
64 | Cell118 Batch1 Dunno 53634.30521743906
65 | Cell120 Batch1 Dunno 62831.76965151383
66 | Cell128 Batch1 Weird subtype 49168.365335475275
67 | Cell134 Batch1 Exciting 48870.960893990785
68 | Cell135 Batch1 Weird subtype 54873.72119222294
69 | Cell140 Batch1 Mystery celltype 54859.58984385942
70 | Cell141 Batch1 Exciting 83376.68228099668
71 | Cell142 Batch1 Mystery celltype 55971.30698920121
72 | Cell144 Batch1 Exciting 46879.47787048267
73 | Cell145 Batch1 Exciting 67878.99955034911
74 | Cell146 Batch1 Dunno 55562.100480375855
75 | Cell151 Batch1 Exciting 74276.64780383078
76 | Cell153 Batch1 Exciting 56111.32170147118
77 | Cell157 Batch1 Mystery celltype 69035.40856601347
78 | Cell160 Batch1 Mystery celltype 71638.95396712197
79 | Cell162 Batch1 Mystery celltype 59310.28826991561
80 | Cell165 Batch1 Mystery celltype 54678.946737595004
81 | Cell166 Batch1 Mystery celltype 57245.59746935213
82 | Cell168 Batch1 Exciting 73350.32963614837
83 | Cell169 Batch1 Exciting 53979.55283679607
84 | Cell171 Batch1 Mystery celltype 51236.72745269203
85 | Cell172 Batch1 Dunno 72477.87092243957
86 | Cell174 Batch1 Weird subtype 48998.17955999089
87 | Cell175 Batch1 Mystery celltype 67618.9003205934
88 | Cell178 Batch1 Mystery celltype 64338.60678140598
89 | Cell179 Batch1 Dunno 82347.14301485407
90 | Cell181 Batch1 Mystery celltype 50579.42513230821
91 | Cell182 Batch1 Dunno 70628.91792338541
92 | Cell183 Batch1 Dunno 73006.42094973422
93 | Cell185 Batch1 Exciting 70256.1424370556
94 | Cell186 Batch1 Exciting 44968.786524416486
95 | Cell189 Batch1 Dunno 69565.64579270386
96 | Cell190 Batch1 Exciting 70875.24020658476
97 | Cell192 Batch1 Dunno 60129.941634371746
98 | Cell195 Batch1 Exciting 49205.88335127697
99 | Cell196 Batch1 Mystery celltype 39889.40310194166
100 | Cell198 Batch1 Mystery celltype 63159.400769407235
101 | Cell200 Batch1 Mystery celltype 47032.689748694414
102 | Cell202 Batch1 Exciting 65503.045438960275
103 | Cell205 Batch1 Exciting 52580.34831716724
104 | Cell206 Batch1 Mystery celltype 73953.2545036953
105 | Cell209 Batch1 Exciting 86205.33155590425
106 | Cell211 Batch1 Exciting 62282.41608670635
107 | Cell213 Batch1 Mystery celltype 62199.789214546705
108 | Cell218 Batch1 Mystery celltype 59298.18336733352
109 | Cell222 Batch1 Weird subtype 43655.17044370976
110 | Cell224 Batch1 Mystery celltype 48351.48506715231
111 | Cell227 Batch1 Exciting 75763.7955813687
112 | Cell229 Batch1 Mystery celltype 60655.41686891719
113 | Cell230 Batch1 Dunno 58779.80516234371
114 | Cell231 Batch1 Exciting 49546.80040872587
115 | Cell235 Batch1 Exciting 53141.669888693665
116 | Cell240 Batch1 Mystery celltype 58713.082429138136
117 | Cell241 Batch1 Mystery celltype 76112.73294680292
118 | Cell242 Batch1 Mystery celltype 50450.87852683166
119 | Cell243 Batch1 Mystery celltype 75652.89691471095
120 | Cell244 Batch1 Weird subtype 49912.80449902912
121 | Cell245 Batch1 Exciting 51187.39363283295
122 | Cell248 Batch1 Mystery celltype 58215.93764980519
123 | Cell249 Batch1 Exciting 68906.31864078039
124 | Cell250 Batch1 Mystery celltype 68610.31209030513
125 | Cell252 Batch1 Mystery celltype 42757.503340227704
126 | Cell253 Batch1 Mystery celltype 48064.019523638206
127 | Cell254 Batch1 Exciting 70366.73081233972
128 | Cell258 Batch1 Exciting 87038.13312069164
129 | Cell263 Batch1 Dunno 88283.01861665332
130 | Cell267 Batch1 Dunno 62275.42505737917
131 | Cell268 Batch1 Dunno 55921.35946356151
132 | Cell270 Batch1 Dunno 46562.23963203289
133 | Cell272 Batch1 Dunno 58826.122208443354
134 | Cell278 Batch1 Dunno 56665.444451203424
135 | Cell279 Batch1 Mystery celltype 85926.43198037753
136 | Cell282 Batch1 Exciting 42968.7540495921
137 | Cell284 Batch1 Mystery celltype 77742.70394705818
138 | Cell285 Batch1 Mystery celltype 47166.67198543781
139 | Cell287 Batch1 Dunno 50111.2502275346
140 | Cell288 Batch1 Dunno 82493.90597167228
141 | Cell290 Batch1 Dunno 67881.40035008473
142 | Cell292 Batch1 Mystery celltype 59215.0103723406
143 | Cell297 Batch1 Mystery celltype 55774.158236177835
144 | Cell298 Batch1 Dunno 66644.28042367943
145 | Cell299 Batch1 Dunno 78552.69644137434
146 | Cell300 Batch1 Exciting 71322.50268357595
147 | Cell305 Batch1 Mystery celltype 98096.9655645084
148 | Cell307 Batch1 Exciting 73956.1818316257
149 | Cell309 Batch1 Dunno 53629.99440978181
150 | Cell310 Batch1 Mystery celltype 73586.46368562838
151 | Cell311 Batch1 Dunno 69426.46218642656
152 | Cell313 Batch1 Dunno 73736.89330866672
153 | Cell314 Batch1 Exciting 77510.7842623073
154 | Cell315 Batch1 Mystery celltype 62970.71822940966
155 | Cell316 Batch1 Exciting 87257.9578797986
156 | Cell320 Batch1 Dunno 72457.7227430852
157 | Cell321 Batch1 Exciting 57874.94767184463
158 | Cell323 Batch1 Exciting 62349.33386915656
159 | Cell324 Batch1 Dunno 58651.652601725225
160 | Cell325 Batch1 Exciting 56546.17817691532
161 | Cell326 Batch1 Exciting 54768.976058638
162 | Cell329 Batch1 Mystery celltype 87593.28762181934
163 | Cell330 Batch1 Dunno 51036.400212527355
164 | Cell331 Batch1 Dunno 81041.87901135806
165 | Cell334 Batch1 Mystery celltype 61691.92665258174
166 | Cell336 Batch1 Dunno 58992.66535025075
167 | Cell338 Batch1 Dunno 52063.4996265077
168 | Cell339 Batch1 Mystery celltype 49542.437593599825
169 | Cell341 Batch1 Dunno 60508.588401668974
170 | Cell347 Batch1 Mystery celltype 81185.27119543543
171 | Cell350 Batch1 Dunno 65539.07591285396
172 | Cell352 Batch1 Exciting 54313.644396865864
173 | Cell355 Batch1 Exciting 70403.87619190854
174 | Cell359 Batch1 Mystery celltype 58294.253745828355
175 | Cell361 Batch1 Exciting 64073.97559123764
176 | Cell362 Batch1 Exciting 48932.612529710124
177 | Cell364 Batch1 Exciting 50986.72561130079
178 | Cell368 Batch1 Mystery celltype 48451.51059068924
179 | Cell370 Batch1 Exciting 60143.409875790916
180 | Cell373 Batch1 Exciting 101255.23768731704
181 | Cell374 Batch1 Mystery celltype 68037.66852231452
182 | Cell375 Batch1 Exciting 90439.55606548753
183 | Cell376 Batch1 Exciting 68892.6046870921
184 | Cell379 Batch1 Dunno 51410.75193092789
185 | Cell382 Batch1 Mystery celltype 59950.403336530835
186 | Cell384 Batch1 Exciting 85139.82979429308
187 | Cell387 Batch1 Mystery celltype 52360.47569417261
188 | Cell388 Batch1 Dunno 58256.2485453605
189 | Cell389 Batch1 Exciting 42724.60209830365
190 | Cell390 Batch1 Exciting 57007.70297865485
191 | Cell391 Batch1 Dunno 66072.25864104759
192 | Cell394 Batch1 Mystery celltype 74182.07163435398
193 | Cell395 Batch1 Dunno 47631.59757952375
194 | Cell397 Batch1 Exciting 53165.4524317683
195 | Cell398 Batch1 Dunno 83096.6506509114
196 | Cell399 Batch1 Dunno 59888.83557096457
197 | Cell400 Batch1 Mystery celltype 60745.74881671979
198 | Cell401 Batch1 Dunno 59474.08398927108
199 | Cell402 Batch1 Mystery celltype 67329.54672421503
200 | Cell403 Batch1 Mystery celltype 63082.571728269155
201 | Cell405 Batch1 Dunno 42108.029073011436
202 | Cell409 Batch1 Exciting 56078.862237949266
203 | Cell410 Batch1 Exciting 61301.194330446066
204 | Cell411 Batch1 Exciting 93972.76158956725
205 | Cell415 Batch1 Dunno 68818.54370560612
206 | Cell417 Batch1 Mystery celltype 69596.63839047254
207 | Cell418 Batch1 Dunno 54643.34342350498
208 | Cell421 Batch1 Dunno 49502.5247815001
209 | Cell422 Batch1 Mystery celltype 69352.93633465214
210 | Cell424 Batch1 Mystery celltype 55403.50513551321
211 | Cell425 Batch1 Weird subtype 51357.607136771556
212 | Cell426 Batch1 Mystery celltype 37802.08995695506
213 | Cell428 Batch1 Dunno 47395.02408822196
214 | Cell431 Batch1 Mystery celltype 57760.294055991326
215 | Cell432 Batch1 Exciting 55727.42997261764
216 | Cell433 Batch1 Exciting 49075.48660293653
217 | Cell436 Batch1 Dunno 54957.357170455674
218 | Cell438 Batch1 Dunno 73249.70537624441
219 | Cell439 Batch1 Exciting 83176.58833450306
220 | Cell440 Batch1 Dunno 50865.81695062725
221 | Cell442 Batch1 Exciting 64701.1256678348
222 | Cell443 Batch1 Exciting 59014.02721629023
223 | Cell445 Batch1 Exciting 63773.26720757371
224 | Cell451 Batch1 Weird subtype 53764.10206251263
225 | Cell455 Batch1 Mystery celltype 75449.72182186146
226 | Cell456 Batch1 Dunno 53218.680081457984
227 | Cell457 Batch1 Exciting 85026.22745982851
228 | Cell460 Batch1 Dunno 79062.0684719292
229 | Cell461 Batch1 Dunno 58137.92521728268
230 | Cell462 Batch1 Exciting 57678.32329935796
231 | Cell465 Batch1 Dunno 55685.398676793055
232 | Cell466 Batch1 Mystery celltype 83705.23059210832
233 | Cell470 Batch1 Dunno 49351.161985836705
234 | Cell471 Batch1 Mystery celltype 46525.06377298973
235 | Cell473 Batch1 Dunno 69972.26834490715
236 | Cell474 Batch1 Dunno 75577.30087930195
237 | Cell476 Batch1 Weird subtype 69530.43643290483
238 | Cell477 Batch1 Dunno 48477.8585685877
239 | Cell478 Batch1 Mystery celltype 43432.79476529873
240 | Cell481 Batch1 Mystery celltype 80141.0003101273
241 | Cell483 Batch1 Mystery celltype 56050.46718357846
242 | Cell484 Batch1 Mystery celltype 40099.174277372105
243 | Cell485 Batch1 Dunno 46579.832656207414
244 | Cell487 Batch1 Exciting 45331.87064944953
245 | Cell488 Batch1 Exciting 67426.51336242554
246 | Cell490 Batch1 Exciting 56758.486777894155
247 | Cell492 Batch1 Mystery celltype 45025.979091999725
248 | Cell493 Batch1 Exciting 56702.40308039261
249 | Cell494 Batch1 Mystery celltype 60617.35013346793
250 | Cell496 Batch1 Mystery celltype 59412.44501506707
251 | Cell498 Batch1 Dunno 52872.094812063544
252 | Cell499 Batch1 Mystery celltype 80622.24876492086
253 | Cell501 Batch1 Mystery celltype 61024.59422595926
254 | Cell504 Batch1 Mystery celltype 83454.86757289598
255 | Cell506 Batch1 Mystery celltype 56341.76015419253
256 | Cell507 Batch1 Dunno 44974.63914653621
257 | Cell508 Batch1 Exciting 44407.01226516687
258 | Cell510 Batch1 Mystery celltype 54285.0113116402
259 | Cell514 Batch1 Exciting 63768.025745437524
260 | Cell517 Batch1 Mystery celltype 46668.24402505373
261 | Cell518 Batch1 Dunno 65325.70001000344
262 | Cell520 Batch1 Exciting 62736.84248924784
263 | Cell521 Batch1 Exciting 58380.38448777898
264 | Cell523 Batch1 Exciting 67360.46401291544
265 | Cell524 Batch1 Weird subtype 61144.817793029644
266 | Cell527 Batch1 Exciting 57796.696946227545
267 | Cell528 Batch1 Mystery celltype 64425.15603134294
268 | Cell529 Batch1 Exciting 44857.79201776551
269 | Cell532 Batch1 Exciting 75181.2654638172
270 | Cell536 Batch1 Mystery celltype 45800.23400736708
271 | Cell537 Batch1 Dunno 74263.01956322035
272 | Cell542 Batch1 Exciting 60177.77078332433
273 | Cell543 Batch1 Exciting 43778.15982525974
274 | Cell545 Batch1 Mystery celltype 67253.07115123785
275 | Cell547 Batch1 Exciting 77546.31023451254
276 | Cell549 Batch1 Dunno 55221.56146902432
277 | Cell550 Batch1 Exciting 66551.76621601294
278 | Cell552 Batch1 Mystery celltype 64645.7310966531
279 | Cell554 Batch1 Mystery celltype 52864.76383286277
280 | Cell558 Batch1 Dunno 49503.16551763513
281 | Cell561 Batch1 Mystery celltype 49747.827370732746
282 | Cell562 Batch1 Exciting 77796.29361199462
283 | Cell563 Batch1 Exciting 32517.801451913903
284 | Cell566 Batch1 Mystery celltype 49761.78778759733
285 | Cell570 Batch1 Dunno 67672.15191428228
286 | Cell571 Batch1 Mystery celltype 69175.65444702056
287 | Cell575 Batch1 Exciting 56709.75805172777
288 | Cell579 Batch1 Exciting 54052.30800505988
289 | Cell580 Batch1 Exciting 74598.90128966793
290 | Cell582 Batch1 Dunno 41938.583858660175
291 | Cell583 Batch1 Mystery celltype 72111.31916836997
292 | Cell584 Batch1 Mystery celltype 69429.64281585328
293 | Cell585 Batch1 Mystery celltype 72503.45888976833
294 | Cell586 Batch1 Mystery celltype 80708.89694882413
295 | Cell588 Batch1 Exciting 30840.752800798415
296 | Cell590 Batch1 Exciting 69172.59871626447
297 | Cell591 Batch1 Dunno 82647.26830144694
298 | Cell593 Batch1 Mystery celltype 44869.642073155
299 | Cell594 Batch1 Mystery celltype 67373.43624525052
300 | Cell595 Batch1 Dunno 60551.42027935148
301 | Cell596 Batch1 Dunno 56997.77917260365
302 | Cell599 Batch1 Dunno 58071.664457033876
303 | Cell602 Batch1 Mystery celltype 41389.43818547396
304 | Cell603 Batch1 Mystery celltype 64471.42955457428
305 | Cell605 Batch1 Exciting 59349.332968371295
306 | Cell609 Batch1 Mystery celltype 35342.969251708106
307 | Cell611 Batch1 Exciting 78384.32851069582
308 | Cell612 Batch1 Mystery celltype 59339.222517357804
309 | Cell614 Batch1 Mystery celltype 76706.61761351599
310 | Cell616 Batch1 Dunno 56949.44134777487
311 | Cell618 Batch1 Mystery celltype 54261.099929363096
312 | Cell619 Batch1 Exciting 66047.9927719561
313 | Cell620 Batch1 Exciting 73085.20289662106
314 | Cell621 Batch1 Dunno 58793.49531685555
315 | Cell622 Batch1 Mystery celltype 54943.73433204054
316 | Cell624 Batch1 Dunno 54097.59785961883
317 | Cell626 Batch1 Exciting 53855.07469884097
318 | Cell627 Batch1 Weird subtype 44416.32087036199
319 | Cell628 Batch1 Exciting 57903.75618818855
320 | Cell629 Batch1 Dunno 62644.41547684184
321 | Cell631 Batch1 Exciting 60017.41514845921
322 | Cell632 Batch1 Mystery celltype 59910.910029747654
323 | Cell633 Batch1 Mystery celltype 49732.15075467038
324 | Cell634 Batch1 Exciting 56589.77566652936
325 | Cell638 Batch1 Exciting 62447.44259068599
326 | Cell639 Batch1 Exciting 64133.64450491816
327 | Cell641 Batch1 Exciting 44089.20554144063
328 | Cell643 Batch1 Dunno 73522.18182880066
329 | Cell646 Batch1 Mystery celltype 61802.93889261675
330 | Cell647 Batch1 Mystery celltype 69012.0854740952
331 | Cell650 Batch1 Mystery celltype 74714.79586975073
332 | Cell652 Batch1 Weird subtype 56062.62499714292
333 | Cell659 Batch1 Mystery celltype 69894.64218398865
334 | Cell660 Batch1 Dunno 66592.586718327555
335 | Cell661 Batch1 Dunno 85591.02358008553
336 | Cell662 Batch1 Dunno 44516.24819323479
337 | Cell664 Batch1 Dunno 61307.004898365776
338 | Cell665 Batch1 Exciting 64705.31155970481
339 | Cell668 Batch1 Weird subtype 71455.41809041442
340 | Cell670 Batch1 Exciting 42053.06380443934
341 | Cell673 Batch1 Mystery celltype 55095.33915013706
342 | Cell674 Batch1 Dunno 45486.15169293153
343 | Cell676 Batch1 Exciting 63517.160284769445
344 | Cell677 Batch1 Exciting 73489.07959872915
345 | Cell678 Batch1 Mystery celltype 71814.41470489615
346 | Cell685 Batch1 Dunno 47441.372998551655
347 | Cell686 Batch1 Dunno 49938.59656240519
348 | Cell688 Batch1 Dunno 49738.568822691675
349 | Cell690 Batch1 Mystery celltype 52789.31899776843
350 | Cell693 Batch1 Dunno 52209.58603909123
351 | Cell694 Batch1 Exciting 65974.73334813816
352 | Cell698 Batch1 Mystery celltype 64468.974478183445
353 | Cell699 Batch1 Exciting 64012.37086466773
354 | Cell700 Batch1 Exciting 55742.29025149838
355 | Cell702 Batch1 Exciting 58774.06532284232
356 | Cell703 Batch1 Exciting 52129.503219550716
357 | Cell705 Batch1 Mystery celltype 60585.22086924787
358 | Cell706 Batch1 Exciting 68973.56697136941
359 | Cell707 Batch1 Dunno 70457.08797883583
360 | Cell709 Batch1 Mystery celltype 74559.04863400896
361 | Cell711 Batch1 Exciting 54858.45915215607
362 | Cell712 Batch1 Exciting 50166.67818033528
363 | Cell716 Batch1 Exciting 73099.67341725297
364 | Cell720 Batch1 Exciting 67445.83981177946
365 | Cell723 Batch1 Exciting 53446.52691235099
366 | Cell725 Batch1 Dunno 51813.53140782089
367 | Cell726 Batch1 Exciting 73544.824521771
368 | Cell727 Batch1 Mystery celltype 77108.1982912516
369 | Cell728 Batch1 Mystery celltype 64581.15229087629
370 | Cell729 Batch1 Dunno 61697.683973040716
371 | Cell730 Batch1 Dunno 66889.12767705396
372 | Cell731 Batch1 Mystery celltype 48456.99291942505
373 | Cell732 Batch1 Mystery celltype 40238.11346060553
374 | Cell733 Batch1 Mystery celltype 54992.66697274335
375 | Cell736 Batch1 Exciting 42050.56317254396
376 | Cell737 Batch1 Weird subtype 39975.41657072051
377 | Cell738 Batch1 Mystery celltype 76628.4179457589
378 | Cell739 Batch1 Mystery celltype 66319.98992634375
379 | Cell741 Batch1 Dunno 69603.29382237795
380 | Cell742 Batch1 Weird subtype 56063.76398353835
381 | Cell746 Batch1 Dunno 76474.29292901258
382 | Cell747 Batch1 Exciting 66562.59488683096
383 | Cell748 Batch1 Mystery celltype 110327.72657035479
384 | Cell749 Batch1 Mystery celltype 51073.62800414176
385 | Cell750 Batch1 Exciting 59000.37259510889
386 | Cell751 Batch1 Dunno 59410.0011177861
387 | Cell752 Batch1 Mystery celltype 65667.35453780026
388 | Cell754 Batch1 Exciting 75800.52544906427
389 | Cell755 Batch1 Exciting 43302.72149825256
390 | Cell758 Batch1 Exciting 80637.93996155521
391 | Cell759 Batch1 Dunno 58290.90065131635
392 | Cell764 Batch1 Mystery celltype 56589.4473843959
393 | Cell771 Batch1 Mystery celltype 74817.87862836638
394 | Cell773 Batch1 Dunno 45359.98044989292
395 | Cell774 Batch1 Mystery celltype 60468.7985422407
396 | Cell781 Batch1 Mystery celltype 56729.539384588774
397 | Cell782 Batch1 Exciting 65464.334406033464
398 | Cell785 Batch1 Dunno 84827.91766206028
399 | Cell787 Batch1 Mystery celltype 69995.89537319269
400 | Cell790 Batch1 Dunno 75428.60638060226
401 | Cell791 Batch1 Mystery celltype 51940.56168878204
402 | Cell792 Batch1 Exciting 59360.403835508274
403 | Cell793 Batch1 Exciting 49305.59270006465
404 | Cell795 Batch1 Mystery celltype 67961.80619423035
405 | Cell797 Batch1 Exciting 66215.25871847503
406 | Cell801 Batch1 Mystery celltype 74773.00831045811
407 | Cell802 Batch1 Mystery celltype 69503.48690314239
408 | Cell806 Batch1 Exciting 75483.32950416845
409 | Cell807 Batch1 Exciting 63791.25301962973
410 | Cell812 Batch1 Mystery celltype 70710.59238823927
411 | Cell813 Batch1 Dunno 66386.01610004039
412 | Cell814 Batch1 Mystery celltype 57003.8002986488
413 | Cell815 Batch1 Dunno 49434.401234209894
414 | Cell816 Batch1 Mystery celltype 81019.48316453358
415 | Cell817 Batch1 Dunno 57660.14672159395
416 | Cell820 Batch1 Mystery celltype 81139.42740797685
417 | Cell822 Batch1 Exciting 51758.33356656971
418 | Cell827 Batch1 Exciting 82163.52899908184
419 | Cell828 Batch1 Mystery celltype 44510.20628733321
420 | Cell829 Batch1 Exciting 49830.26339342397
421 | Cell830 Batch1 Weird subtype 79645.9067006031
422 | Cell831 Batch1 Mystery celltype 66678.27291821793
423 | Cell832 Batch1 Dunno 70804.76796286086
424 | Cell833 Batch1 Mystery celltype 88773.24491561459
425 | Cell834 Batch1 Mystery celltype 76984.23136078312
426 | Cell836 Batch1 Exciting 79299.10284013295
427 | Cell838 Batch1 Dunno 50232.62520234292
428 | Cell839 Batch1 Exciting 67677.05755288526
429 | Cell840 Batch1 Mystery celltype 59751.913327324786
430 | Cell842 Batch1 Dunno 69948.02665545353
431 | Cell844 Batch1 Dunno 56626.2838206254
432 | Cell845 Batch1 Exciting 54436.92059598208
433 | Cell846 Batch1 Mystery celltype 72918.98140644566
434 | Cell847 Batch1 Exciting 57252.83703691692
435 | Cell849 Batch1 Dunno 61261.36391609695
436 | Cell850 Batch1 Exciting 61237.867728430676
437 | Cell853 Batch1 Dunno 50055.0538727858
438 | Cell860 Batch1 Exciting 62464.39017823284
439 | Cell861 Batch1 Mystery celltype 43952.10003838061
440 | Cell862 Batch1 Mystery celltype 49429.93452585513
441 | Cell863 Batch1 Exciting 63089.664479451036
442 | Cell864 Batch1 Exciting 50012.16143696786
443 | Cell865 Batch1 Mystery celltype 39810.15418311501
444 | Cell867 Batch1 Mystery celltype 57648.67434413288
445 | Cell869 Batch1 Mystery celltype 52830.146137773714
446 | Cell870 Batch1 Exciting 64698.36491185619
447 | Cell871 Batch1 Mystery celltype 62351.319631986924
448 | Cell872 Batch1 Dunno 66782.4195643286
449 | Cell874 Batch1 Exciting 55992.78928064643
450 | Cell876 Batch1 Exciting 57838.32227614392
451 | Cell877 Batch1 Mystery celltype 84778.63929336573
452 | Cell880 Batch1 Exciting 39790.36493991507
453 | Cell881 Batch1 Exciting 67698.2318412466
454 | Cell883 Batch1 Dunno 75401.70206051634
455 | Cell884 Batch1 Dunno 42776.43027168227
456 | Cell885 Batch1 Exciting 66251.97025539962
457 | Cell886 Batch1 Exciting 59728.89109650733
458 | Cell889 Batch1 Dunno 65862.16280115274
459 | Cell892 Batch1 Dunno 64886.72759060191
460 | Cell893 Batch1 Exciting 62741.3812670592
461 | Cell896 Batch1 Exciting 66700.44446821426
462 | Cell900 Batch1 Mystery celltype 89838.24189988746
463 | Cell902 Batch1 Exciting 49720.91127972059
464 | Cell904 Batch1 Dunno 61698.31414189178
465 | Cell905 Batch1 Mystery celltype 75890.78025812042
466 | Cell906 Batch1 Dunno 49332.671183011495
467 | Cell908 Batch1 Mystery celltype 60911.50170090127
468 | Cell912 Batch1 Dunno 48371.35697778317
469 | Cell913 Batch1 Dunno 43300.84285321089
470 | Cell916 Batch1 Mystery celltype 41294.94179371393
471 | Cell917 Batch1 Exciting 78215.83123964709
472 | Cell921 Batch1 Dunno 68510.89876505475
473 | Cell922 Batch1 Weird subtype 47675.61036223717
474 | Cell923 Batch1 Mystery celltype 50285.32470667179
475 | Cell924 Batch1 Exciting 50810.458266410016
476 | Cell925 Batch1 Dunno 54502.55586418292
477 | Cell926 Batch1 Dunno 66428.82045412977
478 | Cell928 Batch1 Mystery celltype 50810.16890843113
479 | Cell929 Batch1 Dunno 52505.54964244125
480 | Cell930 Batch1 Dunno 48660.587675347415
481 | Cell932 Batch1 Mystery celltype 68166.19623600527
482 | Cell933 Batch1 Mystery celltype 61781.75459550566
483 | Cell935 Batch1 Exciting 61965.20553453278
484 | Cell936 Batch1 Exciting 59552.082663036985
485 | Cell937 Batch1 Weird subtype 44121.63202588367
486 | Cell938 Batch1 Dunno 86090.04391328803
487 | Cell939 Batch1 Weird subtype 51965.66952959045
488 | Cell940 Batch1 Mystery celltype 55573.07455502759
489 | Cell947 Batch1 Exciting 78966.42110849377
490 | Cell948 Batch1 Dunno 57893.58454573163
491 | Cell949 Batch1 Dunno 108399.51473060224
492 | Cell950 Batch1 Exciting 63941.65092858417
493 | Cell953 Batch1 Dunno 52494.101352971746
494 | Cell954 Batch1 Exciting 60801.11137383105
495 | Cell956 Batch1 Mystery celltype 53317.811588200435
496 | Cell957 Batch1 Mystery celltype 66956.63625350388
497 | Cell958 Batch1 Dunno 61169.891441098334
498 | Cell960 Batch1 Mystery celltype 70297.72026231191
499 | Cell961 Batch1 Mystery celltype 73180.26381577374
500 | Cell964 Batch1 Exciting 51129.50161236589
501 | Cell965 Batch1 Dunno 57332.79065619364
502 | Cell966 Batch1 Weird subtype 100920.05054716372
503 | Cell968 Batch1 Mystery celltype 56613.22063144928
504 | Cell971 Batch1 Exciting 48814.80998613285
505 | Cell974 Batch1 Mystery celltype 42222.48434667154
506 | Cell977 Batch1 Weird subtype 56934.87403202956
507 | Cell980 Batch1 Mystery celltype 63527.13835053482
508 | Cell983 Batch1 Mystery celltype 60243.19147127942
509 | Cell985 Batch1 Weird subtype 48036.537458381434
510 | Cell987 Batch1 Weird subtype 76719.92184793137
511 | Cell989 Batch1 Dunno 67576.9088681226
512 | Cell990 Batch1 Mystery celltype 76899.69013529253
513 | Cell991 Batch1 Mystery celltype 54174.509374250156
514 | Cell993 Batch1 Exciting 58906.90399288473
515 | Cell995 Batch1 Dunno 81652.84927522646
516 | Cell1000 Batch1 Mystery celltype 45436.644641530926
517 |
--------------------------------------------------------------------------------
/inst/extdata/sim_ref_gene_info.tab:
--------------------------------------------------------------------------------
1 | Gene BaseGeneMean
2 | Gene1 15.070166941865256
3 | Gene2 1.3214264125078057
4 | Gene3 2.6471940552392117
5 | Gene4 0.4314126274103263
6 | Gene5 1.3263294447897767
7 | Gene6 2.6556251736653076
8 | Gene7 1.5087404287107207
9 | Gene8 2.3980012018206116
10 | Gene9 1.174393945899882
11 | Gene10 0.16520007149995963
12 | Gene11 0.6063222399660461
13 | Gene12 1.1567203151084895
14 | Gene13 0.5697996434879279
15 | Gene14 5.910077587592439
16 | Gene15 1.2584703860405693
17 | Gene16 3.7312510321195247
18 | Gene17 3.732506094739015
19 | Gene18 0.9527541186413107
20 | Gene19 0.8299824883603495
21 | Gene20 5.419859401553251
22 | Gene21 0.12596021483220884
23 | Gene22 0.2838042849946762
24 | Gene23 0.19182208419572352
25 | Gene24 3.1676347186761196
26 | Gene25 0.5341929920646847
27 | Gene26 4.869121526664764
28 | Gene27 2.6688329911337663
29 | Gene28 0.8680950634977899
30 | Gene29 1.6676711932219812
31 | Gene30 0.7773442566244723
32 | Gene31 0.13776280782261727
33 | Gene32 1.62467446974602
34 | Gene33 3.408362066133117
35 | Gene34 0.1551462319196136
36 | Gene35 1.3693638794774319
37 | Gene36 6.539464980895205
38 | Gene37 0.9702595159605595
39 | Gene38 4.936128580990508e-4
40 | Gene39 1.4130366607242324
41 | Gene40 7.1535748703456825
42 | Gene41 0.8945476522623658
43 | Gene42 0.5154778965230534
44 | Gene43 5.282330848092755
45 | Gene44 3.53610188530163
46 | Gene45 5.439591965553024
47 | Gene46 0.5049124358836893
48 | Gene47 1.4218930614187446
49 | Gene48 0.3595905853302735
50 | Gene49 5.119043134371494
51 | Gene50 0.2502446473365546
52 | Gene51 0.006118641284641165
53 | Gene52 0.012327696958109266
54 | Gene53 2.270533237587709
55 | Gene54 0.2277377522940588
56 | Gene55 0.24543799611272676
57 | Gene56 1.0140785160835557
58 | Gene57 3.2974078405543414
59 | Gene58 3.1500104445427897
60 | Gene59 0.7890490693371935
61 | Gene60 6.322817718066329
62 | Gene61 0.418082717549265
63 | Gene62 2.034700826504123
64 | Gene63 2.440291341133348
65 | Gene64 0.002391602636955899
66 | Gene65 0.6029098216167582
67 | Gene66 3.3016014257423305
68 | Gene67 0.7482611461409928
69 | Gene68 0.12812044564547306
70 | Gene69 0.35066175036892044
71 | Gene70 0.1516936710876119
72 | Gene71 0.009882651695215254
73 | Gene72 0.31386242457573105
74 | Gene73 0.29994401190711634
75 | Gene74 1.4015772692111752
76 | Gene75 1.2497823102933978
77 | Gene76 0.5632321953622131
78 | Gene77 1.6801835300877066
79 | Gene78 0.2984888953070142
80 | Gene79 1.180053288644517
81 | Gene80 3.496870702720702
82 | Gene81 3.031213248293015
83 | Gene82 2.4544117966947323
84 | Gene83 0.6426568584477319
85 | Gene84 2.118998746932126
86 | Gene85 5.998968724366211
87 | Gene86 0.6370660786673482
88 | Gene87 4.878501798012513
89 | Gene88 4.236375736446719
90 | Gene89 2.284970524462676
91 | Gene90 0.9048320211852259
92 | Gene91 3.328864243871868
93 | Gene92 5.99485292531768
94 | Gene93 0.5110956753927606
95 | Gene94 0.667408242586006
96 | Gene95 0.4217533489766145
97 | Gene96 13.667521456964762
98 | Gene97 12.94918139728537
99 | Gene98 2.7655551566988636
100 | Gene99 0.026034366925850677
101 | Gene100 8.41958370917556
102 | Gene101 0.629002401311236
103 | Gene102 0.3984773552739324
104 | Gene103 6.648385010819624
105 | Gene104 0.52025440038014
106 | Gene105 0.6041210844214472
107 | Gene106 1.2171295700541882
108 | Gene107 2.1725070518596286
109 | Gene108 0.8655766682126647
110 | Gene109 0.20528408017518773
111 | Gene110 0.2871928791336786
112 | Gene111 0.6272521086349571
113 | Gene112 0.20938989155484852
114 | Gene113 1.1982525476674017
115 | Gene114 2.161009771439923
116 | Gene115 0.2108280997361641
117 | Gene116 5.490780408446104
118 | Gene117 7.901244927017148
119 | Gene118 1.6497365669481299
120 | Gene119 4.764385944692947
121 | Gene120 0.23214880715569838
122 | Gene121 1.826709920091376
123 | Gene122 2.222420903422962
124 | Gene123 0.28732215327512833
125 | Gene124 0.3270285059720184
126 | Gene125 0.028345690324059893
127 | Gene126 0.6385121429116922
128 | Gene127 0.45777438372310086
129 | Gene128 3.564006959063369
130 | Gene129 0.22257284891712545
131 | Gene130 3.06866579942385
132 | Gene131 0.010164450289338305
133 | Gene132 2.2159808283796263
134 | Gene133 0.18777347338931308
135 | Gene134 0.15308684636813338
136 | Gene135 5.029457080816111
137 | Gene136 0.19252884382655167
138 | Gene137 4.817288261266247
139 | Gene138 1.3571315484166406
140 | Gene139 0.9551584585099973
141 | Gene140 6.358966828719138
142 | Gene141 0.04758510724490084
143 | Gene142 0.5387642785968534
144 | Gene143 0.40903272802201124
145 | Gene144 0.9046274593654208
146 | Gene145 2.5300499331280255
147 | Gene146 0.007103927806247059
148 | Gene147 0.06617422376829078
149 | Gene148 1.1814835278920117
150 | Gene149 0.00730382422326564
151 | Gene150 4.523440167023822
152 | Gene151 5.0714787992616595
153 | Gene152 0.058381702597344695
154 | Gene153 9.849902477078747e-4
155 | Gene154 2.2714248681125615
156 | Gene155 2.8062016343972034
157 | Gene156 0.005019413599753972
158 | Gene157 1.1414874803343211
159 | Gene158 7.98467666157701
160 | Gene159 4.750130376606671
161 | Gene160 0.41449314009314947
162 | Gene161 0.15629966419234909
163 | Gene162 2.0033323934363065
164 | Gene163 1.9133958051043465
165 | Gene164 1.8693826892142416
166 | Gene165 4.264060971136903
167 | Gene166 0.9041637960610055
168 | Gene167 1.7028366821036134
169 | Gene168 0.11251934733987087
170 | Gene169 1.9744619054281058
171 | Gene170 0.7657225174288028
172 | Gene171 2.9666842911519877
173 | Gene172 2.5384890421122717
174 | Gene173 0.03553526322132183
175 | Gene174 0.17140160127119666
176 | Gene175 0.5375172760119603
177 | Gene176 1.1957388264762312
178 | Gene177 2.233141923375682
179 | Gene178 5.910669733366996
180 | Gene179 19.45900667463009
181 | Gene180 3.4749784094647818
182 | Gene181 1.813842633144611
183 | Gene182 3.875603753958344
184 | Gene183 0.36373824951414807
185 | Gene184 0.15860239608952753
186 | Gene185 1.5602047184804921
187 | Gene186 4.6993917033637205
188 | Gene187 8.789618789385647
189 | Gene188 0.7302572334579468
190 | Gene189 7.049027042557773
191 | Gene190 0.8125960912689993
192 | Gene191 0.43486651275110694
193 | Gene192 0.0038062829718254923
194 | Gene193 0.13273796373459007
195 | Gene194 0.5723144816254031
196 | Gene195 0.46429625203852326
197 | Gene196 4.781896261324305
198 | Gene197 0.0562914344264723
199 | Gene198 3.982487795926734
200 | Gene199 0.05075770320152484
201 | Gene200 1.329913146761545
202 |
--------------------------------------------------------------------------------
/man/contrast_each_group_to_the_rest.Rd:
--------------------------------------------------------------------------------
1 | % Generated by roxygen2: do not edit by hand
2 | % Please edit documentation in R/contrasting_functions.r
3 | \name{contrast_each_group_to_the_rest}
4 | \alias{contrast_each_group_to_the_rest}
5 | \title{contrast_each_group_to_the_rest}
6 | \usage{
7 | contrast_each_group_to_the_rest(dataset_se, dataset_name,
8 | groups2test = NA, num_cores = 1, n.group = Inf, n.other = n.group
9 | * 5, factors_to_rm = c())
10 | }
11 | \arguments{
12 | \item{dataset_se}{Summarised experiment object containing count data. Also
13 | requires 'ID' and 'group' to be set within the cell information
14 | (see \code{colData()})}
15 |
16 | \item{dataset_name}{Short, meaningful name for this dataset/experiment.}
17 |
18 | \item{groups2test}{An optional character vector specificing specific groups
19 | to check. By default (set to NA), all groups will be tested.}
20 |
21 | \item{num_cores}{Number of cores to use to run MAST jobs in parallel.
22 | Ignored if parallel package not available. Set to 1 to avoid
23 | parallelisation. Default = 1}
24 |
25 | \item{n.group}{How many cells to keep for each group in groupwise
26 | comparisons. Default = Inf}
27 |
28 | \item{n.other}{How many cells to keep from everything not in the group.
29 | Default = \bold{n.group} * 5}
30 |
31 | \item{factors_to_rm}{If there are extra confounding factors that should be
32 | removed from MAST's zlm model (e.g individual, run), specify the column name(s)
33 | from colData in a vector here. Default=c().}
34 | }
35 | \value{
36 | A tibble the within-experiment de_table (differential expression
37 | table). This is a core summary of the individual experiment/dataset,
38 | which is used for the cross-dataset comparisons.
39 |
40 | The table feilds won't neccesarily match across datasets, as they include
41 | cell annotations information. Important columns
42 | (used in downstream analysis) are:
43 |
44 | \describe{
45 | \item{ID}{Gene identifier}
46 | \item{ci_inner}{ Inner (conservative) 95\% confidence interval of
47 | log2 fold-change.}
48 | \item{fdr}{Multiple hypothesis corrected p-value (using BH/FDR method)}
49 | \item{group}{Cells from this group were compared to everything else}
50 | \item{sig_up}{Significnatly differentially expressed (fdr < 0.01), with a
51 | positive fold change?}
52 | \item{rank}{Rank position (within group), ranked by CI inner, highest to
53 | lowest. }
54 | \item{rescaled_rank}{Rank scaled 0(top most overrepresented genes in group) -
55 | 1(top most not-present genes)}
56 | \item{dataset}{Name of dataset/experiment}
57 | }
58 | }
59 | \description{
60 | Produces a table of within-experiment differential expression results (for
61 | either query or reference experiment), where each group (cluster) is
62 | compared to the rest of the cells.
63 | }
64 | \details{
65 | Note that this function is \emph{slow}, because it runs the differential
66 | expression. It only needs to be run once per dataset though (unless group
67 | labels change).
68 | Having package \pkg{parallel} installed is highly recomended.
69 |
70 | If this function runs out of memory, consider specifying \emph{n.group} and
71 | \emph{n.other} to run on a subset of cells (taken from each group,
72 | and proportionally from the rest for each test).
73 | Alternatively use \emph{subset_cells_by_group} to subset \bold{dataset_se}
74 | for each group independantly.
75 |
76 | Both reference and query datasets should be processed with this
77 | function.
78 |
79 | The tables produced by this function (usually named something like
80 | \emph{de_table.datasetname}) contain summarised results of MAST results.
81 | Each group is compared versus cells in the group, versus not in the group,
82 | (Ie. always a 2-group contrast, other groups information is ignored).
83 | As per MAST reccomendataions, the proportion of genes seen in each cell is
84 | included in the model.
85 | }
86 | \examples{
87 |
88 | de_table.demo_query <- contrast_each_group_to_the_rest(
89 | demo_query_se, "a_demo_query")
90 |
91 | de_table.demo_ref <- contrast_each_group_to_the_rest(
92 | demo_ref_se, "a_demo_ref", num_cores=2)
93 |
94 |
95 | }
96 |
--------------------------------------------------------------------------------
/man/contrast_each_group_to_the_rest_for_norm_ma_with_limma.Rd:
--------------------------------------------------------------------------------
1 | % Generated by roxygen2: do not edit by hand
2 | % Please edit documentation in R/contrasting_functions.r
3 | \name{contrast_each_group_to_the_rest_for_norm_ma_with_limma}
4 | \alias{contrast_each_group_to_the_rest_for_norm_ma_with_limma}
5 | \title{contrast_each_group_to_the_rest_for_norm_ma_with_limma}
6 | \usage{
7 |
8 | contrast_each_group_to_the_rest_for_norm_ma_with_limma(norm_expression_table,
9 | sample_sheet_table, dataset_name, sample_name, group_name = "group",
10 | groups2test = NA, extra_factor_name = NA, pval_threshold = 0.01)
11 | }
12 | \arguments{
13 | \item{norm_expression_table}{A logged, normalised expression table. Any
14 | filtering (removal of low-expression probes/genes)}
15 |
16 | \item{sample_sheet_table}{Tab-separated text file of sample information.
17 | Columns must have names. Sample/microarray ids should be listed under
18 | \bold{sample_name} column. The cell-type (or 'group') of each sample should
19 | be listed under a column \bold{group_name}.}
20 |
21 | \item{dataset_name}{Short, meaningful name for this dataset/experiment.}
22 |
23 | \item{sample_name}{Name of \bold{sample_sheet_table} with sample ID}
24 |
25 | \item{group_name}{Name of \bold{sample_sheet_table} with group/cell-type.
26 | Default = "group"}
27 |
28 | \item{groups2test}{An optional character vector specificing specific groups
29 | to check. By default (set to NA), all groups will be tested.}
30 |
31 | \item{extra_factor_name}{Optionally, an extra cross-group factor (as column
32 | name in \bold{sample_sheet_table}) to include in the model used by limma.
33 | E.g. An individual/mouse id. Refer limma docs. Default = NA}
34 |
35 | \item{pval_threshold}{For reporting only, a p-value threshold.
36 | Default = 0.01}
37 | }
38 | \value{
39 | A tibble, the within-experiment de_table (differential expression
40 | table)
41 | }
42 | \description{
43 | This function loads and processes microarray data (from purified cell
44 | populations) that can be used as a reference.
45 | }
46 | \details{
47 | Sometimes there are microarray studies measureing purified cell populations
48 | that would be measured together in a single-cell sequenicng experiment.
49 | E.g. comparing PBMC scRNA to FACs-sorted blood cell populations.
50 | This function
51 | will process microarray data with limma and format it for comparisions.
52 |
53 | The microarray data used should consist of purified cell types
54 | from /emph{one single study/experiment} (due to batch effects).
55 | Ideally just those cell-types expected in the
56 | scRNAseq, but the method appears relatively robust to a few extra cell
57 | types.
58 |
59 | Note that unlike the single-cell workflow there are no summarisedExperiment
60 | objects (they're not really comparable) - this function reads data and
61 | generates a table of within-dataset differentential expression contrasts in
62 | one step. Ie. equivalent to the output of
63 | \code{\link{contrast_each_group_to_the_rest}}.
64 |
65 | Also, note that while downstream functions can accept
66 | the microarray-derived data as query datasets,
67 | its not really intended and assumptions might not
68 | hold (Generally, its known what got loaded onto a microarray!)
69 |
70 | The (otherwise optional) 'limma' package must be installed to use this
71 | function.
72 | }
73 | \examples{
74 |
75 | contrast_each_group_to_the_rest_for_norm_ma_with_limma(
76 | norm_expression_table=demo_microarray_expr,
77 | sample_sheet_table=demo_microarray_sample_sheet,
78 | dataset_name="DemoSimMicroarrayRef",
79 | sample_name="cell_sample", group_name="group")
80 |
81 | \dontrun{
82 | contrast_each_group_to_the_rest_for_norm_ma_with_limma(
83 | norm_expression_table, sample_sheet_table=samples_table,
84 | dataset_name="Watkins2009PBMCs", extra_factor_name='description')
85 | }
86 |
87 |
88 | }
89 | \seealso{
90 | \code{\link{contrast_each_group_to_the_rest}} is the
91 | funciton that makes comparable output on the scRNAseq data (dataset_se
92 | objects).
93 |
94 | \href{https://bioconductor.org/packages/release/bioc/html/limma.html}{Limma}
95 | Limma package for differential expression.
96 |
97 | Other Data loading functions: \code{\link{load_dataset_10Xdata}},
98 | \code{\link{load_se_from_tables}}
99 | }
100 | \concept{Data loading functions}
101 |
--------------------------------------------------------------------------------
/man/contrast_the_group_to_the_rest.Rd:
--------------------------------------------------------------------------------
1 | % Generated by roxygen2: do not edit by hand
2 | % Please edit documentation in R/contrasting_functions.r
3 | \name{contrast_the_group_to_the_rest}
4 | \alias{contrast_the_group_to_the_rest}
5 | \title{contrast_the_group_to_the_rest}
6 | \usage{
7 | contrast_the_group_to_the_rest(dataset_se, the_group,
8 | pvalue_threshold = 0.01, n.group = Inf, n.other = n.group * 5,
9 | factors_to_rm = c())
10 | }
11 | \arguments{
12 | \item{dataset_se}{Datast summarisedExperiment object.}
13 |
14 | \item{the_group}{group to test}
15 |
16 | \item{pvalue_threshold}{Default = 0.01}
17 |
18 | \item{n.group}{How many cells to keep for each group in groupwise
19 | comparisons. Default = Inf}
20 |
21 | \item{n.other}{How many cells to keep from everything not in the group.
22 | Default = \bold{n.group} * 5}
23 |
24 | \item{factors_to_rm}{If there are extra confounding factors that should be
25 | removed from MAST's zlm model (e.g individual, run), specify the column name(s)
26 | from colData in a vector here. Default=c().}
27 | }
28 | \value{
29 | A tibble, the within-experiment de_table (differential expression
30 | table), for the group specified.
31 | }
32 | \description{
33 | Internal function to calculate differential expression within an experiment
34 | between a specified group and cells not in that group.
35 | }
36 | \details{
37 | This function should only be called by
38 | \code{contrast_each_group_to_the_rest}
39 | (which can be passed a single group name if desired). Else 'pofgenes' will
40 | not be defined.
41 |
42 | MAST is supplied with log2(counts + 1.1), and zlm called with model
43 | '~ TvsR + pofgenes' . The p-values reported are from the hurdle model. FDR
44 | is with default fdr/BH method.
45 | }
46 | \seealso{
47 | \code{\link{contrast_each_group_to_the_rest}}
48 | }
49 |
--------------------------------------------------------------------------------
/man/contrast_the_group_to_the_rest_with_limma_for_microarray.Rd:
--------------------------------------------------------------------------------
1 | % Generated by roxygen2: do not edit by hand
2 | % Please edit documentation in R/contrasting_functions.r
3 | \name{contrast_the_group_to_the_rest_with_limma_for_microarray}
4 | \alias{contrast_the_group_to_the_rest_with_limma_for_microarray}
5 | \title{contrast_the_group_to_the_rest_with_limma_for_microarray}
6 | \usage{
7 |
8 | contrast_the_group_to_the_rest_with_limma_for_microarray(norm_expression_table,
9 | sample_sheet_table, the_group, sample_name, extra_factor_name = NA,
10 | pval_threshold = 0.01)
11 | }
12 | \arguments{
13 | \item{norm_expression_table}{A logged, normalised expression table. Any
14 | filtering (removal of low-expression probes/genes)}
15 |
16 | \item{sample_sheet_table}{Tab-separated text file of sample information.
17 | Columns must have names. Sample/microarray ids should be listed under
18 | \bold{sample_name} column. The cell-type (or 'group') of each sample should
19 | be listed under a column \bold{group_name}.}
20 |
21 | \item{the_group}{Which query group is being tested.}
22 |
23 | \item{sample_name}{Name of \bold{sample_sheet_table} with sample ID}
24 |
25 | \item{extra_factor_name}{Optionally, an extra cross-group factor (as column
26 | name in \bold{sample_sheet_table}) to include in the model used by limma.
27 | E.g. An individual/mouse id. Refer limma docs. Default = NA}
28 |
29 | \item{pval_threshold}{For reporting only, a p-value threshold. Default = 0.01}
30 | }
31 | \value{
32 | A tibble, the within-experiment de_table (differential expression
33 | table), for the group specified.
34 | }
35 | \description{
36 | Private function used by
37 | contrast_each_group_to_the_rest_for_norm_ma_with_limma
38 | }
39 | \seealso{
40 | \code{\link{contrast_each_group_to_the_rest_for_norm_ma_with_limma}}
41 | public calling function
42 |
43 | \href{https://bioconductor.org/packages/release/bioc/html/limma.html}{Limma}
44 | Limma package for differential expression.
45 | }
46 |
--------------------------------------------------------------------------------
/man/convert_se_gene_ids.Rd:
--------------------------------------------------------------------------------
1 | % Generated by roxygen2: do not edit by hand
2 | % Please edit documentation in R/loading_helper_functions.r
3 | \name{convert_se_gene_ids}
4 | \alias{convert_se_gene_ids}
5 | \title{convert_se_gene_ids}
6 | \usage{
7 | convert_se_gene_ids(dataset_se, new_id, eval_col, find_max = TRUE)
8 | }
9 | \arguments{
10 | \item{dataset_se}{Summarised experiment object containing count data. Also
11 | requires 'ID' and 'group' to be set within the cell information
12 | (see \code{colData()})}
13 |
14 | \item{new_id}{A column within the feature information (view
15 | \code{colData(dataset_se)})) of the \bold{dataset_se}, which will become
16 | the new ID column. Non-uniqueness of this column is handled gracefully!
17 | Any \emph{NAs} will be dropped.}
18 |
19 | \item{eval_col}{Which column to use to break ties of duplicate
20 | \bold{new_id}. Must be a column within the feature information (view
21 | \code{colData(dataset_se)})) of the \bold{dataset_se}. Total reads per gene
22 | feature is a good choice.}
23 |
24 | \item{find_max}{If false, this will choose the minimal \bold{eval_col}
25 | instead of max. Default = TRUE}
26 | }
27 | \value{
28 | A modified dataset_se - ID will now be \bold{new_id}, and unique.
29 | It will have fewer genes if old ID to new ID was not a 1:1 mapping.
30 | The selected genes will be according to the eval col max (or min).
31 | \emph{should} pick the alphabetical first on ties, but could change.
32 | }
33 | \description{
34 | Change the gene IDs in in the supplied datatset_se object to some other id
35 | already present in the gene info (as seen with \code{rowData()})
36 | }
37 | \examples{
38 |
39 | # The demo dataset doesn't have other names, so make some up
40 | # (don't do this)
41 | dataset_se <- demo_ref_se
42 | rowData(dataset_se)$dummyname <- toupper(rowData(dataset_se)$ID)
43 |
44 | # If not already present, define a column to evaluate,
45 | # typically total reads/gene.
46 | rowData(dataset_se)$total_count <- rowSums(assay(dataset_se))
47 |
48 | dataset_se <- convert_se_gene_ids(dataset_se, new_id='dummyname', eval_col='total_count')
49 |
50 | }
51 | \seealso{
52 | \href{https://bioconductor.org/packages/release/bioc/html/SummarizedExperiment.html}{SummarizedExperiment}
53 | For general doco on the SummarizedExperiment objects.
54 |
55 | \code{\link{load_se_from_files}} For reading data from flat
56 | files (not 10X cellRanger output)
57 | }
58 |
--------------------------------------------------------------------------------
/man/de_table.demo_query.Rd:
--------------------------------------------------------------------------------
1 | % Generated by roxygen2: do not edit by hand
2 | % Please edit documentation in R/data.r
3 | \docType{data}
4 | \name{de_table.demo_query}
5 | \alias{de_table.demo_query}
6 | \title{Demo query de table}
7 | \format{An object of class \code{data.frame} with 800 rows and 13 columns.}
8 | \usage{
9 | de_table.demo_query
10 | }
11 | \value{
12 | An example de_table from
13 | \link{contrast_each_group_to_the_rest} (for demo query dataset)
14 | }
15 | \description{
16 | Small example dataset that is the output of
17 | \link{contrast_each_group_to_the_rest}. It contains the results
18 | of each group compared to the rest of the sample (ie within sample
19 | differential expression)
20 | }
21 | \keyword{datasets}
22 |
--------------------------------------------------------------------------------
/man/de_table.demo_ref.Rd:
--------------------------------------------------------------------------------
1 | % Generated by roxygen2: do not edit by hand
2 | % Please edit documentation in R/data.r
3 | \docType{data}
4 | \name{de_table.demo_ref}
5 | \alias{de_table.demo_ref}
6 | \title{Demo ref de table}
7 | \format{An object of class \code{data.frame} with 800 rows and 13 columns.}
8 | \usage{
9 | de_table.demo_ref
10 | }
11 | \value{
12 | An example de_table from
13 | \link{contrast_each_group_to_the_rest} (for demo ref dataset)
14 | }
15 | \description{
16 | Small example dataset that is the output of
17 | \link{contrast_each_group_to_the_rest}. It contains the results
18 | of each group compared to the rest of the sample (ie within sample
19 | differential expression)
20 | }
21 | \keyword{datasets}
22 |
--------------------------------------------------------------------------------
/man/demo_cell_info_table.Rd:
--------------------------------------------------------------------------------
1 | % Generated by roxygen2: do not edit by hand
2 | % Please edit documentation in R/data.r
3 | \docType{data}
4 | \name{demo_cell_info_table}
5 | \alias{demo_cell_info_table}
6 | \title{Demo cell info table}
7 | \format{An object of class \code{data.frame} with 515 rows and 4 columns.}
8 | \usage{
9 | demo_cell_info_table
10 | }
11 | \value{
12 | An example cell info table
13 | }
14 | \description{
15 | Sample sheet table listing each cell, its assignd cluster/group, and
16 | any other information that might be interesting (replicate, individual e.t.c)
17 | }
18 | \keyword{datasets}
19 |
--------------------------------------------------------------------------------
/man/demo_counts_matrix.Rd:
--------------------------------------------------------------------------------
1 | % Generated by roxygen2: do not edit by hand
2 | % Please edit documentation in R/data.r
3 | \docType{data}
4 | \name{demo_counts_matrix}
5 | \alias{demo_counts_matrix}
6 | \title{Demo count matrix}
7 | \format{An object of class \code{matrix} with 200 rows and 514 columns.}
8 | \usage{
9 | demo_counts_matrix
10 | }
11 | \value{
12 | An example counts matrix.
13 | }
14 | \description{
15 | Counts matrix for a small, demo example datasets. Raw counts of
16 | reads per gene (row) per cell (column).
17 | }
18 | \keyword{datasets}
19 |
--------------------------------------------------------------------------------
/man/demo_gene_info_table.Rd:
--------------------------------------------------------------------------------
1 | % Generated by roxygen2: do not edit by hand
2 | % Please edit documentation in R/data.r
3 | \docType{data}
4 | \name{demo_gene_info_table}
5 | \alias{demo_gene_info_table}
6 | \title{Demo gene info table}
7 | \format{An object of class \code{data.frame} with 200 rows and 2 columns.}
8 | \usage{
9 | demo_gene_info_table
10 | }
11 | \value{
12 | An example table of genes.
13 | }
14 | \description{
15 | Extra table of gene-level information for the demo example dataset.
16 | Can contain anything as long as theres a unique gene id.
17 | }
18 | \keyword{datasets}
19 |
--------------------------------------------------------------------------------
/man/demo_microarray_expr.Rd:
--------------------------------------------------------------------------------
1 | % Generated by roxygen2: do not edit by hand
2 | % Please edit documentation in R/data.r
3 | \docType{data}
4 | \name{demo_microarray_expr}
5 | \alias{demo_microarray_expr}
6 | \title{Demo microarray expression table}
7 | \format{An object of class \code{matrix} with 200 rows and 20 columns.}
8 | \usage{
9 | demo_microarray_expr
10 | }
11 | \value{
12 | An example table of (fake) microarray data.
13 | }
14 | \description{
15 | Microarray-style expression table for the demo example dataset.
16 | Rows are genes, columns are samples, as per counts matrix.
17 | }
18 | \keyword{datasets}
19 |
--------------------------------------------------------------------------------
/man/demo_microarray_sample_sheet.Rd:
--------------------------------------------------------------------------------
1 | % Generated by roxygen2: do not edit by hand
2 | % Please edit documentation in R/data.r
3 | \docType{data}
4 | \name{demo_microarray_sample_sheet}
5 | \alias{demo_microarray_sample_sheet}
6 | \title{Demo microarray sample sheet table}
7 | \format{An object of class \code{grouped_df} (inherits from \code{tbl_df}, \code{tbl}, \code{data.frame}) with 20 rows and 2 columns.}
8 | \usage{
9 | demo_microarray_sample_sheet
10 | }
11 | \value{
12 | An example microarray sample sheet
13 | }
14 | \description{
15 | Microarray sample sheet table for the demo example dataset.
16 | Contains array identifiers, their group and any other information that could
17 | be useful.
18 | }
19 | \keyword{datasets}
20 |
--------------------------------------------------------------------------------
/man/demo_query_se.Rd:
--------------------------------------------------------------------------------
1 | % Generated by roxygen2: do not edit by hand
2 | % Please edit documentation in R/data.r
3 | \docType{data}
4 | \name{demo_query_se}
5 | \alias{demo_query_se}
6 | \title{Demo query se (summarizedExperiment)}
7 | \format{An object of class \code{SummarizedExperiment} with 200 rows and 485 columns.}
8 | \usage{
9 | demo_query_se
10 | }
11 | \value{
12 | An example summarised experiment (for demo query dataset)
13 | }
14 | \description{
15 | A summarisedExperiment object loaded from demo info tables, for a query set.
16 | }
17 | \keyword{datasets}
18 |
--------------------------------------------------------------------------------
/man/demo_ref_se.Rd:
--------------------------------------------------------------------------------
1 | % Generated by roxygen2: do not edit by hand
2 | % Please edit documentation in R/data.r
3 | \docType{data}
4 | \name{demo_ref_se}
5 | \alias{demo_ref_se}
6 | \title{Demo reference se (summarizedExperiment)}
7 | \format{An object of class \code{SummarizedExperiment} with 200 rows and 515 columns.}
8 | \usage{
9 | demo_ref_se
10 | }
11 | \value{
12 | An example summarised experiment (for demo reference dataset)
13 | }
14 | \description{
15 | A summarisedExperiment object loaded from demo info tables, for a reference
16 | set.
17 | }
18 | \keyword{datasets}
19 |
--------------------------------------------------------------------------------
/man/find_within_match_differences.Rd:
--------------------------------------------------------------------------------
1 | % Generated by roxygen2: do not edit by hand
2 | % Please edit documentation in R/group_labelling_functions.r
3 | \name{find_within_match_differences}
4 | \alias{find_within_match_differences}
5 | \title{find_within_match_differences}
6 | \usage{
7 | find_within_match_differences(de_table.ref.marked, matches, the_test_group,
8 | the_test_dataset, the_ref_dataset, the_pval)
9 | }
10 | \arguments{
11 | \item{de_table.ref.marked}{see make_ref_similarity_names_for_group}
12 |
13 | \item{matches}{see make_ref_similarity_names_for_group}
14 |
15 | \item{the_test_group}{see make_ref_similarity_names_for_group}
16 |
17 | \item{the_test_dataset}{see make_ref_similarity_names_for_group}
18 |
19 | \item{the_ref_dataset}{see make_ref_similarity_names_for_group}
20 |
21 | \item{the_pval}{see make_ref_similarity_names_for_group}
22 | }
23 | \value{
24 | String of within match differences
25 | }
26 | \description{
27 | Internal function to find if there are significant difference between the
28 | distribitions, when there are multiple match groups.
29 | }
30 | \details{
31 | For use by make_ref_similarity_names_for_group
32 | }
33 | \seealso{
34 | \code{\link{make_ref_similarity_names_for_group}}
35 | }
36 |
--------------------------------------------------------------------------------
/man/get_counts_index.Rd:
--------------------------------------------------------------------------------
1 | % Generated by roxygen2: do not edit by hand
2 | % Please edit documentation in R/contrasting_functions.r
3 | \name{get_counts_index}
4 | \alias{get_counts_index}
5 | \title{get_counts_index}
6 | \usage{
7 | get_counts_index(n_assays, assay_names)
8 | }
9 | \arguments{
10 | \item{n_assays}{How many assays are there? ie: length(assays(dataset_se))}
11 |
12 | \item{assay_names}{What are the assays called? ie: names(assays(dataset_se))}
13 | }
14 | \value{
15 | The index of an assay in assays called 'counts', or, if there's just
16 | the one unnamed assay - happily assume that that is counts.
17 | }
18 | \description{
19 | \code{get_counts_index} is an internal utility function to find out where
20 | the counts are (if anywhere.). Stops if there's no assay called 'counts',
21 | (unless there is only a single unnamed assay).
22 | }
23 |
--------------------------------------------------------------------------------
/man/get_inner_or_outer_ci.Rd:
--------------------------------------------------------------------------------
1 | % Generated by roxygen2: do not edit by hand
2 | % Please edit documentation in R/contrasting_functions.r
3 | \name{get_inner_or_outer_ci}
4 | \alias{get_inner_or_outer_ci}
5 | \title{get_inner_or_outer_ci}
6 | \usage{
7 | get_inner_or_outer_ci(fc, ci.hi, ci.lo, get_inner = TRUE)
8 | }
9 | \arguments{
10 | \item{fc}{Fold-change}
11 |
12 | \item{ci.hi}{Higher fold-change CI (numerically)}
13 |
14 | \item{ci.lo}{smaller fold-change CI (numerically)}
15 |
16 | \item{get_inner}{If TRUE, get the more conservative inner CI, else the
17 | bigger outside one.}
18 | }
19 | \value{
20 | inner or outer CI from \bold{ci.hi} or \bold{ci.low}
21 | }
22 | \description{
23 | Given a fold-change, and high and low confidence interval (where lower <
24 | higher), pick the innermost/most conservative one.
25 | }
26 |
--------------------------------------------------------------------------------
/man/get_limma_top_table_with_ci.Rd:
--------------------------------------------------------------------------------
1 | % Generated by roxygen2: do not edit by hand
2 | % Please edit documentation in R/contrasting_functions.r
3 | \name{get_limma_top_table_with_ci}
4 | \alias{get_limma_top_table_with_ci}
5 | \title{get_limma_top_table_with_ci}
6 | \usage{
7 | get_limma_top_table_with_ci(fit2, the_coef, ci = 0.95)
8 | }
9 | \arguments{
10 | \item{fit2}{The fit2 object after calling eBayes as per standard limma
11 | workflow. Ie object that topTable gets called on.}
12 |
13 | \item{the_coef}{Coeffient. As passed to topTable.}
14 |
15 | \item{ci}{Confidence interval. Number between 0 and 1, default 0.95 (95\%)}
16 | }
17 | \value{
18 | Output of topTable, but with the (95%) confidence interval reported
19 | for the logFC.
20 | }
21 | \description{
22 | Internal function that wraps limma topTable output but also adds upper and
23 | lower confidence intervals to the logFC. Calculated according to
24 | \url{https://support.bioconductor.org/p/36108/}
25 | }
26 | \seealso{
27 | \code{\link{contrast_the_group_to_the_rest_with_limma_for_microarray}}
28 | Calling function.
29 | }
30 |
--------------------------------------------------------------------------------
/man/get_matched_stepped_mwtest_res_table.Rd:
--------------------------------------------------------------------------------
1 | % Generated by roxygen2: do not edit by hand
2 | % Please edit documentation in R/group_labelling_functions.r
3 | \name{get_matched_stepped_mwtest_res_table}
4 | \alias{get_matched_stepped_mwtest_res_table}
5 | \title{get_matched_stepped_mwtest_res_table}
6 | \usage{
7 | get_matched_stepped_mwtest_res_table(mwtest_res_table.this, the_pval)
8 | }
9 | \arguments{
10 | \item{mwtest_res_table.this}{Combined output of
11 | \code{\link{get_ranking_and_test_results}}}
12 |
13 | \item{the_pval}{Pvalue threshold}
14 | }
15 | \value{
16 | Stepped pvalues string
17 | }
18 | \description{
19 | Internal function to grab a table of the matched group(s).
20 | }
21 | \details{
22 | For use by make_ref_similarity_names_for_group
23 | }
24 | \seealso{
25 | \code{\link{make_ref_similarity_names_for_group}}
26 | }
27 |
--------------------------------------------------------------------------------
/man/get_ranking_and_test_results.Rd:
--------------------------------------------------------------------------------
1 | % Generated by roxygen2: do not edit by hand
2 | % Please edit documentation in R/group_labelling_functions.r
3 | \name{get_ranking_and_test_results}
4 | \alias{get_ranking_and_test_results}
5 | \title{get_ranking_and_test_results}
6 | \usage{
7 | get_ranking_and_test_results(de_table.ref.marked, the_test_group,
8 | the_test_dataset, the_ref_dataset, num_steps, pval = 0.01)
9 | }
10 | \arguments{
11 | \item{de_table.ref.marked}{see
12 | \link{make_ref_similarity_names_using_marked}}
13 |
14 | \item{the_test_group}{The group to calculate the stats on.}
15 |
16 | \item{the_test_dataset}{see
17 | \link{make_ref_similarity_names_using_marked}}
18 |
19 | \item{the_ref_dataset}{see
20 | \link{make_ref_similarity_names_using_marked}}
21 |
22 | \item{num_steps}{see
23 | \link{make_ref_similarity_names_using_marked}}
24 |
25 | \item{pval}{see
26 | \link{make_ref_similarity_names_using_marked}}
27 | }
28 | \value{
29 | Table of similarity contrast results/assigned names
30 | e.t.c for a single group.
31 | Used internally for populating mwtest_res_table tables.
32 | }
33 | \description{
34 | Internal function to get reference group similarity contrasts for an
35 | individual query qroup.
36 | }
37 | \details{
38 | For use by \bold{make_ref_similarity_names_using_marked}, see that function
39 | for parameter details.
40 | This function just runs this for a single query group \bold{the_test_group}
41 | }
42 | \seealso{
43 | \code{\link{make_ref_similarity_names_using_marked}}
44 | which calls this.
45 | }
46 |
--------------------------------------------------------------------------------
/man/get_rankstat_table.Rd:
--------------------------------------------------------------------------------
1 | % Generated by roxygen2: do not edit by hand
2 | % Please edit documentation in R/group_labelling_functions.r
3 | \name{get_rankstat_table}
4 | \alias{get_rankstat_table}
5 | \title{get_rankstat_table}
6 | \usage{
7 | get_rankstat_table(de_table.ref.marked, the_test_group)
8 | }
9 | \arguments{
10 | \item{de_table.ref.marked}{The output of
11 | \code{\link{get_the_up_genes_for_all_possible_groups}} for the contrast
12 | of interest.}
13 |
14 | \item{the_test_group}{Name of query group to test}
15 | }
16 | \value{
17 | A tibble of query group name (test_group),
18 | number of 'top' genes (n),
19 | reference dataset group (group) with its ranking (grouprank) and the median
20 | (rescaled 0..1) ranking of 'top' genes (median_rank).
21 | }
22 | \description{
23 | Summarise the comparison of the specified query group against in the
24 | comparison in \bold{de_table.ref.marked} - number of 'top' genes and their
25 | median rank in each of the reference groups, with reference group rankings.
26 | }
27 | \examples{
28 |
29 | # Make input
30 | # de_table.demo_query <- contrast_each_group_to_the_rest(demo_query_se, "demo_query")
31 | # de_table.demo_ref <- contrast_each_group_to_the_rest(demo_ref_se, "demo_ref")
32 |
33 | de_table.marked.query_vs_ref <- get_the_up_genes_for_all_possible_groups(
34 | de_table.demo_query,
35 | de_table.demo_ref)
36 |
37 | get_rankstat_table(de_table.marked.query_vs_ref, "Group3")
38 |
39 | }
40 | \seealso{
41 | \code{\link{get_the_up_genes_for_all_possible_groups}} To
42 | prepare the \bold{de_table.ref.marked} input.
43 | }
44 |
--------------------------------------------------------------------------------
/man/get_reciprocal_matches.Rd:
--------------------------------------------------------------------------------
1 | % Generated by roxygen2: do not edit by hand
2 | % Please edit documentation in R/group_labelling_functions.r
3 | \name{get_reciprocal_matches}
4 | \alias{get_reciprocal_matches}
5 | \title{get_reciprocal_matches}
6 | \usage{
7 | get_reciprocal_matches(mwtest_res_table.recip, de_table.recip.marked,
8 | the_pval)
9 | }
10 | \arguments{
11 | \item{mwtest_res_table.recip}{Combined output of
12 | \code{\link{get_ranking_and_test_results}} for reciprocal test -
13 | ref vs query.}
14 |
15 | \item{de_table.recip.marked}{Recriprocal ref vs query de_table.ref.marked}
16 |
17 | \item{the_pval}{See make_ref_similarity_names_using_marked}
18 | }
19 | \value{
20 | List of table of reciprocal matches tested from reference to query.
21 | }
22 | \description{
23 | Internal function to run a bionomial test of
24 | median test rank > 0.5 (random).
25 | }
26 | \details{
27 | For use by make_ref_similarity_names_using_marked
28 | }
29 | \seealso{
30 | \code{\link{make_ref_similarity_names_using_marked}}
31 | }
32 |
--------------------------------------------------------------------------------
/man/get_stepped_pvals_str.Rd:
--------------------------------------------------------------------------------
1 | % Generated by roxygen2: do not edit by hand
2 | % Please edit documentation in R/group_labelling_functions.r
3 | \name{get_stepped_pvals_str}
4 | \alias{get_stepped_pvals_str}
5 | \title{get_stepped_pvals_str}
6 | \usage{
7 | get_stepped_pvals_str(mwtest_res_table.this)
8 | }
9 | \arguments{
10 | \item{mwtest_res_table.this}{Combined output of
11 | \code{\link{get_ranking_and_test_results}}}
12 | }
13 | \value{
14 | Stepped pvalues string
15 | }
16 | \description{
17 | Internal function to construct the string of stepped pvalues reported by
18 | make_ref_similarity_names_using_marked
19 | }
20 | \details{
21 | For use by make_ref_similarity_names_for_group
22 | }
23 | \seealso{
24 | \code{\link{make_ref_similarity_names_for_group}}
25 | }
26 |
--------------------------------------------------------------------------------
/man/get_the_up_genes_for_all_possible_groups.Rd:
--------------------------------------------------------------------------------
1 | % Generated by roxygen2: do not edit by hand
2 | % Please edit documentation in R/contrasting_functions.r
3 | \name{get_the_up_genes_for_all_possible_groups}
4 | \alias{get_the_up_genes_for_all_possible_groups}
5 | \title{get_the_up_genes_for_all_possible_groups}
6 | \usage{
7 | get_the_up_genes_for_all_possible_groups(de_table.test, de_table.ref,
8 | rankmetric = "TOP100_LOWER_CI_GTE1", n = 100)
9 | }
10 | \arguments{
11 | \item{de_table.test}{A differential expression table of the query
12 | experiment, as generated from
13 | \code{\link{contrast_each_group_to_the_rest}}}
14 |
15 | \item{de_table.ref}{A differential expression table of the reference
16 | dataset, as generated from
17 | \code{\link{contrast_each_group_to_the_rest}}}
18 |
19 | \item{rankmetric}{Specifiy ranking method used to pick the
20 | 'top' genes. The default 'TOP100_LOWER_CI_GTE1' picks genes from the top 100
21 | overrepresented genes (ranked by inner 95% confidence interval) - appears to
22 | work best for distinct cell types (e.g. tissue sample.). 'TOP100_SIG' again
23 | picks from the top 100 ranked genes, but requires only statistical
24 | significance, 95% CI threshold - may perform better on more similar cell
25 | clusters (e.g. PBMCs).}
26 |
27 | \item{n}{For tweaking maximum returned genes from different ranking methods.
28 | Will change the p-values! Suggest leaving as default unless you're keen.}
29 | }
30 | \value{
31 | \emph{de_table.marked} This will alsmost be a subset of
32 | \bold{de_table.ref},
33 | with an added column \emph{test_group} set to the query groups, and
34 | \emph{test_dataset} set to \bold{test_dataset_name}.
35 |
36 | If nothing passes the rankmetric criteria, a warning is thrown and NA is
37 | returned. (This can be a genuine inability to pick out the
38 | representative 'up' genes, or due to some problem in the analysis)
39 | }
40 | \description{
41 | For the most overrepresented genes of each group in the test
42 | dataset, get their rankings in all the groups of the reference dataset.
43 | }
44 | \details{
45 | This is effectively a subset of the reference data, 'marked' with the 'top'
46 | genes that represent the groups in the query data. The
47 | distribution of the \emph{rescaled ranks} of these marked genes in each
48 | reference data group indicate how similar they are to the query group.
49 |
50 | This function is simply a conveinent wrapper for
51 | \code{\link{get_the_up_genes_for_group}} that merges output for
52 | each group in the query into one table.
53 | }
54 | \examples{
55 | de_table.marked.query_vs_ref <- get_the_up_genes_for_all_possible_groups(
56 | de_table.test=de_table.demo_query ,
57 | de_table.ref=de_table.demo_ref )
58 |
59 | }
60 | \seealso{
61 | \code{\link{get_the_up_genes_for_group}} Function for
62 | testing a single group.
63 | }
64 |
--------------------------------------------------------------------------------
/man/get_the_up_genes_for_group.Rd:
--------------------------------------------------------------------------------
1 | % Generated by roxygen2: do not edit by hand
2 | % Please edit documentation in R/contrasting_functions.r
3 | \name{get_the_up_genes_for_group}
4 | \alias{get_the_up_genes_for_group}
5 | \title{get_the_up_genes_for_group}
6 | \usage{
7 | get_the_up_genes_for_group(the_group, de_table.test, de_table.ref,
8 | rankmetric = "TOP100_LOWER_CI_GTE1", n = 100)
9 | }
10 | \arguments{
11 | \item{the_group}{The group (from the test/query experiment) to examine.}
12 |
13 | \item{de_table.test}{A differential expression table of the query
14 | experiment, as generated from
15 | \code{\link{contrast_each_group_to_the_rest}}}
16 |
17 | \item{de_table.ref}{A differential expression table of the reference
18 | dataset, as generated from
19 | \code{\link{contrast_each_group_to_the_rest}}}
20 |
21 | \item{rankmetric}{Specifiy ranking method used to pick the
22 | 'top' genes. The default 'TOP100_LOWER_CI_GTE1' picks genes from the top 100
23 | overrepresented genes (ranked by inner 95% confidence interval) - appears to
24 | work best for distinct cell types (e.g. tissue sample.). 'TOP100_SIG' again
25 | picks from the top 100 ranked genes, but requires only statistical
26 | significance, 95% CI threshold - may perform better on more similar cell
27 | clusters (e.g. PBMCs).}
28 |
29 | \item{n}{For tweaking maximum returned genes from different ranking methods.
30 | Will change the p-values! Suggest leaving as default unless you're keen.}
31 | }
32 | \value{
33 | \emph{de_table.marked} This will be a subset of
34 | \bold{de_table.ref}, with an added column \emph{test_group} set to
35 | \bold{the_group}. If nothing passes the rankmetric criteria, NA.
36 | }
37 | \description{
38 | For the most overrepresented genes of the specified group in the test
39 | dataset, get their rankings in all the groups of the reference dataset.
40 | }
41 | \details{
42 | This is effectively a subset of the reference data, 'marked' with the 'top'
43 | genes that represent the group of interest in the query data. The
44 | distribution of the \emph{rescaled ranks} of these marked genes in each
45 | reference data group indicate how similar they are to the query group.
46 | }
47 | \examples{
48 | de_table.marked.Group3vsRef <- get_the_up_genes_for_group(
49 | the_group="Group3",
50 | de_table.test=de_table.demo_query,
51 | de_table.ref=de_table.demo_ref)
52 |
53 | }
54 | \seealso{
55 | \code{\link{contrast_each_group_to_the_rest}} For prepraring the
56 | de_table.* tables.
57 | \code{\link{get_the_up_genes_for_all_possible_groups}} For running
58 | all query groups at once.
59 | }
60 |
--------------------------------------------------------------------------------
/man/get_vs_random_pval.Rd:
--------------------------------------------------------------------------------
1 | % Generated by roxygen2: do not edit by hand
2 | % Please edit documentation in R/group_labelling_functions.r
3 | \name{get_vs_random_pval}
4 | \alias{get_vs_random_pval}
5 | \title{get_vs_random_pval}
6 | \usage{
7 | get_vs_random_pval(de_table.ref.marked, the_group, the_test_group)
8 | }
9 | \arguments{
10 | \item{de_table.ref.marked}{see make_ref_similarity_names_for_group}
11 |
12 | \item{the_group}{Reference group name}
13 |
14 | \item{the_test_group}{Test group name
15 | #'}
16 | }
17 | \value{
18 | Pvalue result of a binomial test of each 'top gene' being greater
19 | than the theoretical random median rank of 0.5 (halfway).
20 | }
21 | \description{
22 | Internal function to run a bionomial test of
23 | median test rank > 0.5 (random).
24 | }
25 | \details{
26 | For use by make_ref_similarity_names_for_group
27 | }
28 | \seealso{
29 | \code{\link{make_ref_similarity_names_for_group}}
30 | }
31 |
--------------------------------------------------------------------------------
/man/load_dataset_10Xdata.Rd:
--------------------------------------------------------------------------------
1 | % Generated by roxygen2: do not edit by hand
2 | % Please edit documentation in R/loading_helper_functions.r
3 | \name{load_dataset_10Xdata}
4 | \alias{load_dataset_10Xdata}
5 | \title{load_dataset_10Xdata}
6 | \usage{
7 | load_dataset_10Xdata(dataset_path, dataset_genome, clustering_set,
8 | gene_id_cols_10X = c("ensembl_ID", "GeneSymbol"),
9 | id_to_use = gene_id_cols_10X[1])
10 | }
11 | \arguments{
12 | \item{dataset_path}{Path to the directory of 10X data, as generated by the
13 | cellRanger pipeline (versions 2.1.0 and 2.0.1). The directory should have
14 | subdirecotires \emph{analysis}, \emph{filtered_gene_bc_matrices} and
15 | \emph{raw_gene_bc_matrices} (only the first 2 are read).}
16 |
17 | \item{dataset_genome}{The genome that the reads were aligned against,
18 | e.g. GRCh38. Check for this as a directory name under the
19 | \emph{filtered_gene_bc_matrices} subdirectory if unsure.}
20 |
21 | \item{clustering_set}{The 10X cellRanger pipeline produces several
22 | different cluster definitions per dataset. Specify which one to use e.g.
23 | kmeans_10_clusters Find them as directory names under
24 | \emph{analysis/clustering/}}
25 |
26 | \item{gene_id_cols_10X}{Vector of the names of the columns in the gene
27 | description file (\emph{filtered_gene_bc_matrices/GRCh38/genes.csv}). The
28 | first element of this will become the ID.
29 | Default = c("ensembl_ID","GeneSymbol")}
30 |
31 | \item{id_to_use}{Column from \bold{gene_id_cols_10X} that defines the gene
32 | identifier to use as 'ID' in the returned SummarisedExperiment object.
33 | Many-to-one relationships betwen the assumed unique first element of
34 | \bold{gene_id_cols_10X} and \bold{id_to_use} will be handled gracefully by
35 | \code{\link{convert_se_gene_ids}}.
36 | Defaults to first element of \bold{gene_id_cols_10X}}
37 | }
38 | \value{
39 | A SummarisedExperiment object containing the count data, cell info
40 | and gene info.
41 | }
42 | \description{
43 | Convenience function to create a SummarizedExperiment object (dataset_se)
44 | from a the output of 10X cell ranger pipeline run.
45 | }
46 | \details{
47 | This function makes a SummarizedExperiment object in a form that
48 | should work for celaref functions. Specifically, that means it will have an
49 | 'ID' feild for genes (view with \code{rowData(dataset_se)}), and both
50 | 'cell_sample' and 'group' feild for cells (view with
51 | \code{colData(dataset_se)}). See parameters for detail.
52 | Additionally, the counts will be an integer matrix (not a
53 | sparse matrix), and the \emph{group} feild (but not \emph{cell_sample}
54 | or \emph{ID}) will be a factor.
55 |
56 | The clustering information can be read from whichever cluster is specified,
57 | usually there will be several choices.
58 |
59 | This funciton is designed to work with output of version 2.0.1 of the
60 | cellRanger pipeline, may not work with others (will not work for 1.x).
61 | }
62 | \examples{
63 | example_10X_dir <- system.file("extdata", "sim_cr_dataset", package = "celaref")
64 | dataset_se <- load_dataset_10Xdata(example_10X_dir, dataset_genome="GRCh38",
65 | clustering_set="kmeans_4_clusters", gene_id_cols_10X=c("gene"))
66 |
67 | \dontrun{
68 | dataset_se <- load_dataset_10Xdata('~/path/to/data/10X_pbmc4k',
69 | dataset_genome="GRCh38",
70 | clustering_set="kmeans_7_clusters")
71 | }
72 |
73 | }
74 | \seealso{
75 | \href{https://bioconductor.org/packages/release/bioc/html/SummarizedExperiment.html}{SummarizedExperiment}
76 | For general doco on the SummarizedExperiment objects.
77 |
78 | \code{\link{convert_se_gene_ids}} describes method for
79 | converting IDs.
80 |
81 | Other Data loading functions: \code{\link{contrast_each_group_to_the_rest_for_norm_ma_with_limma}},
82 | \code{\link{load_se_from_tables}}
83 | }
84 | \concept{Data loading functions}
85 |
--------------------------------------------------------------------------------
/man/load_se_from_tables.Rd:
--------------------------------------------------------------------------------
1 | % Generated by roxygen2: do not edit by hand
2 | % Please edit documentation in R/loading_helper_functions.r
3 | \name{load_se_from_tables}
4 | \alias{load_se_from_tables}
5 | \alias{load_se_from_files}
6 | \title{load_se_from_tables}
7 | \usage{
8 | load_se_from_tables(counts_matrix, cell_info_table, gene_info_table = NA,
9 | group_col_name = "group", cell_col_name = NA)
10 |
11 | load_se_from_files(counts_file, cell_info_file, gene_info_file = NA,
12 | group_col_name = "group", cell_col_name = NA)
13 | }
14 | \arguments{
15 | \item{counts_matrix}{A tab-separated matrix of read counts for each gene
16 | (row) and each cell (column). Columns and rows should be named.}
17 |
18 | \item{cell_info_table}{Table of cell information.
19 | If there is a column labelled
20 | \emph{cell_sample}, that will be used as the unique cell identifiers.
21 | If not, the first column is assumed to be cell identifiers, and will be
22 | copied to a new feild labelled \emph{cell_sample}.
23 | Similarly - the clusters of these cells should be listed in one column -
24 | which can be called 'group' (case-sensitive) or specified with
25 | \bold{group_col_name}. \emph{Minimal data format: }}
26 |
27 | \item{gene_info_table}{Optional table of gene information. If there is a
28 | column labelled
29 | \emph{ID}, that will be used as the gene identifiers (they must be unique!).
30 | If not, the first column is assumed to be a gene identifier, and will be
31 | copied to a
32 | new feild labelled \emph{ID}. Must match all rownames in
33 | \bold{counts_matrix}.
34 | If omitted, ID wll be generated from the rownames of counts_matrix.
35 | Default=NA}
36 |
37 | \item{group_col_name}{Name of the column in \bold{cell_info_table}
38 | containing
39 | the cluster/group that each cell belongs to. Case-sensitive. Default='group'}
40 |
41 | \item{cell_col_name}{Name of the column in \bold{cell_info_table} containing
42 | a cell id. Ignored if \emph{cell_sample} column is already present.
43 | If omitted, (and no \emph{cell_sample} column) will use first column.
44 | Case-sensitive. Default=NA}
45 |
46 | \item{counts_file}{A tab-separated file of a matrix of read counts. As per
47 | \bold{counts_matrix}. First column should be gene ID, and top row cell ids.}
48 |
49 | \item{cell_info_file}{Tab-separated text file of cell information, as per
50 | \bold{cell_info_table}. Columns must have names.}
51 |
52 | \item{gene_info_file}{Optional tab-separated text file of gene information,
53 | as per \bold{gene_info_file}. Columns must have names. Default=NA}
54 | }
55 | \value{
56 | A SummarisedExperiment object containing the count data, cell info
57 | and gene info.
58 | }
59 | \description{
60 | Create a SummarizedExperiment object (dataset_se) from a count matrix, cell
61 | information and optionally gene information.
62 |
63 | \code{load_se_from_files} is a wrapper for \code{load_se_from_tables} that
64 | will read in tables from specified files.
65 | }
66 | \details{
67 | This function makes a SummarizedExperiment object in a form that
68 | should work for celaref functions. Specifically, that means it will have an
69 | 'ID' feild for genes (view with \code{rowData(dataset_se)}), and both
70 | 'cell_sample' and 'group' feild for cells (view with
71 | \code{colData(dataset_se)}). See parameters for detail.
72 | Additionally, the counts will be an integer matrix (not a
73 | sparse matrix), and the \emph{group} feild (but not \emph{cell_sample}
74 | or \emph{ID}) will be a factor.
75 |
76 | Note that data will be subsetted to cells present in both the counts matrix
77 | and cell info, this is handy for loading subsets of cells.
78 | However, if \bold{gene_info_file} is defined, all genes must match exactly.
79 |
80 | The \code{load_se_from_files} form of this function will run the same
81 | checks, but will read everything from files in one go. The
82 | \code{load_se_from_tables}
83 | form is perhaps more useful when the annotations need to be modified (e.g.
84 | programmatically adding a different gene identifier, renaming groups,
85 | removing unwanted samples).
86 |
87 | Note that the SummarizedExperiment object can also be created without using
88 | these functions, it just needs the \emph{cell_sample}, \emph{ID} and
89 | \emph{group} feilds as described above. Since sometimes it might be easier
90 | to add these to an existing \emph{SummarizedExperiment} from upstream
91 | analyses.
92 | }
93 | \section{Functions}{
94 | \itemize{
95 | \item \code{load_se_from_files}: To read from files
96 | }}
97 |
98 | \examples{
99 |
100 | # From data frames (or a matrix for counts) :
101 | demo_se <- load_se_from_tables(counts_matrix=demo_counts_matrix,
102 | cell_info_table=demo_cell_info_table)
103 | demo_se <- load_se_from_tables(counts_matrix=demo_counts_matrix,
104 | cell_info_table=demo_cell_info_table,
105 | gene_info_table=demo_gene_info_table)
106 |
107 | # Or from data files :
108 | counts_filepath <- system.file("extdata", "sim_query_counts.tab", package = "celaref")
109 | cell_info_filepath <- system.file("extdata", "sim_query_cell_info.tab", package = "celaref")
110 | gene_info_filepath <- system.file("extdata", "sim_query_gene_info.tab", package = "celaref")
111 |
112 | demo_se <- load_se_from_files(counts_file=counts_filepath, cell_info_file=cell_info_filepath)
113 | demo_se <- load_se_from_files(counts_file=counts_filepath, cell_info_file=cell_info_filepath,
114 | gene_info_file=gene_info_filepath )
115 |
116 | }
117 | \seealso{
118 | \href{https://bioconductor.org/packages/release/bioc/html/SummarizedExperiment.html}{SummarizedExperiment} For general doco on the SummarizedExperiment objects.
119 |
120 | Other Data loading functions: \code{\link{contrast_each_group_to_the_rest_for_norm_ma_with_limma}},
121 | \code{\link{load_dataset_10Xdata}}
122 | }
123 | \concept{Data loading functions}
124 | \concept{Data-loading functions}
125 |
--------------------------------------------------------------------------------
/man/make_ranking_violin_plot.Rd:
--------------------------------------------------------------------------------
1 | % Generated by roxygen2: do not edit by hand
2 | % Please edit documentation in R/plotting_functions.R
3 | \name{make_ranking_violin_plot}
4 | \alias{make_ranking_violin_plot}
5 | \title{make_ranking_violin_plot}
6 | \usage{
7 | make_ranking_violin_plot(de_table.marked = NA, de_table.test = NA,
8 | de_table.ref = NA, log10trans = FALSE, ...)
9 | }
10 | \arguments{
11 | \item{de_table.marked}{The output of
12 | \code{\link{get_the_up_genes_for_all_possible_groups}}
13 | for the contrast of interest.}
14 |
15 | \item{de_table.test}{A differential expression table of the
16 | query experiment,
17 | as generated from \code{\link{contrast_each_group_to_the_rest}}}
18 |
19 | \item{de_table.ref}{A differential expression table of the
20 | reference dataset,
21 | as generated from \code{\link{contrast_each_group_to_the_rest}}}
22 |
23 | \item{log10trans}{Plot on a log scale? Useful for distinishing multiple
24 | similar, yet distinct cell type that bunch at top of plot. Default=FALSE.}
25 |
26 | \item{...}{Further options to be passed to
27 | \code{\link{get_the_up_genes_for_all_possible_groups}},
28 | e.g. rankmetric}
29 | }
30 | \value{
31 | A ggplot object.
32 | }
33 | \description{
34 | Plot a panel of violin plots showing the distribution of the 'top' genes of
35 | each of query group, across the reference dataset.
36 | }
37 | \details{
38 | In the plot output, each panel correponsds to a different group/cluster in
39 | the query experiment. The x-axis has the groups in the reference dataset.
40 | The y-axis is the rescaled rank of each 'top' gene from the query group,
41 | within each reference group.
42 |
43 | Only the 'top' genes for each query group are plotted, forming the violin
44 | plots - each individual gene is shown as a tickmark. Some groups have few
45 | top genes, and so their uncertanty can be seen on this plot.
46 |
47 | The thick black lines reprenset the median gene rescaled ranking for each
48 | query group / reference group combination. Having this fall above the dotted
49 | median threshold marker is a quick indication of potential similarity.
50 | A complete lack of similarity would have a median rank around 0.5. Median
51 | rankings much less than 0.5 are common though (an 'anti-cell-groupA'
52 | signature), because genes overrepresented in one group in an experiment,
53 | are likely to be relatively 'underrepresented' in the other groups.
54 | Taken to an
55 | extreme, if there are only two reference groups, they'll be complete
56 | opposites.
57 |
58 | Input can be either the precomputed \emph{de_table.marked} object for the
59 | comparison, OR both \emph{de_table.test} and \emph{de_table.ref}
60 | differential expression results to compare from
61 | \code{\link{contrast_each_group_to_the_rest}}
62 | }
63 | \examples{
64 |
65 | # Make input
66 | # de_table.demo_query <- contrast_each_group_to_the_rest(demo_query_se, "demo_query")
67 | # de_table.demo_ref <- contrast_each_group_to_the_rest(demo_ref_se, "demo_ref")
68 |
69 | # This:
70 | make_ranking_violin_plot(de_table.test=de_table.demo_query,
71 | de_table.ref=de_table.demo_ref )
72 |
73 | # Is equivalent to this:
74 | de_table.marked.query_vs_ref <-
75 | get_the_up_genes_for_all_possible_groups( de_table.test=de_table.demo_query,
76 | de_table.ref=de_table.demo_ref)
77 | make_ranking_violin_plot(de_table.marked.query_vs_ref)
78 |
79 |
80 | }
81 | \seealso{
82 | \code{\link{get_the_up_genes_for_all_possible_groups}} To make
83 | the input data.
84 | }
85 |
--------------------------------------------------------------------------------
/man/make_ref_similarity_names.Rd:
--------------------------------------------------------------------------------
1 | % Generated by roxygen2: do not edit by hand
2 | % Please edit documentation in R/group_labelling_functions.r
3 | \name{make_ref_similarity_names}
4 | \alias{make_ref_similarity_names}
5 | \alias{make_ref_similarity_names_using_marked}
6 | \title{make_ref_similarity_names}
7 | \usage{
8 | make_ref_similarity_names(de_table.test, de_table.ref, pval = 0.01,
9 | num_steps = 5, rankmetric = "TOP100_LOWER_CI_GTE1", n = 100)
10 |
11 | make_ref_similarity_names_using_marked(de_table.ref.marked,
12 | de_table.recip.marked = NA, the_test_dataset = NA,
13 | the_ref_dataset = NA, pval = 0.01, num_steps = 5)
14 | }
15 | \arguments{
16 | \item{de_table.test}{A differential expression table of the query
17 | experiment, as generated from
18 | \code{\link{contrast_each_group_to_the_rest}}}
19 |
20 | \item{de_table.ref}{A differential expression table of the reference
21 | dataset, as generated from
22 | \code{\link{contrast_each_group_to_the_rest}}}
23 |
24 | \item{pval}{Differences between the rescaled ranking distribution of 'top'
25 | genes on different reference groups are tested with a Mann-Whitney U test.
26 | If they are \emph{significantly different},
27 | only the top group(s) are reported.
28 | It isn't a simple cutoff threshold as it can change the number of similar
29 | groups reported. ie. A more stringent \bold{pval} is more likely to decide
30 | that groups are similar -
31 | which would result in multiple group reporting, or no similarity at all.
32 | Unlikely that this parameter will ever need to change. Default = 0.01.}
33 |
34 | \item{num_steps}{After ranking reference groups according to median 'top'
35 | gene ranking, how many adjacent pairs to test for differences.
36 | Set to 1 to only compare each group to the next, or NA to perform an
37 | all-vs-all comparison.
38 | Setting too low may means it is possible to miss groups with some similarity
39 | to the reported matches (\emph{similar_non_match} column)).
40 | Too high (or NA) with a large number of reference groups could be slow.
41 | Default = 5.}
42 |
43 | \item{rankmetric}{Specifiy ranking method used to pick the
44 | 'top' genes. The default 'TOP100_LOWER_CI_GTE1' picks genes from the top 100
45 | overrepresented genes (ranked by inner 95% confidence interval) - appears to
46 | work best for distinct cell types (e.g. tissue sample.). 'TOP100_SIG' again
47 | picks from the top 100 ranked genes, but requires only statistical
48 | significance, 95% CI threshold - may perform better on more similar cell
49 | clusters (e.g. PBMCs).}
50 |
51 | \item{n}{For tweaking maximum returned genes from different ranking methods.}
52 |
53 | \item{de_table.ref.marked}{The output of
54 | \code{\link{get_the_up_genes_for_all_possible_groups}} for the contrast
55 | of interest.}
56 |
57 | \item{de_table.recip.marked}{Optional. The (reciprocal) output of
58 | \code{\link{get_the_up_genes_for_all_possible_groups}} with the test and
59 | reference datasets swapped.
60 | If omitted a reciprocal test will not be done. Default = NA.}
61 |
62 | \item{the_test_dataset}{Optional. A short meaningful name for the
63 | experiment.
64 | (Should match \emph{test_dataset} column in \bold{de_table.marked}).
65 | Only needed in a table of more than one dataset. Default = NA.}
66 |
67 | \item{the_ref_dataset}{Optional. A short meaningful name for the
68 | experiment.
69 | (Should match \emph{dataset} column in \bold{de_table.marked}).
70 | Only needed in a table of more than one dataset. Default = NA.}
71 | }
72 | \value{
73 | A table of automagically-generated labels for each query group,
74 | given their similarity to reference groups.
75 |
76 | The columns in this table:
77 | \itemize{
78 | \item \bold{test_group} : Query group e.g. "c1"
79 | \item \bold{shortlab} : The cluster label described above e.g.
80 | "c1:macrophage"
81 | \item \bold{pval} : If there is a similarity flagged, this is the P-value
82 | from a Mann-Whitney U test from the last 'matched' group to the adjacent
83 | 'non-matched' group. Ie. If only one label in shortlab, this will be the
84 | first of the stepped_pvals, if there are 2, it will be the second.
85 | If there is 'no_similarity' this will be NA
86 | (Because there is no confidence in what
87 | is the most appropriate of the all non-significant stepped pvalues.).
88 | \item \bold{stepped_pvals} : P-values from Mann-Whitney U tests across
89 | adjacent pairs of reference groups ordered from most to least similar
90 | (ascending median rank).
91 | ie. 1st-2nd most similar first, 2nd-3rd, 3rd-4th e.t.c. The last value
92 | will always be NA (no more reference group).
93 | e.g.
94 | refA:8.44e-10,refB:2.37e-06,refC:0.000818,refD:0.435,refE:0.245,refF:NA
95 | \item \bold{pval_to_random} : P-value of test of median rank (of last
96 | matched reference group) < random, from binomial test on top gene
97 | ranks (being < 0.5).
98 | \item \bold{matches} : List of all reference groups that 'match',
99 | as described, except it also includes (rare) examples where
100 | pval_to_random is not significant. "|" delimited.
101 | \item \bold{reciprocal_matches} : List of all reference groups that
102 | flagged test group as a match when directon of comparison is reversed.
103 | (significant pval and pval_to_random). "|" delimited.
104 | \item \bold{similar_non_match}: This column lists any reference groups
105 | outside of shortlab that are not signifcantly different to a reported
106 | match group. Limited by \emph{num_steps}, and will never find anything
107 | if num_steps==1. "|" delimited. Usually NA.
108 | \item \bold{similar_non_match_detail} : P-values for any details about
109 | similar_non_match results. These p-values will always be non-significant.
110 | E.g. "A > C (p=0.0214,n.s)". "|" delimited. Usually NA.
111 | \item \bold{differences_within} : This feild lists any pairs of
112 | reference groups in shortlab that are significantly different.
113 | "|" delimited. Usually NA.
114 | }
115 | }
116 | \description{
117 | Construct some sensible labels or the groups/clusters in the query dataset,
118 | based on similarity the reference dataset.
119 |
120 | This is a more low level/customisable version of
121 | \code{\link{make_ref_similarity_names}}, (would usually use that instead).
122 | Suitable for rare cases to reuse an existing \bold{de_table.ref.marked}
123 | object. Or use a \bold{de_table.ref.marked} table with more than one dataset
124 | present (discoraged). Or to skip the reciprocal comparison step.
125 | }
126 | \details{
127 | This function aims to report a) the top most similar reference group, if
128 | there's a clear frontrunner, b) A list of multiple similar groups if they
129 | have similar similarity, or c) 'No similarity', if there is none.
130 |
131 | Each group is named according to the following rules.
132 | Testing for significant
133 | (smaller) differences with a one-directional Mann-Whitney U test on their
134 | rescaled ranks:
135 | \enumerate{
136 | \item The first (as ranked by median rescaled rank) reference group is
137 | significantly more similar than the next: Report \emph{first only}.
138 | \item When comparing differences betwen groups stepwise ranked by
139 | median rescaled rank - no group is significantly different to its
140 | neighbour: Report \emph{no similarity}
141 | \item There's no significant differences in the stepwise comparisons
142 | of the first N reference groups - but there is a significant
143 | difference later on : Report \emph{multiple group similarity}
144 | }
145 |
146 | There are some further heuristic caveats:
147 | \enumerate{
148 | \item The distribution of top genes in the last (or only) match group is
149 | tested versus a theroetical random distribution around 0.5 (as reported
150 | in \emph{pval_vs_random} column). If the distribution is not
151 | significantly above random
152 | (It is possible in edge cases where there is a skewed dataset
153 | and no/few matches),
154 | \emph{no similarity} is reported. The significnat \emph{pval} column is
155 | left intact.
156 | \item The comparison is repeated reciprocally - reference groups vs the
157 | query groups. This helps sensitivity of heterogenous query groups -
158 | and investigating the reciprocal matches can be informative in these
159 | cases.
160 | If a query group doens't 'match' a reference group, but the reference
161 | group does match that query group - it is reported in the group label in
162 | brackets.
163 | e.g. \emph{c1:th_lymphocytes(tc_lympocytes)}.
164 | Its even possible if there was no match (and pval = NA)
165 | e.g. emph{c2:(tc_lymphocytes)}
166 | }
167 |
168 |
169 |
170 | The similarity is formatted into a group label. Where there are
171 | multiple similar groups, they're listed from most to least similar by their
172 | median ranks.
173 |
174 | For instance, a query dataset of clusters c1, c2, c3 and c4 againsts a
175 | cell-type labelled reference datatset might get names like:
176 | E.g.
177 | \itemize{
178 | \item c1:macrophage
179 | \item c2:endotheial|mesodermal
180 | \item c3:no_similarity
181 | \item c4:mesodermal(endothelial)
182 | }
183 |
184 | Function \code{make_ref_similarity_names} is a convenience wrapper function
185 | for \code{make_ref_similarity_names_from_marked}. It accepts two 'de_table'
186 | outputs of function \code{contrast_each_group_to_the_rest} to compare
187 | and handles running
188 | \code{\link{get_the_up_genes_for_all_possible_groups}}.
189 | Sister function \code{make_ref_similarity_names_from_marked} may (rarely) be
190 | of use if the \bold{de_table.marked} object has already been created,
191 | or if reciprocal tests are not wanted.
192 | }
193 | \section{Functions}{
194 | \itemize{
195 | \item \code{make_ref_similarity_names_using_marked}: Construct some sensible cluster
196 | labels, but using a premade marked table.
197 | }}
198 |
199 | \examples{
200 |
201 | # Make input
202 | # de_table.demo_query <- contrast_each_group_to_the_rest(demo_query_se, "demo_query")
203 | # de_table.demo_ref <- contrast_each_group_to_the_rest(demo_ref_se, "demo_ref")
204 |
205 | make_ref_similarity_names(de_table.demo_query, de_table.demo_ref)
206 | make_ref_similarity_names(de_table.demo_query, de_table.demo_ref, num_steps=3)
207 | make_ref_similarity_names(de_table.demo_query, de_table.demo_ref, num_steps=NA)
208 |
209 |
210 | # Make input
211 | # de_table.demo_query <- contrast_each_group_to_the_rest(demo_query_se, "demo_query")
212 | # de_table.demo_ref <- contrast_each_group_to_the_rest(demo_ref_se, "demo_ref")
213 |
214 | de_table.marked.query_vs_ref <- get_the_up_genes_for_all_possible_groups(
215 | de_table.demo_query, de_table.demo_ref)
216 | de_table.marked.reiprocal <- get_the_up_genes_for_all_possible_groups(
217 | de_table.demo_ref, de_table.demo_query)
218 |
219 |
220 | make_ref_similarity_names_using_marked(de_table.marked.query_vs_ref,
221 | de_table.marked.reiprocal)
222 |
223 | make_ref_similarity_names_using_marked(de_table.marked.query_vs_ref)
224 |
225 |
226 | }
227 | \seealso{
228 | \code{\link{contrast_each_group_to_the_rest}} For
229 | preparing de_table input
230 |
231 | \code{\link{get_the_up_genes_for_all_possible_groups}}
232 | To prepare the \bold{de_table.ref.marked} input.
233 | }
234 |
--------------------------------------------------------------------------------
/man/make_ref_similarity_names_for_group.Rd:
--------------------------------------------------------------------------------
1 | % Generated by roxygen2: do not edit by hand
2 | % Please edit documentation in R/group_labelling_functions.r
3 | \name{make_ref_similarity_names_for_group}
4 | \alias{make_ref_similarity_names_for_group}
5 | \title{make_ref_similarity_names_for_group}
6 | \usage{
7 | make_ref_similarity_names_for_group(the_test_group, mwtest_res_table,
8 | de_table.ref.marked, reciprocal_matches = NA, the_test_dataset,
9 | the_ref_dataset, the_pval)
10 | }
11 | \arguments{
12 | \item{the_test_group}{Query group to make name for}
13 |
14 | \item{mwtest_res_table}{Mann-whitney test results as constructed
15 | in \code{\link{make_ref_similarity_names_using_marked}}}
16 |
17 | \item{de_table.ref.marked}{The output of
18 | \code{\link{get_the_up_genes_for_all_possible_groups}} for the contrast of
19 | interest.}
20 |
21 | \item{reciprocal_matches}{Simplified table of reciprocal matches prepared
22 | within \code{\link{make_ref_similarity_names_using_marked}}.
23 | If omitted no reciprocal matching done. Default = NA.}
24 |
25 | \item{the_test_dataset}{A short meaningful name for the experiment.
26 | (Should match \emph{test_dataset} column in \bold{de_table.marked})}
27 |
28 | \item{the_ref_dataset}{A short meaningful name for the experiment.
29 | (Should match \emph{dataset} column in \bold{de_table.marked})}
30 |
31 | \item{the_pval}{pval as per
32 | \code{\link{make_ref_similarity_names_using_marked}}}
33 | }
34 | \value{
35 | A tibble with just one group's labelling information, as per
36 | \code{\link{make_ref_similarity_names_using_marked}}
37 | }
38 | \description{
39 | Internal function, called by make_ref_similarity_names_using_marked
40 | for each group.
41 | }
42 | \seealso{
43 | \code{\link{make_ref_similarity_names_using_marked}}
44 | Only place that uses this function, details there.
45 | }
46 |
--------------------------------------------------------------------------------
/man/run_pair_test_stats.Rd:
--------------------------------------------------------------------------------
1 | % Generated by roxygen2: do not edit by hand
2 | % Please edit documentation in R/group_labelling_functions.r
3 | \name{run_pair_test_stats}
4 | \alias{run_pair_test_stats}
5 | \title{run_pair_test_stats}
6 | \usage{
7 | run_pair_test_stats(de_table.ref.marked, the_test_group, groupA, groupB,
8 | enforceAgtB = TRUE)
9 | }
10 | \arguments{
11 | \item{de_table.ref.marked}{The output of
12 | \code{\link{get_the_up_genes_for_all_possible_groups}} for the contrast
13 | of interest.}
14 |
15 | \item{the_test_group}{Name of the test group in query dataset.}
16 |
17 | \item{groupA}{One of the reference group names}
18 |
19 | \item{groupB}{Another of the reference group names}
20 |
21 | \item{enforceAgtB}{Do a one tailed test of A 'less' B (more similar)?
22 | Or two-tailed. Default = TRUE.}
23 | }
24 | \value{
25 | A tibble of wilcox / man-whitneyU test results for this contrast.
26 | }
27 | \description{
28 | Internal function to compare the distribution of a query datasets 'top'
29 | genes between two different reference datasete groups with a
30 | Mann–Whitney U test. One directional test if groupA median < group B.
31 | }
32 | \details{
33 | For use by make_ref_similarity_names_using_marked
34 | }
35 | \seealso{
36 | \code{\link{make_ref_similarity_names_using_marked}}
37 | }
38 |
--------------------------------------------------------------------------------
/man/subset_cells_by_group.Rd:
--------------------------------------------------------------------------------
1 | % Generated by roxygen2: do not edit by hand
2 | % Please edit documentation in R/contrasting_functions.r
3 | \name{subset_cells_by_group}
4 | \alias{subset_cells_by_group}
5 | \title{subset_cells_by_group}
6 | \usage{
7 | subset_cells_by_group(dataset_se, n.group = 1000)
8 | }
9 | \arguments{
10 | \item{dataset_se}{Summarised experiment object containing count data. Also
11 | requires 'ID' and 'group' to be set within the cell information.}
12 |
13 | \item{n.group}{How many cells to keep for each group. Default = 1000}
14 | }
15 | \value{
16 | \emph{dataset_se} A hopefully more managably subsetted version of
17 | the inputted \bold{dataset_se}.
18 | }
19 | \description{
20 | Utility function to randomly subset very large datasets (that use too much
21 | memory). Specify a maximum number of cells to keep per group and use the
22 | subsetted version to analysis.
23 | }
24 | \details{
25 | The resulting
26 | differential expression table \emph{de_table} will have reduced statistical
27 | power.
28 | But as long as enough cells are left to reasonably accurately
29 | calculate differnetial expression between groups this should be enough for
30 | celaref to work with.
31 |
32 | Also, this function will lose proportionality of groups
33 | (there'll be \emph{n.groups} or less of each).
34 | Consider using the n.group/n.other parameters in
35 | \emph{contrast_each_group_to_the_rest} or
36 | \emph{contrast_the_group_to_the_rest} -
37 | which subsets non-group cells independantly for each group.
38 | That may be more approriate for tissue type samples which would have similar
39 | compositions of cells.
40 |
41 | So this function is intended for use when either; the
42 | proportionality isn't relevant (e.g. FACs purified cell populations),
43 | or, the data is just too big to work with otherwise.
44 |
45 | Cells are randomly sampled, so set the random seed (with \emph{set.seed()})
46 | for consistant results across runs.
47 | }
48 | \examples{
49 |
50 | dataset_se.30pergroup <- subset_cells_by_group(demo_query_se, n.group=30)
51 |
52 | }
53 | \seealso{
54 | \code{\link{contrast_each_group_to_the_rest}} For alternative method
55 | of subsetting cells proportionally.
56 | }
57 |
--------------------------------------------------------------------------------
/man/subset_se_cells_for_group_test.Rd:
--------------------------------------------------------------------------------
1 | % Generated by roxygen2: do not edit by hand
2 | % Please edit documentation in R/contrasting_functions.r
3 | \name{subset_se_cells_for_group_test}
4 | \alias{subset_se_cells_for_group_test}
5 | \title{subset_se_cells_for_group_test}
6 | \usage{
7 | subset_se_cells_for_group_test(dataset_se, the_group, n.group = Inf,
8 | n.other = n.group * 5)
9 | }
10 | \arguments{
11 | \item{dataset_se}{Summarised experiment object containing count data. Also
12 | requires 'ID' and 'group' to be set within the cell information.}
13 |
14 | \item{the_group}{The group being subsetted for}
15 |
16 | \item{n.group}{How many cells to keep for each group. Default = Inf}
17 |
18 | \item{n.other}{How many cells to keep from everything not in the group.
19 | Default = \bold{n.group} * 5}
20 | }
21 | \value{
22 | \emph{dataset_se} A hopefully more managably subsetted version of
23 | the inputted \bold{dataset_se}
24 | }
25 | \description{
26 | This function for use by \code{\link{contrast_each_group_to_the_rest}}
27 | downsamples cells from a summarizedExperiment
28 | (\emph{dataset_se}) - keeping \bold{n.group} (or all if fewer)
29 | cells from the specified group, and \bold{n.other} from the rest.
30 | This maintains the proportions of cells in the 'other' part of the
31 | differential expression comparisons.
32 | }
33 | \details{
34 | Cells are randomly sampled, so set the random seed (with \emph{set.seed()})
35 | for consistant results across runs.
36 | }
37 | \seealso{
38 | Calling function \code{\link{contrast_each_group_to_the_rest}}
39 |
40 | \code{\link{subset_cells_by_group}} Exported function for
41 | subsetting each group independantly upfront.
42 | (For when this approach is still unmanageable)
43 | }
44 |
--------------------------------------------------------------------------------
/man/trim_small_groups_and_low_expression_genes.Rd:
--------------------------------------------------------------------------------
1 | % Generated by roxygen2: do not edit by hand
2 | % Please edit documentation in R/loading_helper_functions.r
3 | \name{trim_small_groups_and_low_expression_genes}
4 | \alias{trim_small_groups_and_low_expression_genes}
5 | \title{trim_small_groups_and_low_expression_genes}
6 | \usage{
7 | trim_small_groups_and_low_expression_genes(dataset_se,
8 | min_lib_size = 1000, min_group_membership = 5,
9 | min_reads_in_sample = 1, min_detected_by_min_samples = 5)
10 | }
11 | \arguments{
12 | \item{dataset_se}{Summarised experiment object containing count data. Also
13 | requires 'ID' and 'group' to be set within the cell information
14 | (see \code{colData()})}
15 |
16 | \item{min_lib_size}{Minimum library size. Cells with fewer than this many
17 | reads removed. Default = 1000}
18 |
19 | \item{min_group_membership}{Throw out groups/clusters with fewer than this
20 | many cells. May change with experiment size. Default = 5}
21 |
22 | \item{min_reads_in_sample}{Require this many reads to consider a gene
23 | detected in a sample. Default = 1}
24 |
25 | \item{min_detected_by_min_samples}{Keep genes detected in this many
26 | samples. May change with experiment size. Default = 5}
27 | }
28 | \value{
29 | A filtered dataset_se, ready for use.
30 | }
31 | \description{
32 | Filter and return a SummarizedExperiment object (dataset_se) by several
33 | metrics:
34 | \itemize{
35 | \item Cells with at least \bold{min_lib_size} total reads.
36 | \item Genes expressed in at least \bold{min_detected_by_min_samples}
37 | cells, at a threshold of \bold{min_reads_in_sample} per cell.
38 | \item Remove entire groups (clusters) of cells where there are fewer than
39 | \bold{min_group_membership} cells in that group.
40 | }
41 | }
42 | \details{
43 | If it hasn't been done already, it is highly reccomended to use this
44 | function to filter out genes with no/low total counts
45 | (especially in single cell data,
46 | there can be many) - without expression they are not useful and may reduce
47 | statistical power.
48 |
49 | Likewise, very small groups (<5 cells) are unlikely to give useful
50 | results with this method. And cells with abnormally small library sizes may
51 | not be desireable.
52 |
53 | Of course 'reasonable' thresholds for filtering cells/genes are subjective.
54 | Defaults are moderately sensible starting points.
55 | }
56 | \examples{
57 |
58 | demo_query_se.trimmed <-
59 | trim_small_groups_and_low_expression_genes(demo_query_se)
60 | demo_query_se.trimmed2 <-
61 | trim_small_groups_and_low_expression_genes(demo_ref_se,
62 | min_group_membership = 10)
63 |
64 | }
65 |
--------------------------------------------------------------------------------
/tests/testthat.R:
--------------------------------------------------------------------------------
1 | library(testthat)
2 | library(celaref)
3 |
4 | test_check("celaref")
5 |
--------------------------------------------------------------------------------
/tests/testthat/test-contrasting_functions.R:
--------------------------------------------------------------------------------
1 | context("test-contrasting_functions")
2 |
3 |
4 | test_that("MAST contrasts - dense, sparse and empty", {
5 |
6 | # Checkt the first 5 genes are the same.
7 | asubset <- seq_len(30)
8 | top_check <- seq_len(5)
9 | de_genes <- c("Gene3", "Gene23", "Gene10", "Gene25", "Gene30")
10 | de_genes.3 <- c("Gene1", "Gene4", "Gene3", "Gene30", "Gene10") #altered data.
11 |
12 | # Densse
13 | demo_query_se.1 <- demo_query_se[asubset,asubset]
14 | de_table1.demo_query <- contrast_each_group_to_the_rest(
15 | demo_query_se.1, "a_demo_query", num_cores=1)
16 |
17 | expect_equal(de_table1.demo_query$ID[top_check], de_genes )
18 |
19 |
20 | #now sparse
21 | demo_query_se.2 <- demo_query_se.1
22 | assays(demo_query_se.2)[[1]] <- Matrix::Matrix(assay(demo_query_se.1), sparse=TRUE)
23 | de_table2.demo_query <- contrast_each_group_to_the_rest(
24 | demo_query_se.2, "a_demo_query", num_cores=1)
25 | expect_equal(de_table1.demo_query$ID[top_check], de_genes )
26 |
27 |
28 | # what is one gene's expression for entire group totally empty?
29 | # (previously caused errors and there's a workaround now.)
30 | demo_query_se.3 <- demo_query_se.1
31 | assays(demo_query_se.3)[[1]][1,demo_query_se.1$group =="Group2"] <- 0
32 | de_table3.demo_query <- contrast_each_group_to_the_rest(
33 | demo_query_se.3, "a_demo_query", num_cores=1)
34 | expect_equal(de_table3.demo_query$ID[top_check], de_genes.3 )
35 | #ID pval log2FC ci_inner ci_outer fdr group sig sig_up gene_count rank rescaled_rank dataset
36 | #1 Gene1 0.06701189 -1.98643911 1.734379 -5.707257 0.4505910 Group1 FALSE FALSE 30 1 0.03333333 a_demo_query
37 | #2 Gene4 0.99903639 -0.04353777 1.726446 -1.813522 0.9990364 Group1 FALSE FALSE 30 2 0.06666667 a_demo_query
38 | #3 Gene3 0.50595300 -2.71865608 1.394369 -6.831681 0.5837919 Group1 FALSE FALSE 30 3 0.10000000 a_demo_query
39 | #4 Gene30 0.67979358 -0.94039185 1.281941 -3.162725 0.7279509 Group1 FALSE FALSE 30 4 0.13333333 a_demo_query
40 | #5 Gene10 0.68075516 -0.46524523 1.147719 -2.078209 0.7279509 Group1 FALSE FALSE 30 5 0.16666667 a_demo_query
41 | })
42 |
43 |
44 |
45 |
46 | test_that("MAST contrasts - hdf5-backed assays and SCE objects", {
47 | # Just test that runs -
48 | # these things are succeptible to format / object changes.
49 |
50 | # dense sce
51 | d.sce.den <- as(demo_query_se, "SingleCellExperiment")
52 | expect_equal( 10, nrow(
53 | contrast_each_group_to_the_rest(d.sce.den[1:10,],'test',
54 | groups2test = "Group2", n.group = 20, num_cores = 1)))
55 |
56 | # sparse SCE
57 | d.sce.sp <- d.sce.den
58 | assays(d.sce.sp)[[1]] <- Matrix::Matrix(assays(d.sce.sp)[[1]], sparse=TRUE)
59 | expect_equal( 10, nrow(
60 | contrast_each_group_to_the_rest(d.sce.sp[1:10,],'test',
61 | groups2test = "Group2", n.group = 20, num_cores = 1)))
62 |
63 |
64 | # hdf5 SCE
65 | d.sce.hdf <- HDF5Array::saveHDF5SummarizedExperiment( d.sce.sp , replace=TRUE)
66 | expect_equal( 10, nrow(
67 | contrast_each_group_to_the_rest(d.sce.hdf[1:10,],'test',
68 | groups2test = "Group2", n.group = 20, num_cores = 1)))
69 |
70 | })
71 |
72 |
73 |
74 |
75 |
76 | test_that("Microarray reference", {
77 |
78 | top5 <- c("Gene100", "Gene150", "Gene57", "Gene80", "Gene21" )
79 | de_table.ma <- contrast_each_group_to_the_rest_for_norm_ma_with_limma(
80 | norm_expression_table=demo_microarray_expr,
81 | sample_sheet_table=demo_microarray_sample_sheet,
82 | dataset_name="DemoSimMicroarrayRef",
83 | sample_name="cell_sample", group_name="group")
84 |
85 |
86 | expect_equal(de_table.ma$ID[seq_len(5)], top5)
87 |
88 | })
89 |
90 |
91 |
92 |
93 | test_that("Rankmetrics", {
94 |
95 | # Ask for just 10 genes and check them. Actually same for both mehods.
96 | genes.TOP100_LOWER_CI_GTE1 <-
97 | c("Gene100", "Gene150", "Gene57", "Gene80", "Gene21",
98 | "Gene30", "Gene23", "Gene65", "Gene101", "Gene10")
99 | genes.TOP100_SIG <- genes.TOP100_LOWER_CI_GTE1 # are same
100 |
101 |
102 | de_table.marked.Group3vsRef.TOP100_LOWER_CI_GTE1 <-
103 | get_the_up_genes_for_group(
104 | the_group="Group3",
105 | de_table.test=de_table.demo_query,
106 | de_table.ref=de_table.demo_ref,
107 | rankmetric = "TOP100_LOWER_CI_GTE1",
108 | n=10)
109 | expect_equal(de_table.marked.Group3vsRef.TOP100_LOWER_CI_GTE1$ID[
110 | de_table.marked.Group3vsRef.TOP100_LOWER_CI_GTE1$group == "Dunno"],
111 | genes.TOP100_LOWER_CI_GTE1)
112 |
113 |
114 |
115 | de_table.marked.Group3vsRef.TOP100_SIG <-
116 | get_the_up_genes_for_group(
117 | the_group="Group3",
118 | de_table.test=de_table.demo_query,
119 | de_table.ref=de_table.demo_ref,
120 | rankmetric = 'TOP100_SIG', n=10)
121 |
122 | expect_equal(de_table.marked.Group3vsRef.TOP100_SIG$ID[
123 | de_table.marked.Group3vsRef.TOP100_SIG$group == "Dunno"],
124 | genes.TOP100_SIG)
125 |
126 |
127 | })
128 |
129 |
130 |
131 |
132 | test_that("Subsetting ses", {
133 |
134 | dataset_se.30pergroup <- subset_cells_by_group(demo_query_se, n.group=30)
135 | expect_equal(sum(dataset_se.30pergroup$group == "Group3"),30)
136 | expect_equal(sum(dataset_se.30pergroup$group == "Group1"),28)
137 |
138 | demo_query_se.subset2 <- subset_se_cells_for_group_test(demo_query_se,
139 | the_group="Group3",
140 | n.group=20,
141 | n.other=30)
142 | expect_equal(sum(demo_query_se.subset2$group == "Group3"),20)
143 | expect_equal(sum(demo_query_se.subset2$group != "Group3"),30)
144 |
145 | })
146 |
147 |
148 |
149 | #test_that("Finding counts", {
150 | #
151 | #})
152 |
153 |
--------------------------------------------------------------------------------
/tests/testthat/test-loading_helper_functions.R:
--------------------------------------------------------------------------------
1 | context("Loading functions")
2 | library(celaref)
3 |
4 |
5 | test_that("Load se from files, tables, 10X", {
6 |
7 |
8 | expect_something_in_demo_se <- function(test_se) {
9 |
10 | # any 0-length (or 1 length) dimensions are a fail,
11 | # and is usual fail case.
12 | # but don't check what's actually there, because it could change
13 | # These are different sized datasets anyway.
14 |
15 | expect_gt(base::ncol(test_se), 1) # cells
16 | expect_gt(nrow(test_se), 1) # genes # 1 gene would be wrong too.
17 | expect_gt(sum(assays(test_se)[[1]]), 1) #total counts aren't all 0
18 |
19 | expect_gt(nrow(colData(test_se)) , 1 )
20 | expect_gt(ncol(colData(test_se)) , 1 )
21 |
22 | expect_gt(nrow(rowData(test_se)) , 1 )
23 | expect_gt(ncol(rowData(test_se)) , 1 )
24 | }
25 |
26 |
27 | counts_filepath <- system.file("extdata", "sim_query_counts.tab", package = "celaref")
28 | cell_info_filepath <- system.file("extdata", "sim_query_cell_info.tab", package = "celaref")
29 | gene_info_filepath <- system.file("extdata", "sim_query_gene_info.tab", package = "celaref")
30 |
31 | demo_se.files <- load_se_from_files(counts_filepath,
32 | cell_info_file = cell_info_filepath,
33 | gene_info_file = gene_info_filepath)
34 | expect_something_in_demo_se(demo_se.files)
35 |
36 |
37 |
38 | demo_se.tables <- load_se_from_tables(counts_matrix=demo_counts_matrix,
39 | cell_info_table=demo_cell_info_table,
40 | gene_info_table=demo_gene_info_table)
41 | expect_something_in_demo_se(demo_se.tables)
42 |
43 |
44 | example_10X_dir <- system.file("extdata", "sim_cr_dataset", package = "celaref")
45 | dataset_se.10X <- load_dataset_10Xdata(example_10X_dir, dataset_genome="GRCh38",
46 | clustering_set="kmeans_4_clusters", gene_id_cols_10X=c("gene"))
47 |
48 | expect_something_in_demo_se(dataset_se.10X)
49 |
50 | })
51 |
52 |
53 |
54 | test_that("Filtering low expression genes and groups", {
55 |
56 | demo_ref_se.trim <- trim_small_groups_and_low_expression_genes(
57 | dataset_se=demo_ref_se,
58 | min_lib_size=1000, min_group_membership=50,
59 | min_reads_in_sample=1, min_detected_by_min_samples=20
60 | )
61 |
62 | expect_equal(length(levels(colData(demo_ref_se.trim)$group)), 3)
63 | expect_equal(nrow(demo_ref_se.trim), 199)
64 | expect_equal(ncol(demo_ref_se.trim), 489)
65 |
66 | })
67 |
68 |
69 |
70 |
71 | test_that("Converting gene ids",{
72 |
73 | dataset_se <- demo_ref_se[1:10, 1:10]
74 | rowData(dataset_se)$dummyname <- c(rep("A",5), rep("B",5))
75 | rowData(dataset_se)$total_not_count <- 1:10
76 |
77 | dataset_se.2 <- convert_se_gene_ids(dataset_se, new_id='dummyname', eval_col='total_not_count')
78 |
79 | expect_equal(rowData(dataset_se.2)["A","total_not_count"], 5)
80 |
81 | })
--------------------------------------------------------------------------------
/vignettes/celaref.bib:
--------------------------------------------------------------------------------
1 | @article{Farmer2017,
2 | abstract = {The tear producing lacrimal gland is a tubular organ that protects and lubricates the ocular surface. While the lacrimal gland possesses many features that make it an excellent model to understand tubulogenesis, the cell types and lineage relationships that drive lacrimal gland formation are unclear. Using single cell sequencing and other molecular tools, we reveal novel cell identities and epithelial lineage dynamics that underlie lacrimal gland development. We show that the lacrimal gland from its earliest developmental stages is composed of multiple subpopulations of immune, epithelial, and mesenchymal cell lineages. The epithelial lineage exhibits the most substantiative cellular changes, transitioning through a series of unique transcriptional states to become terminally differentiated acinar, ductal and myoepithelial cells. Furthermore, lineage tracing in postnatal and adult glands provides the first direct evidence of unipotent KRT5+ epithelial cells in the lacrimal gland. Finally, we show conservation of developmental markers between the developing mouse and human lacrimal gland, supporting the use of mice to understand human development. Together, our data reveal critical features of lacrimal gland development that have broad implications for understanding epithelial organogenesis.},
3 | author = {Farmer, D'Juan T. and Nathan, Sara and Finley, Jennifer K. and {Shengyang Yu}, Kevin and Emmerson, Elaine and Byrnes, Lauren E. and Sneddon, Julie B. and McManus, Michael T. and Tward, Aaron D. and Knox, Sarah M.},
4 | doi = {10.1242/dev.150789},
5 | file = {:Users/swil0005/mendeley/mendeley{\_}pdfs/Farmer et al. - 2017 - Defining epithelial cell dynamics and lineage relationships in the developing lacrimal gland.pdf:pdf},
6 | issn = {0950-1991},
7 | journal = {Development},
8 | keywords = {development,epithelia,lacrimal gland,single cell sequencing,tubulogenesis},
9 | number = {13},
10 | pages = {2517--2528},
11 | pmid = {28576768},
12 | title = {{Defining epithelial cell dynamics and lineage relationships in the developing lacrimal gland}},
13 | url = {http://dev.biologists.org/lookup/doi/10.1242/dev.150789},
14 | volume = {144},
15 | year = {2017}
16 | }
17 | @article{Finak2015,
18 | abstract = {Single-cell transcriptomics reveals gene expression heterogeneity but suffers from stochastic dropout and characteristic bimodal expression distributions in which expression is either strongly non-zero or non-detectable. We propose a two-part, generalized linear model for such bimodal data that parameterizes both of these features. We argue that the cellular detection rate, the fraction of genes expressed in a cell, should be adjusted for as a source of nuisance variation. Our model provides gene set enrichment analysis tailored to single-cell data. It provides insights into how networks of co-expressed genes evolve across an experimental treatment. MAST is available at https://github.com/RGLab/MAST .},
19 | author = {Finak, Greg and McDavid, Andrew and Yajima, Masanao and Deng, Jingyuan and Gersuk, Vivian and Shalek, Alex K. and Slichter, Chloe K. and Miller, Hannah W. and McElrath, M. Juliana and Prlic, Martin and Linsley, Peter S. and Gottardo, Raphael},
20 | doi = {10.1186/s13059-015-0844-5},
21 | file = {:Users/swil0005/mendeley/mendeley{\_}pdfs/Finak et al. - 2015 - MAST A flexible statistical framework for assessing transcriptional changes and characterizing heterogeneity in si.pdf:pdf},
22 | isbn = {10.1186/s13059-015-0844-5},
23 | issn = {1474760X},
24 | journal = {Genome Biology},
25 | keywords = {Bimodality,Cellular detection rate,Co-expression,Empirical Bayes,Gene set enrichment analysis,Generalized linear model},
26 | number = {1},
27 | pages = {1--13},
28 | pmid = {26653891},
29 | publisher = {Genome Biology},
30 | title = {{MAST: A flexible statistical framework for assessing transcriptional changes and characterizing heterogeneity in single-cell RNA sequencing data}},
31 | url = {http://dx.doi.org/10.1186/s13059-015-0844-5},
32 | volume = {16},
33 | year = {2015}
34 | }
35 | @article{Freytag2017,
36 | abstract = {The commercially available 10X Genomics protocol to generate droplet-based single cell RNA-seq (scRNA-seq) data is enjoying growing popularity among researchers. Fundamental to the analysis of such scRNA-seq data is the ability to cluster similar or same cells into non-overlapping groups. Many competing methods have been proposed for this task, but there is currently little guidance with regards to which method offers most accuracy. Answering this question is complicated by the fact that 10X Genomics data lack cell labels that would allow a direct performance evaluation. Thus in this review, we focused on comparing clustering solutions of a dozen methods for three datasets on human peripheral mononuclear cells generated with the 10X Genomics technology. While clustering solutions appeared robust, we found that solutions produced by different methods have little in common with each other. They also failed to replicate cell type assignment generated with supervised labeling approaches. Furthermore, we demonstrate that all clustering methods tested clustered cells to a large degree according to the amount of ribosomal RNA in each cell.},
37 | author = {Freytag, Saskia and Lonnstedt, Ingrid and Ng, Milica and Bahlo, Melanie},
38 | doi = {10.1101/203752},
39 | file = {:Users/swil0005/mendeley/mendeley{\_}pdfs/Freytag et al. - 2017 - Cluster Headache Comparing Clustering Tools for 10X Single Cell Sequencing Data.pdf:pdf},
40 | number = {4},
41 | title = {{Cluster Headache: Comparing Clustering Tools for 10X Single Cell Sequencing Data}},
42 | year = {2017}
43 | }
44 | @article{DeGraaf2016,
45 | abstract = {Hematopoiesis is a multistage process involving the differentiation of stem and progenitor cells into distinct mature cell lineages. Here we present Haemopedia, an atlas of murine gene-expression data containing 54 hematopoietic cell types, covering all the mature lineages in hematopoiesis. We include rare cell populations such as eosinophils, mast cells, basophils, and megakaryocytes, and a broad collection of progenitor and stem cells. We show that lineage branching and maturation during hematopoiesis can be reconstructed using the expression patterns of small sets of genes. We also have identified genes with enriched expression in each of the mature blood cell lineages, many of which show conserved lineage-enriched expression in human hematopoiesis. We have created an online web portal called Haemosphere to make analyses of Haemopedia and other blood cell transcriptional datasets easier. This resource provides simple tools to interrogate gene-expression-based relationships between hematopoietic cell types and genes of interest.},
46 | author = {de Graaf, Carolyn A. and Choi, Jarny and Baldwin, Tracey M. and Bolden, Jessica E. and Fairfax, Kirsten A. and Robinson, Aaron J. and Biben, Christine and Morgan, Clare and Ramsay, Kerry and Ng, Ashley P. and Kauppi, Maria and Kruse, Elizabeth A. and Sargeant, Tobias J. and Seidenman, Nick and D'Amico, Angela and D'Ombrain, Marthe C. and Lucas, Erin C. and Koernig, Sandra and {Baz Morelli}, Adriana and Wilson, Michael J. and Dower, Steven K. and Williams, Brenda and Heazlewood, Shen Y. and Hu, Yifang and Nilsson, Susan K. and Wu, Li and Smyth, Gordon K. and Alexander, Warren S. and Hilton, Douglas J.},
47 | doi = {10.1016/j.stemcr.2016.07.007},
48 | file = {:Users/swil0005/mendeley/mendeley{\_}pdfs/de Graaf et al. - 2016 - Haemopedia An Expression Atlas of Murine Hematopoietic Cells.pdf:pdf},
49 | isbn = {2213-6711 (Electronic)$\backslash$r2213-6711 (Linking)},
50 | issn = {22136711},
51 | journal = {Stem Cell Reports},
52 | number = {3},
53 | pages = {571--582},
54 | pmid = {27499199},
55 | title = {{Haemopedia: An Expression Atlas of Murine Hematopoietic Cells}},
56 | volume = {7},
57 | year = {2016}
58 | }
59 | @article{Harrison2018,
60 | abstract = {},
61 | author = {Harrison, Paul and Pattison, Andrew and Powell, David and Beilharz, Traude and Corresponding, Co-},
62 | doi = {10.1101/343145},
63 | file = {:Users/swil0005/mendeley/mendeley{\_}pdfs/Harrison et al. - 2018 - Topconfects a package for confident effect sizes in differential expression analysis provides improved usabilit.pdf:pdf},
64 | title = {{Topconfects: a package for confident effect sizes in differential expression analysis provides improved usability ranking genes of interest}},
65 | year = {2018}
66 | }
67 | @article{Kiselev2018,
68 | abstract = {Single-cell RNA-seq (scRNA-seq) allows researchers to define cell types on the basis of unsupervised clustering of the transcriptome. However, differences in experimental methods and computational analyses make it challenging to compare data across experiments. Here we present scmap (http://bioconductor.org/packages/scmap; web version at http://www.sanger.ac.uk/science/tools/scmap), a method for projecting cells from an scRNA-seq data set onto cell types or individual cells from other experiments.},
69 | author = {Kiselev, Vladimir Yu and Yiu, Andrew and Hemberg, Martin},
70 | doi = {10.1038/nmeth.4644},
71 | file = {:Users/swil0005/mendeley/mendeley{\_}pdfs/Kiselev, Yiu, Hemberg - 2018 - scmap projection of single-cell RNA-seq data across data sets.pdf:pdf},
72 | issn = {1548-7091},
73 | journal = {Nature Methods},
74 | month = {apr},
75 | number = {5},
76 | pages = {359--362},
77 | pmid = {29608555},
78 | publisher = {Nature Publishing Group},
79 | title = {{scmap: projection of single-cell RNA-seq data across data sets}},
80 | url = {http://dx.doi.org/10.1038/nmeth.4644 http://www.ncbi.nlm.nih.gov/pubmed/29608555 http://www.nature.com/doifinder/10.1038/nmeth.4644},
81 | volume = {15},
82 | year = {2018}
83 | }
84 | @article{Kiselev2017,
85 | abstract = {Single-cell RNA-seq enables the quantitative characterization of cell types based on global transcriptome profiles. We present single-cell consensus clustering (SC3), a user-friendly tool for unsupervised clustering, which achieves high accuracy and robustness by combining multiple clustering solutions through a consensus approach (http://bioconductor.org/packages/SC3). We demonstrate that SC3 is capable of identifying subclones from the transcriptomes of neoplastic cells collected from patients.},
86 | author = {Kiselev, Vladimir Yu and Kirschner, Kristina and Schaub, Michael T. and Andrews, Tallulah and Yiu, Andrew and Chandra, Tamir and Natarajan, Kedar N. and Reik, Wolf and Barahona, Mauricio and Green, Anthony R. and Hemberg, Martin},
87 | doi = {10.1038/nmeth.4236},
88 | file = {:Users/swil0005/mendeley/mendeley{\_}pdfs/Kiselev et al. - 2017 - SC3 Consensus clustering of single-cell RNA-seq data.pdf:pdf},
89 | issn = {15487105},
90 | journal = {Nature Methods},
91 | number = {5},
92 | pages = {483--486},
93 | pmid = {28346451},
94 | title = {{SC3: Consensus clustering of single-cell RNA-seq data}},
95 | volume = {14},
96 | year = {2017}
97 | }
98 | @article{Satija2015,
99 | abstract = {Spatial localization is a key determinant of cellular fate and behavior, but methods for spatially resolved, transcriptome-wide gene expression profiling across complex tissues are lacking. RNA staining methods assay only a small number of transcripts, whereas single-cell RNA-seq, which measures global gene expression, separates cells from their native spatial context. Here we present Seurat, a computational strategy to infer cellular localization by integrating single-cell RNA-seq data with in situ RNA patterns. We applied Seurat to spatially map 851 single cells from dissociated zebrafish (Danio rerio) embryos and generated a transcriptome-wide map of spatial patterning. We confirmed Seurat's accuracy using several experimental approaches, then used the strategy to identify a set of archetypal expression patterns and spatial markers. Seurat correctly localizes rare subpopulations, accurately mapping both spatially restricted and scattered groups. Seurat will be applicable to mapping cellular localization within complex patterned tissues in diverse systems.},
100 | author = {Satija, Rahul and Farrell, Jeffrey A and Gennert, David and Schier, Alexander F and Regev, Aviv},
101 | doi = {10.1038/nbt.3192},
102 | file = {:Users/swil0005/mendeley/mendeley{\_}pdfs/Satija et al. - 2015 - Spatial reconstruction of single-cell gene expression data.pdf:pdf},
103 | isbn = {1546-1696 (Electronic)$\backslash$r1087-0156 (Linking)},
104 | issn = {1087-0156},
105 | journal = {Nature Biotechnology},
106 | number = {5},
107 | pages = {495--502},
108 | pmid = {25867923},
109 | title = {{Spatial reconstruction of single-cell gene expression data}},
110 | url = {http://dx.doi.org/10.1038/nbt.3192},
111 | volume = {33},
112 | year = {2015}
113 | }
114 | @article{Watkins2009,
115 | abstract = {Hematopoiesis is a carefully controlled process that is regulated by complex networks of transcription factors that are, in part, controlled by signals resulting from ligand binding to cell-surface receptors. To further understand hematopoiesis, we have compared gene expression profiles of human erythroblasts, megakaryocytes, B cells, cytotoxic and helper T cells, natural killer cells, granulocytes, and monocytes using whole genome microarrays. A bioinformatics analysis of these data was performed focusing on transcription factors, immunoglobulin superfamily members, and lineage-specific transcripts. We observed that the numbers of lineage-specific genes varies by 2 orders of magnitude, ranging from 5 for cytotoxic T cells to 878 for granulocytes. In addition, we have identified novel coexpression patterns for key transcription factors involved in hematopoiesis (eg, GATA3-GFI1 and GATA2-KLF1). This study represents the most comprehensive analysis of gene expression in hematopoietic cells to date and has identified genes that play key roles in lineage commitment and cell function. The data, which are freely accessible, will be invaluable for future studies on hematopoiesis and the role of specific genes and will also aid the understanding of the recent genome-wide association studies.},
116 | author = {Watkins, Nicholas a and Gusnanto, Arief and de Bono, Bernard and De, Subhajyoti and Miranda-Saavedra, Diego and Hardie, Debbie L and Angenent, Will G J and Attwood, Antony P and Ellis, Peter D and Erber, Wendy and Foad, Nicola S and Garner, Stephen F and Isacke, Clare M and Jolley, Jennifer and Koch, Kerstin and Macaulay, Iain C and Morley, Sarah L and Rendon, Augusto and Rice, Kate M and Taylor, Niall and Thijssen-Timmer, Daphne C and Tijssen, Marloes R and van der Schoot, C Ellen and Wernisch, Lorenz and Winzer, Thilo and Dudbridge, Frank and Buckley, Christopher D and Langford, Cordelia F and Teichmann, Sarah and G{\"{o}}ttgens, Berthold and Ouwehand, Willem H and {Bloodomics Consortium}},
117 | doi = {10.1182/blood-2008-06-162958},
118 | file = {:Users/swil0005/mendeley/mendeley{\_}pdfs/Watkins et al. - 2009 - A HaemAtlas characterizing gene expression in differentiated human blood cells.pdf:pdf},
119 | issn = {1528-0020},
120 | journal = {Blood},
121 | month = {may},
122 | number = {19},
123 | pages = {e1--9},
124 | pmid = {19228925},
125 | title = {{A HaemAtlas: characterizing gene expression in differentiated human blood cells.}},
126 | url = {http://www.ncbi.nlm.nih.gov/pubmed/19228925 http://www.pubmedcentral.nih.gov/articlerender.fcgi?artid=PMC2680378},
127 | volume = {113},
128 | year = {2009}
129 | }
130 | @article{Zappia2017,
131 | abstract = {As single-cell RNA sequencing (scRNA-seq) technologies have rapidly developed, so have analysis methods. Many methods have been tested, developed, and validated using simulated datasets. Unfortunately, current simulations are often poorly documented, their similarity to real data is not demonstrated, or reproducible code is not available. Here, we present the Splatter Bioconductor package for simple, reproducible, and well-documented simulation of scRNA-seq data. Splatter provides an interface to multiple simulation methods including Splat, our own simulation, based on a gamma-Poisson distribution. Splat can simulate single populations of cells, populations with multiple cell types, or differentiation paths.},
132 | author = {Zappia, Luke and Phipson, Belinda and Oshlack, Alicia},
133 | doi = {10.1186/s13059-017-1305-0},
134 | file = {:Users/swil0005/mendeley/mendeley{\_}pdfs/Zappia, Phipson, Oshlack - 2017 - Splatter Simulation of single-cell RNA sequencing data.pdf:pdf},
135 | issn = {1474760X},
136 | journal = {Genome Biology},
137 | keywords = {RNA-seq,Simulation,Single-cell,Software},
138 | number = {1},
139 | pages = {1--15},
140 | pmid = {28899397},
141 | publisher = {Genome Biology},
142 | title = {{Splatter: Simulation of single-cell RNA sequencing data}},
143 | volume = {18},
144 | year = {2017}
145 | }
146 | @article{Zeisel2015,
147 | abstract = {The mammalian cerebral cortex supports cognitive functions such as sensorimotor integration, memory, and social behaviors. Normal brain function relies on a diverse set of differentiated cell types, including neurons, glia, and vasculature. Here, we have used large-scale single-cell RNA sequencing (RNA-seq) to classify cells in the mouse somatosensory cortex and hippocampal CA1 region. We found 47 molecularly distinct subclasses, comprising all known major cell types in the cortex. We identified numerous marker genes, which allowed alignment with known cell types, morphology, and location. We found a layer I interneuron expressing Pax6 and a distinct postmitotic oligodendrocyte subclass marked by Itpr2. Across the diversity of cortical cell types, transcription factors formed a complex, layered regulatory code, suggesting a mechanism for the maintenance of adult cell type identity.},
148 | archivePrefix = {arXiv},
149 | arxivId = {arXiv:gr-qc/9809069v1},
150 | author = {Zeisel, A. and Manchado, A. B. M. and Codeluppi, S. and Lonnerberg, P. and {La Manno}, G. and Jureus, A. and Marques, S. and Munguba, H. and He, L. and Betsholtz, C. and Rolny, C. and Castelo-Branco, G. and Hjerling-Leffler, J. and Linnarsson, S.},
151 | doi = {10.1126/science.aaa1934},
152 | eprint = {9809069v1},
153 | file = {:Users/swil0005/mendeley/mendeley{\_}pdfs/Zeisel et al. - 2015 - Cell types in the mouse cortex and hippocampus revealed by single-cell RNA-seq.pdf:pdf;:Users/swil0005/mendeley/mendeley{\_}pdfs/Zeisel et al. - 2015 - Cell types in the mouse cortex and hippocampus revealed by single-cell RNA-seq(2).pdf:pdf},
154 | isbn = {1095-9203 (Electronic)$\backslash$r0036-8075 (Linking)},
155 | issn = {0036-8075},
156 | journal = {Science},
157 | keywords = {mgstuff},
158 | mendeley-tags = {mgstuff},
159 | number = {6226},
160 | pages = {1138--42},
161 | pmid = {25700174},
162 | primaryClass = {arXiv:gr-qc},
163 | title = {{Cell types in the mouse cortex and hippocampus revealed by single-cell RNA-seq}},
164 | url = {http://science.sciencemag.org.docelec.univ-lyon1.fr/content/347/6226/1138.abstract},
165 | volume = {347},
166 | year = {2015}
167 | }
168 |
169 |
170 |
171 |
172 |
--------------------------------------------------------------------------------
/vignettes/images/pbmc4k_cloupe_kmeans7.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/MonashBioinformaticsPlatform/celaref/585f2fb96f8d382803cebea3c6fc7adefa8d2054/vignettes/images/pbmc4k_cloupe_kmeans7.png
--------------------------------------------------------------------------------
/vignettes/images/violin_plot_example.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/MonashBioinformaticsPlatform/celaref/585f2fb96f8d382803cebea3c6fc7adefa8d2054/vignettes/images/violin_plot_example.png
--------------------------------------------------------------------------------
/vignettes/images/workflow_diagram.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/MonashBioinformaticsPlatform/celaref/585f2fb96f8d382803cebea3c6fc7adefa8d2054/vignettes/images/workflow_diagram.png
--------------------------------------------------------------------------------