Cluster Markers
Overview
This block identifies marker genes for each cell cluster. It runs Wilcoxon test to find genes significantly enriched in each cluster, filtering them based on fold change, adjusted p-value and the proportion of cells expressing the gene. Results for selected top markers are visualized as a dot plot.
Pipeline context
This block typically follows Leiden Clustering, as it requires cluster assignments to find marker genes.
Blocks Result pool
┌───────────
┌─────────────────────────┤
│ │
v │
╔═══════════════════════════╗ exports │
║ Leiden Clustering ║────->─-─┤ Cluster assignments
╚═══════════════════════════╝ │ ---------------------
│
├ [sampleId][cellId] -> clusterId
┌─────────────────────────┤
│ │
v │
╔═══════════════════════════╗ exports │
║ Cluster Markers ║────->─-─┤ Marker gene statistics
╚═══════════════════════════╝ │ ----------------------
│
├ [clusterId][geneId] -> log2FC, p-value, ...
┌─────────────────────────┤
│ │
v │
╔═══════════════════════╗ │
║ Downstream Analysis ║ │
╚═══════════════════════╝ │
Core structure: axes and p-columns
The block consumes a p-column with cluster assignments and produces a p-frame containing marker gene statistics for each cluster.
Primary axes
| Axis Name | Type | Description |
|---|---|---|
pl7.app/sampleId | String | Uniquely identifies the sample. |
pl7.app/cellId | String | Uniquely identifies a single cell within a sample. |
pl7.app/rna-seq/cluster-num | String | Uniquely identifies a cell cluster. |
pl7.app/rna-seq/geneId | String | Uniquely identifies a gene. |
Input P-Columns
The block requires two inputs: the cell cluster assignments and the raw gene expression count matrix from which the clusters were derived.
1. Leiden Cluster
- P-column name:
pl7.app/rna-seq/leidencluster - Description: The cluster assignment for each cell. This is used to group cells for comparison.
- Requirement: Required.
- Specification:
# --- Core Identity ---
name: pl7.app/rna-seq/leidencluster
valueType: String
# --- Axes ---
axesSpec:
- name: pl7.app/sampleId
type: String
- name: pl7.app/cellId
type: String
# --- Domain ---
domain:
pl7.app/blockId: "..." # blockId from the upstream clustering run
2. Raw Counts
- P-column name:
pl7.app/rna-seq/countMatrix - Description: The raw number of reads (or UMIs) for each gene in each cell. This is used to calculate the average expression and fold changes.
- Requirement: Required.
- Specification:
# --- Core Identity ---
name: pl7.app/rna-seq/countMatrix
valueType: Long
# --- Axes ---
axesSpec:
- name: pl7.app/sampleId
type: String
- name: pl7.app/sc/cellId
type: String
- name: pl7.app/geneId
type: String
# --- Domain ---
domain:
pl7.app/rna-seq/normalized: "false"
# --- Annotations ---
annotations:
pl7.app/label: "Raw Count Matrix"
Exported P-Columns
The block exports a single p-frame containing all marker gene statistics. This includes statistics for all genes (not just the top N) and is intended for programmatic use by downstream blocks.
Common Axes Specification: All exported p-columns in this p-frame share the following axes:
# --- Axes ---
axesSpec:
- name: pl7.app/rna-seq/cluster-num
type: String
domain:
pl7.app/blockId: "..." # blockId from this block run
annotations:
pl7.app/label: "Cluster"
- name: pl7.app/rna-seq/geneId
type: String
domain:
pl7.app/species: "..." # e.g., "mus_musculus"
annotations:
pl7.app/label": "Ensembl Id"
1. Log2 Fold Change
- P-column name:
pl7.app/rna-seq/log2foldchange - Description: The log2 fold change of the average expression of a gene in the current cluster compared to the average expression in all other cells.
- Specification:
name: pl7.app/rna-seq/log2foldchange
valueType: Double
annotations:
pl7.app/label: "Log2FC"
2. Adjusted p-value
- P-column name:
pl7.app/rna-seq/padj - Description: The adjusted p-value from the Wilcoxon rank-sum test.
- Specification:
name: pl7.app/rna-seq/padj
valueType: Double
annotations:
pl7.app/label: "Adjusted p-value"
3. Cell Percentage
- P-column name:
pl7.app/rna-seq/percentcells - Description: The percentage of cells within the cluster that express the gene.
- Specification:
name: pl7.app/rna-seq/percentcells
valueType: Double
annotations:
pl7.app/label: "Cell percentage expressed"
4. Mean Expression
- P-column name:
pl7.app/rna-seq/meanexpression - Description: The mean expression of the gene within the cluster.
- Specification:
name: pl7.app/rna-seq/meanexpression
valueType: Double
annotations:
pl7.app/label: "Mean expression in cluster"
Summary of Exported P-Columns
| P-Column Name | Description | Axes | Requirement |
|---|---|---|---|
pl7.app/rna-seq/log2foldchange | Log2 fold change of gene expression. | [cluster-num][geneId] | Required |
pl7.app/rna-seq/padj | Adjusted p-value. | [cluster-num][geneId] | Required |
pl7.app/rna-seq/percentcells | Percentage of cells expressing the gene. | [cluster-num][geneId] | Required |
pl7.app/rna-seq/meanexpression | Mean expression in the cluster. | [cluster-num][geneId] | Required |