Skip to main content

Cluster Markers

Overview

This block identifies marker genes for each cell cluster. It runs Wilcoxon test to find genes significantly enriched in each cluster, filtering them based on fold change, adjusted p-value and the proportion of cells expressing the gene. Results for selected top markers are visualized as a dot plot.

Pipeline context

This block typically follows Leiden Clustering, as it requires cluster assignments to find marker genes.

 Blocks                                 Result pool
┌───────────
┌─────────────────────────┤
│ │
v │
╔═══════════════════════════╗ exports │
║ Leiden Clustering ║────->─-─┤ Cluster assignments
╚═══════════════════════════╝ │ ---------------------

├ [sampleId][cellId] -> clusterId
┌─────────────────────────┤
│ │
v │
╔═══════════════════════════╗ exports │
║ Cluster Markers ║────->─-─┤ Marker gene statistics
╚═══════════════════════════╝ │ ----------------------

├ [clusterId][geneId] -> log2FC, p-value, ...
┌─────────────────────────┤
│ │
v │
╔═══════════════════════╗ │
║ Downstream Analysis ║ │
╚═══════════════════════╝ │

Core structure: axes and p-columns

The block consumes a p-column with cluster assignments and produces a p-frame containing marker gene statistics for each cluster.

Primary axes

Axis NameTypeDescription
pl7.app/sampleIdStringUniquely identifies the sample.
pl7.app/cellIdStringUniquely identifies a single cell within a sample.
pl7.app/rna-seq/cluster-numStringUniquely identifies a cell cluster.
pl7.app/rna-seq/geneIdStringUniquely identifies a gene.

Input P-Columns

The block requires two inputs: the cell cluster assignments and the raw gene expression count matrix from which the clusters were derived.

1. Leiden Cluster

  • P-column name: pl7.app/rna-seq/leidencluster
  • Description: The cluster assignment for each cell. This is used to group cells for comparison.
  • Requirement: Required.
  • Specification:
# --- Core Identity ---
name: pl7.app/rna-seq/leidencluster
valueType: String

# --- Axes ---
axesSpec:
- name: pl7.app/sampleId
type: String
- name: pl7.app/cellId
type: String

# --- Domain ---
domain:
pl7.app/blockId: "..." # blockId from the upstream clustering run

2. Raw Counts

  • P-column name: pl7.app/rna-seq/countMatrix
  • Description: The raw number of reads (or UMIs) for each gene in each cell. This is used to calculate the average expression and fold changes.
  • Requirement: Required.
  • Specification:
# --- Core Identity ---
name: pl7.app/rna-seq/countMatrix
valueType: Long

# --- Axes ---
axesSpec:
- name: pl7.app/sampleId
type: String
- name: pl7.app/sc/cellId
type: String
- name: pl7.app/geneId
type: String

# --- Domain ---
domain:
pl7.app/rna-seq/normalized: "false"

# --- Annotations ---
annotations:
pl7.app/label: "Raw Count Matrix"

Exported P-Columns

The block exports a single p-frame containing all marker gene statistics. This includes statistics for all genes (not just the top N) and is intended for programmatic use by downstream blocks.

Common Axes Specification: All exported p-columns in this p-frame share the following axes:

# --- Axes ---
axesSpec:
- name: pl7.app/rna-seq/cluster-num
type: String
domain:
pl7.app/blockId: "..." # blockId from this block run
annotations:
pl7.app/label: "Cluster"
- name: pl7.app/rna-seq/geneId
type: String
domain:
pl7.app/species: "..." # e.g., "mus_musculus"
annotations:
pl7.app/label": "Ensembl Id"

1. Log2 Fold Change

  • P-column name: pl7.app/rna-seq/log2foldchange
  • Description: The log2 fold change of the average expression of a gene in the current cluster compared to the average expression in all other cells.
  • Specification:
name: pl7.app/rna-seq/log2foldchange
valueType: Double
annotations:
pl7.app/label: "Log2FC"

2. Adjusted p-value

  • P-column name: pl7.app/rna-seq/padj
  • Description: The adjusted p-value from the Wilcoxon rank-sum test.
  • Specification:
name: pl7.app/rna-seq/padj
valueType: Double
annotations:
pl7.app/label: "Adjusted p-value"

3. Cell Percentage

  • P-column name: pl7.app/rna-seq/percentcells
  • Description: The percentage of cells within the cluster that express the gene.
  • Specification:
name: pl7.app/rna-seq/percentcells
valueType: Double
annotations:
pl7.app/label: "Cell percentage expressed"

4. Mean Expression

  • P-column name: pl7.app/rna-seq/meanexpression
  • Description: The mean expression of the gene within the cluster.
  • Specification:
name: pl7.app/rna-seq/meanexpression
valueType: Double
annotations:
pl7.app/label: "Mean expression in cluster"

Summary of Exported P-Columns

P-Column NameDescriptionAxesRequirement
pl7.app/rna-seq/log2foldchangeLog2 fold change of gene expression.[cluster-num][geneId]Required
pl7.app/rna-seq/padjAdjusted p-value.[cluster-num][geneId]Required
pl7.app/rna-seq/percentcellsPercentage of cells expressing the gene.[cluster-num][geneId]Required
pl7.app/rna-seq/meanexpressionMean expression in the cluster.[cluster-num][geneId]Required