Cluster Markers

Overview

This block identifies marker genes for each cell cluster. It runs Wilcoxon test to find genes significantly enriched in each cluster, filtering them based on fold change, adjusted p-value and the proportion of cells expressing the gene. Results for selected top markers are visualized as a dot plot.

Pipeline context

This block typically follows Leiden Clustering, as it requires cluster assignments to find marker genes.

 Blocks                                 Result pool
                                       ┌───────────
             ┌─────────────────────────┤
             │                         │
             v                         │
 ╔═══════════════════════════╗ exports │
 ║     Leiden Clustering     ║────->─-─┤ Cluster assignments
 ╚═══════════════════════════╝         │ ---------------------
                                       │
                                       ├ [sampleId][cellId] -> clusterId
             ┌─────────────────────────┤
             │                         │
             v                         │
 ╔═══════════════════════════╗ exports │
 ║      Cluster Markers      ║────->─-─┤ Marker gene statistics
 ╚═══════════════════════════╝         │ ----------------------
                                       │
                                       ├ [clusterId][geneId] -> log2FC, p-value, ...
             ┌─────────────────────────┤
             │                         │
             v                         │
 ╔═══════════════════════╗             │
 ║  Downstream Analysis  ║             │
 ╚═══════════════════════╝             │

Core structure: axes and p-columns

The block consumes a p-column with cluster assignments and produces a p-frame containing marker gene statistics for each cluster.

Primary axes

Axis Name	Type	Description
`pl7.app/sampleId`	`String`	Uniquely identifies the sample.
`pl7.app/cellId`	`String`	Uniquely identifies a single cell within a sample.
`pl7.app/rna-seq/cluster-num`	`String`	Uniquely identifies a cell cluster.
`pl7.app/rna-seq/geneId`	`String`	Uniquely identifies a gene.

Input P-Columns

The block requires two inputs: the cell cluster assignments and the raw gene expression count matrix from which the clusters were derived.

1. Leiden Cluster

P-column name: pl7.app/rna-seq/leidencluster
Description: The cluster assignment for each cell. This is used to group cells for comparison.
Requirement: Required.
Specification:

# --- Core Identity ---
name: pl7.app/rna-seq/leidencluster
valueType: String

# --- Axes ---
axesSpec:
  - name: pl7.app/sampleId
    type: String
  - name: pl7.app/cellId
    type: String

# --- Domain ---
domain:
  pl7.app/blockId: "..." # blockId from the upstream clustering run

2. Raw Counts

P-column name: pl7.app/rna-seq/countMatrix
Description: The raw number of reads (or UMIs) for each gene in each cell. This is used to calculate the average expression and fold changes.
Requirement: Required.
Specification:

# --- Core Identity ---
name: pl7.app/rna-seq/countMatrix
valueType: Long

# --- Axes ---
axesSpec:
  - name: pl7.app/sampleId
    type: String
  - name: pl7.app/sc/cellId
    type: String
  - name: pl7.app/geneId
    type: String

# --- Domain ---
domain:
  pl7.app/rna-seq/normalized: "false"

# --- Annotations ---
annotations:
  pl7.app/label: "Raw Count Matrix"

Exported P-Columns

The block exports a single p-frame containing all marker gene statistics. This includes statistics for all genes (not just the top N) and is intended for programmatic use by downstream blocks.

Common Axes Specification: All exported p-columns in this p-frame share the following axes:

# --- Axes ---
axesSpec:
  - name: pl7.app/rna-seq/cluster-num
    type: String
    domain:
      pl7.app/blockId: "..." # blockId from this block run
    annotations:
      pl7.app/label: "Cluster"
  - name: pl7.app/rna-seq/geneId
    type: String
    domain:
      pl7.app/species: "..." # e.g., "mus_musculus"
    annotations:
      pl7.app/label": "Ensembl Id"

1. Log2 Fold Change

P-column name: pl7.app/rna-seq/log2foldchange
Description: The log2 fold change of the average expression of a gene in the current cluster compared to the average expression in all other cells.
Specification:

name: pl7.app/rna-seq/log2foldchange
valueType: Double
annotations:
  pl7.app/label: "Log2FC"

2. Adjusted p-value

P-column name: pl7.app/rna-seq/padj
Description: The adjusted p-value from the Wilcoxon rank-sum test.
Specification:

name: pl7.app/rna-seq/padj
valueType: Double
annotations:
  pl7.app/label: "Adjusted p-value"

3. Cell Percentage

P-column name: pl7.app/rna-seq/percentcells
Description: The percentage of cells within the cluster that express the gene.
Specification:

name: pl7.app/rna-seq/percentcells
valueType: Double
annotations:
  pl7.app/label: "Cell percentage expressed"

4. Mean Expression

P-column name: pl7.app/rna-seq/meanexpression
Description: The mean expression of the gene within the cluster.
Specification:

name: pl7.app/rna-seq/meanexpression
valueType: Double
annotations:
  pl7.app/label: "Mean expression in cluster"

Summary of Exported P-Columns

P-Column Name	Description	Axes	Requirement
`pl7.app/rna-seq/log2foldchange`	Log2 fold change of gene expression.	`[cluster-num][geneId]`	Required
`pl7.app/rna-seq/padj`	Adjusted p-value.	`[cluster-num][geneId]`	Required
`pl7.app/rna-seq/percentcells`	Percentage of cells expressing the gene.	`[cluster-num][geneId]`	Required
`pl7.app/rna-seq/meanexpression`	Mean expression in the cluster.	`[cluster-num][geneId]`	Required

Overview​

Pipeline context​

Core structure: axes and p-columns​

Primary axes​

Input P-Columns​

1. Leiden Cluster​

2. Raw Counts​

Exported P-Columns​

1. Log2 Fold Change​

2. Adjusted p-value​

3. Cell Percentage​

4. Mean Expression​

Summary of Exported P-Columns​