Skip to main content

Dimensionality Reduction

Overview

This block takes high-dimensional gene expression data and transforms it into a lower-dimensional space while preserving the most important biological variation. The block takes the output from any scRNA-seq preprocessing block (e.g., Cell Ranger) as input. It generates plots with tSNE and UMAP projections to aid dataset exploration and outputs dimension values to be used by downstream blocks (e.g. Leiden Clustering, Cell-type Annotation).

Pipeline context

This block is one of the first steps after pre-processing in a typical scRNA-seq analysis workflow.

 Blocks                                 Result pool
┌───────────
┌─────────────────────────┤
│ │
v │
╔═══════════════════════════╗ exports │
║ scRNA-seq Preprocessing ║────->─-─┤ Count Matrices, Gene & Cell Properties
╚═══════════════════════════╝ │ --------------------------------------

├ [sampleId][cellId][geneId] -> raw & normalized counts
┌─────────────────────────┤
│ │
v │
╔═══════════════════════════╗ exports │
║ Dimensionality Reduction ║────->─-─┤ UMAP, t-SNE, PCA embeddings
╚═══════════════════════════╝ │ -----------------------------

├ [sampleId][cellId] -> umap1, umap2, ...
┌─────────────────────────┤
│ │
v │
╔═══════════════════════╗ │
║ Downstream Analysis ║ │
╚═══════════════════════╝ │

Core structure: axes and p-columns

The block consumes a raw count matrix and produces p-frames containing UMAP, t-SNE, and PCA embeddings.

Primary axes

Axis NameTypeDescription
pl7.app/sampleIdStringUniquely identifies the sample.
pl7.app/cellIdStringUniquely identifies a single cell within a sample.

Input P-Columns

The block requires a raw gene expression count matrix as input.

1. Raw Counts

  • P-column name: pl7.app/rna-seq/countMatrix
  • Description: The raw number of reads (or UMIs) for each gene in each cell.
  • Requirement: Required.
  • Specification: The input p-column must have the pl7.app/rna-seq/normalized domain key set to "false".
# --- Core Identity ---
name: pl7.app/rna-seq/countMatrix
valueType: Long

# --- Axes ---
axesSpec:
- name: pl7.app/sampleId
type: String
- name: pl7.app/sc/cellId
type: String
- name: pl7.app/geneId
type: String

# --- Domain ---
domain:
pl7.app/rna-seq/normalized: "false"

# --- Annotations ---
annotations:
pl7.app/label: "Raw Count Matrix"

Exported P-Columns

The block exports a single p-frame containing all the generated embeddings: UMAP, t-SNE, and PCA. Downstream blocks can consume this p-frame to access any of the calculated dimensions.

1. UMAP Coordinates

The block generates three p-columns for the UMAP dimensions.

  • P-column names: pl7.app/rna-seq/umap1, pl7.app/rna-seq/umap2, pl7.app/rna-seq/umap3
  • Description: The coordinates for each cell in the UMAP embedding space.
  • Requirement: Required.
  • Specification (UMAP Dim1 example):
# --- Core Identity ---
name: pl7.app/rna-seq/umap1
valueType: Double

# --- Axes ---
axesSpec:
- name: pl7.app/sampleId
type: String
- name: pl7.app/cellId
type: String

# --- Domain ---
domain:
pl7.app/blockId: "..." # a unique identifier for the block run
pl7.app/rna-seq/batch-corrected": "false"

# --- Annotations ---
annotations:
pl7.app/label: "UMAP Dim1"

2. t-SNE Coordinates

The block generates three p-columns for the t-SNE dimensions.

  • P-column names: pl7.app/rna-seq/tsne1, pl7.app/rna-seq/tsne2, pl7.app/rna-seq/tsne3
  • Description: The coordinates for each cell in the t-SNE embedding space.
  • Requirement: Required.
  • Specification (t-SNE Dim1 example):
# --- Core Identity ---
name: pl7.app/rna-seq/tsne1
valueType: Double

# --- Axes ---
axesSpec:
- name: pl7.app/sampleId
type: String
- name: pl7.app/cellId
type: String

# --- Domain ---
domain:
pl7.app/blockId: "..." # a unique identifier for the block run
pl7.app/rna-seq/batch-corrected": "false"

# --- Annotations ---
annotations:
pl7.app/label: "tSNE Dim1"

3. PCA Coordinates

The block generates a single p-column, pcvalue, for all principal components. The specific component is identified by the pc-num axis.

  • P-column name: pl7.app/rna-seq/pcvalue
  • Description: The principal component value for each cell. These are critical inputs for downstream clustering.
  • Requirement: Required.
  • Specification:
# --- Core Identity ---
name: pl7.app/rna-seq/pcvalue
valueType: Double

# --- Axes ---
axesSpec:
- name: pl7.app/sampleId
type: String
- name: pl7.app/sc/cellId
type: String
- name: pl7.app/rna-seq/pc-num
type: String

# --- Domain ---
domain:
pl7.app/blockId: "..." # a unique identifier for the block run
pl7.app/rna-seq/batch-corrected": "false"

# --- Annotations ---
annotations:
pl7.app/label: "Principal Component Value"

Summary of Exported P-Columns

P-Column NameDescriptionAxesRequirement
pl7.app/rna-seq/umap1, .../umap2, .../umap3UMAP coordinates per cell.[sampleId][cellId]Required
pl7.app/rna-seq/tsne1, .../tsne2, .../tsne3t-SNE coordinates per cell.[sampleId][cellId]Required
pl7.app/rna-seq/pcvaluePCA coordinate per cell.[sampleId][sc/cellId][pc-num]Required