Dimensionality Reduction
Overview
This block takes high-dimensional gene expression data and transforms it into a lower-dimensional space while preserving the most important biological variation. The block takes the output from any scRNA-seq preprocessing block (e.g., Cell Ranger) as input. It generates plots with tSNE and UMAP projections to aid dataset exploration and outputs dimension values to be used by downstream blocks (e.g. Leiden Clustering, Cell-type Annotation).
Pipeline context
This block is one of the first steps after pre-processing in a typical scRNA-seq analysis workflow.
Blocks Result pool
┌───────────
┌─────────────────────────┤
│ │
v │
╔═══════════════════════════╗ exports │
║ scRNA-seq Preprocessing ║────->─-─┤ Count Matrices, Gene & Cell Properties
╚═══════════════════════════╝ │ --------------------------------------
│
├ [sampleId][cellId][geneId] -> raw & normalized counts
┌─────────────────────────┤
│ │
v │
╔═══════════════════════════╗ exports │
║ Dimensionality Reduction ║────->─-─┤ UMAP, t-SNE, PCA embeddings
╚═══════════════════════════╝ │ -----------------------------
│
├ [sampleId][cellId] -> umap1, umap2, ...
┌─────────────────────────┤
│ │
v │
╔═══════════════════════╗ │
║ Downstream Analysis ║ │
╚═══════════════════════╝ │
Core structure: axes and p-columns
The block consumes a raw count matrix and produces p-frames containing UMAP, t-SNE, and PCA embeddings.
Primary axes
| Axis Name | Type | Description |
|---|---|---|
pl7.app/sampleId | String | Uniquely identifies the sample. |
pl7.app/cellId | String | Uniquely identifies a single cell within a sample. |
Input P-Columns
The block requires a raw gene expression count matrix as input.
1. Raw Counts
- P-column name:
pl7.app/rna-seq/countMatrix - Description: The raw number of reads (or UMIs) for each gene in each cell.
- Requirement: Required.
- Specification: The input p-column must have the
pl7.app/rna-seq/normalizeddomain key set to"false".
# --- Core Identity ---
name: pl7.app/rna-seq/countMatrix
valueType: Long
# --- Axes ---
axesSpec:
- name: pl7.app/sampleId
type: String
- name: pl7.app/sc/cellId
type: String
- name: pl7.app/geneId
type: String
# --- Domain ---
domain:
pl7.app/rna-seq/normalized: "false"
# --- Annotations ---
annotations:
pl7.app/label: "Raw Count Matrix"
Exported P-Columns
The block exports a single p-frame containing all the generated embeddings: UMAP, t-SNE, and PCA. Downstream blocks can consume this p-frame to access any of the calculated dimensions.
1. UMAP Coordinates
The block generates three p-columns for the UMAP dimensions.
- P-column names:
pl7.app/rna-seq/umap1,pl7.app/rna-seq/umap2,pl7.app/rna-seq/umap3 - Description: The coordinates for each cell in the UMAP embedding space.
- Requirement: Required.
- Specification (UMAP Dim1 example):
# --- Core Identity ---
name: pl7.app/rna-seq/umap1
valueType: Double
# --- Axes ---
axesSpec:
- name: pl7.app/sampleId
type: String
- name: pl7.app/cellId
type: String
# --- Domain ---
domain:
pl7.app/blockId: "..." # a unique identifier for the block run
pl7.app/rna-seq/batch-corrected": "false"
# --- Annotations ---
annotations:
pl7.app/label: "UMAP Dim1"
2. t-SNE Coordinates
The block generates three p-columns for the t-SNE dimensions.
- P-column names:
pl7.app/rna-seq/tsne1,pl7.app/rna-seq/tsne2,pl7.app/rna-seq/tsne3 - Description: The coordinates for each cell in the t-SNE embedding space.
- Requirement: Required.
- Specification (t-SNE Dim1 example):
# --- Core Identity ---
name: pl7.app/rna-seq/tsne1
valueType: Double
# --- Axes ---
axesSpec:
- name: pl7.app/sampleId
type: String
- name: pl7.app/cellId
type: String
# --- Domain ---
domain:
pl7.app/blockId: "..." # a unique identifier for the block run
pl7.app/rna-seq/batch-corrected": "false"
# --- Annotations ---
annotations:
pl7.app/label: "tSNE Dim1"
3. PCA Coordinates
The block generates a single p-column, pcvalue, for all principal components. The specific component is identified by the pc-num axis.
- P-column name:
pl7.app/rna-seq/pcvalue - Description: The principal component value for each cell. These are critical inputs for downstream clustering.
- Requirement: Required.
- Specification:
# --- Core Identity ---
name: pl7.app/rna-seq/pcvalue
valueType: Double
# --- Axes ---
axesSpec:
- name: pl7.app/sampleId
type: String
- name: pl7.app/sc/cellId
type: String
- name: pl7.app/rna-seq/pc-num
type: String
# --- Domain ---
domain:
pl7.app/blockId: "..." # a unique identifier for the block run
pl7.app/rna-seq/batch-corrected": "false"
# --- Annotations ---
annotations:
pl7.app/label: "Principal Component Value"
Summary of Exported P-Columns
| P-Column Name | Description | Axes | Requirement |
|---|---|---|---|
pl7.app/rna-seq/umap1, .../umap2, .../umap3 | UMAP coordinates per cell. | [sampleId][cellId] | Required |
pl7.app/rna-seq/tsne1, .../tsne2, .../tsne3 | t-SNE coordinates per cell. | [sampleId][cellId] | Required |
pl7.app/rna-seq/pcvalue | PCA coordinate per cell. | [sampleId][sc/cellId][pc-num] | Required |