Clonotype Enrichment Block Guide

This document outlines the standard inputs and outputs for a downstream block that performs clonotype enrichment analysis. By adhering to this standard, an enrichment block can seamlessly process VDJ datasets from any compliant clonotyping block and produce results that identify and rank clonotypes enriched through selection rounds, showing how their frequencies change across conditions.

Overview

The diagram below illustrates where an enrichment block fits in a typical VDJ analysis pipeline. It consumes a VDJ dataset and produces a new, augmented dataset with enrichment scores and analysis results.

 Blocks                                 Result pool
                                       ┌───────────
             ┌─────────────────────────┤
             │                         │
             v                         │
 ╔═══════════════════════╗   exports   │
 ║   Clonotyping Block   ║───────>─────┤ Abundance
 ╚═══════════════════════╝             │ -------------------------------
                                       │
                                       ├ [sampleId][clonotypeKey] -> abundance
             ┌─────────────────────────┤
             │                         │
             v                         │
 ╔═══════════════════════╗   exports   │
 ║   Enrichment Block    ║───────>─────┤ Enrichment Score and frequencies
 ╚═══════════════════════╝             │ ---------------------------------
                                       │
                                       ├ [clonotypeKey] -> pl7.app/vdj/maxEnrichment
                                       ├ [clonotypeKey] -> pl7.app/vdj/frequency
             ┌─────────────────────────┤
             │                         │
             v                         │
 ╔═══════════════════════╗   exports   │
 ║   Clustering Block    ║───────>─────┤ Cluster Abundance & Properties
 ╚═══════════════════════╝             │ --------------------------------
                                       │
                                       ├ [clusterId] -> cluster props
                                       ├ [sampleId][clusterId] -> abundance
                                       ├ [clonotypeKey][clusterId] -> 1 (linker)
             ┌─────────────────────────┤
             │                         │
             v                         │
 ╔═══════════════════════╗   exports   │
 ║   Enrichment Block    ║───────>─────┤ Enrichment Score and frequencies
 ╚═══════════════════════╝             │ ---------------------------------
                                       │
                                       ├ [clusterId] -> pl7.app/vdj/maxEnrichment
                                       ├ [clusterId] -> pl7.app/vdj/frequency
             ┌─────────────────────────┤
             │                         │
             v                         │
 ╔═══════════════════════╗   exports   │
 ║  Downstream Blocks    ║───────>─────┤ (Downstream Results)
 ╚═══════════════════════╝             │ --------------------
                                       │

Inputs

A standard enrichment block operates on a VDJ dataset (either bulk or single-cell) or a clustered VDJ dataset. The developer implementing an enrichment block should ensure it can consume the following p-columns from an upstream clonotyping or clustering block:

Abundance Reference (abundanceRef): A reference to a p-column with axes [pl7.app/sampleId][any] where second axis, [any], can be any identifier (e.g., pl7.app/vdj/clonotypeKey, pl7.app/vdj/scClonotypeKey, pl7.app/vdj/clusterId, or any).
This p-column must have these annotations:
- pl7.app/isAbundance: "true"
- pl7.app/abundance/normalized: "false"
- pl7.app/abundance/isPrimary: "true"
Condition Column Reference (conditionColumnRef): A reference to a metadata p-column with the same axes as the abundance column, containing condition/sample identifiers.
Condition Order (conditionOrder): An array of strings defining the order of conditions for enrichment analysis. Values are obtained from the conditionColumnRef column.
Downsampling Parameters (downsampling): Optional parameters for normalizing abundance data across conditions with different sequencing depths.

Input Dataset Flexibility

The block automatically works with different types of input datasets based on the second axis of the abundance p-column:

Individual Clonotypes: When the second axis is pl7.app/vdj/clonotypeKey (bulk) or pl7.app/vdj/scClonotypeKey (single-cell)
Clonotype Clusters: When the second axis is pl7.app/vdj/clusterId
Any: TO BE FILLED AND EXPLAIN FUTURE DATASETS CAN BE ADDED HERE

This flexibility allows the block to perform enrichment analysis on both individual clonotypes and clustered data without requiring different configurations.

Exports

An enrichment block ingests a VDJ dataset and produces a new, augmented p-frame. The core of this output is a set of p-columns that describe the enrichment analysis results.

Enrichment scores

These p-columns describe the enrichment analysis results for each clonotype or cluster.

1. Maximum enrichment (`pl7.app/vdj/maxEnrichment`)

Description: The primary enrichment score representing the maximum enrichment value for each clonotype/cluster across all condition comparisons.
Specification:

name: pl7.app/vdj/maxEnrichment
valueType: Double
axesSpec:
  - name: pl7.app/vdj/clonotypeKey # or scClonotypeKey or clusterId
    type: String
domain:
  pl7.app/downsampling: [canonical.encode(downsampling)] # Json formatted map with selected downsampling parameters. E.g.: {\"type\":\"hypergeometric\",\"valueChooser\":\"auto\"}
  pl7.app/conditionsOrder: [canonical.encode(condition_order)]
  pl7.app/blockId: [blockId]
annotations:
  pl7.app/label: "Maximum Clonotype Enrichment" # or "Cluster"
  pl7.app/min: [minValue] # Minimum enrichment in data
  pl7.app/max: [maxValue] # Maxium enrichment in data
  pl7.app/isScore: "true"
  pl7.app/score/rankValues: "increasing"
  pl7.app/score/defaultCutoff: [75th_percentile_value] # 75th percentyle of all enrichment values

2. Frequency per condition (`pl7.app/vdj/frequency`)

Description: The frequency/abundance of each clonotype/cluster in each experimental condition.
Specification:

name: pl7.app/vdj/frequency
valueType: Double
axesSpec:
  - name: pl7.app/vdj/clonotypeKey # or scClonotypeKey or clusterId
    type: String
domain:
  pl7.app/downsampling: [canonical.encode(downsampling)] # Json formatted map with selected downsampling parameters. E.g.: {\"type\":\"hypergeometric\",\"valueChooser\":\"auto\"}
  pl7.app/condition: [condition_name]
  pl7.app/blockId: [blockId]
annotations:
  pl7.app/label: "Clonotype Frequency [condition_name]" # or Cluster

Summary of standard exports

The following table provides a summary of all standard p-columns that a developer can expect to be produced by a compliant enrichment block.

P-Column Name	Description	Axes
Enrichment Scores
`pl7.app/vdj/maxEnrichment`	Maximum enrichment score for each clonotype/cluster.	`[scClonotypeKey/clonotypeKey/clusterId]`
Frequency Data
`pl7.app/vdj/frequency`	Frequency/abundance per condition.	`[scClonotypeKey/clonotypeKey/clusterId]`

Overview​

Inputs​

Input Dataset Flexibility​

Exports​

Enrichment scores​

1. Maximum enrichment (pl7.app/vdj/maxEnrichment)​

2. Frequency per condition (pl7.app/vdj/frequency)​

Summary of standard exports​