Bulk Clonotyping

This document outlines the standard for VDJ datasets generated from bulk sequencing data. Upstream clonotyping blocks that process bulk FASTQ files (e.g. from targeted library sequencing or bulk RNA-Seq) should produce a p-frame containing the p-columns defined here. This ensures that downstream tools for analysis, visualization, and comparison can operate on a consistent and predictable data structure. See the P-frames and p-columns guide for more foundational information.

The MiXCR Clonotyping block is the reference implementation for this standard.

Overview

The diagram below illustrates a typical user flow involving a clonotyping.

 Blocks                                 Result pool
                                       ┌───────────
             ┌─────────────────────────┤
             │                         │
             v                         │
 ╔═══════════════════════╗   exports   │
 ║    Samples & Data     ║───────>─────┤ Sequencing Dataset
 ╚═══════════════════════╝             │ ------------------
                                       │
                                       ├ [sampleId](readIndex)(lane) -> file
             ┌─────────────────────────┤
             │                         │
             v                         │
 ╔═══════════════════════╗   exports   │
 ║   Clonotyping Block   ║───────>─────┤ Abundance & Properties (per chain)
 ╚═══════════════════════╝             │ ------------------------------------
                                       │
                                       ├ [sampleId][clonotypeKey] -> number
                                       ├ [clonotypeKey] -> property
             ┌─────────────────────────┤
             │                         │
             v                         │
 ╔═══════════════════════╗             │
 ║  Downstream Analysis  ║             │
 ╚═══════════════════════╝             │

Samples & Data: The flow starts with "Samples & Data" a universal entry point for importing and organizing raw sequencing data. It produces a p-frame where each sample's FASTQ/FASTA files are keyed by sampleId and optionally readIndex, lane and others.
Clonotyping Block: This block takes the sequencing dataset as input. It performs VDJ clonotyping analysis (e.g., using MiXCR) and generates one or more standardized VDJ datasets as its primary output, one for each immunological chain selected for analysis (e.g., one for IGH, one for IGK, etc.). Each dataset consists of two main types of p-columns:
- Abundance p-columns: These have a composite key of [sampleId][clonotypeKey] and store quantitative data, such as the read count for each clonotype in each sample.
- Property p-columns: These are keyed only by [clonotypeKey] and store descriptive attributes of the clonotype, such as its CDR3 sequence or V/J gene calls.
Downstream Blocks: Subsequent blocks (e.g., for clustering, diversity analysis, or visualization) consume one of the VDJ datasets. They use a special "anchor" p-column to easily identify the dataset and then access the various abundance and property p-columns for their specific analysis.

Core structure: axes and p-columns

A standard bulk VDJ dataset is a p-frame composed of p-columns that describe clonotypes and their abundance. The structure of these p-columns hinges on two primary axes.

Primary axes

Axis Name	Type	Description
`pl7.app/sampleId`	`String`	Uniquely identifies the sample from which the data was derived. This is typically inherited from the upstream "Samples & Data" block.
`pl7.app/vdj/clonotypeKey`	`String`	A composite key that uniquely identifies a clonotype. Its structure is precisely defined by its `domain`, making it a self-contained and unambiguous identifier. See the detailed description below.

The `clonotypeKey` axis in detail

A crucial concept of the VDJ standard is that a clonotyping block produces a separate p-frame for each immunological chain (e.g., IGHeavy, TCRBeta, etc.) included in the analysis. This separation is enforced by the domain of the clonotypeKey axis.

This design is powerful because it allows downstream blocks to unambiguously select the exact data they need. For example, a liability prediction block designed to analyze antibody heavy chains can specifically query for the p-frame where the clonotypeKey domain is {"pl7.app/vdj/chain": "IGHeavy"} and ignore all other chains.

A typical AxisSpec for a clonotypeKey illustrates how this works (in YAML for readability):

name: pl7.app/vdj/clonotypeKey
type: String
domain:
  pl7.app/vdj/chain: "IGHeavy"
  pl7.app/vdj/clonotypingRunId: "7e2eac8d-95b5-47dc-adac-eb0455134e87"
  pl7.app/vdj/clonotypeKey/structure: "VDJRegion-nt-VGene-JGene-CGene"
annotations:
  pl7.app/label: "Clonotype ID"

See the P-Column Specification guide for more details on the AxisSpec structure.

The domain contains both biological and technical keys that guarantee uniqueness and provide context.

Biological context: `chain`

The pl7.app/vdj/chain domain is the most important for data interpretation.

Values: It must be one of the following: IGHeavy, IGLight, TCRAlpha, TCRBeta, TCRGamma, TCRDelta.
Purpose: It specifies the immunological chain for all clonotypes in the p-frame. This ensures that a clonotypeKey from a TCRBeta analysis cannot be accidentally joined with one from an IGHeavy analysis, as they will exist in separate p-frames with distinct clonotypeKey axes.

Technical uniqueness: `clonotypingRunId` and `structure`

"pl7.app/vdj/clonotypingRunId": A unique identifier for the specific execution of the clonotyping block. This prevents key collisions if the same block is run multiple times with different parameters on the same data.
"pl7.app/vdj/clonotypeKey/structure": A string that describes the gene features used to define the clonotype (e.g., which parts of the V/D/J genes and which feature sequences). This provides downstream blocks with a clear understanding of how the clonotype was defined.

Calculating the `clonotypeKey`

The clonotypeKey value itself must be a deterministic, unique identifier for the clonotype. It is the responsibility of the upstream block to generate this key. A reliable method is to compute a SHA1 hash of the clonotype's defining properties, such as the main sequence and key gene hits. This ensures that the exact same clonotype will always have the exact same key.

Clonotype labels

To provide a short, human-readable label for the clonotypeKey, a clonotyping block should also generate a pl7.app/label p-column. This column is keyed by pl7.app/vdj/clonotypeKey and has a specific format.

Format: The label should be a string prefixed with "C-", followed by 6-7 alphanumeric characters in upper case (e.g., "C-7VCA13" or "C-A4B1C9D").
Specification:

name: pl7.app/label
valueType: String
axesSpec:
  - name: pl7.app/vdj/clonotypeKey
    type: String
annotations:
  pl7.app/isLabel: "true"
  pl7.app/description: "A short, human-readable label for the clonotype."
  pl7.app/label: "Clonotype Label"

See the Label p-columns guide for a more detailed explanation of how labels work.

The anchor column

To facilitate discovery by downstream blocks, one of the primary abundance p-columns must be designated as the "anchor" for the dataset. This is done by including the annotation {"pl7.app/isAnchor": "true"} in its spec.

Guarantee: It is guaranteed that there is only one anchor p-column in a VDJ dataset.
Structure: The anchor always has the axes [pl7.app/sampleId][pl7.app/vdj/clonotypeKey].
Purpose: The anchor acts as a stable entry point for other blocks. For example, a "Clonotype Clustering" block doesn't need to know the exact name of the abundance column. It can simply ask the user to select an input dataset by searching the result pool for p-columns with the pl7.app/isAnchor: "true" annotation. Once the user selects the anchor, the block can inspect its axesSpec to find the sampleId and clonotypeKey axes and use them to retrieve all other associated property and abundance p-columns from the same p-frame.

See the Standard Annotations documentation for more details on isAnchor.

Anchor column in practice: an example

To make the concept of an anchor p-column concrete, let's look at a simplified but realistic example of how a downstream "Clonotype Clustering" block would discover bulk VDJ datasets. The following TypeScript snippet from a block's model file shows how it populates UI dropdowns.

This example is derived from the real clustering block but has been simplified to focus only on the bulk data standard described in this document.

import { BlockModel } from "@platforma-sdk/model";

export const model = BlockModel.create()
  // ... (other model definitions)

  // Step 1: Discover available Bulk VDJ datasets for the UI dropdown.
  .output("datasetOptions", (ctx) =>
    ctx.resultPool.getOptions(
      // This matcher finds anchor p-columns for Bulk VDJ datasets by looking
      // for the specific axes and the `isAnchor` annotation.
      {
        axes: [
          { name: "pl7.app/sampleId" },
          { name: "pl7.app/vdj/clonotypeKey" },
        ],
        annotations: { "pl7.app/isAnchor": "true" },
      },
      {
        // This setting improves the UI by showing only the dataset label,
        // not the specific name of the anchor p-column (e.g. "Read Count").
        label: { includeNativeLabel: false },
      }
    )
  )

  // Step 2: Discover compatible sequences based on the selected bulk dataset.
  // This runs after the user has selected a dataset from the dropdown populated above.
  .output("sequenceOptions", (ctx) => {
    const anchorRef = ctx.args.datasetRef;
    if (anchorRef === undefined) return undefined;

    // Use `getCanonicalOptions` to perform an anchored query. This will find
    // sequence p-columns that are compatible with the selected anchor.
    return ctx.resultPool.getCanonicalOptions(
      // The anchor provides the context for the query
      { main: anchorRef },
      // The matcher defines what kind of p-columns we're looking for.
      {
        // This is the key part: it ensures the sequence p-column is keyed
        // by the *exact same clonotypeKey axis* as our anchor.
        axes: [{ anchor: "main", idx: 1 }],
        name: "pl7.app/vdj/sequence",
        domain: {
          "pl7.app/alphabet": ctx.args.sequenceType, // e.g., 'aminoacid'
        },
      }
    );
  })

  // ... (rest of the model)
  .done();

See Model API documentation for further examples.

This real-world example demonstrates the power of standardization:

The Clustering block can find any compliant bulk VDJ dataset by searching for an anchor p-column with the standard [sampleId][clonotypeKey] axis structure.
By using an anchored query (getCanonicalOptions), the block robustly discovers all compatible sequence p-columns associated with the selected dataset. This makes the system flexible and extensible, as new sequence features can be added by clonotyping blocks and will be automatically discovered by downstream tools without requiring code changes.

Abundance p-columns

Abundance p-columns quantify each clonotype within a specific sample. They are always keyed by [pl7.app/sampleId][pl7.app/vdj/clonotypeKey].

A core concept of this standard is the primary abundance. For any VDJ dataset, one abundance p-column is designated as "primary" using the pl7.app/abundance/isPrimary: "true" annotation. This is a critical feature for interoperability, as it allows downstream blocks to request the most relevant quantitative data without needing to know if the dataset was generated with or without UMIs.

The rule for determining the primary abundance is straightforward:

For datasets generated with UMIs, pl7.app/vdj/uniqueMoleculeCount is the primary abundance.
For datasets generated without UMIs, pl7.app/vdj/readCount is the primary abundance.

This simple standard ensures that downstream blocks can always find the most meaningful quantitative measure by querying for the isPrimary annotation.

See the Standard Annotations documentation for more details on abundance-related annotations.

The following p-columns are the standard abundance measures.

1. Read count

P-column name: pl7.app/vdj/readCount
Description: The raw number of sequencing reads assigned to the clonotype.
Requirement: Required. This column must always be present.
Specification: When no UMIs are used, readCount serves as the primary abundance and the dataset anchor.

# --- Core Identity ---
name: pl7.app/vdj/readCount
valueType: Long

# --- Axes ---
axesSpec:
  - name: pl7.app/sampleId
    type: String
  - name: pl7.app/vdj/clonotypeKey
    type: String

# --- Annotations ---
annotations:
  # --- Abundance & Discovery ---
  # Identifies this as the main entry point for the dataset.
  pl7.app/isAnchor: "true"
  # Marks this as a quantifiable abundance measure.
  pl7.app/isAbundance: "true"
  # Designates this as the primary abundance for this dataset.
  pl7.app/abundance/isPrimary: "true"
  # Specifies the unit of measurement.
  pl7.app/abundance/unit: "reads"
  # Indicates the value is a raw count, not a fraction.
  pl7.app/abundance/normalized: "false"

  pl7.app/description: "The raw number of sequencing reads in the sample assigned to the clonotype."
  # --- UI & Formatting ---
  pl7.app/label: "Number Of Reads"
  # Controls sort order in tables (higher is further left).
  pl7.app/table/orderPriority: "90000"
  # Controls default visibility in tables.
  pl7.app/table/visibility: "default"
  # A hint that the minimum value is 1.
  pl7.app/min: "1"

2. Read fraction

P-column name: pl7.app/vdj/readFraction
Description: The fraction of total reads in the sample assigned to the clonotype.
Requirement: Required. This column must always be present.
Specification:

name: pl7.app/vdj/readFraction
valueType: Double
axesSpec:
  # ... (same as readCount)
annotations:
  # --- Abundance & Discovery ---
  pl7.app/isAbundance: "true"
  # For non-UMI data, this is also considered a primary measure, but normalized.
  pl7.app/abundance/isPrimary: "true"
  pl7.app/abundance/unit: "reads"
  pl7.app/abundance/normalized: "true"

  pl7.app/description: "The fraction of total reads in the sample assigned to the clonotype."
  # --- UI & Formatting ---
  pl7.app/label: "Fraction of Reads"
  pl7.app/table/orderPriority: "89000"
  pl7.app/table/visibility: "default"
  # Specifies the value is a percentage-like fraction.
  pl7.app/format: ".2p"
  pl7.app/min: "0"
  pl7.app/max: "1"

3. UMI count

P-column name: pl7.app/vdj/uniqueMoleculeCount
Description: The number of unique molecules (UMIs) assigned to the clonotype.
Requirement: Conditional. Must be present if and only if the data was generated with UMIs.
Specification: When UMIs are present, this column becomes the primary abundance and the dataset anchor.

name: pl7.app/vdj/uniqueMoleculeCount
valueType: Long
axesSpec:
  # ... (same as readCount)
annotations:
  # --- Abundance & Discovery ---
  # With UMIs, this becomes the anchor.
  pl7.app/isAnchor: "true"
  pl7.app/isAbundance: "true"
  # Designates this as the primary abundance.
  pl7.app/abundance/isPrimary: "true"
  pl7.app/abundance/unit: "molecules"
  pl7.app/abundance/normalized: "false"

  pl7.app/description: "The number of unique molecules (UMIs) in the sample assigned to the clonotype."
  # --- UI & Formatting ---
  pl7.app/label: "Number of UMIs"
  pl7.app/table/orderPriority: "88000"
  pl7.app/table/visibility: "default"
  pl7.app/min: "1"

4. UMI fraction

P-column name: pl7.app/vdj/uniqueMoleculeFraction
Description: The fraction of total UMIs in the sample assigned to the clonotype.
Requirement: Conditional. Must be present if and only if the data was generated with UMIs.
Specification:

name: pl7.app/vdj/uniqueMoleculeFraction
valueType: Double
axesSpec:
  # ... (same as readCount)
annotations:
  # --- Abundance & Discovery ---
  pl7.app/isAbundance: "true"
  pl7.app/abundance/isPrimary: "true"
  pl7.app/abundance/unit: "molecules"
  pl7.app/abundance/normalized: "true"

  pl7.app/description: "The fraction of total UMIs in the sample assigned to the clonotype."
  # --- UI & Formatting ---
  pl7.app/label: "Fraction of UMIs"
  pl7.app/table/orderPriority: "87500"
  pl7.app/table/visibility: "default"
  pl7.app/format: ".2p"
  pl7.app/min: "0"
  pl7.app/max: "1"

Querying for abundance: examples

The true power of this standard is revealed when downstream blocks need to find and use abundance data without knowing the specifics of the upstream block that generated it. The following examples show how to reliably query for primary abundance p-columns in both the model (TypeScript) and the workflow (Tengo).

Model: Populating a UI with normalized abundance

This TypeScript example shows how a block's model can find all normalized, primary abundance p-columns across all available datasets. This is useful for populating a UI dropdown that allows a user to select a frequency measure for visualization.

// In a block's model file (`/model/src/index.ts`)

import { BlockModel } from "@platforma-sdk/model";

export const model = BlockModel.create()
  // ...
  .output("normalizedAbundanceOptions", (ctx) =>
    ctx.resultPool.getOptions({
      // The core of the query:
      annotations: {
        "pl7.app/isAbundance": "true",
        "pl7.app/abundance/normalized": "true",
        "pl7.app/abundance/isPrimary": "true",
      },
    })
  )
  //...
  .done();

See Model API documentation for further examples.

Workflow: Using abundance counts for calculations

This Tengo example shows how a workflow, having been given an inputAnchor dataset by the user, can reliably fetch the associated non-normalized, primary abundance p-column. This raw count data can then be used for statistical calculations.

// In a block's workflow file (`/workflow/src/main.tpl.tengo`)

wf := import("@platforma-sdk/workflow-tengo:workflow")

wf.prepare(func(args) {
	bundleBuilder := wf.createPBundleBuilder()
	bundleBuilder.addAnchor("main", args.inputAnchor)

	// Add the primary, non-normalized abundance column to the bundle.
	// The platform will find the correct p-column (readCount or uniqueMoleculeCount)
	// by matching the annotations against the provided anchor.
	bundleBuilder.addSingle({
		axes: [ { anchor: "main", idx: 0 }, { anchor: "main", idx: 1 }],
		annotations: {
				"pl7.app/isAbundance": "true",
				"pl7.app/abundance/normalized": "false",
				"pl7.app/abundance/isPrimary": "true"
			}
		},
		"primaryAbundance")

	return {
		bundle: bundleBuilder.build()
	}
})

wf.body(func(args) {
	// ...
	// The `primaryAbundance` p-column is now available
	// in the `args.bundle` bundle for processing.
	// ...
})

See Workflow API documentation for further examples.

Clonotype property p-columns

Property p-columns describe the intrinsic characteristics of a clonotype, independent of any single sample. They are always keyed by a single axis: pl7.app/vdj/clonotypeKey. The following sections detail the standard property columns generated by a clonotyping block.

1. Sequence features

This group of p-columns stores the nucleotide and amino acid sequences of various clonotype features (e.g., CDR3, VDJ region).

A crucial concept here is the main sequence. The clonotyping block is configured to define clonotypes based on a specific sequence, known as the "assembling feature" (e.g., the VDJRegion or just CDR3). The resulting p-column for this specific sequence is marked with two annotations:

pl7.app/vdj/isAssemblingFeature: "true"
pl7.app/vdj/isMainSequence: "true"

This allows downstream blocks to unambiguously identify the most important sequence data that defines the clonotype.

Feature sequence (`pl7.app/vdj/sequence`)

Description: The sequence of a specific feature. The domain is critical, as it specifies both the alphabet (aminoacid or nucleotide) and the feature (e.g., CDR3, FR1, VDJRegion).
Specification (Main sequence example):

# --- Core Identity ---
name: pl7.app/vdj/sequence
valueType: String

# --- Axes ---
axesSpec:
  - name: pl7.app/vdj/clonotypeKey
    type: String

# --- Domain (defines which sequence this is) ---
domain:
  pl7.app/vdj/feature: "VDJRegion"
  pl7.app/alphabet": "nucleotide"

# --- Annotations ---
annotations:
  # --- Discovery ---
  pl7.app/vdj/isAssemblingFeature: "true"
  pl7.app/vdj/isMainSequence: "true"

  pl7.app/description: "The sequence of a specific feature (e.g., CDR3, VDJRegion)."
  # --- UI & Formatting ---
  pl7.app/label: "VDJRegion nt"
  pl7.app/table/fontFamily: "monospace"
  pl7.app/table/visibility: "default"
  pl7.app/table/orderPriority: "78300"

Other common sequence-related p-columns include:

pl7.app/vdj/sequenceLength: The length of a feature sequence. Its domain links it to the corresponding sequence p-column.
pl7.app/vdj/sequence/productive: A flag ("True" or "False") indicating whether the main sequence is productive (in-frame and without stop codons).

2. V(D)J gene hits

These columns identify the best-matching V, D, J, and C gene segments. The standard provides two variants for each: geneHit (e.g., TRBV7-2) and geneHitWithAllele (e.g., TRBV7-2*01). The pattern is the same for all segments; only the domain changes.

Gene hit (`pl7.app/vdj/geneHit` / `pl7.app/vdj/geneHitWithAllele`)

Description: The name of the best-matching gene segment.
Specification (V-gene example):

# Using geneHit as the example
name: pl7.app/vdj/geneHit
valueType: String
axesSpec:
  - name: pl7.app/vdj/clonotypeKey
    type: String
domain:
  # Specifies the gene segment type. Can be VGene, DGene, JGene, or CGene.
  pl7.app/vdj/reference: "VGene"
annotations:
  pl7.app/description: "The name of the best-matching V gene."
  pl7.app/label: "Best V gene"
  # Allows this column to be used as a filter in the UI.
  pl7.app/isDiscreteFilter: "true"
  pl7.app/table/visibility: "default"

3. Somatic hypermutations (SHM)

These p-columns store the number or rate of mutations within a given gene region (e.g., the V gene), which is essential for analyzing affinity maturation. Both nucleotide (n...) and amino acid (aa...) variants exist.

These columns are often used to rank clonotypes. To facilitate this, they should be annotated as "score" columns, specifying that a higher number of mutations is typically of greater interest.

Mutation count (`pl7.app/vdj/sequence/nMutationsCount`)

Description: The number of nucleotide mutations in a specified gene region relative to the germline sequence.
Specification (V-gene example):

name: pl7.app/vdj/sequence/nMutationsCount
valueType: Int
axesSpec:
  - name: pl7.app/vdj/clonotypeKey
    type: String
domain:
  # Specifies the gene region where mutations were counted (V or J).
  pl7.app/vdj/gene: "V"
annotations:
  pl7.app/description: "The number of nucleotide mutations in a specified gene region relative to the germline sequence."
  pl7.app/label: "Nt mutations count in V gene"
  pl7.app/table/visibility: "optional"
  # --- Score & Ranking Annotations ---
  pl7.app/isScore: "true"
  pl7.app/score/rankingOrder: "decreasing" # Higher mutation count is ranked higher

See the Standard Annotations guide for more on score and ranking annotations.

4. Other key properties

This section includes other important high-level properties of the clonotypes.

pl7.app/label: A default human-readable label for the clonotype, often derived from its key features.
pl7.app/vdj/isotype: The isotype of the receptor (e.g., IgG1, IgM). Primarily for B-cell receptors.
pl7.app/vdj/chain: The receptor chain type (e.g. IGH, TRB). This provides a high-level grouping for the clonotype.

5. Cross-sample aggregations

These p-columns provide summary statistics for each clonotype across all samples in the dataset. They are particularly useful for identifying clonotypes that are abundant or prevalent across the entire cohort. These are keyed by pl7.app/vdj/clonotypeKey.

Number of samples (`pl7.app/vdj/sampleCount`)

Description: The number of samples in which a given clonotype is detected (i.e., has an abundance > 0).
Requirement: Required.
Specification:

name: pl7.app/vdj/sampleCount
valueType: Int
axesSpec:
  - name: pl7.app/vdj/clonotypeKey
    type: String
annotations:
  pl7.app/description: "The number of samples in which a given clonotype is detected."
  pl7.app/label: "Number of Samples"
  pl7.app/isAbundance: "true"
  pl7.app/abundance/unit: "samples"
  pl7.app/abundance/normalized: "false"
  pl7.app/table/visibility: "default"
  pl7.app/table/orderPriority: "87110"
  pl7.app/min: "1"

Total abundance (`.../readCountTotal` or `.../uniqueMoleculeCountTotal`)

Description: The sum of a clonotype's abundance across all samples. This is calculated for the primary abundance measure. If UMIs were used, this will be uniqueMoleculeCountTotal; otherwise, it will be readCountTotal.
Requirement: Required. One of the two variants must be present.
Specification (read count example):

name: pl7.app/vdj/readCountTotal
valueType: Long
axesSpec:
  - name: pl7.app/vdj/clonotypeKey
    type: String
annotations:
  pl7.app/description: "The sum of a clonotype's read counts across all samples."
  pl7.app/label: "Supporting Reads"
  pl7.app/isAbundance: "true"
  pl7.app/abundance/unit: "reads"
  pl7.app/abundance/normalized: "false"
  pl7.app/table/visibility: "default"
  pl7.app/table/orderPriority: "87120"
  pl7.app/min: "1"

Mean abundance fraction (`.../readFractionMean` or `.../uniqueMoleculeFractionMean`)

Description: The average fraction of a clonotype across all samples in which it was observed. This is calculated for the primary abundance measure. If UMIs were used, this will be uniqueMoleculeFractionMean; otherwise, it will be readFractionMean.
Requirement: Required. One of the two variants must be present.
Specification (read fraction example):

name: pl7.app/vdj/readFractionMean
valueType: Double
axesSpec:
  - name: pl7.app/vdj/clonotypeKey
    type: String
annotations:
  pl7.app/description: "The average fraction of a clonotype across all samples in which it was observed."
  pl7.app/label: "Mean Fraction of Reads"
  pl7.app/isAbundance: "true"
  pl7.app/abundance/unit: "reads"
  pl7.app/abundance/normalized: "true"
  pl7.app/table/visibility: "default"
  pl7.app/table/orderPriority: "87130"
  pl7.app/format: ".2p"
  pl7.app/min: "0"
  pl7.app/max: "1"

Querying for property p-columns: examples

The following examples show how to reliably query for specific property p-columns in both the model (TypeScript) and the workflow (Tengo).

Model: Finding all amino acid sequences in a dataset

This TypeScript example shows how a block's model, given a dataset anchor (inputAnchor), can find all associated amino acid sequence p-columns. This is useful for populating a UI dropdown that allows a user to select a sequence for clustering or visualization.

// In a block's model file (`/model/src/index.ts`)

import { BlockModel } from "@platforma-sdk/model";

export const model = BlockModel.create()
  // ...
  .output("sequenceOptions", (ctx) => {
    const anchorRef = ctx.args.inputAnchor;
    if (anchorRef === undefined) return undefined;

    // Use `getCanonicalOptions` to perform an anchored query.
    return ctx.resultPool.getCanonicalOptions(
      // The anchor provides the context for the query.
      { main: anchorRef },
      // The matcher defines what kind of p-columns we're looking for.
      {
        // This is the key part: it ensures the p-column is keyed
        // by the *exact same clonotypeKey axis* as our anchor.
        axes: [{ anchor: "main", idx: 1 }],
        name: "pl7.app/vdj/sequence",
        domain: {
          "pl7.app/alphabet": "aminoacid",
        },
      }
    );
  })
  //...
  .done();

See Model API documentation for further examples.

Workflow: Using V and J genes for calculations

This Tengo example shows how a workflow, having been given an inputAnchor, can reliably fetch the associated V and J gene hit p-columns to use in processing.

// In a block's workflow file (`/workflow/src/main.tpl.tengo`)

wf := import("@platforma-sdk/workflow-tengo:workflow")

wf.prepare(func(args) {
	bundleBuilder := wf.createPBundleBuilder()
	bundleBuilder.addAnchor("main", args.inputAnchor)

	// Add the V and J gene hit columns to the bundle.
	for gene in ["V", "J"] {
		bundleBuilder.addSingle({
			axes: [{ anchor: "main", idx: 1 }],
			name: "pl7.app/vdj/geneHit",
			domain: {
				"pl7.app/vdj/reference": gene + "Gene",
			}
		},
		gene + "GeneHit")
	}

	return {
		bundle: bundleBuilder.build()
	}
})

wf.body(func(args) {
	// ...
	// The `VGeneHit` and `JGeneHit` p-columns are now available
	// in the `args.bundle` bundle for processing.
	// ...
})

See Workflow API documentation for further examples.

Summary of standard p-columns

The following table provides a summary of all standard p-columns that a developer can expect to be produced by a compliant bulk VDJ clonotyping block. Downstream block developers can use this as a quick reference to see what data is available for their tools.

P-Column Name	Description	Axes	Requirement
Abundance P-Columns
`pl7.app/vdj/readCount`	The raw number of sequencing reads assigned to the clonotype.	`[sampleId][clonotypeKey]`	Required
`pl7.app/vdj/readFraction`	The fraction of total reads in the sample assigned to the clonotype.	`[sampleId][clonotypeKey]`	Required
`pl7.app/vdj/uniqueMoleculeCount`	The number of unique molecules (UMIs) assigned to the clonotype.	`[sampleId][clonotypeKey]`	Conditional
`pl7.app/vdj/uniqueMoleculeFraction`	The fraction of total UMIs in the sample assigned to the clonotype.	`[sampleId][clonotypeKey]`	Conditional
Property P-Columns
`pl7.app/vdj/sequence`	The sequence of a specific feature (e.g., CDR3, VDJRegion).	`[clonotypeKey]`	Required¹
`pl7.app/vdj/sequenceLength`	The length of a feature sequence.	`[clonotypeKey]`	Optional
`pl7.app/vdj/sequence/productive`	A flag indicating whether the main sequence is productive (in-frame and without stop codons).	`[clonotypeKey]`	Required
`pl7.app/vdj/geneHit`	The name of the best-matching V/D/J/C gene segment, without allele information.	`[clonotypeKey]`	Required²
`pl7.app/vdj/geneHitWithAllele`	The name of the best-matching V/D/J/C gene segment, including allele information.	`[clonotypeKey]`	Optional
`pl7.app/vdj/sequence/nMutationsCount`	The number of nucleotide mutations in a specified gene region relative to the germline sequence.	`[clonotypeKey]`	Optional
`pl7.app/vdj/sequence/aaMutationsRate`	The rate of amino acid mutations in a specified gene region relative to the germline.	`[clonotypeKey]`	Optional
`pl7.app/vdj/isotype`	The isotype of the receptor chain (e.g., `IgG1`, `IgM`).	`[clonotypeKey]`	Conditional³
`pl7.app/vdj/chain`	The high-level receptor chain type (e.g., `IGH`, `TRB`).	`[clonotypeKey]`	Required
`pl7.app/label`	A short, human-readable label for the clonotype.	`[clonotypeKey]`	Optional
`pl7.app/vdj/sampleCount`	The number of samples in which the clonotype is detected.	`[clonotypeKey]`	Required
`pl7.app/vdj/readCountTotal`	The sum of a clonotype's read counts across all samples.	`[clonotypeKey]`	Conditional⁴
`pl7.app/vdj/uniqueMoleculeCountTotal`	The sum of a clonotype's UMI counts across all samples.	`[clonotypeKey]`	Conditional⁴
`pl7.app/vdj/readFractionMean`	The average read fraction of a clonotype across all samples where it is present.	`[clonotypeKey]`	Conditional⁴
`pl7.app/vdj/uniqueMoleculeFractionMean`	The average UMI fraction of a clonotype across all samples where it is present.	`[clonotypeKey]`	Conditional⁴

¹ At a minimum, the sequence p-column designated as the isMainSequence is required. Additional sequence p-columns (e.g., for other CDRs or Framework Regions) are optional. ² The geneHit p-column is required for V and J genes. For D and C genes, it is optional. The geneHitWithAllele is always optional. ³ Required for B-cell receptor chains (IGHeavy, IGLight). ⁴ One of readCountTotal or uniqueMoleculeCountTotal is required. Similarly, one of readFractionMean or uniqueMoleculeFractionMean is required. These are determined by the primary abundance metric (reads vs. UMIs).

Overview​

Core structure: axes and p-columns​

Primary axes​

The clonotypeKey axis in detail​

Biological context: chain​

Technical uniqueness: clonotypingRunId and structure​

Calculating the clonotypeKey​

Clonotype labels​

The anchor column​

Anchor column in practice: an example​

Abundance p-columns​

1. Read count​

2. Read fraction​

3. UMI count​

4. UMI fraction​

Querying for abundance: examples​

Model: Populating a UI with normalized abundance​

Workflow: Using abundance counts for calculations​

Clonotype property p-columns​

1. Sequence features​

Feature sequence (pl7.app/vdj/sequence)​

2. V(D)J gene hits​

Gene hit (pl7.app/vdj/geneHit / pl7.app/vdj/geneHitWithAllele)​

3. Somatic hypermutations (SHM)​

Mutation count (pl7.app/vdj/sequence/nMutationsCount)​

4. Other key properties​

5. Cross-sample aggregations​

Number of samples (pl7.app/vdj/sampleCount)​

Total abundance (.../readCountTotal or .../uniqueMoleculeCountTotal)​

Mean abundance fraction (.../readFractionMean or .../uniqueMoleculeFractionMean)​

Querying for property p-columns: examples​

Model: Finding all amino acid sequences in a dataset​

Workflow: Using V and J genes for calculations​

Summary of standard p-columns​

Overview

Core structure: axes and p-columns

Primary axes

The `clonotypeKey` axis in detail

Biological context: `chain`

Technical uniqueness: `clonotypingRunId` and `structure`

Calculating the `clonotypeKey`

Clonotype labels

The anchor column

Anchor column in practice: an example

Abundance p-columns

1. Read count

2. Read fraction

3. UMI count

4. UMI fraction

Querying for abundance: examples

Model: Populating a UI with normalized abundance

Workflow: Using abundance counts for calculations

Clonotype property p-columns

1. Sequence features

Feature sequence (`pl7.app/vdj/sequence`)

2. V(D)J gene hits

Gene hit (`pl7.app/vdj/geneHit` / `pl7.app/vdj/geneHitWithAllele`)

3. Somatic hypermutations (SHM)

Mutation count (`pl7.app/vdj/sequence/nMutationsCount`)

4. Other key properties

5. Cross-sample aggregations

Number of samples (`pl7.app/vdj/sampleCount`)

Total abundance (`.../readCountTotal` or `.../uniqueMoleculeCountTotal`)

Mean abundance fraction (`.../readFractionMean` or `.../uniqueMoleculeFractionMean`)

Querying for property p-columns: examples

Model: Finding all amino acid sequences in a dataset

Workflow: Using V and J genes for calculations

Summary of standard p-columns