Bulk Clonotyping
This document outlines the standard for VDJ datasets generated from bulk sequencing data. Upstream clonotyping blocks that process bulk FASTQ files (e.g. from targeted library sequencing or bulk RNA-Seq) should produce a p-frame containing the p-columns defined here. This ensures that downstream tools for analysis, visualization, and comparison can operate on a consistent and predictable data structure. See the P-frames and p-columns guide for more foundational information.
The MiXCR Clonotyping block is the reference implementation for this standard.
Overview
The diagram below illustrates a typical user flow involving a clonotyping.
Blocks Result pool
┌───────────
┌─────────────────────────┤
│ │
v │
╔═══════════════════════╗ exports │
║ Samples & Data ║───────>─────┤ Sequencing Dataset
╚═══════════════════════╝ │ ------------------
│
├ [sampleId](readIndex)(lane) -> file
┌─────────────────────────┤
│ │
v │
╔═══════════════════════╗ exports │
║ Clonotyping Block ║───────>─────┤ Abundance & Properties (per chain)
╚═══════════════════════╝ │ ------------------------------------
│
├ [sampleId][clonotypeKey] -> number
├ [clonotypeKey] -> property
┌─────────────────────────┤
│ │
v │
╔═══════════════════════╗ │
║ Downstream Analysis ║ │
╚═══════════════════════╝ │
- Samples & Data: The flow starts with "Samples & Data" a universal entry point for importing and organizing raw sequencing data. It produces a p-frame where each sample's FASTQ/FASTA files are keyed by
sampleIdand optionallyreadIndex,laneand others. - Clonotyping Block: This block takes the sequencing dataset as input. It performs VDJ clonotyping analysis (e.g., using MiXCR) and generates one or more standardized VDJ datasets as its primary output, one for each immunological chain selected for analysis (e.g., one for IGH, one for IGK, etc.). Each dataset consists of two main types of p-columns:
- Abundance p-columns: These have a composite key of
[sampleId][clonotypeKey]and store quantitative data, such as the read count for each clonotype in each sample. - Property p-columns: These are keyed only by
[clonotypeKey]and store descriptive attributes of the clonotype, such as its CDR3 sequence or V/J gene calls.
- Abundance p-columns: These have a composite key of
- Downstream Blocks: Subsequent blocks (e.g., for clustering, diversity analysis, or visualization) consume one of the VDJ datasets. They use a special "anchor" p-column to easily identify the dataset and then access the various abundance and property p-columns for their specific analysis.
Core structure: axes and p-columns
A standard bulk VDJ dataset is a p-frame composed of p-columns that describe clonotypes and their abundance. The structure of these p-columns hinges on two primary axes.
Primary axes
| Axis Name | Type | Description |
|---|---|---|
pl7.app/sampleId | String | Uniquely identifies the sample from which the data was derived. This is typically inherited from the upstream "Samples & Data" block. |
pl7.app/vdj/clonotypeKey | String | A composite key that uniquely identifies a clonotype. Its structure is precisely defined by its domain, making it a self-contained and unambiguous identifier. See the detailed description below. |
The clonotypeKey axis in detail
A crucial concept of the VDJ standard is that a clonotyping block produces a separate p-frame for each immunological chain (e.g., IGHeavy, TCRBeta, etc.) included in the analysis. This separation is enforced by the domain of the clonotypeKey axis.
This design is powerful because it allows downstream blocks to unambiguously select the exact data they need. For example, a liability prediction block designed to analyze antibody heavy chains can specifically query for the p-frame where the clonotypeKey domain is {"pl7.app/vdj/chain": "IGHeavy"} and ignore all other chains.
A typical AxisSpec for a clonotypeKey illustrates how this works (in YAML for readability):
name: pl7.app/vdj/clonotypeKey
type: String
domain:
pl7.app/vdj/chain: "IGHeavy"
pl7.app/vdj/clonotypingRunId: "7e2eac8d-95b5-47dc-adac-eb0455134e87"
pl7.app/vdj/clonotypeKey/structure: "VDJRegion-nt-VGene-JGene-CGene"
annotations:
pl7.app/label: "Clonotype ID"
See the P-Column Specification guide for more details on the AxisSpec structure.
The domain contains both biological and technical keys that guarantee uniqueness and provide context.
Biological context: chain
The pl7.app/vdj/chain domain is the most important for data interpretation.
- Values: It must be one of the following:
IGHeavy,IGLight,TCRAlpha,TCRBeta,TCRGamma,TCRDelta. - Purpose: It specifies the immunological chain for all clonotypes in the p-frame. This ensures that a
clonotypeKeyfrom aTCRBetaanalysis cannot be accidentally joined with one from anIGHeavyanalysis, as they will exist in separate p-frames with distinctclonotypeKeyaxes.
Technical uniqueness: clonotypingRunId and structure
"pl7.app/vdj/clonotypingRunId": A unique identifier for the specific execution of the clonotyping block. This prevents key collisions if the same block is run multiple times with different parameters on the same data."pl7.app/vdj/clonotypeKey/structure": A string that describes the gene features used to define the clonotype (e.g., which parts of the V/D/J genes and which feature sequences). This provides downstream blocks with a clear understanding of how the clonotype was defined.
Calculating the clonotypeKey
The clonotypeKey value itself must be a deterministic, unique identifier for the clonotype. It is the responsibility of the upstream block to generate this key. A reliable method is to compute a SHA1 hash of the clonotype's defining properties, such as the main sequence and key gene hits. This ensures that the exact same clonotype will always have the exact same key.
Clonotype labels
To provide a short, human-readable label for the clonotypeKey, a clonotyping block should also generate a pl7.app/label p-column. This column is keyed by pl7.app/vdj/clonotypeKey and has a specific format.
- Format: The label should be a string prefixed with
"C-", followed by 6-7 alphanumeric characters in upper case (e.g.,"C-7VCA13"or"C-A4B1C9D"). - Specification:
name: pl7.app/label
valueType: String
axesSpec:
- name: pl7.app/vdj/clonotypeKey
type: String
annotations:
pl7.app/isLabel: "true"
pl7.app/description: "A short, human-readable label for the clonotype."
pl7.app/label: "Clonotype Label"
See the Label p-columns guide for a more detailed explanation of how labels work.
The anchor column
To facilitate discovery by downstream blocks, one of the primary abundance p-columns must be designated as the "anchor" for the dataset. This is done by including the annotation {"pl7.app/isAnchor": "true"} in its spec.
- Guarantee: It is guaranteed that there is only one anchor p-column in a VDJ dataset.
- Structure: The anchor always has the axes
[pl7.app/sampleId][pl7.app/vdj/clonotypeKey]. - Purpose: The anchor acts as a stable entry point for other blocks. For example, a "Clonotype Clustering" block doesn't need to know the exact name of the abundance column. It can simply ask the user to select an input dataset by searching the result pool for p-columns with the
pl7.app/isAnchor: "true"annotation. Once the user selects the anchor, the block can inspect itsaxesSpecto find thesampleIdandclonotypeKeyaxes and use them to retrieve all other associated property and abundance p-columns from the same p-frame.
See the Standard Annotations documentation for more details on isAnchor.
Anchor column in practice: an example
To make the concept of an anchor p-column concrete, let's look at a simplified but realistic example of how a downstream "Clonotype Clustering" block would discover bulk VDJ datasets. The following TypeScript snippet from a block's model file shows how it populates UI dropdowns.
This example is derived from the real clustering block but has been simplified to focus only on the bulk data standard described in this document.
import { BlockModel } from "@platforma-sdk/model";
export const model = BlockModel.create()
// ... (other model definitions)
// Step 1: Discover available Bulk VDJ datasets for the UI dropdown.
.output("datasetOptions", (ctx) =>
ctx.resultPool.getOptions(
// This matcher finds anchor p-columns for Bulk VDJ datasets by looking
// for the specific axes and the `isAnchor` annotation.
{
axes: [
{ name: "pl7.app/sampleId" },
{ name: "pl7.app/vdj/clonotypeKey" },
],
annotations: { "pl7.app/isAnchor": "true" },
},
{
// This setting improves the UI by showing only the dataset label,
// not the specific name of the anchor p-column (e.g. "Read Count").
label: { includeNativeLabel: false },
}
)
)
// Step 2: Discover compatible sequences based on the selected bulk dataset.
// This runs after the user has selected a dataset from the dropdown populated above.
.output("sequenceOptions", (ctx) => {
const anchorRef = ctx.args.datasetRef;
if (anchorRef === undefined) return undefined;
// Use `getCanonicalOptions` to perform an anchored query. This will find
// sequence p-columns that are compatible with the selected anchor.
return ctx.resultPool.getCanonicalOptions(
// The anchor provides the context for the query
{ main: anchorRef },
// The matcher defines what kind of p-columns we're looking for.
{
// This is the key part: it ensures the sequence p-column is keyed
// by the *exact same clonotypeKey axis* as our anchor.
axes: [{ anchor: "main", idx: 1 }],
name: "pl7.app/vdj/sequence",
domain: {
"pl7.app/alphabet": ctx.args.sequenceType, // e.g., 'aminoacid'
},
}
);
})
// ... (rest of the model)
.done();
See Model API documentation for further examples.
This real-world example demonstrates the power of standardization:
- The Clustering block can find any compliant bulk VDJ dataset by searching for an anchor p-column with the standard
[sampleId][clonotypeKey]axis structure. - By using an anchored query (
getCanonicalOptions), the block robustly discovers all compatible sequence p-columns associated with the selected dataset. This makes the system flexible and extensible, as new sequence features can be added by clonotyping blocks and will be automatically discovered by downstream tools without requiring code changes.
Abundance p-columns
Abundance p-columns quantify each clonotype within a specific sample. They are always keyed by [pl7.app/sampleId][pl7.app/vdj/clonotypeKey].
A core concept of this standard is the primary abundance. For any VDJ dataset, one abundance p-column is designated as "primary" using the pl7.app/abundance/isPrimary: "true" annotation. This is a critical feature for interoperability, as it allows downstream blocks to request the most relevant quantitative data without needing to know if the dataset was generated with or without UMIs.
The rule for determining the primary abundance is straightforward:
- For datasets generated with UMIs,
pl7.app/vdj/uniqueMoleculeCountis the primary abundance. - For datasets generated without UMIs,
pl7.app/vdj/readCountis the primary abundance.
This simple standard ensures that downstream blocks can always find the most meaningful quantitative measure by querying for the isPrimary annotation.
See the Standard Annotations documentation for more details on abundance-related annotations.
The following p-columns are the standard abundance measures.
1. Read count
- P-column name:
pl7.app/vdj/readCount - Description: The raw number of sequencing reads assigned to the clonotype.
- Requirement: Required. This column must always be present.
- Specification: When no UMIs are used,
readCountserves as the primary abundance and the dataset anchor.
# --- Core Identity ---
name: pl7.app/vdj/readCount
valueType: Long
# --- Axes ---
axesSpec:
- name: pl7.app/sampleId
type: String
- name: pl7.app/vdj/clonotypeKey
type: String
# --- Annotations ---
annotations:
# --- Abundance & Discovery ---
# Identifies this as the main entry point for the dataset.
pl7.app/isAnchor: "true"
# Marks this as a quantifiable abundance measure.
pl7.app/isAbundance: "true"
# Designates this as the primary abundance for this dataset.
pl7.app/abundance/isPrimary: "true"
# Specifies the unit of measurement.
pl7.app/abundance/unit: "reads"
# Indicates the value is a raw count, not a fraction.
pl7.app/abundance/normalized: "false"
pl7.app/description: "The raw number of sequencing reads in the sample assigned to the clonotype."
# --- UI & Formatting ---
pl7.app/label: "Number Of Reads"
# Controls sort order in tables (higher is further left).
pl7.app/table/orderPriority: "90000"
# Controls default visibility in tables.
pl7.app/table/visibility: "default"
# A hint that the minimum value is 1.
pl7.app/min: "1"
2. Read fraction
- P-column name:
pl7.app/vdj/readFraction - Description: The fraction of total reads in the sample assigned to the clonotype.
- Requirement: Required. This column must always be present.
- Specification:
name: pl7.app/vdj/readFraction
valueType: Double
axesSpec:
# ... (same as readCount)
annotations:
# --- Abundance & Discovery ---
pl7.app/isAbundance: "true"
# For non-UMI data, this is also considered a primary measure, but normalized.
pl7.app/abundance/isPrimary: "true"
pl7.app/abundance/unit: "reads"
pl7.app/abundance/normalized: "true"
pl7.app/description: "The fraction of total reads in the sample assigned to the clonotype."
# --- UI & Formatting ---
pl7.app/label: "Fraction of Reads"
pl7.app/table/orderPriority: "89000"
pl7.app/table/visibility: "default"
# Specifies the value is a percentage-like fraction.
pl7.app/format: ".2p"
pl7.app/min: "0"
pl7.app/max: "1"
3. UMI count
- P-column name:
pl7.app/vdj/uniqueMoleculeCount - Description: The number of unique molecules (UMIs) assigned to the clonotype.
- Requirement: Conditional. Must be present if and only if the data was generated with UMIs.
- Specification: When UMIs are present, this column becomes the primary abundance and the dataset anchor.
name: pl7.app/vdj/uniqueMoleculeCount
valueType: Long
axesSpec:
# ... (same as readCount)
annotations:
# --- Abundance & Discovery ---
# With UMIs, this becomes the anchor.
pl7.app/isAnchor: "true"
pl7.app/isAbundance: "true"
# Designates this as the primary abundance.
pl7.app/abundance/isPrimary: "true"
pl7.app/abundance/unit: "molecules"
pl7.app/abundance/normalized: "false"
pl7.app/description: "The number of unique molecules (UMIs) in the sample assigned to the clonotype."
# --- UI & Formatting ---
pl7.app/label: "Number of UMIs"
pl7.app/table/orderPriority: "88000"
pl7.app/table/visibility: "default"
pl7.app/min: "1"
4. UMI fraction
- P-column name:
pl7.app/vdj/uniqueMoleculeFraction - Description: The fraction of total UMIs in the sample assigned to the clonotype.
- Requirement: Conditional. Must be present if and only if the data was generated with UMIs.
- Specification:
name: pl7.app/vdj/uniqueMoleculeFraction
valueType: Double
axesSpec:
# ... (same as readCount)
annotations:
# --- Abundance & Discovery ---
pl7.app/isAbundance: "true"
pl7.app/abundance/isPrimary: "true"
pl7.app/abundance/unit: "molecules"
pl7.app/abundance/normalized: "true"
pl7.app/description: "The fraction of total UMIs in the sample assigned to the clonotype."
# --- UI & Formatting ---
pl7.app/label: "Fraction of UMIs"
pl7.app/table/orderPriority: "87500"
pl7.app/table/visibility: "default"
pl7.app/format: ".2p"
pl7.app/min: "0"
pl7.app/max: "1"
Querying for abundance: examples
The true power of this standard is revealed when downstream blocks need to find and use abundance data without knowing the specifics of the upstream block that generated it. The following examples show how to reliably query for primary abundance p-columns in both the model (TypeScript) and the workflow (Tengo).
Model: Populating a UI with normalized abundance
This TypeScript example shows how a block's model can find all normalized, primary abundance p-columns across all available datasets. This is useful for populating a UI dropdown that allows a user to select a frequency measure for visualization.
// In a block's model file (`/model/src/index.ts`)
import { BlockModel } from "@platforma-sdk/model";
export const model = BlockModel.create()
// ...
.output("normalizedAbundanceOptions", (ctx) =>
ctx.resultPool.getOptions({
// The core of the query:
annotations: {
"pl7.app/isAbundance": "true",
"pl7.app/abundance/normalized": "true",
"pl7.app/abundance/isPrimary": "true",
},
})
)
//...
.done();
See Model API documentation for further examples.
Workflow: Using abundance counts for calculations
This Tengo example shows how a workflow, having been given an inputAnchor dataset by the user, can reliably fetch the associated non-normalized, primary abundance p-column. This raw count data can then be used for statistical calculations.
// In a block's workflow file (`/workflow/src/main.tpl.tengo`)
wf := import("@platforma-sdk/workflow-tengo:workflow")
wf.prepare(func(args) {
bundleBuilder := wf.createPBundleBuilder()
bundleBuilder.addAnchor("main", args.inputAnchor)
// Add the primary, non-normalized abundance column to the bundle.
// The platform will find the correct p-column (readCount or uniqueMoleculeCount)
// by matching the annotations against the provided anchor.
bundleBuilder.addSingle({
axes: [ { anchor: "main", idx: 0 }, { anchor: "main", idx: 1 }],
annotations: {
"pl7.app/isAbundance": "true",
"pl7.app/abundance/normalized": "false",
"pl7.app/abundance/isPrimary": "true"
}
},
"primaryAbundance")
return {
bundle: bundleBuilder.build()
}
})
wf.body(func(args) {
// ...
// The `primaryAbundance` p-column is now available
// in the `args.bundle` bundle for processing.
// ...
})
See Workflow API documentation for further examples.
Clonotype property p-columns
Property p-columns describe the intrinsic characteristics of a clonotype, independent of any single sample. They are always keyed by a single axis: pl7.app/vdj/clonotypeKey. The following sections detail the standard property columns generated by a clonotyping block.
1. Sequence features
This group of p-columns stores the nucleotide and amino acid sequences of various clonotype features (e.g., CDR3, VDJ region).
A crucial concept here is the main sequence. The clonotyping block is configured to define clonotypes based on a specific sequence, known as the "assembling feature" (e.g., the VDJRegion or just CDR3). The resulting p-column for this specific sequence is marked with two annotations:
pl7.app/vdj/isAssemblingFeature: "true"pl7.app/vdj/isMainSequence: "true"
This allows downstream blocks to unambiguously identify the most important sequence data that defines the clonotype.
Feature sequence (pl7.app/vdj/sequence)
- Description: The sequence of a specific feature. The
domainis critical, as it specifies both thealphabet(aminoacid or nucleotide) and thefeature(e.g.,CDR3,FR1,VDJRegion). - Specification (Main sequence example):
# --- Core Identity ---
name: pl7.app/vdj/sequence
valueType: String
# --- Axes ---
axesSpec:
- name: pl7.app/vdj/clonotypeKey
type: String
# --- Domain (defines which sequence this is) ---
domain:
pl7.app/vdj/feature: "VDJRegion"
pl7.app/alphabet": "nucleotide"
# --- Annotations ---
annotations:
# --- Discovery ---
pl7.app/vdj/isAssemblingFeature: "true"
pl7.app/vdj/isMainSequence: "true"
pl7.app/description: "The sequence of a specific feature (e.g., CDR3, VDJRegion)."
# --- UI & Formatting ---
pl7.app/label: "VDJRegion nt"
pl7.app/table/fontFamily: "monospace"
pl7.app/table/visibility: "default"
pl7.app/table/orderPriority: "78300"
Other common sequence-related p-columns include:
pl7.app/vdj/sequenceLength: The length of a feature sequence. Itsdomainlinks it to the corresponding sequence p-column.pl7.app/vdj/sequence/productive: A flag ("True"or"False") indicating whether the main sequence is productive (in-frame and without stop codons).
2. V(D)J gene hits
These columns identify the best-matching V, D, J, and C gene segments. The standard provides two variants for each: geneHit (e.g., TRBV7-2) and geneHitWithAllele (e.g., TRBV7-2*01). The pattern is the same for all segments; only the domain changes.
Gene hit (pl7.app/vdj/geneHit / pl7.app/vdj/geneHitWithAllele)
- Description: The name of the best-matching gene segment.
- Specification (V-gene example):
# Using geneHit as the example
name: pl7.app/vdj/geneHit
valueType: String
axesSpec:
- name: pl7.app/vdj/clonotypeKey
type: String
domain:
# Specifies the gene segment type. Can be VGene, DGene, JGene, or CGene.
pl7.app/vdj/reference: "VGene"
annotations:
pl7.app/description: "The name of the best-matching V gene."
pl7.app/label: "Best V gene"
# Allows this column to be used as a filter in the UI.
pl7.app/isDiscreteFilter: "true"
pl7.app/table/visibility: "default"
3. Somatic hypermutations (SHM)
These p-columns store the number or rate of mutations within a given gene region (e.g., the V gene), which is essential for analyzing affinity maturation. Both nucleotide (n...) and amino acid (aa...) variants exist.
These columns are often used to rank clonotypes. To facilitate this, they should be annotated as "score" columns, specifying that a higher number of mutations is typically of greater interest.
Mutation count (pl7.app/vdj/sequence/nMutationsCount)
- Description: The number of nucleotide mutations in a specified gene region relative to the germline sequence.
- Specification (V-gene example):
name: pl7.app/vdj/sequence/nMutationsCount
valueType: Int
axesSpec:
- name: pl7.app/vdj/clonotypeKey
type: String
domain:
# Specifies the gene region where mutations were counted (V or J).
pl7.app/vdj/gene: "V"
annotations:
pl7.app/description: "The number of nucleotide mutations in a specified gene region relative to the germline sequence."
pl7.app/label: "Nt mutations count in V gene"
pl7.app/table/visibility: "optional"
# --- Score & Ranking Annotations ---
pl7.app/isScore: "true"
pl7.app/score/rankingOrder: "decreasing" # Higher mutation count is ranked higher
See the Standard Annotations guide for more on score and ranking annotations.
4. Other key properties
This section includes other important high-level properties of the clonotypes.
pl7.app/label: A default human-readable label for the clonotype, often derived from its key features.pl7.app/vdj/isotype: The isotype of the receptor (e.g.,IgG1,IgM). Primarily for B-cell receptors.pl7.app/vdj/chain: The receptor chain type (e.g.IGH,TRB). This provides a high-level grouping for the clonotype.
5. Cross-sample aggregations
These p-columns provide summary statistics for each clonotype across all samples in the dataset. They are particularly useful for identifying clonotypes that are abundant or prevalent across the entire cohort. These are keyed by pl7.app/vdj/clonotypeKey.
Number of samples (pl7.app/vdj/sampleCount)
- Description: The number of samples in which a given clonotype is detected (i.e., has an abundance > 0).
- Requirement: Required.
- Specification:
name: pl7.app/vdj/sampleCount
valueType: Int
axesSpec:
- name: pl7.app/vdj/clonotypeKey
type: String
annotations:
pl7.app/description: "The number of samples in which a given clonotype is detected."
pl7.app/label: "Number of Samples"
pl7.app/isAbundance: "true"
pl7.app/abundance/unit: "samples"
pl7.app/abundance/normalized: "false"
pl7.app/table/visibility: "default"
pl7.app/table/orderPriority: "87110"
pl7.app/min: "1"
Total abundance (.../readCountTotal or .../uniqueMoleculeCountTotal)
- Description: The sum of a clonotype's abundance across all samples. This is calculated for the primary abundance measure. If UMIs were used, this will be
uniqueMoleculeCountTotal; otherwise, it will bereadCountTotal. - Requirement: Required. One of the two variants must be present.
- Specification (read count example):
name: pl7.app/vdj/readCountTotal
valueType: Long
axesSpec:
- name: pl7.app/vdj/clonotypeKey
type: String
annotations:
pl7.app/description: "The sum of a clonotype's read counts across all samples."
pl7.app/label: "Supporting Reads"
pl7.app/isAbundance: "true"
pl7.app/abundance/unit: "reads"
pl7.app/abundance/normalized: "false"
pl7.app/table/visibility: "default"
pl7.app/table/orderPriority: "87120"
pl7.app/min: "1"
Mean abundance fraction (.../readFractionMean or .../uniqueMoleculeFractionMean)
- Description: The average fraction of a clonotype across all samples in which it was observed. This is calculated for the primary abundance measure. If UMIs were used, this will be
uniqueMoleculeFractionMean; otherwise, it will bereadFractionMean. - Requirement: Required. One of the two variants must be present.
- Specification (read fraction example):
name: pl7.app/vdj/readFractionMean
valueType: Double
axesSpec:
- name: pl7.app/vdj/clonotypeKey
type: String
annotations:
pl7.app/description: "The average fraction of a clonotype across all samples in which it was observed."
pl7.app/label: "Mean Fraction of Reads"
pl7.app/isAbundance: "true"
pl7.app/abundance/unit: "reads"
pl7.app/abundance/normalized: "true"
pl7.app/table/visibility: "default"
pl7.app/table/orderPriority: "87130"
pl7.app/format: ".2p"
pl7.app/min: "0"
pl7.app/max: "1"
Querying for property p-columns: examples
The following examples show how to reliably query for specific property p-columns in both the model (TypeScript) and the workflow (Tengo).
Model: Finding all amino acid sequences in a dataset
This TypeScript example shows how a block's model, given a dataset anchor (inputAnchor), can find all associated amino acid sequence p-columns. This is useful for populating a UI dropdown that allows a user to select a sequence for clustering or visualization.
// In a block's model file (`/model/src/index.ts`)
import { BlockModel } from "@platforma-sdk/model";
export const model = BlockModel.create()
// ...
.output("sequenceOptions", (ctx) => {
const anchorRef = ctx.args.inputAnchor;
if (anchorRef === undefined) return undefined;
// Use `getCanonicalOptions` to perform an anchored query.
return ctx.resultPool.getCanonicalOptions(
// The anchor provides the context for the query.
{ main: anchorRef },
// The matcher defines what kind of p-columns we're looking for.
{
// This is the key part: it ensures the p-column is keyed
// by the *exact same clonotypeKey axis* as our anchor.
axes: [{ anchor: "main", idx: 1 }],
name: "pl7.app/vdj/sequence",
domain: {
"pl7.app/alphabet": "aminoacid",
},
}
);
})
//...
.done();
See Model API documentation for further examples.
Workflow: Using V and J genes for calculations
This Tengo example shows how a workflow, having been given an inputAnchor, can reliably fetch the associated V and J gene hit p-columns to use in processing.
// In a block's workflow file (`/workflow/src/main.tpl.tengo`)
wf := import("@platforma-sdk/workflow-tengo:workflow")
wf.prepare(func(args) {
bundleBuilder := wf.createPBundleBuilder()
bundleBuilder.addAnchor("main", args.inputAnchor)
// Add the V and J gene hit columns to the bundle.
for gene in ["V", "J"] {
bundleBuilder.addSingle({
axes: [{ anchor: "main", idx: 1 }],
name: "pl7.app/vdj/geneHit",
domain: {
"pl7.app/vdj/reference": gene + "Gene",
}
},
gene + "GeneHit")
}
return {
bundle: bundleBuilder.build()
}
})
wf.body(func(args) {
// ...
// The `VGeneHit` and `JGeneHit` p-columns are now available
// in the `args.bundle` bundle for processing.
// ...
})
See Workflow API documentation for further examples.
Summary of standard p-columns
The following table provides a summary of all standard p-columns that a developer can expect to be produced by a compliant bulk VDJ clonotyping block. Downstream block developers can use this as a quick reference to see what data is available for their tools.
| P-Column Name | Description | Axes | Requirement |
|---|---|---|---|
| Abundance P-Columns | |||
pl7.app/vdj/readCount | The raw number of sequencing reads assigned to the clonotype. | [sampleId][clonotypeKey] | Required |
pl7.app/vdj/readFraction | The fraction of total reads in the sample assigned to the clonotype. | [sampleId][clonotypeKey] | Required |
pl7.app/vdj/uniqueMoleculeCount | The number of unique molecules (UMIs) assigned to the clonotype. | [sampleId][clonotypeKey] | Conditional |
pl7.app/vdj/uniqueMoleculeFraction | The fraction of total UMIs in the sample assigned to the clonotype. | [sampleId][clonotypeKey] | Conditional |
| Property P-Columns | |||
pl7.app/vdj/sequence | The sequence of a specific feature (e.g., CDR3, VDJRegion). | [clonotypeKey] | Required¹ |
pl7.app/vdj/sequenceLength | The length of a feature sequence. | [clonotypeKey] | Optional |
pl7.app/vdj/sequence/productive | A flag indicating whether the main sequence is productive (in-frame and without stop codons). | [clonotypeKey] | Required |
pl7.app/vdj/geneHit | The name of the best-matching V/D/J/C gene segment, without allele information. | [clonotypeKey] | Required² |
pl7.app/vdj/geneHitWithAllele | The name of the best-matching V/D/J/C gene segment, including allele information. | [clonotypeKey] | Optional |
pl7.app/vdj/sequence/nMutationsCount | The number of nucleotide mutations in a specified gene region relative to the germline sequence. | [clonotypeKey] | Optional |
pl7.app/vdj/sequence/aaMutationsRate | The rate of amino acid mutations in a specified gene region relative to the germline. | [clonotypeKey] | Optional |
pl7.app/vdj/isotype | The isotype of the receptor chain (e.g., IgG1, IgM). | [clonotypeKey] | Conditional³ |
pl7.app/vdj/chain | The high-level receptor chain type (e.g., IGH, TRB). | [clonotypeKey] | Required |
pl7.app/label | A short, human-readable label for the clonotype. | [clonotypeKey] | Optional |
pl7.app/vdj/sampleCount | The number of samples in which the clonotype is detected. | [clonotypeKey] | Required |
pl7.app/vdj/readCountTotal | The sum of a clonotype's read counts across all samples. | [clonotypeKey] | Conditional⁴ |
pl7.app/vdj/uniqueMoleculeCountTotal | The sum of a clonotype's UMI counts across all samples. | [clonotypeKey] | Conditional⁴ |
pl7.app/vdj/readFractionMean | The average read fraction of a clonotype across all samples where it is present. | [clonotypeKey] | Conditional⁴ |
pl7.app/vdj/uniqueMoleculeFractionMean | The average UMI fraction of a clonotype across all samples where it is present. | [clonotypeKey] | Conditional⁴ |
¹ At a minimum, the sequence p-column designated as the isMainSequence is required. Additional sequence p-columns (e.g., for other CDRs or Framework Regions) are optional.
² The geneHit p-column is required for V and J genes. For D and C genes, it is optional. The geneHitWithAllele is always optional.
³ Required for B-cell receptor chains (IGHeavy, IGLight).
⁴ One of readCountTotal or uniqueMoleculeCountTotal is required. Similarly, one of readFractionMean or uniqueMoleculeFractionMean is required. These are determined by the primary abundance metric (reads vs. UMIs).