VDJ Naming Conventions
This document provides a comprehensive reference for VDJ-specific column names, axis names, domains, and annotations used in the Platforma VDJ analysis ecosystem. Following these conventions ensures interoperability between upstream clonotyping blocks and downstream analysis blocks.
Standard VDJ Axes
These axis names are used to index VDJ data across different analysis stages.
| Axis Name | Type | Description | Usage |
|---|---|---|---|
pl7.app/sampleId | String | Sample identifier | Primary axis for organizing data by sample. Present in both bulk and single-cell datasets. |
pl7.app/sampleGroupId | String | Sample group identifier | Used to organize samples into groups for comparative analysis. |
pl7.app/vdj/clonotypeKey | String | Clonotype identifier (bulk) | Composite key identifying a unique clonotype in bulk datasets. Structure defined by domain pl7.app/vdj/clonotypeKey/structure. |
pl7.app/vdj/scClonotypeKey | String | Clonotype identifier (single-cell) | Composite key identifying a unique clonotype in single-cell datasets. |
pl7.app/vdj/clusterId | String | Cluster identifier | Identifies clusters created by clustering algorithms. Usually includes domain specifying the clustering algorithm and block ID. |
pl7.app/sc/cellId | String | Cell identifier (single-cell) | Unique identifier for individual cells in single-cell datasets. |
Standard VDJ Columns
Abundance Columns
Abundance columns measure the quantity of clonotypes or cells. They follow naming patterns based on the unit of measurement.
| Column Name | Value Type | Description | Typical Annotations |
|---|---|---|---|
pl7.app/vdj/readCount | Long | Number of reads for a clonotype | pl7.app/isAbundance: "true", pl7.app/abundance/unit: "reads", pl7.app/abundance/normalized: "false" |
pl7.app/vdj/readFraction | Double | Fraction of total reads | pl7.app/isAbundance: "true", pl7.app/abundance/unit: "reads", pl7.app/abundance/normalized: "true" |
pl7.app/vdj/uniqueMoleculeCount | Long | Number of unique molecules (UMIs) | pl7.app/isAbundance: "true", pl7.app/abundance/unit: "molecules", pl7.app/abundance/normalized: "false" |
pl7.app/vdj/uniqueMoleculeFraction | Double | Fraction of unique molecules | pl7.app/isAbundance: "true", pl7.app/abundance/unit: "molecules", pl7.app/abundance/normalized: "true" |
pl7.app/vdj/uniqueCellCount | Long | Number of unique cells (single-cell) | pl7.app/isAbundance: "true", pl7.app/abundance/unit: "cells", pl7.app/abundance/normalized: "false" |
pl7.app/vdj/uniqueCellFraction | Double | Fraction of unique cells | pl7.app/isAbundance: "true", pl7.app/abundance/unit: "cells", pl7.app/abundance/normalized: "true" |
pl7.app/vdj/sampleCount | Long | Number of samples containing this clonotype | pl7.app/isAbundance: "true", pl7.app/abundance/unit: "samples", pl7.app/abundance/normalized: "false" |
Aggregated abundance columns (across samples):
| Column Name | Value Type | Description |
|---|---|---|
pl7.app/vdj/readCountTotal | Long | Total read count across all samples |
pl7.app/vdj/readFractionMean | Double | Mean read fraction across samples |
pl7.app/vdj/uniqueMoleculeCountTotal | Long | Total UMI count across all samples |
pl7.app/vdj/uniqueMoleculeFractionMean | Double | Mean UMI fraction across samples |
Sequence Columns
Sequence columns store nucleotide or amino acid sequences for various gene features. The pl7.app/vdj/feature domain specifies which feature the sequence represents, and the pl7.app/alphabet domain distinguishes between nucleotide and amino acid sequences.
| Column Name Pattern | Value Type | Description | Required Domains |
|---|---|---|---|
pl7.app/vdj/sequence | String | Sequence data for a specific feature | pl7.app/vdj/feature, pl7.app/alphabet |
pl7.app/vdj/sequenceLength | Int | Length of the sequence | pl7.app/vdj/feature, pl7.app/alphabet |
pl7.app/vdj/sequence/productive | String | Boolean flag indicating if sequence is productive | - |
pl7.app/vdj/sequence/annotation | String | Annotated regions within the sequence | Same as corresponding sequence column |
pl7.app/vdj/sequenceQuality | String | Quality scores for sequence bases | Same as sequence column |
pl7.app/vdj/sequence/{alphabet}{variant} | String | Variant sequences (e.g., germline) | pl7.app/vdj/gene, pl7.app/vdj/scClonotypeChain |
Common values for pl7.app/vdj/feature domain:
CDR1,CDR2,CDR3- Complementarity Determining RegionsFR1,FR2,FR3,FR4- Framework RegionsVDJRegion- Complete V(D)J regionVDJRegionInFrame,FR4InFrame- In-frame variants
Variant naming pattern examples:
nGermline- Nucleotide germline sequenceaaGermline- Amino acid germline sequencenTarget- Nucleotide target sequence
Gene Hit Columns
These columns identify which V, D, J, or C genes were aligned to the sequence.
| Column Name | Value Type | Description | Required Domains |
|---|---|---|---|
pl7.app/vdj/geneHit | String | Gene name without allele information | pl7.app/vdj/reference |
pl7.app/vdj/geneHitWithAllele | String | Gene name with allele information | pl7.app/vdj/reference |
pl7.app/vdj/chain | String | Chain type (e.g., IGH, IGK, TRA, TRB) | - |
pl7.app/vdj/isotype | String | Antibody isotype (e.g., IgG, IgM) | - |
Values for pl7.app/vdj/reference domain:
VGene- Variable geneDGene- Diversity geneJGene- Joining geneCGene- Constant gene
Clustering Columns
These columns are produced by clustering analysis blocks.
| Column Name | Value Type | Description | Required Domains |
|---|---|---|---|
pl7.app/vdj/clusterId | String | Cluster identifier | pl7.app/vdj/clustering/algorithm, pl7.app/vdj/clustering/blockId |
pl7.app/vdj/clustering/clusterSize | Long | Number of clonotypes in cluster | Same as clusterId |
pl7.app/vdj/clustering/clusterRadius | Double | Maximum distance to centroid | Same as clusterId |
pl7.app/vdj/distanceToCentroid | Double | Distance from clonotype to cluster centroid | Same as clusterId |
Single-Cell Specific Columns
| Column Name | Value Type | Description | Usage |
|---|---|---|---|
pl7.app/vdj/scFv-sequence | String | Single-chain variable fragment sequence | Used in scFv clonotyping workflows |
pl7.app/sc/cellLinker | Long | Linker column connecting cells to clonotypes | Value is typically 1 indicating the relationship exists |
Linker Columns
| Column Name | Value Type | Description | Usage |
|---|---|---|---|
pl7.app/vdj/link | Long | Generic linker column | Connects entities across different axes, value is typically 1 |
Statistic Columns
These columns provide summary statistics, typically with pl7.app/sampleId as the only axis.
| Column Name | Value Type | Description |
|---|---|---|
pl7.app/vdj/stat/clonotypeCount | Long | Total number of clonotypes in sample |
pl7.app/vdj/stat/readCount | Long | Total number of reads in sample |
pl7.app/vdj/stat/umiCount | Long | Total number of UMIs in sample |
Analysis Result Columns
| Column Name | Value Type | Description |
|---|---|---|
pl7.app/vdj/library | String | Library type or protocol used |
pl7.app/vdj/umap1, pl7.app/vdj/umap2 | Double | UMAP coordinates for dimensionality reduction |
VDJ-Specific Domains
Domains provide additional context to distinguish otherwise similar columns or axes.
| Domain Key | Typical Values | Purpose |
|---|---|---|
pl7.app/vdj/feature | CDR3, CDR1, CDR2, FR1-FR4, VDJRegion, etc. | Specifies which gene feature a sequence column represents |
pl7.app/vdj/reference | VGene, DGene, JGene, CGene | Identifies which gene segment a gene hit refers to |
pl7.app/vdj/scClonotypeChain | A, B | Distinguishes between chains in single-cell data (e.g., heavy vs light) |
pl7.app/vdj/scClonotypeChain/index | primary, secondary | Indicates the importance of a chain within a cell |
pl7.app/vdj/clustering/algorithm | mmseqs2, etc. | Identifies the clustering algorithm used |
pl7.app/vdj/clustering/blockId | Block instance UUID | Distinguishes clusters from different clustering runs |
pl7.app/vdj/clonotypingRunId | Block instance UUID | Identifies which clonotyping block produced the data |
pl7.app/vdj/gene | V, J, D, C | Specifies gene type for sequence variants |
pl7.app/vdj/scFv-linker | Linker sequence | Identifies the scFv linker used |
pl7.app/vdj/scFv-hinge | Hinge sequence | Identifies the scFv hinge used |
pl7.app/alphabet | nucleotide, aminoacid | Distinguishes between DNA/RNA and protein sequences |
Special domain for clonotype keys:
pl7.app/vdj/clonotypeKey/structure: JSON array describing which features define the clonotype (e.g.,["nSeqCDR3","bestVGene","bestJGene"])
VDJ-Specific Annotations
| Annotation | Values | Description |
|---|---|---|
pl7.app/vdj/imputed | "true", "false" | Marks sequences that were imputed rather than directly observed |
pl7.app/vdj/isAssemblingFeature | "true", "false" | Marks features used in clonotype assembly |
pl7.app/vdj/isMainSequence | "true", "false" | Marks the primary sequence for a feature |
Naming Best Practices
- Use established names: Always use the standard column and axis names documented here when they fit your use case
- Follow the pattern: When creating new feature-specific columns, follow the established naming patterns (e.g.,
pl7.app/vdj/sequencewith appropriate domains) - Document domains: Always include appropriate domains to distinguish between similar columns
- Abundance annotations: Always include the full set of abundance annotations for count/fraction columns
- Segmentation: Use
pl7.app/segmentedByannotation when columns can be meaningfully merged across different analysis runs
Integration with Downstream Blocks
Downstream analysis blocks (like Clonotype Browser, Clustering, etc.) discover input data by querying for columns with specific characteristics. For example:
- Abundance queries: Look for
pl7.app/isAbundance: "true"with specificpl7.app/abundance/unitvalues - Sequence queries: Look for
pl7.app/vdj/sequencewith specificpl7.app/vdj/featureandpl7.app/alphabetdomains - Anchor queries: Look for
pl7.app/isAnchor: "true"to find the primary dataset column
By following these naming conventions, your clonotyping block will automatically work with all standard downstream analysis blocks.