Skip to main content

VDJ Naming Conventions

This document provides a comprehensive reference for VDJ-specific column names, axis names, domains, and annotations used in the Platforma VDJ analysis ecosystem. Following these conventions ensures interoperability between upstream clonotyping blocks and downstream analysis blocks.

Standard VDJ Axes

These axis names are used to index VDJ data across different analysis stages.

Axis NameTypeDescriptionUsage
pl7.app/sampleIdStringSample identifierPrimary axis for organizing data by sample. Present in both bulk and single-cell datasets.
pl7.app/sampleGroupIdStringSample group identifierUsed to organize samples into groups for comparative analysis.
pl7.app/vdj/clonotypeKeyStringClonotype identifier (bulk)Composite key identifying a unique clonotype in bulk datasets. Structure defined by domain pl7.app/vdj/clonotypeKey/structure.
pl7.app/vdj/scClonotypeKeyStringClonotype identifier (single-cell)Composite key identifying a unique clonotype in single-cell datasets.
pl7.app/vdj/clusterIdStringCluster identifierIdentifies clusters created by clustering algorithms. Usually includes domain specifying the clustering algorithm and block ID.
pl7.app/sc/cellIdStringCell identifier (single-cell)Unique identifier for individual cells in single-cell datasets.

Standard VDJ Columns

Abundance Columns

Abundance columns measure the quantity of clonotypes or cells. They follow naming patterns based on the unit of measurement.

Column NameValue TypeDescriptionTypical Annotations
pl7.app/vdj/readCountLongNumber of reads for a clonotypepl7.app/isAbundance: "true", pl7.app/abundance/unit: "reads", pl7.app/abundance/normalized: "false"
pl7.app/vdj/readFractionDoubleFraction of total readspl7.app/isAbundance: "true", pl7.app/abundance/unit: "reads", pl7.app/abundance/normalized: "true"
pl7.app/vdj/uniqueMoleculeCountLongNumber of unique molecules (UMIs)pl7.app/isAbundance: "true", pl7.app/abundance/unit: "molecules", pl7.app/abundance/normalized: "false"
pl7.app/vdj/uniqueMoleculeFractionDoubleFraction of unique moleculespl7.app/isAbundance: "true", pl7.app/abundance/unit: "molecules", pl7.app/abundance/normalized: "true"
pl7.app/vdj/uniqueCellCountLongNumber of unique cells (single-cell)pl7.app/isAbundance: "true", pl7.app/abundance/unit: "cells", pl7.app/abundance/normalized: "false"
pl7.app/vdj/uniqueCellFractionDoubleFraction of unique cellspl7.app/isAbundance: "true", pl7.app/abundance/unit: "cells", pl7.app/abundance/normalized: "true"
pl7.app/vdj/sampleCountLongNumber of samples containing this clonotypepl7.app/isAbundance: "true", pl7.app/abundance/unit: "samples", pl7.app/abundance/normalized: "false"

Aggregated abundance columns (across samples):

Column NameValue TypeDescription
pl7.app/vdj/readCountTotalLongTotal read count across all samples
pl7.app/vdj/readFractionMeanDoubleMean read fraction across samples
pl7.app/vdj/uniqueMoleculeCountTotalLongTotal UMI count across all samples
pl7.app/vdj/uniqueMoleculeFractionMeanDoubleMean UMI fraction across samples

Sequence Columns

Sequence columns store nucleotide or amino acid sequences for various gene features. The pl7.app/vdj/feature domain specifies which feature the sequence represents, and the pl7.app/alphabet domain distinguishes between nucleotide and amino acid sequences.

Column Name PatternValue TypeDescriptionRequired Domains
pl7.app/vdj/sequenceStringSequence data for a specific featurepl7.app/vdj/feature, pl7.app/alphabet
pl7.app/vdj/sequenceLengthIntLength of the sequencepl7.app/vdj/feature, pl7.app/alphabet
pl7.app/vdj/sequence/productiveStringBoolean flag indicating if sequence is productive-
pl7.app/vdj/sequence/annotationStringAnnotated regions within the sequenceSame as corresponding sequence column
pl7.app/vdj/sequenceQualityStringQuality scores for sequence basesSame as sequence column
pl7.app/vdj/sequence/{alphabet}{variant}StringVariant sequences (e.g., germline)pl7.app/vdj/gene, pl7.app/vdj/scClonotypeChain

Common values for pl7.app/vdj/feature domain:

  • CDR1, CDR2, CDR3 - Complementarity Determining Regions
  • FR1, FR2, FR3, FR4 - Framework Regions
  • VDJRegion - Complete V(D)J region
  • VDJRegionInFrame, FR4InFrame - In-frame variants

Variant naming pattern examples:

  • nGermline - Nucleotide germline sequence
  • aaGermline - Amino acid germline sequence
  • nTarget - Nucleotide target sequence

Gene Hit Columns

These columns identify which V, D, J, or C genes were aligned to the sequence.

Column NameValue TypeDescriptionRequired Domains
pl7.app/vdj/geneHitStringGene name without allele informationpl7.app/vdj/reference
pl7.app/vdj/geneHitWithAlleleStringGene name with allele informationpl7.app/vdj/reference
pl7.app/vdj/chainStringChain type (e.g., IGH, IGK, TRA, TRB)-
pl7.app/vdj/isotypeStringAntibody isotype (e.g., IgG, IgM)-

Values for pl7.app/vdj/reference domain:

  • VGene - Variable gene
  • DGene - Diversity gene
  • JGene - Joining gene
  • CGene - Constant gene

Clustering Columns

These columns are produced by clustering analysis blocks.

Column NameValue TypeDescriptionRequired Domains
pl7.app/vdj/clusterIdStringCluster identifierpl7.app/vdj/clustering/algorithm, pl7.app/vdj/clustering/blockId
pl7.app/vdj/clustering/clusterSizeLongNumber of clonotypes in clusterSame as clusterId
pl7.app/vdj/clustering/clusterRadiusDoubleMaximum distance to centroidSame as clusterId
pl7.app/vdj/distanceToCentroidDoubleDistance from clonotype to cluster centroidSame as clusterId

Single-Cell Specific Columns

Column NameValue TypeDescriptionUsage
pl7.app/vdj/scFv-sequenceStringSingle-chain variable fragment sequenceUsed in scFv clonotyping workflows
pl7.app/sc/cellLinkerLongLinker column connecting cells to clonotypesValue is typically 1 indicating the relationship exists

Linker Columns

Column NameValue TypeDescriptionUsage
pl7.app/vdj/linkLongGeneric linker columnConnects entities across different axes, value is typically 1

Statistic Columns

These columns provide summary statistics, typically with pl7.app/sampleId as the only axis.

Column NameValue TypeDescription
pl7.app/vdj/stat/clonotypeCountLongTotal number of clonotypes in sample
pl7.app/vdj/stat/readCountLongTotal number of reads in sample
pl7.app/vdj/stat/umiCountLongTotal number of UMIs in sample

Analysis Result Columns

Column NameValue TypeDescription
pl7.app/vdj/libraryStringLibrary type or protocol used
pl7.app/vdj/umap1, pl7.app/vdj/umap2DoubleUMAP coordinates for dimensionality reduction

VDJ-Specific Domains

Domains provide additional context to distinguish otherwise similar columns or axes.

Domain KeyTypical ValuesPurpose
pl7.app/vdj/featureCDR3, CDR1, CDR2, FR1-FR4, VDJRegion, etc.Specifies which gene feature a sequence column represents
pl7.app/vdj/referenceVGene, DGene, JGene, CGeneIdentifies which gene segment a gene hit refers to
pl7.app/vdj/scClonotypeChainA, BDistinguishes between chains in single-cell data (e.g., heavy vs light)
pl7.app/vdj/scClonotypeChain/indexprimary, secondaryIndicates the importance of a chain within a cell
pl7.app/vdj/clustering/algorithmmmseqs2, etc.Identifies the clustering algorithm used
pl7.app/vdj/clustering/blockIdBlock instance UUIDDistinguishes clusters from different clustering runs
pl7.app/vdj/clonotypingRunIdBlock instance UUIDIdentifies which clonotyping block produced the data
pl7.app/vdj/geneV, J, D, CSpecifies gene type for sequence variants
pl7.app/vdj/scFv-linkerLinker sequenceIdentifies the scFv linker used
pl7.app/vdj/scFv-hingeHinge sequenceIdentifies the scFv hinge used
pl7.app/alphabetnucleotide, aminoacidDistinguishes between DNA/RNA and protein sequences

Special domain for clonotype keys:

  • pl7.app/vdj/clonotypeKey/structure: JSON array describing which features define the clonotype (e.g., ["nSeqCDR3","bestVGene","bestJGene"])

VDJ-Specific Annotations

AnnotationValuesDescription
pl7.app/vdj/imputed"true", "false"Marks sequences that were imputed rather than directly observed
pl7.app/vdj/isAssemblingFeature"true", "false"Marks features used in clonotype assembly
pl7.app/vdj/isMainSequence"true", "false"Marks the primary sequence for a feature

Naming Best Practices

  1. Use established names: Always use the standard column and axis names documented here when they fit your use case
  2. Follow the pattern: When creating new feature-specific columns, follow the established naming patterns (e.g., pl7.app/vdj/sequence with appropriate domains)
  3. Document domains: Always include appropriate domains to distinguish between similar columns
  4. Abundance annotations: Always include the full set of abundance annotations for count/fraction columns
  5. Segmentation: Use pl7.app/segmentedBy annotation when columns can be meaningfully merged across different analysis runs

Integration with Downstream Blocks

Downstream analysis blocks (like Clonotype Browser, Clustering, etc.) discover input data by querying for columns with specific characteristics. For example:

  • Abundance queries: Look for pl7.app/isAbundance: "true" with specific pl7.app/abundance/unit values
  • Sequence queries: Look for pl7.app/vdj/sequence with specific pl7.app/vdj/feature and pl7.app/alphabet domains
  • Anchor queries: Look for pl7.app/isAnchor: "true" to find the primary dataset column

By following these naming conventions, your clonotyping block will automatically work with all standard downstream analysis blocks.