VDJ Naming Conventions

This document provides a comprehensive reference for VDJ-specific column names, axis names, domains, and annotations used in the Platforma VDJ analysis ecosystem. Following these conventions ensures interoperability between upstream clonotyping blocks and downstream analysis blocks.

Standard VDJ Axes

These axis names are used to index VDJ data across different analysis stages.

Axis Name	Type	Description	Usage
`pl7.app/sampleId`	String	Sample identifier	Primary axis for organizing data by sample. Present in both bulk and single-cell datasets.
`pl7.app/sampleGroupId`	String	Sample group identifier	Used to organize samples into groups for comparative analysis.
`pl7.app/vdj/clonotypeKey`	String	Clonotype identifier (bulk)	Composite key identifying a unique clonotype in bulk datasets. Structure defined by domain `pl7.app/vdj/clonotypeKey/structure`.
`pl7.app/vdj/scClonotypeKey`	String	Clonotype identifier (single-cell)	Composite key identifying a unique clonotype in single-cell datasets.
`pl7.app/vdj/clusterId`	String	Cluster identifier	Identifies clusters created by clustering algorithms. Usually includes domain specifying the clustering algorithm and block ID.
`pl7.app/sc/cellId`	String	Cell identifier (single-cell)	Unique identifier for individual cells in single-cell datasets.

Standard VDJ Columns

Abundance Columns

Abundance columns measure the quantity of clonotypes or cells. They follow naming patterns based on the unit of measurement.

Column Name	Value Type	Description	Typical Annotations
`pl7.app/vdj/readCount`	Long	Number of reads for a clonotype	`pl7.app/isAbundance: "true"`, `pl7.app/abundance/unit: "reads"`, `pl7.app/abundance/normalized: "false"`
`pl7.app/vdj/readFraction`	Double	Fraction of total reads	`pl7.app/isAbundance: "true"`, `pl7.app/abundance/unit: "reads"`, `pl7.app/abundance/normalized: "true"`
`pl7.app/vdj/uniqueMoleculeCount`	Long	Number of unique molecules (UMIs)	`pl7.app/isAbundance: "true"`, `pl7.app/abundance/unit: "molecules"`, `pl7.app/abundance/normalized: "false"`
`pl7.app/vdj/uniqueMoleculeFraction`	Double	Fraction of unique molecules	`pl7.app/isAbundance: "true"`, `pl7.app/abundance/unit: "molecules"`, `pl7.app/abundance/normalized: "true"`
`pl7.app/vdj/uniqueCellCount`	Long	Number of unique cells (single-cell)	`pl7.app/isAbundance: "true"`, `pl7.app/abundance/unit: "cells"`, `pl7.app/abundance/normalized: "false"`
`pl7.app/vdj/uniqueCellFraction`	Double	Fraction of unique cells	`pl7.app/isAbundance: "true"`, `pl7.app/abundance/unit: "cells"`, `pl7.app/abundance/normalized: "true"`
`pl7.app/vdj/sampleCount`	Long	Number of samples containing this clonotype	`pl7.app/isAbundance: "true"`, `pl7.app/abundance/unit: "samples"`, `pl7.app/abundance/normalized: "false"`

Aggregated abundance columns (across samples):

Column Name	Value Type	Description
`pl7.app/vdj/readCountTotal`	Long	Total read count across all samples
`pl7.app/vdj/readFractionMean`	Double	Mean read fraction across samples
`pl7.app/vdj/uniqueMoleculeCountTotal`	Long	Total UMI count across all samples
`pl7.app/vdj/uniqueMoleculeFractionMean`	Double	Mean UMI fraction across samples

Sequence Columns

Sequence columns store nucleotide or amino acid sequences for various gene features. The pl7.app/vdj/feature domain specifies which feature the sequence represents, and the pl7.app/alphabet domain distinguishes between nucleotide and amino acid sequences.

Column Name Pattern	Value Type	Description	Required Domains
`pl7.app/vdj/sequence`	String	Sequence data for a specific feature	`pl7.app/vdj/feature`, `pl7.app/alphabet`
`pl7.app/vdj/sequenceLength`	Int	Length of the sequence	`pl7.app/vdj/feature`, `pl7.app/alphabet`
`pl7.app/vdj/sequence/productive`	String	Boolean flag indicating if sequence is productive	-
`pl7.app/vdj/sequence/annotation`	String	Annotated regions within the sequence	Same as corresponding sequence column
`pl7.app/vdj/sequenceQuality`	String	Quality scores for sequence bases	Same as sequence column
`pl7.app/vdj/sequence/{alphabet}{variant}`	String	Variant sequences (e.g., germline)	`pl7.app/vdj/gene`, `pl7.app/vdj/scClonotypeChain`

Common values for pl7.app/vdj/feature domain:

CDR1, CDR2, CDR3 - Complementarity Determining Regions
FR1, FR2, FR3, FR4 - Framework Regions
VDJRegion - Complete V(D)J region
VDJRegionInFrame, FR4InFrame - In-frame variants

Variant naming pattern examples:

nGermline - Nucleotide germline sequence
aaGermline - Amino acid germline sequence
nTarget - Nucleotide target sequence

Gene Hit Columns

These columns identify which V, D, J, or C genes were aligned to the sequence.

Column Name	Value Type	Description	Required Domains
`pl7.app/vdj/geneHit`	String	Gene name without allele information	`pl7.app/vdj/reference`
`pl7.app/vdj/geneHitWithAllele`	String	Gene name with allele information	`pl7.app/vdj/reference`
`pl7.app/vdj/chain`	String	Chain type (e.g., IGH, IGK, TRA, TRB)	-
`pl7.app/vdj/isotype`	String	Antibody isotype (e.g., IgG, IgM)	-

Values for pl7.app/vdj/reference domain:

VGene - Variable gene
DGene - Diversity gene
JGene - Joining gene
CGene - Constant gene

Clustering Columns

These columns are produced by clustering analysis blocks.

Column Name	Value Type	Description	Required Domains
`pl7.app/vdj/clusterId`	String	Cluster identifier	`pl7.app/vdj/clustering/algorithm`, `pl7.app/vdj/clustering/blockId`
`pl7.app/vdj/clustering/clusterSize`	Long	Number of clonotypes in cluster	Same as clusterId
`pl7.app/vdj/clustering/clusterRadius`	Double	Maximum distance to centroid	Same as clusterId
`pl7.app/vdj/distanceToCentroid`	Double	Distance from clonotype to cluster centroid	Same as clusterId

Single-Cell Specific Columns

Column Name	Value Type	Description	Usage
`pl7.app/vdj/scFv-sequence`	String	Single-chain variable fragment sequence	Used in scFv clonotyping workflows
`pl7.app/sc/cellLinker`	Long	Linker column connecting cells to clonotypes	Value is typically `1` indicating the relationship exists

Linker Columns

Column Name	Value Type	Description	Usage
`pl7.app/vdj/link`	Long	Generic linker column	Connects entities across different axes, value is typically `1`

Statistic Columns

These columns provide summary statistics, typically with pl7.app/sampleId as the only axis.

Column Name	Value Type	Description
`pl7.app/vdj/stat/clonotypeCount`	Long	Total number of clonotypes in sample
`pl7.app/vdj/stat/readCount`	Long	Total number of reads in sample
`pl7.app/vdj/stat/umiCount`	Long	Total number of UMIs in sample

Analysis Result Columns

Column Name	Value Type	Description
`pl7.app/vdj/library`	String	Library type or protocol used
`pl7.app/vdj/umap1`, `pl7.app/vdj/umap2`	Double	UMAP coordinates for dimensionality reduction

VDJ-Specific Domains

Domains provide additional context to distinguish otherwise similar columns or axes.

Domain Key	Typical Values	Purpose
`pl7.app/vdj/feature`	`CDR3`, `CDR1`, `CDR2`, `FR1-FR4`, `VDJRegion`, etc.	Specifies which gene feature a sequence column represents
`pl7.app/vdj/reference`	`VGene`, `DGene`, `JGene`, `CGene`	Identifies which gene segment a gene hit refers to
`pl7.app/vdj/scClonotypeChain`	`A`, `B`	Distinguishes between chains in single-cell data (e.g., heavy vs light)
`pl7.app/vdj/scClonotypeChain/index`	`primary`, `secondary`	Indicates the importance of a chain within a cell
`pl7.app/vdj/clustering/algorithm`	`mmseqs2`, etc.	Identifies the clustering algorithm used
`pl7.app/vdj/clustering/blockId`	Block instance UUID	Distinguishes clusters from different clustering runs
`pl7.app/vdj/clonotypingRunId`	Block instance UUID	Identifies which clonotyping block produced the data
`pl7.app/vdj/gene`	`V`, `J`, `D`, `C`	Specifies gene type for sequence variants
`pl7.app/vdj/scFv-linker`	Linker sequence	Identifies the scFv linker used
`pl7.app/vdj/scFv-hinge`	Hinge sequence	Identifies the scFv hinge used
`pl7.app/alphabet`	`nucleotide`, `aminoacid`	Distinguishes between DNA/RNA and protein sequences

Special domain for clonotype keys:

pl7.app/vdj/clonotypeKey/structure: JSON array describing which features define the clonotype (e.g., ["nSeqCDR3","bestVGene","bestJGene"])

VDJ-Specific Annotations

Annotation	Values	Description
`pl7.app/vdj/imputed`	`"true"`, `"false"`	Marks sequences that were imputed rather than directly observed
`pl7.app/vdj/isAssemblingFeature`	`"true"`, `"false"`	Marks features used in clonotype assembly
`pl7.app/vdj/isMainSequence`	`"true"`, `"false"`	Marks the primary sequence for a feature

Naming Best Practices

Use established names: Always use the standard column and axis names documented here when they fit your use case
Follow the pattern: When creating new feature-specific columns, follow the established naming patterns (e.g., pl7.app/vdj/sequence with appropriate domains)
Document domains: Always include appropriate domains to distinguish between similar columns
Abundance annotations: Always include the full set of abundance annotations for count/fraction columns
Segmentation: Use pl7.app/segmentedBy annotation when columns can be meaningfully merged across different analysis runs

Integration with Downstream Blocks

Downstream analysis blocks (like Clonotype Browser, Clustering, etc.) discover input data by querying for columns with specific characteristics. For example:

Abundance queries: Look for pl7.app/isAbundance: "true" with specific pl7.app/abundance/unit values
Sequence queries: Look for pl7.app/vdj/sequence with specific pl7.app/vdj/feature and pl7.app/alphabet domains
Anchor queries: Look for pl7.app/isAnchor: "true" to find the primary dataset column

By following these naming conventions, your clonotyping block will automatically work with all standard downstream analysis blocks.

Standard VDJ Axes​

Standard VDJ Columns​

Abundance Columns​

Sequence Columns​

Gene Hit Columns​

Clustering Columns​

Single-Cell Specific Columns​

Linker Columns​

Statistic Columns​

Analysis Result Columns​

VDJ-Specific Domains​

VDJ-Specific Annotations​

Naming Best Practices​

Integration with Downstream Blocks​