Single-cell Clonotyping

This document outlines the standard for VDJ datasets generated from single-cell sequencing data (e.g., from 10x Genomics). This standard extends the Bulk Clonotyping guide, and a compliant single-cell dataset must include all required p-columns from the bulk standard.

This guide details the new axes and p-columns that are introduced to handle the key features of single-cell VDJ analysis: per-cell quantification and paired-chain receptors.

Core structure: the `scClonotypeKey`

The single-cell standard introduces a new primary axis, pl7.app/vdj/scClonotypeKey. This axis identifies a unique single-cell clonotype, which represents a specific pair of receptor chains (e.g., one TRA and one TRB) found together in a cell.

Unlike the bulk standard, the clonotypeKey axis is not used. Instead, all p-columns in a single-cell dataset are keyed by scClonotypeKey (and sampleId for abundance columns). Properties of individual chains within a pair are distinguished using additional domain keys, as described in the sections below.

The `scClonotypeKey` axis in detail

The domain of the scClonotypeKey axis itself provides the highest-level biological context for the entire p-frame.

name: pl7.app/vdj/scClonotypeKey
type: String
domain:
  # The receptor type. Must be one of: IG, TCRAB, or TCRGD.
  pl7.app/vdj/receptor: "IG"
  # Uniquely identifies the analysis run that produced this key.
  pl7.app/vdj/clonotypingRunId: "3d58beb8-0ec1-46cf-99a4-79d81dafbedc"
  # Describes which features are used to define the clonotype.
  pl7.app/vdj/scClonotypeKey/structure": "[...]"
annotations:
  pl7.app/label: "Clonotype ID"

Calculating the `scClonotypeKey`

The scClonotypeKey value itself must be a deterministic, unique identifier for the paired-chain clonotype. A reliable method is to compute a SHA1 hash of the clonotype's defining properties (e.g., the primary sequences of both chains and their key gene hits). This ensures that the exact same paired clonotype will always have the exact same key.

Clonotype labels

To provide a short, human-readable label for the scClonotypeKey, a clonotyping block should also generate a pl7.app/label p-column.

Format: The label should be a string prefixed with "C-", followed by 6-7 random alphanumeric characters in upper case (e.g., "C-7VCA13" or "C-A4B1C9D").
Specification:

name: pl7.app/label
valueType: String
axesSpec:
  - name: pl7.app/vdj/scClonotypeKey
    type: String
annotations:
  pl7.app/isLabel: "true"
  pl7.app/label: "Clonotype Label"

Abundance p-columns

For single-cell data, the primary measure of abundance is the cell count. The primary abundance column, pl7.app/vdj/uniqueCellCount, also serves as the anchor column for the entire single-cell VDJ dataset.

1. Cell count

P-column name: pl7.app/vdj/uniqueCellCount
Description: The number of unique cells in which a specific single-cell clonotype was detected.
Requirement: Required.
Specification:

# --- Core Identity ---
name: pl7.app/vdj/uniqueCellCount
valueType: Long

# --- Axes ---
axesSpec:
  - name: pl7.app/sampleId
    type: String
  - name: pl7.app/vdj/scClonotypeKey
    type: String

# --- Annotations ---
annotations:
  # --- Abundance & Discovery ---
  pl7.app/isAnchor: "true"
  pl7.app/isAbundance: "true"
  pl7.app/abundance/isPrimary: "true"
  pl7.app/abundance/unit: "cells"
  pl7.app/abundance/normalized: "false"
  
  # --- UI & Formatting ---
  pl7.app/label: "Number of Cells"
  pl7.app/table/orderPriority: "100000"
  pl7.app/table/visibility: "default"
  pl7.app/min: "1"

2. Cell fraction

P-column name: pl7.app/vdj/uniqueCellFraction
Description: The fraction of total cells in the sample that are assigned to this clonotype.
Requirement: Required.

Clonotype property p-columns

The single-cell standard inherits all property p-columns from the bulk standard. The key difference is how they handle paired-chain information. This is achieved by adding extra keys to the domain of property p-columns.

Paired-chain properties

To distinguish between the two chains in a pair (e.g., heavy/light or alpha/beta), two domain keys are added to property p-columns like pl7.app/vdj/sequence or pl7.app/vdj/geneHit:

pl7.app/vdj/scClonotypeChain: Specifies the chain within a pair, typically "A" for the heavy/alpha/delta chain and "B" for the light/beta/gamma chain.
pl7.app/vdj/scClonotypeChain/index: Indicates whether the chain is the "primary" or "secondary" sequence. This is necessary because a single cell can contain more than one productive chain of the same type (e.g., two different TRAs). In such cases, the one with the higher abundance is designated as primary.

Feature sequence (`pl7.app/vdj/sequence`) for a paired chain

Description: The amino acid or nucleotide sequence of a feature from one chain of a paired-chain clonotype.
Specification (Heavy chain CDR3 example):

# --- Core Identity ---
name: pl7.app/vdj/sequence
valueType: String

# --- Axes ---
axesSpec:
  - name: pl7.app/vdj/scClonotypeKey
    type: String

# --- Domain (defines which chain and feature this is) ---
domain:
  pl7.app/vdj/feature: "CDR3"
  pl7.app/alphabet: "aminoacid"
  # Identifies this as belonging to the 'A' chain of the pair.
  pl7.app/vdj/scClonotypeChain: "A"
  # Identifies this as the primary sequence for that chain.
  pl7.app/vdj/scClonotypeChain/index: "primary"

# --- Annotations ---
annotations:
  pl7.app/label: "Heavy CDR3 aa Primary"
  pl7.app/table/visibility: "default"

Querying for property p-columns: examples

Model: Finding primary sequences for both chains

This TypeScript example shows how to query for the primary amino acid sequence of both the "A" and "B" chains for a given single-cell dataset.

// In a block's model file (`/model/src/index.ts`)
import { BlockModel } from "@platforma-sdk/model";

export const model = BlockModel.create()
  // ...
  .output("pairedChainSequences", (ctx) => {
    const anchorRef = ctx.args.scDatasetAnchor;
    if (anchorRef === undefined) return undefined;

    // Find all primary AA sequences for both chains
    return ctx.resultPool.getAnchoredPColumns(
      { main: anchorRef },
      {
        axes: [{ anchor: "main", idx: 1 }], // Keyed by scClonotypeKey
        name: "pl7.app/vdj/sequence",
        domain: {
  "pl7.app/alphabet": "aminoacid",
          "pl7.app/vdj/scClonotypeChain/index": "primary",
        },
      }
    );
  })
  //...
  .done();

By omitting the pl7.app/vdj/scClonotypeChain key from the domain query, the platform returns all p-columns that match the other criteria, effectively giving us both the "A" and "B" chains.

Summary of new single-cell p-columns

This table summarizes the key new or modified p-columns introduced in the single-cell standard. All other columns are inherited from the Bulk Clonotyping standard.

P-Column Name	Description	Axes	Requirement
`pl7.app/vdj/uniqueCellCount`	Number of unique cells for the clonotype.	`[sampleId][scClonotypeKey]`	Required
`pl7.app/vdj/uniqueCellFraction`	Fraction of cells for the clonotype.	`[sampleId][scClonotypeKey]`	Required
`pl7.app/vdj/sequence`	Sequence with added paired-chain domain keys.	`[scClonotypeKey]`	Required
`pl7.app/vdj/geneHit`	Gene hit with added paired-chain domain keys.	`[scClonotypeKey]`	Required
`pl7.app/label`	Default human-readable label for the clonotype.	`[scClonotypeKey]`	Optional

Core structure: the scClonotypeKey​

The scClonotypeKey axis in detail​

Calculating the scClonotypeKey​

Clonotype labels​

Abundance p-columns​

1. Cell count​

2. Cell fraction​

Clonotype property p-columns​

Paired-chain properties​

Feature sequence (pl7.app/vdj/sequence) for a paired chain​

Querying for property p-columns: examples​

Model: Finding primary sequences for both chains​

Summary of new single-cell p-columns​