VDJ block development guides

The aim of this documentation is to provide standardized guidelines for implementing VDJ (V, D, J gene segment) analysis blocks within the Platforma ecosystem. Adhering to these standards ensures that upstream analysis blocks and downstream application blocks can work together seamlessly, regardless of the specific bioinformatics tools they wrap.

Core concepts

Clonotyping block: An upstream block that processes raw sequencing data (e.g., FASTQ files) and generates a standardized "VDJ dataset". The MiXCR Clonotyping block is the gold-standard implementation of an upstream block.
VDJ dataset: A p-frame that contains a set of p-columns describing clonotyping information. This includes clonotype abundances, gene feature sequences (like CDR3), V/D/J gene calls, and other essential properties of the clonotypes. While the specific structure can vary slightly based on the sequencing protocol (e.g., bulk vs. single-cell), the core column definitions should follow the standards outlined in this guide. The language and definitions used here are heavily based on the output of the MiXCR tool.
Downstream blocks: Blocks that consume a standardized VDJ dataset and perform various downstream analyses, such as clonotype sequence clustering, antibody sequence liability prediction, diversity calculations, etc. By relying on a standard input format, these blocks can operate on the output of any compliant upstream block.

The importance of standardization

The primary goal of these standards is to create a robust framework for interoperability. When block developers follow these guidelines, users can build powerful and flexible analysis pipelines. For example, a user could swap out one upstream clonotyping tool for another, and all existing downstream blocks (like the standard Clonotype Browser) would continue to work without modification.

This documentation will provide comprehensive details on the expected structure of VDJ datasets for different data types and the inputs and outputs for common downstream applications.

📄️ Bulk Clonotyping

This document outlines the standard for VDJ datasets generated from bulk sequencing data. Upstream clonotyping blocks that process bulk FASTQ files (e.g. from targeted library sequencing or bulk RNA-Seq) should produce a p-frame containing the p-columns defined here. This ensures that downstream tools for analysis, visualization, and comparison can operate on a consistent and predictable data structure. See the P-frames and p-columns guide for more foundational information.

📄️ Single-cell Clonotyping

This document outlines the standard for VDJ datasets generated from single-cell sequencing data (e.g., from 10x Genomics). This standard extends the Bulk Clonotyping guide, and a compliant single-cell dataset must include all required p-columns from the bulk standard.

📄️ Clonotype clustering

This document outlines the standard inputs and outputs for a downstream block that performs clonotype clustering. By adhering to this standard, a clustering block can seamlessly process VDJ datasets from any compliant clonotyping block and produce results that are easy to understand and use in further analyses.

📄️ Immune Assay Data

This document outlines the standard for a block that integrates functional immune assay data (e.g., from ELISA, SPR, or cell-based functional screens) with VDJ clonotype datasets. By adhering to this standard, an assay data block can link experimental measurements to specific clonotypes, making the data queryable and available for downstream analysis.

📄️ Clonotype enrichment

This document outlines the standard inputs and outputs for a downstream block that performs clonotype enrichment analysis. By adhering to this standard, an enrichment block can seamlessly process VDJ datasets from any compliant clonotyping block and produce results that identify and rank clonotypes enriched through selection rounds, showing how their frequencies change across conditions.

📄️ VDJ Naming Conventions

This document provides a comprehensive reference for VDJ-specific column names, axis names, domains, and annotations used in the Platforma VDJ analysis ecosystem. Following these conventions ensures interoperability between upstream clonotyping blocks and downstream analysis blocks.

Core concepts​

The importance of standardization​