VDJ block development guides
The aim of this documentation is to provide standardized guidelines for implementing VDJ (V, D, J gene segment) analysis blocks within the Platforma ecosystem. Adhering to these standards ensures that upstream analysis blocks and downstream application blocks can work together seamlessly, regardless of the specific bioinformatics tools they wrap.
Core conceptsโ
-
Clonotyping block: An upstream block that processes raw sequencing data (e.g., FASTQ files) and generates a standardized "VDJ dataset". The
MiXCR Clonotyping
block is the gold-standard implementation of an upstream block. -
VDJ dataset: A p-frame that contains a set of p-columns describing clonotyping information. This includes clonotype abundances, gene feature sequences (like CDR3), V/D/J gene calls, and other essential properties of the clonotypes. While the specific structure can vary slightly based on the sequencing protocol (e.g., bulk vs. single-cell), the core column definitions should follow the standards outlined in this guide. The language and definitions used here are heavily based on the output of the MiXCR tool.
-
Downstream blocks: Blocks that consume a standardized VDJ dataset and perform various downstream analyses, such as clonotype sequence clustering, antibody sequence liability prediction, diversity calculations, etc. By relying on a standard input format, these blocks can operate on the output of any compliant upstream block.
The importance of standardizationโ
The primary goal of these standards is to create a robust framework for interoperability. When block developers follow these guidelines, users can build powerful and flexible analysis pipelines. For example, a user could swap out one upstream clonotyping tool for another, and all existing downstream blocks (like the standard Clonotype Browser
) would continue to work without modification.
This documentation will provide comprehensive details on the expected structure of VDJ datasets for different data types and the inputs and outputs for common downstream applications.
๐๏ธ Bulk Clonotyping
This document outlines the standard for VDJ datasets generated from bulk sequencing data. Upstream clonotyping blocks that process bulk FASTQ files (e.g. from targeted library sequencing or bulk RNA-Seq) should produce a p-frame containing the p-columns defined here. This ensures that downstream tools for analysis, visualization, and comparison can operate on a consistent and predictable data structure. See the P-frames and p-columns guide for more foundational information.
๐๏ธ Single-cell Clonotyping
This document outlines the standard for VDJ datasets generated from single-cell sequencing data (e.g., from 10x Genomics). This standard extends the Bulk Clonotyping guide, and a compliant single-cell dataset must include all required p-columns from the bulk standard.
๐๏ธ Clonotype clustering
This document outlines the standard inputs and outputs for a downstream block that performs clonotype clustering. By adhering to this standard, a clustering block can seamlessly process VDJ datasets from any compliant clonotyping block and produce results that are easy to understand and use in further analyses.
๐๏ธ Immune Assay Data
This document outlines the standard for a block that integrates functional immune assay data (e.g., from ELISA, SPR, or cell-based functional screens) with VDJ clonotype datasets. By adhering to this standard, an assay data block can link experimental measurements to specific clonotypes, making the data queryable and available for downstream analysis.
๐๏ธ Clonotype enrichment
This document outlines the standard inputs and outputs for a downstream block that performs clonotype enrichment analysis. By adhering to this standard, an enrichment block can seamlessly process VDJ datasets from any compliant clonotyping block and produce results that identify and rank clonotypes enriched through selection rounds, showing how their frequencies change across conditions.