Skip to main content

VDJ block development guides

The aim of this documentation is to provide standardized guidelines for implementing VDJ (V, D, J gene segment) analysis blocks within the Platforma ecosystem. Adhering to these standards ensures that upstream analysis blocks and downstream application blocks can work together seamlessly, regardless of the specific bioinformatics tools they wrap.

Core conceptsโ€‹

  • Clonotyping block: An upstream block that processes raw sequencing data (e.g., FASTQ files) and generates a standardized "VDJ dataset". The MiXCR Clonotyping block is the gold-standard implementation of an upstream block.

  • VDJ dataset: A p-frame that contains a set of p-columns describing clonotyping information. This includes clonotype abundances, gene feature sequences (like CDR3), V/D/J gene calls, and other essential properties of the clonotypes. While the specific structure can vary slightly based on the sequencing protocol (e.g., bulk vs. single-cell), the core column definitions should follow the standards outlined in this guide. The language and definitions used here are heavily based on the output of the MiXCR tool.

  • Downstream blocks: Blocks that consume a standardized VDJ dataset and perform various downstream analyses, such as clonotype sequence clustering, antibody sequence liability prediction, diversity calculations, etc. By relying on a standard input format, these blocks can operate on the output of any compliant upstream block.

The importance of standardizationโ€‹

The primary goal of these standards is to create a robust framework for interoperability. When block developers follow these guidelines, users can build powerful and flexible analysis pipelines. For example, a user could swap out one upstream clonotyping tool for another, and all existing downstream blocks (like the standard Clonotype Browser) would continue to work without modification.

This documentation will provide comprehensive details on the expected structure of VDJ datasets for different data types and the inputs and outputs for common downstream applications.

๐Ÿ“„๏ธ Bulk Clonotyping

This document outlines the standard for VDJ datasets generated from bulk sequencing data. Upstream clonotyping blocks that process bulk FASTQ files (e.g. from targeted library sequencing or bulk RNA-Seq) should produce a p-frame containing the p-columns defined here. This ensures that downstream tools for analysis, visualization, and comparison can operate on a consistent and predictable data structure. See the P-frames and p-columns guide for more foundational information.