Annotating Natural Antibody Libraries

After importing your raw sequencing data, the next critical step is annotation. This process transforms your FASTQ files into structured clonotype tables by:

Aligning reads to reference V, D, J, and C genes.
Assembling clonotypes and correcting for sequencing and PCR errors.
Identifying key antibody features like CDRs and FRs.
For single-cell data, pairing heavy and light chains from the same cell.

This workflow applies to all natural antibody libraries, including high-throughput single-cell V(D)J experiments and bulk sequencing.

The powerful and versatile MiXCR Clonotyping block is the standard tool for this task in Platforma. To ensure the documentation is always current, we maintain a single, comprehensive guide for this block.

➡️ Proceed to the main guide: Clonotyping Analysis with MiXCR

This guide provides a detailed walkthrough of the MiXCR Clonotyping block, including parameter settings and interpretation of the results. Once you have annotated your sequences, you can proceed with downstream analysis like clustering and enrichment.

Using presets for your library type

The MiXCR Clonotyping block uses presets to automatically configure the analysis pipeline for your specific library preparation protocol.

For most common commercial kits, such as those from 10x Genomics, Takara Bio (SMARTer), or New England BioLabs, you can simply select the appropriate built-in preset from the dropdown menu in the block's interface. This handles all the complex configuration for you.

Advanced: Creating a custom preset

While built-in presets cover many standard commercial kits, you may need a custom preset if you are working with a non-standard library. This is common in antibody discovery, where research often involves innovative, in-house protocols.

You will need a custom preset if you are using:

An in-house or novel library preparation protocol.
A unique barcoding scheme (e.g., custom UMI or cell barcode structures).
A species not covered by default libraries.
A protocol requiring fine-tuned analysis parameters (e.g., for detailed somatic hypermutation analysis).

Creating a custom preset is a systematic process. Here’s a step-by-step guide to crafting your own.

Step 1: Deconstruct your sequencing library

Before writing any configuration, you need to understand the architecture of your sequencing library. This information will guide your preset configuration.

Here is a checklist of key biological and technical questions:

Starting material: Is your library built from RNA (cDNA) or genomic DNA (gDNA)? For most antibody repertoire studies, the starting material is RNA, as it represents the expressed antibody genes.
5' end strategy: How was the 5' end of the transcript captured? This is critical, as it determines how the V-gene is sequenced.
- 5' RACE with template switching: This method captures the full-length transcript, including the 5' UTR. It results in a known adapter sequence at the 5' end, which requires a rigid-left-alignment-boundary in MiXCR.
- Multiplex PCR with V-gene primers: This uses a pool of primers that bind to different V-genes. Because the exact binding site varies, the 5' start of the read is variable, requiring a floating-left-alignment-boundary.
Barcode structure: Where are the Unique Molecular Identifiers (UMIs) and cell barcodes located? Are they in Read 1, Read 2, or a separate index read? Note their exact lengths and any surrounding adapter sequences. This information is essential for building the tag-pattern to ensure accurate clonotype quantification.
Analysis goal: What is the biological question you are asking?
- Clonality analysis: If you are focused on identifying unique clones and their frequencies, assembling by CDR3 is the standard approach.
- Somatic hypermutation (SHM) analysis: If you want to study mutations across the entire V-gene, which is common in affinity maturation studies, you must assemble by the full VDJRegion.

Step 2: Find and download a base preset

It's best to start from an existing preset that is closest to your protocol. This avoids having to write a preset file from scratch.

Find a suitable base preset: The easiest way is to browse the built-in presets on the MiXCR GitHub repository. Look for a .yaml file with a name that closely matches your library type. For a novel UMI-barcoded, multiplex PCR protocol for human BCRs, a preset like milab-human-rna-bcr-umi-multiplex.yaml would be a good starting point. If you can't find a close match, generic-amplicon.yaml is a good universal template.
Download and save the preset: Download the .yaml file and save it on your computer as my_custom_preset.yaml. This file will be your starting template.

Step 3: Modify the YAML template

Open your downloaded my_custom_preset.yaml file in a text editor to fine-tune the parameters. Since you started from a generic template, you will likely need to manually set some of the key options that describe your library.

Key pipeline parameters

At the top of the align step in the file, you'll need to add or modify several parameters based on your library deconstruction from Step 1. For example, for a human RNA library using 5' RACE, you would ensure the following parameters are set:

align:
  species: hsa
  rna: true
  rigidLeftAlignmentBoundary: true
  floatingRightAlignmentBoundary: C
  ...

Key parameters for antibody libraries

Gene features for alignment: For 5' RACE protocols on RNA, it's critical to align to the full transcript to capture the 5' UTR and leader sequence. This improves V-gene identification, especially for closely related genes. In the align step, set:

vParameters:
  geneFeatureToAlign: VTranscript

Barcode parsing (tagPattern): This is one of the most important and complex parameters for barcoded libraries. It tells MiXCR how to extract UMIs and cell barcodes from the raw reads.

For example, consider a library where Read 1 starts with a 16 bp cell barcode and an 8 bp UMI, and Read 2 contains the rest of the sequence. The tagPattern in the align step would be:

tagPattern: "^(CELL:N{16})(UMI:N{8})(R1:*) \\ ^(R2:*)"

Assembling feature: To analyze somatic hypermutation across the entire V-gene, you need to assemble clonotypes by the full VDJ region. In the assemble step, set:

assemblingFeatures: VDJRegion

Examples of custom presets

Here are two examples of what a final my_custom_preset.yaml might look like for common antibody discovery scenarios.

Example 1: Bulk VHH library from immunized llama (Multiplex PCR)

This preset is for a simple bulk library where VHH sequences were amplified from RNA using a mix of V-gene and J-gene primers. The goal is to identify unique CDR3 sequences.

rna: true and floating...Boundary settings match the multiplex PCR protocol.
assemblingFeatures: CDR3 tells MiXCR to define clones by their CDR3 sequence.

# Custom preset for a bulk llama VHH library (RNA, multiplex PCR)
steps:
  - align:
      # Key pipeline parameters
      species: llama # Or another appropriate species
      rna: true
      floatingLeftAlignmentBoundary: true
      floatingRightAlignmentBoundary: J

      # Key parameters for antibody libraries
      vParameters:
        geneFeatureToAlign: VRegion
  - assemble:
      # Key parameters for antibody libraries
      assemblingFeatures: CDR3
  - exportClones: {}

Example 2: Single-cell human antibody library (5' RACE with barcodes)

This preset is for a more complex single-cell library. It assumes a 16 bp cell barcode and a 10 bp UMI at the start of Read 1. The goal is to assemble full-length, paired VH/VL sequences.

rigidLeftAlignmentBoundary and vParameters.geneFeatureToAlign: VTranscript are set for the 5' RACE protocol.
The tagPattern is defined to extract the cell barcode and UMI.
The pipeline includes refineTagsAndSort and assemblePartial to process the barcodes.
assemblingFeatures: VDJRegion is used to reconstruct the full-length VDJ sequence for SHM analysis.

# Custom preset for a single-cell human library (RNA, 5' RACE, UMI/CB)
steps:
  - align:
      # Key pipeline parameters
      species: hsa
      rna: true
      rigidLeftAlignmentBoundary: true
      floatingRightAlignmentBoundary: C

      # Key parameters for antibody libraries
      tagPattern: "^(CELL:N{16})(UMI:N{10})(R1:*) \\ ^(R2:*)"
      vParameters:
        geneFeatureToAlign: VTranscript
  - refineTagsAndSort: {}
  - assemblePartial: {}
  - assemble:
      # Key parameters for antibody libraries
      assemblingFeatures: VDJRegion
  - exportClones: {}

Step 4: Use the preset in platforma

Once your my_custom_preset.yaml file is ready, you can use it in the MiXCR Clonotyping block:

In the block settings, select Preset From File.
Upload your custom .yaml file.
Configure the remaining settings and run the block.

The initial run should be on a small subset of your data to verify that the preset is working correctly. Check the alignment report from MiXCR for any warnings or errors.

Getting help

Creating a custom preset can be challenging. If you run into issues, you can always ask for help from the community at community.platforma.bio.

Using presets for your library type​

Advanced: Creating a custom preset​

Step 1: Deconstruct your sequencing library​

Step 2: Find and download a base preset​

Step 3: Modify the YAML template​

Step 4: Use the preset in platforma​

Getting help​