scRNA-seq Analysis
Single-cell RNA sequencing (scRNA-seq) enables high-resolution profiling of gene expression at the level of individual cells, providing powerful insights into cellular diversity, lineage relationships, and dynamic biological processes. This workflow serves as a comprehensive guide for analyzing scRNA-seq data, starting from raw sequencing reads or preprocessed count matrices and extending through advanced downstream analyses.
Key steps include processing raw FASTQ files using pipelines such as Cell Ranger for 10x Genomics data, or importing count matrices from various formats. Following data import, rigorous quality control, normalization, and batch correction are performed to ensure comparability across samples and conditions. Dimensionality reduction techniques such as PCA, UMAP, and t-SNE are then used to visualize cellular heterogeneity, followed by Leiden clustering to identify distinct cell populations.
Subsequent analyses focus on cell type annotation using marker genes or reference datasets, and differential expression analysis to detect genes defining clusters or conditions. Results from differential expression can be further explored through functional enrichment analyses (e.g., pathway and GO analysis) adapted from the bulk RNA-seq pipeline. The workflow also supports compositional analysis to assess changes in cell-type proportions across conditions and pseudotime trajectory inference to study dynamic processes such as differentiation or activation.
By following this workflow, you can derive a detailed and biologically meaningful understanding of cellular heterogeneity and dynamics within single-cell RNA-seq datasets.
๐๏ธ Import raw FASTQ files and metadata
The first step in any Platforma analysis is to import your data. This involves two key parts: importing your raw sequencing files (e.g., FASTQ) and associating them with your experimental metadata (e.g., sample, treatment, genotype).
๐๏ธ Importing Pre-processed scRNA-seq Data
While Platforma offers powerful tools for processing raw sequencing data (FASTQ), you can also start your analysis directly with pre-processed gene expression matrices. This is ideal if you have legacy data, public datasets, or outputs from other pipelines like Cell Ranger, Seurat, or Scanpy.
๐๏ธ Run Cell Ranger for 10x dataset
Cell Ranger is the standard bioinformatics pipeline from 10x Genomics used to process raw sequencing data from their single-cell platforms. It handles demultiplexing, alignment, barcode/UMI processing, and cell calling to generate the feature-barcode matrices and preliminary clustering results that are the starting point for all downstream scRNA-seq analysis.
๐๏ธ Dimensionality reduction and batch correction
After processing your raw data with the Cell Ranger block, you have a massive dataset. For each of your thousands of cells, you have a measurement for over 20,000 genes. This is called "high-dimensional data," and it's impossible for a human to visualize or find patterns in 20000 dimensions.
๐๏ธ Leiden Clustering
In the previous step, we used Dimensionality Reduction to visualize our cells on a 2D map (UMAP and t-SNE). We could see "islands" of cells that looked similar.
๐๏ธ Cluster Markers (DEGs)
In the previous steps, we visualized our cells (Dimensionality Reduction) and grouped them into "clusters" (Leiden Clustering). This gave us groups named CL-0, CL-1, CL-2, etc.
๐๏ธ Compositional Analysis: Comparing Cluster Proportions
So far, we have processed our data, visualized it (UMAP/t-SNE), and grouped our cells into clusters (CL-0, CL-1, etc.). We may even have a good idea of their cell types from the marker genes.