Skip to main content

P-column specification

A p-column's structure is formally defined by its specification (PColumnSpec). This specification is the metadata that details the column's identity, the type of data it holds, its dimensional axes, and various other attributes for context and behavior.

While p-column specifications are always stored as JSON objects, we use TypeScript notation in this document for clarity and to illustrate the data structure.

A well-defined spec is crucial for interoperability between blocks, UI rendering, and discoverability.

Core data types

Before diving into the PColumnSpec itself, it's important to understand the primitive types used:

/** P-frame axes may store one of these types. */
type AxisType = 'Int' | 'Long' | 'String';

/** P-frame colums may store one of these types. */
type ValueType = 'Int' | 'Long' | 'Float' | 'Double' | 'String';
  • AxisType: Defines the allowed data types for the values of an axis. Axes can be integers (Int, Long) or strings (String).
  • ValueType: Defines the allowed data types for the values stored in a P-column. This includes all AxisTypes, plus single-precision (Float) and double-precision (Double) floating-point numbers. In practice, ValueType can also represent more complex types like 'File'.

PColumnSpec

The PColumnSpec interface includes the following key fields:

type PColumnSpec = {
/** The data type of the column's values. */
valueType: ValueType;

/** The formal name of the column, often namespaced. */
name: string;

/** An array of AxisSpec objects that define the dimensions or keys of the p-column. */
axesSpec: AxesSpec;

/**
* Adds auxiliary information to the name, type, and axes to form a unique
* identifier for the p-column. This is crucial for distinguishing p-columns
* that might otherwise seem similar.
*/
domain?: Record<string, string>;

/**
* Any additional information attached to the column that does not affect its
* identifier but provides context for UI rendering or workflow logic.
*/
annotations?: Record<string, string>;
}
  • name: A string identifier for the column, e.g., "pl7.app/vdj/readCount". It's a best practice to use namespaces (like pl7.app/vdj) to avoid collisions and provide context.
  • valueType: Specifies the data type of the values, e.g., 'Long', 'Double', 'String'.
  • axesSpec: Defines the keys of the p-column. See AxisSpec below.
  • domain: A map of key-value pairs that provides semantic context and ensures the p-column's uniqueness. For example, a column of sequences might have a domain specifying the feature and alphabet: {"pl7.app/vdj/feature": "CDR3", "pl7.app/alphabet": "aminoacid"}.
  • annotations: A map of key-value pairs for additional metadata, often used for UI hints (pl7.app/label), data lineage (pl7.app/trace), or biological interpretations. See the Standard Annotations documentation for a list of common annotations.

AxisSpec

Each axis in a P-column's axesSpec array is defined by the AxisSpec interface. Axes form the composite key that addresses data within the P-column.

/**
* Specification of an individual axis.
*/
type AxisSpec = {
/** The data type of the axis values. */
type: AxisType;

/** The formal name of the axis. */
name: string;

/**
* Adds auxiliary information to the axis name and type to form a unique
* identifier for the axis.
*/
domain?: Record<string, string>;

/**
* Any additional information attached to the axis that does not affect its
* identifier.
*/
annotations?: Record<string, string>;
}
  • name: The name of the axis, e.g., "pl7.app/sampleId".
  • type: The data type of the axis values, e.g., 'String'.
  • domain: Similar to the PColumnSpec domain, this provides semantic context for the axis.
  • annotations: Provides UI hints and other metadata for the axis.

Identifiers and Compatibility

Two of the most important concepts underpinning the p-frame system are how p-columns are uniquely identified and how their compatibility is determined. These concepts are what enable robust data querying, joining, and interoperability between blocks.

The Unique Identifier

The unique identifier for any p-column or axis is not just its name, but a composite key formed by three fields from its specification:

  • name: The base name for the entity (e.g., pl7.app/vdj/sequence).
  • valueType (for p-columns) or type (for axes): The data type (e.g., String, Long).
  • domain: A map of key-value pairs that provides fine-grained semantic context.

This composite key is what the Platforma system uses to distinguish one p-column from another. It is crucial to design the domain carefully to avoid collisions. For example, two different sequence p-columns could share the name pl7.app/vdj/sequence, but be distinguished by their domains: one might have {"pl7.app/alphabet": "aminoacid"} and the other {"pl7.app/alphabet": "nucleotide"}.

The annotations field, by contrast, contains auxiliary metadata that does not affect the identifier. Its purpose is to provide UI hints, human-readable labels, and other information that can change without altering the fundamental identity of the p-column.

Axis and Column Compatibility

Compatibility is the mechanism that allows the system to understand relationships between different p-columns, which is essential for querying (e.g., with ctx.resultPool.getOptions) and for joining data in visualizations.

The core rule is as follows: An axis B is considered compatible with an axis A if the domain of axis B is a superset of the domain of axis A.

More formally, for axis B to be compatible with axis A:

  1. They must have the same name and type.
  2. Every key-value pair in axis A's domain must also be present in axis B's domain. Axis B can have additional "qualifying" key-value pairs in its domain, making it more specific.

Think of it like this: if axis A represents a general concept like "mass," its compatible descendants can represent more specific kinds of mass, but not something completely different.

  • Compatible: An axis for "mass in kilograms" is compatible with a general "mass" axis. A "mammalian gene ID" axis is compatible with a general "gene ID" axis.
  • Incompatible: An axis for "mass in kilograms" is incompatible with an axis for "number of books". They represent fundamentally different concepts.

This system of compatibility allows downstream blocks to query for data in a flexible way. A block can ask for a general "gene sequence" and receive any p-column that meets that basic definition, regardless of additional qualifications in its domain (like the specific gene feature, e.g., CDR3). This makes the system extensible and robust.

Real-world example

Here is a real-world example of a PColumnSpec for a "clonotype read abundance" p-column, which stores the number of reads for a given clonotype within a specific sample.

The spec is shown here in YAML format for readability.

name: pl7.app/vdj/readCount
valueType: Long
axesSpec:
- name: pl7.app/sampleId
type: String
annotations:
pl7.app/label: Sample ID
- name: pl7.app/vdj/clonotypeKey
type: String
domain:
# This domain specifies what makes up the clonotype key
"pl7.app/vdj/clonotypeKey/structure": '["nSeqCDR3","bestVGene","bestJGene"]'
annotations:
pl7.app/label: Clonotype ID
annotations:
pl7.app/label: Read Count
pl7.app/isAbundance: "true"
pl7.app/abundance/unit: reads
pl7.app/abundance/normalized: "false"
pl7.app/table/orderPriority: "90000"
pl7.app/table/visibility: "default"

This example defines a p-column that maps a key composed of two axes to a single Long integer value.

  • axesSpec: This field defines the composite key.

    • The first axis, pl7.app/sampleId, identifies the sample.
    • The second axis, pl7.app/vdj/clonotypeKey, identifies the specific clonotype. Its domain further specifies that the key is composed of a CDR3 nucleotide sequence and V/J gene hits, making the axis definition very precise.
    • Therefore, each value in this p-column represents the abundance of one clonotype in one sample.
  • annotations: These provide rich metadata for the p-column's values.

    • pl7.app/label: Tells the UI to display this column with the header "Read Count".
    • pl7.app/isAbundance, pl7.app/abundance/unit: These provide deep semantic context, indicating that this column represents an abundance measured in reads.
    • pl7.app/table/orderPriority, pl7.app/table/visibility: These give direct instructions to the UI on how to display this column in a table by default.

See also

  • Standard Annotations: Comprehensive reference of all standard annotations used in p-column specifications
  • VDJ Naming Conventions: Detailed reference for VDJ-specific column names, axes, and domains
  • Other SDK documentation pages describe domain-specific naming conventions and column definitions for their respective areas (RNA-seq, single-cell analysis, etc.)