Skip to main content

XSV Conversion Specification

The Platforma SDK provides powerful tools for converting between p-frames and flat file formats like CSV or TSV (XSV). This is handled by the xsv and pframes libraries, which wrap a powerful pframes-conv command-line tool. This document details the specifications for xsv.importFile and the XSV file builders, complete with real-world examples.

xsv.importFile

This function imports a tabular data file and converts it into a p-frame. The conversion is controlled by a comprehensive spec object.

resultsPf := xsv.importFile(outputFile, "tsv", {
// ... spec object ...
}, { splitDataAndSpec: true })

Top-Level Import Parameters

KeyTypeDescription
separatorstringSingle ASCII character for a separator. Defaults to , for CSV and \t for TSV.
commentLinePrefixstringIf a row begins with this character, it will be skipped.
skipEmptyLinesbooleanIf true, empty lines are skipped. Default: false.
allowColumnLabelDuplicatesbooleanIf true, duplicate column names are resolved with sequential suffixes. Default: true.
axesarray(Required) Defines how XSV columns map to p-frame axes.
columnsarray(Required) Defines how XSV columns map to p-frame data columns.
storageFormatstringStorage format for the p-columns ("Binary" or "Json"). Default: "Binary".
partitionKeyLengthnumberNumber of initial axes to use for partitioning. See "Data Partitioning". Default: 0.

The axes and columns Arrays

These arrays define how to map columns from the XSV file.

KeyTypeDescription
columnstringThe name of the column in the XSV file to use.
filterOutRegexstringA regex pattern to filter out rows based on this column's value.
preProcessarraySteps to transform the value (e.g., regexpReplace).
naRegexstringA regex pattern to identify NA values.
allowNAbooleanIf false, an NA value will cause an error. Default is false for axes, true for columns.
specobjectThe PAxisSpec or PColumnSpec for the new axis or column.

Data Partitioning

Partitioning creates file-based p-columns to improve performance on large datasets. It is controlled by partitionKeyLength, which specifies how many of the first axes in the axes array are combined to form the partition key.

Real-World Example: Parsing Clustering Results

This example from the clonotype-clustering block shows how to parse a TSV file containing cluster abundance data into a p-frame.

// From: blocks/vdj/clonotype-clustering/workflow/src/main.tpl.tengo

// The 'abundances' variable holds a file resource for a TSV with columns:
// "sampleId", "clusterId", "abundance"

// Get the spec of an existing axis from the input bundle to reuse it.
sampleIdAxisSpec := args.bundle.getSpec("main").axesSpec[0]

// Define the spec for the new clusterId axis we are creating.
clusterIdAxisSpec := { name: "pl7.app/vdj/clusterId", type: "String", ... }

abundancesPf := xsv.importFile(abundances, "tsv", {
// 1. Define the axes for the new p-frame.
axes: [
// Map the "sampleId" column from the TSV to an axis, reusing an existing spec.
{ column: "sampleId", spec: sampleIdAxisSpec },
// Map the "clusterId" column from the TSV to our newly defined axis.
{ column: "clusterId", spec: clusterIdAxisSpec }
],
// 2. Define the p-columns for the new p-frame.
columns: [
// Map the "abundance" column from the TSV to a new p-column.
{
column: "abundance",
spec: {
name: "pl7.app/vdj/readCount", // Use standard name
valueType: "Long",
annotations: { "pl7.app/label": "Abundance in cluster" }
}
}
]
}, { splitDataAndSpec: true }) // Get a nested map for easier iteration later

Controlling Output Structure with splitDataAndSpec

The optional ops object in xsv.importFile allows you to control the structure of the returned p-frame object using the splitDataAndSpec flag.

  • splitDataAndSpec: false (Default): Returns a "flat" map where each p-column is split into two separate entries, one for its data and one for its specification. The keys are concatenated with .data and .spec.

    // Example flat p-frame with one column named "abundance"
    flatPf := {
    "abundance.data": [ ...data resource... ],
    "abundance.spec": { ...spec object... }
    }
  • splitDataAndSpec: true: Returns a "nested" map where each entry corresponds to a single p-column. The value is an object containing both the data and spec. This format is often much easier to work with, especially when you need to iterate over the columns.

    // Example nested p-frame with one column named "abundance"
    nestedPf := {
    "abundance": {
    "data": [ ...data resource... ],
    "spec": { ...spec object... }
    }
    }

    // This structure makes iteration straightforward:
    for colName, col in nestedPf {
    // Now you can access col.data and col.spec directly
    }

Creating XSV Files

To create an XSV file from existing p-columns (e.g., to prepare an input for a tool), use the pframes.tsvFileBuilder() or pframes.csvFileBuilder() utilities.

xsvFileBuilder Methods

MethodDescription
.add(pCol, ops)Adds a p-column to the builder. ops can specify a custom header.
.setAxisHeaderSets a custom header for an axis that will be implicitly included.
.build(params)Builds the final file resource. params can specify joinType (Inner/Left).

Real-World Example: Preparing Input for Clustering

This example from clonotype-clustering prepares a simple two-column TSV file to be used as input for the mmseqs2 clustering tool.

// From: blocks/vdj/clonotype-clustering/workflow/src/main.tpl.tengo

// Get the input dataset's main axis spec.
datasetSpec := columns.getSpec(args.datasetRef)

// 1. Create a builder for a TSV file.
seqTableBuilder := pframes.tsvFileBuilder()

// 2. Set the header for the clonotypeKey axis, which will be implicitly added
// when we add the sequence columns.
seqTableBuilder.setAxisHeader(datasetSpec.axesSpec[1].name, "clonotypeKey")

// 3. Add each sequence p-column specified by the user to the builder,
// giving each a unique header like "sequence_0", "sequence_1", etc.
for nr, seqRef in sequencesRef {
seqTableBuilder.add(columns.getColumn(seqRef), {header: "sequence_" + string(nr)})
}

// 4. Build the TSV file resource.
// The resulting file will have columns: "clonotypeKey", "sequence_0", ...
seqTable := seqTableBuilder.build()

This demonstrates how the builder can combine axes and values from multiple p-columns into a single, flat tabular file.