XSV Conversion Specification

The Platforma SDK provides powerful tools for converting between p-frames and flat file formats like CSV or TSV (XSV). This is handled by the xsv and pframes libraries, which wrap a powerful pframes-conv command-line tool. This document details the specifications for xsv.importFile and the XSV file builders, complete with real-world examples.

`xsv.importFile`

This function imports a tabular data file and converts it into a p-frame. The conversion is controlled by a comprehensive spec object.

resultsPf := xsv.importFile(outputFile, "tsv", {
    // ... spec object ...
}, { splitDataAndSpec: true })

Top-Level Import Parameters

Key	Type	Description
`separator`	string	Single ASCII character for a separator. Defaults to `,` for CSV and `\t` for TSV.
`commentLinePrefix`	string	If a row begins with this character, it will be skipped.
`skipEmptyLines`	boolean	If `true`, empty lines are skipped. Default: `false`.
`allowColumnLabelDuplicates`	boolean	If `true`, duplicate column names are resolved with sequential suffixes. Default: `true`.
`axes`	array	(Required) Defines how XSV columns map to p-frame axes.
`columns`	array	(Required) Defines how XSV columns map to p-frame data columns.
`storageFormat`	string	Storage format for the p-columns (`"Binary"` or `"Json"`). Default: `"Binary"`.
`partitionKeyLength`	number	Number of initial axes to use for partitioning. See "Data Partitioning". Default: `0`.

The `axes` and `columns` Arrays

These arrays define how to map columns from the XSV file.

Key	Type	Description
`column`	string	The name of the column in the XSV file to use.
`filterOutRegex`	string	A regex pattern to filter out rows based on this column's value.
`preProcess`	array	Steps to transform the value (e.g., `regexpReplace`).
`naRegex`	string	A regex pattern to identify `NA` values.
`allowNA`	boolean	If `false`, an `NA` value will cause an error. Default is `false` for axes, `true` for columns.
`spec`	object	The `PAxisSpec` or `PColumnSpec` for the new axis or column.

Data Partitioning

Partitioning creates file-based p-columns to improve performance on large datasets. It is controlled by partitionKeyLength, which specifies how many of the first axes in the axes array are combined to form the partition key.

Real-World Example: Parsing Clustering Results

This example from the clonotype-clustering block shows how to parse a TSV file containing cluster abundance data into a p-frame.

// From: blocks/vdj/clonotype-clustering/workflow/src/main.tpl.tengo

// The 'abundances' variable holds a file resource for a TSV with columns:
// "sampleId", "clusterId", "abundance"

// Get the spec of an existing axis from the input bundle to reuse it.
sampleIdAxisSpec := args.bundle.getSpec("main").axesSpec[0]

// Define the spec for the new clusterId axis we are creating.
clusterIdAxisSpec := { name: "pl7.app/vdj/clusterId", type: "String", ... }

abundancesPf := xsv.importFile(abundances, "tsv", {
    // 1. Define the axes for the new p-frame.
    axes: [
        // Map the "sampleId" column from the TSV to an axis, reusing an existing spec.
        { column: "sampleId", spec: sampleIdAxisSpec },
        // Map the "clusterId" column from the TSV to our newly defined axis.
        { column: "clusterId", spec: clusterIdAxisSpec }
    ],
    // 2. Define the p-columns for the new p-frame.
    columns: [
        // Map the "abundance" column from the TSV to a new p-column.
        {
            column: "abundance",
            spec: {
                name: "pl7.app/vdj/readCount", // Use standard name
                valueType: "Long",
                annotations: { "pl7.app/label": "Abundance in cluster" }
            }
        }
    ]
}, { splitDataAndSpec: true }) // Get a nested map for easier iteration later

Controlling Output Structure with `splitDataAndSpec`

The optional ops object in xsv.importFile allows you to control the structure of the returned p-frame object using the splitDataAndSpec flag.

splitDataAndSpec: false (Default): Returns a "flat" map where each p-column is split into two separate entries, one for its data and one for its specification. The keys are concatenated with .data and .spec.
```
// Example flat p-frame with one column named "abundance"
flatPf := {
    "abundance.data": [ ...data resource... ],
    "abundance.spec": { ...spec object... }
}
```

splitDataAndSpec: true: Returns a "nested" map where each entry corresponds to a single p-column. The value is an object containing both the data and spec. This format is often much easier to work with, especially when you need to iterate over the columns.

// Example nested p-frame with one column named "abundance"
nestedPf := {
    "abundance": {
        "data": [ ...data resource... ],
        "spec": { ...spec object... }
    }
}

// This structure makes iteration straightforward:
for colName, col in nestedPf {
    // Now you can access col.data and col.spec directly
}

Creating XSV Files

To create an XSV file from existing p-columns (e.g., to prepare an input for a tool), use the pframes.tsvFileBuilder() or pframes.csvFileBuilder() utilities.

`xsvFileBuilder` Methods

Method	Description
`.add(pCol, ops)`	Adds a p-column to the builder. `ops` can specify a custom `header`.
`.setAxisHeader`	Sets a custom header for an axis that will be implicitly included.
`.build(params)`	Builds the final file resource. `params` can specify `joinType` (`Inner`/`Left`).

Real-World Example: Preparing Input for Clustering

This example from clonotype-clustering prepares a simple two-column TSV file to be used as input for the mmseqs2 clustering tool.

// From: blocks/vdj/clonotype-clustering/workflow/src/main.tpl.tengo

// Get the input dataset's main axis spec.
datasetSpec := columns.getSpec(args.datasetRef)

// 1. Create a builder for a TSV file.
seqTableBuilder := pframes.tsvFileBuilder()

// 2. Set the header for the clonotypeKey axis, which will be implicitly added
//    when we add the sequence columns.
seqTableBuilder.setAxisHeader(datasetSpec.axesSpec[1].name, "clonotypeKey")

// 3. Add each sequence p-column specified by the user to the builder,
//    giving each a unique header like "sequence_0", "sequence_1", etc.
for nr, seqRef in sequencesRef {
    seqTableBuilder.add(columns.getColumn(seqRef), {header: "sequence_" + string(nr)})
}

// 4. Build the TSV file resource.
// The resulting file will have columns: "clonotypeKey", "sequence_0", ...
seqTable := seqTableBuilder.build()

This demonstrates how the builder can combine axes and values from multiple p-columns into a single, flat tabular file.

xsv.importFile​

Top-Level Import Parameters​

The axes and columns Arrays​

Data Partitioning​

Real-World Example: Parsing Clustering Results​

Controlling Output Structure with splitDataAndSpec​

Creating XSV Files​

xsvFileBuilder Methods​

Real-World Example: Preparing Input for Clustering​