Packing software for Platforma

Platforma allows you to package external software (binaries, Python scripts, R scripts, Java applications, Conda environments) and make them available for execution in block workflows. This guide covers the essential concepts you need to understand before packing your first software.

Overview

Platforma software is defined as a Node.js package (npm package). The package contains metadata and configuration that describes how to build, package, and execute your software. The actual software binaries (and containers) are stored in separate registries and downloaded by Platforma backend when needed. They are called 'artifacts'

The key thing that describes the software for Platforma resides inside package.json of mentioned node package.

Special tool @platforma-sdk/package-builder reads configuration from your package.json file and packs the software.

Software in Platforma is delivered in two formats:

Binary archives — OS and architecture-specific packages containing your software binaries, scripts, and dependencies.
Docker images — OCI-compatible container images.

Both formats are built and published by the @platforma-sdk/package-builder tool, which reads configuration from your package.json file.

Configuration Structure

Software configuration lives in the block-software section of package.json:

{
  "name": "<npm package name>",
  "version": "<npm package version>",
  "files": [ "dist" ],
  "block-software": {
    "entrypoints": {
      "main": {
        "conda": {
          "artifact": { "...": "..." },
          "cmd": ["command", "to", "run"]
        }
      }
    }
  }
}

Software Entrypoints

Software is organized into entrypoints — named interfaces that define how to execute your software. Each entrypoint specifies:

Artifact configuration — Where to find files that should be delivered to the server side (and executed by Platforma Backend)
Command — The base command to run (workflow can append additional arguments to it)

To reference specific software entrypoint from block's workflow, full entrypoint ID is used: <package-name>:<entrypoint-name>. For example:

@platforma-open/milaboratories.software-conda-anarci:main

All entrypoint definitions are located inside block-software -> entrypoints section of package.json file. Each entrypoint is indexed by its name and contains two parts:

artifact: software package to deliver to server side;
cmd: command to run on server side to start the software.

Here is the example of entrypoint structure:

{
  "block-software": {
    "entrypoints": {
      "<entrypoint name>": {
        "<type of entrypoint: 'binary', 'docker', 'conda', ...>": {
          "artifact": { "<where to get content for software package>" },
          "cmd": ["command", "to", "run"]
        }
      }
    }
  }
}

Artifact definition

Artifact, referenced by entrpoint, is the configuration that describes how to pack the software to deliver it to remote server when it needs to run a workflow, described by block.

The structure of artifact definition depends on type of software, that is delivered by entrypoint: 'binary' software has one set of options, 'conda' and 'docker' - another.

The main idea of artifact is to tell @platforma-sdk/package-builder what exactly should be delivered to remote server: where to get compiled binaries for particular operating system, what dependencies to install for python script and so on.

OS and architecture support

There are plenty of architectures and operating systems in the modern world. Platforma does not suppport all of them, but enough to cover most of real-world applications:

linux-x64, linux-aarch64
macosx-x64, macosx-aarch64
windows-x64

Some languages (Java, Python, R) are isolated from OS and architecture specificity, which allows them to run in almost any environment.

Other languages (like C, C++, Go) do not have such freedom out of the box and need to be compiled into executable binaries to become runnable on particular computer.

Software, written on such languages, may support only few of listed operating systems and architectures. Like CellRanger, that is officially available only for Linux x64 platform.

When looking into configuration of artifact inside package.json file, @platforma-sdk/package-builder tracks what OS and architectures are available for compilable software using special roots option in software definition, where applicable.

{
  "artifact": { 
    "roots": {
      "linux-x64": "./linux-x64",
      "macosx-x64": "./macosx-x64"
    }
}

Command Execution

The cmd array in package.json of the software defines the base command to execute. I.e.:

    "entrypoints": {
      "main": {
        "conda": {
          "...": "...",
          "cmd": ["ANARCI"]
        }
      }
    }

When your block workflow calls the software, it can append additional arguments using .arg():

sw := assets.importSoftware("@platforma-open/milaboratories.software-conda-anarci:main")

run := exec.builder().
    software(sw).
    arg("--version").  // call 'ANARCI --verison'
    run()

Build and Distribution

Each software preparation process contains 3 steps:

Build: Run pl-pkg build to create binary archives and Docker images
Publish: Run of pl-pkg prepublish and npm publish commands uploads archives, images and node package with metadata to appropriate registries.
Use: The npm package containinig metadata files is used by block workflows to run your software.

npm package itself contains only metadata, describing where to get the sofware for execution. Actual software binaries and docker images are stored in separate registries and downloaded by Platforma backend when needed.

Next Steps

Conda software — Learn how to package Conda-based software
Package Builder README — Detailed configuration reference

Overview​

Configuration Structure​

Software Entrypoints​

Artifact definition​

OS and architecture support​

Command Execution​

Build and Distribution​

Next Steps​