Backups
The overall data the Platforma Backend works with can be split into 2 groups:
- The data kept on external storages. All that is managed by Data Controller. This data can be really huge, and can be stored in various storages (say, S3, network FS, local RAID and so on). The backup ploicy on it is entirely your choice: the most important thing you need to know is that everything produced by workflows started by Platforma finally reaches the 'primary' storage. If you backup 'primary' storage, you save most of the data. You still may loose data, that was produced by workflows and was not transferred to 'primary' storage yet, but it is not dangerous, as the data is easily reproducible: just re-run project's block and wait.
- The state of the Platforma backend. All the projects, created by users, all connections between blocks inside projects, the references to the data inside external storages, samples metadata, charts configuration options and many-many more. The thing, that binds everything managed by Platforma into a system, that Users can see and use. The loss of The State usually means the loss of everything managed by Platforma, as there is no other place where Projects with all their settings are stored. The project + initial project data (sequences, CSV with metadata and so on) is the combination that allows to restore all project results by re-running blocks.
Saving initial data for Projects.
Every Project has some initial data as its input: sequences, metadata tables and so on. Something that Blocks will work with to produce analysis results: charts, tables and other things.
As the sources of this data may vary a lot there is no single good advice on how to do that. Say, some data can be stored in corporate library that is attached to Platforma, some can be directly uploaded to primary storage from user's laptop and so on.
All user's data direct uploads are saved to 'primary' storage. You can backup 'primary' storage's data and this will definitely save all required data, but it will also take something that can be easily generated by blocks once again, increasing thus the size of the backup putting 'waste' into it. If you are not sure whether your users have safe storage for their original files and the only source of the data might be the Platforma - then backup the 'primary' storage. If not - the decision is the trade-off between the waste of storage resources on additional backups of 'primary' storage, or the waste of computation resources that will be spent on re-calculation of all the lost results, if the 'primary' storage dies.
Saving The State of Platforma Backend.
...