(Disclaimer: I am an Arvados developer)
The Arvados project is a free and open source (AGPLv3 and Apache v2) bioinformatics platform for genomic and biomedical data, designed to address precisely the issues raised in this article. Arvados features a 1) content addressed filesystem (blocks are addressed by a hash of their actual content rather some arbitrarily assigned identifier) which performs end-to-end data integrity checks , 2) fine-grained access controls, 3) a cluster scheduling system that tracks the input and output results of every job (enabling you to track processing pipelines and establish data provenance), and 4) data replication by default. Arvados is developed and commercially supported by Curoverse which is 100% committed to free software (in fact, one of the founders is a former employee of the Free Software Foundation.) I encourage slashdotters in the bioinformatics, big data, or data archiving space to come check it out and join the community.