Thumbs up on HDFS. The next question to ask your groups how they will be analyzing it. HDFS (and Hadoop/Spark/Whatever) will hopefully fit in nicely there. Not only will your data be redundantly copied across multiple systems, but as your data needs (and cluster) grows, so does your computational power.
Getting data in & out can be done via Java API, Rest API, FUSE or NFS Mounts. The only issue is that HDFS doesn't play well with small files, but hopefully your groups will be using large files instead.
Now administration is another story, but then there's Cloudera's Manager that's supposed to greatly simplify management. I'm currently using it to store about
As far as backing up, HDFS provides snapshots, 3x replication (or more) across nodes in the cluster. Of course there's always the big hammer of just getting a second cluster. As an old HW sage once told me, "If you can't afford to buy two, don't buy one"