Remote Data Access Solutions? 54
magoldfish asks: "Our company has several terabytes of data, typically chunked into 2-6 GB files, that we need to share among users at sites around the US. Our users range in technical skill from novice to guru, data access needs to be secure, and we require client access from multiple platforms (Linux, Mac, and Windows). These files will likely be accessed infrequently, and cached on local hard drives by our power users. We've been running WebDAV, primarily to accommodate non-savvy users and guarantee access from firewalled environments, but the users are really complaining about download speed — maybe 50 KB/sec serving from a Mac. Any suggestions for better alternatives, or comments on the pros and cons of alternative access techniques?"
Infrequent access = Send out dvds (Score:3, Insightful)
Re: (Score:2)
While there is no denying the average bandwidth of a box of DVDs, there are alternatives to addressing the download speed. The 50Kb/sec download speed from their server is horrible no matter how you slice it -- far too many corporations serve up well over 100Kb/sec per client.
The problem is that high capacity solutions are not cheap, and I get the impression that there must not be the budget for those options, or no one would have deployed a 50Kb/sec server in the first place.
Aside from DVDs, you coul
How will the data be used? (Score:4, Insightful)
Uh ... VPN? (Score:2)
Re: (Score:1)
Alternative suggestion (Score:2, Funny)
make sure you use more than a 24bit key though
One word: (Score:5, Informative)
(As always, Google is your friend).
Wide Area File Services (Score:4, Informative)
Here's Cisco's version: WAFS [cisco.com]
Secure? (Score:2, Interesting)
dbms (Score:2)
well, (Score:1)
(This is just my view tho)
Profiling? (Score:4, Informative)
The idea is simple. Don't just go in and change stuff, first measure the pieces under typical load. Look where the bottle-neck is, address it, and move to the next bottle-neck. Repeat as often as needed.
Are you disk I/O bound? Buy faster disk / better controllers / spread the load over more machines /
Are you CPU bound? Is the CPU on your server spending so much time with I/O requests, that it has no cycles available to address additional requests? Buy more / faster / better CPUs.
Are you network bound? Which piece of the network is the hold-up? Your switch? Get a better / faster one. Your ISP? Get a fatter pipe.
Have you optimized all of these? What about setting up remote servers that are updated hourly/daily/weekly/whatever so the machine is close to the user network-wise for faster download speeds.
Some of the above adds complexity. Are you equipped to handle that complexity? Can you become equipped to handle it? If not, re-consider your options.
Hope this helps.
Cheers,
Dave
A really good point! (Score:3, Insightful)
While the answer may reside with any of the main themes recommended by responders (improving transfer, reducing the amount of data to transfer, and eliminating the need to transfer via remote desktop solutions such as Citrix, MS Terminal Services, and VNC), the questioner really needs to define his needs. Does the data really need to be local at e
Send out DVDs == Security Risk (Score:2)
Blu-Ray (Score:1)
Remote Desktop Solution(s) (Score:5, Interesting)
Re: (Score:2)
Re: (Score:2)
It's a lot cheaper to have people download the files from a $4,000 server and crunch them locally than have them connect to a $50,000 server and crunch them remotely at 1/10th the speed.
Re: (Score:2)
To determine if a remote desktop solution is the be
more variables than the machines (Score:2)
who says the 50k server has to be slower?
Move everyone (Score:1)
Depends on the situation (Score:3, Insightful)
Or, if you need it spread out for some reason, iFolder or rSync seem the best choices. However, you could also look at AFS.
Basically, you have to get the long haul data transfers down somehow, or else get faster connections.
I am reminded of the old saying... (Score:2)
Of course, the latency kind of sucks, but that doesn't seem to affect your requirements. And, these days, you're just as likely to pop it in a FedEx canister, and they don't use station wagons. But the saying still holds...
More of a question (Score:3, Insightful)
But how often do these files need updated? Is the end user in a read only situation? How infrequent is infrequent? How many users are you talking about and what's the density of these users? Even though the access is "infrequent" is this an access that modifies data that would have to be shared across your entire user base?
Your scenario needs some gaps filled in as far as the requirements. I see a lot of people suggesting large capacity medias being shipped to the user but if this data is being updated frequently this is not a solution. If you have a large number of users and the data is not updated often you would still have to judge the frequency of sending out updates to X number of users versus the costs of having the data centralized and the infrastructure upgraded to meet the needs of these users. If you need to share changes in this data from the user end then using a physical media via postal services is going to cause problems in the coordination of the correct version of these files to be used by the other end users. and god forbid you have several users who need to update this data in a short timeframe as you're going to have disks being mailed in by users not knowing what changes were made by other users. You'd doubtlessly fall into some kind of data integrity problem and even if your talking only having users update their data every few months your still going to need to hire someone on to coordinate these efforts and insure that all the end users are getting every update in a timely fashion.
Without more information it's hard to suggest something and still be confident that we're not leading you down to a solution that is completely inappropriate.
Your chunking appears to be a problem... (Score:3, Insightful)
Figure out how the data can be broken into smaller chunks and managed...that will probably indicate what sort of tech will enable things for you.
Data specifics (Score:2)
What you need to do depends a LOT on these things above the size of the data. If it's TBs of customer data, you probably want that somewhere secure and centralized, with stored procedures to query it and return subsets to your users. If it's not private data, why not let Google crawl and cache it and get
go with Novell (Score:1, Informative)
Re: (Score:1)
Look at Caymas (Score:2)
AOL? (Score:2, Funny)
BitTorrent? (Score:1)
Amazon S3 (Score:2)
One word (Score:2)
The client works on Mac, Linux and Windows, can be installed from and runs in a web browser, you need only enough bandwidth as a VNC connection and if your connection is interrupted for whatever reason, it will save your session state without borking whatever application you happened to be working in.
Warez (Score:1, Funny)
Use and OpenSource Distributed Storage Filesystem (Score:1)
Have a look at this article: http://www.linuxplanet.com/linuxplanet/reports/436 1/1/ [linuxplanet.com] then choose amongst the more mature projects: Coda http://coda.cs.cmu.edu/ [cmu.edu] and OpenAFS http://www.openafs.org/ [openafs.org]. Intermezzo looked promising but hasn't been updated in a long while so it's probably dead.
Hope this helps.
Compression? (Score:2)
A lot of people here have mentioned breaking up your data into smaller chunks, which is valid and first priority.
Have you also considered serving-up a compressed version of the data, say using a
Microsoft Server 2003 Enterprise + Terminal Server (Score:1)
Curious... just how many people are we talking about that need access to the data?
Re:Microsoft Server 2003 Enterprise + Terminal Ser (Score:3, Informative)
Terminal Server / Citrix / etc. (Score:2)
Constrained by file format? (Score:1)
For me it looks like you are dealing with some kind of media data (movies?). If you are constrained by file format and unable/unwilling to split those in smaller parts use local cache servers.
In each location provide small caching server that will rsync periodically to the main data source. Then tell users in each location to use the local server.
Remote Data Accesss (Score:1)
The method you seem to want to follow seems to make for a large amount of redundant data as well as being bandwidth consuming.
The package we provide allows user to securely log into a terminal server located at your main office (sometimes hosted by my company) and access a full desk top with nothing more then a web browser and Java installed on the computer.
This system is ideal as it removes the nee
Thought about Citrix? (Score:1)
Unfortunately, implementing Citrix can be a bit pricey especially if the apps your users will run are 'heavy' on resources - just means you'd have to build a bigger, meaner server ($$$!). Plus it will cost you recurring yearly licensing fees... But for accessibility for remote users, you'll be hard pressed to find a m
AFS away! (Score:1)
WAN Optimization / Acceleration (Score:2)
Firewall friendly (Score:2)
So like ... what are they doing with the data??? (Score:1)
If it's read-only, and it's statistics, why dont you implement an analysis system instead. Find out what the users are reading from this data, break-it up into logical truth tables and, if possible, implement and update an OLAP cube. Allow people to remote admin/citrix/terminal services into the data via a medium like Crystal reports, analysis services and excel, um um many other pieces of software...
A VPN with AFS. (Score:2)