Want to read Slashdot from your mobile device? Point it at m.slashdot.org and keep reading!

 



Forgot your password?
typodupeerror
×

Comment Re:Moore's law is too slow (Score 1) 126

Interesting. I view this from a completely different perspective: if DNA sequencing really is outpacing Moore's Law, that just means that the results become disposable. You use them for your initial analysis and store whatever summarized results you want from this sequence, then delete the original data.

If you need the raw data again, you can just resequence the sample.

The only problem with this approach, of course, is that samples are consumable; eventually there wouldn't be any more material left to sequence. So this wouldn't be appropriate in every situation.

Comment Re:Moore's law is too slow (Score 1) 126

I assume you're talking about incoming data, not the final DNA sequence. As I understand it the final result is 2 bits/base pair and about 3 billion base pairs so about a CD's worth of data per human. And if you were talking about a genetic database I guess 99%+ is common so you could just store a "reference human" and diffs against that. So at 750 MB for the first person and 7.5 MB for each additional person I guess you could store 2-300.000 full genetic profiles on a 2 TB disk. Probably the whole human race in less than 100 TB.

The incoming data is image-based, so yes, it will be huge. Regarding the sequence data: yes; in its most condensed format it could be stored in 750MB. There are a couple of issues that you're overlooking, however:
1. The reads aren't uniform quality -- and methods of analysis that don't consider the quality score of a read are quickly being viewed as antiquated. So each two bit "call" also has a few more bits representing the confidence in that call.
2. This technology is based on redundant reads. In order to get to an acceptable level of quality, you want at least ~20 (+/- 10) reads at each exonic loci.
So that 750MB you mention for a human genome grows by a factor of 20, then by another factor of 2 or 3, depending on how you store the quality scores.

Your suggestion of deduplicating the experiments could work, but definitely not as well as you think because of all the "noise" that's inherent in the above two steps.

If you really just wanted to unique portions of a sample, you could use a SNP array which just reads the samples at specific locations which are known to differ between individuals. Even with the advances in the technology, the cost of sequencing a genome still isn't negligible. For most labs, it's still cheaper to store the original data for reanalysis later.

Comment Working on Commission? (Score 1) 608

I often stumble across some product on Wikipedia that I'm interested in buying (album, book, etc.). I actually would find it very convenient if such pages had a "Purchase this Item" link. I'm sure Amazon would kick in a few million for that privilege, or you could use their pre-existing referral program. I think most users would view those links as added value to Wikipedia.

Comment Re:I think there is... (Score 1) 2

I think there is a software for that. But it may not be out there iin public. Maybe developers are just keeping them private. Nevertheless, there's a high probability that the software you will need already exists. what is more important is your ETA. So you may find other means. Laptop Troubleshoot Tips

Of course! I'll just fix their laptops! Why hadn't I thought of that?

Submission + - File concurrency solution for non-programmers? 2

jda104 writes: I work with a group of about a dozen "data analysts," most of whom have some informal programming experience. We currently have an FTP server setup for file/code sharing but, as the projects get more complicated, the number of outdated versions of code and data floating around among group members has become problematic; we're looking for a more robust solution to manage our files.

I see this as a great opportunity to introduce a revision control system, though there will surely be a bit of a learning curve for non-programmers. I've primarily worked with Subversion (+TortoiseSVN), but I would rather not have to spend my time manually resolving file conflicts and locking issues for each user and anything beyond commit, update, and revert (such as branching, merging, etc.) would probably not be used.

We're definitely not "software developers," but we write many Perl and R scripts to process datasets that can be many dozens of GBs. The group's personal machines are evenly split between Windows and Macs and our servers are all Linux, currently.

Is there a revision control system that "just works" — even for non-programmers? Or should we just head in a different direction (network share, rsync, etc.)?

Comment Re:NoSQL? Waittaminute (Score 2, Insightful) 280

A big-ass Oracle or IBM-DB2 can do the job if you pay enough for tuning.

Why is it that, ever since Key-Value DBs came into vogue, that relational databases instantly got perceived as so neanderthal?

A normal-ass Oracle database would surely be just fine for storing a no-fly list which, by necessity, has magnitudes of order less than 6.whatever billion names; I'm guessing it would do so without much tuning, also.

Comment Re:"Sue fucking everyone" (Score 1) 949

I'm not familiar with the intricacies of the Torrent protocols, but it seems like this group would either need to be in one of two groups:
A.) Connect to a swarm as a "spectator" not uploading or downloading any data.
B.) Connect to the swarm and actively upload/download.

If A., it seems like it would be hard to prove that any IP logged as participating in the swarm is actively engaged in any malicious behavior. If B., aren't they (the group) guilty of the same crimes of which they're accusing these other people?

I guess I just don't see how they could assure the courts of a crime being committed without having to participate in the exact same action in order to prove it.

Slashdot Top Deals

Work is the crab grass in the lawn of life. -- Schulz

Working...