Catch up on stories from the past week (and beyond) at the Slashdot story archive


Forgot your password?

Comment Re:nanopore tech still has accuracy problems (Score 1) 33

Yes, absolutely. However, the nanopore sequencer has to have more than one limited-applicability advantage for it to be commercially successful against competitors. Just consider seriously for a minute what has actually been described (not hyped about) in this paper.
    1) A mobile lab in a suitcase including sequencer - yes, that's awesome
    2) Deployed to a region experiencing an outbreak
        - ok, can be useful, but how many outbreaks occur every year that actually benefit from on-site sequencing
        - in the case of Ebola, which spreads and mutates quickly, the advantage may be very real, but Zika? the flu? not so sure
        - is the advantage enough to offset the tremendous cost compared to alternatives?
    3) They did sequence a segment of the viral genome (not the whole genome) and successfully call base mismatches
        - but they didn't call indels
        - they ignored homopolymer regions and the ends of their amplicons
        - they did get some useful information, but there were samples that they couldn't successfully analyze after sequencing

So in the end, it is a sequencer that can be deployed to remote villages, provided you have a very limited set of analyses you intend to do, and you don't care about the cost. But is that enough to be commercially viable and displace competitors? I don't think so.

I'm not trying to rag on Oxford Nanopore, don't get me wrong. If they really could reliably sequence whole genomes fast and with minimal preparation from a usb stick, I would definitely jump on the bandwagon. I'm just tired of all the hype. They've been promising these breakthroughs for more than a decade now, but they have yet to deliver. Meanwhile other companies, namely PacBio, have appeared and been very successful at providing long reads at an affordable cost, so I'm not holding my breath for ONP.

Comment Re:nanopore tech still has accuracy problems (Score 1) 33

though if necessary you could presumably do many additional passes to bring the error rate down further.

In its present state, not really. The biggest problem with nanopore data right now is systematic errors in homopolymer regions. These can't be easily corrected out with higher coverage. Incidentally, some of the most significant mutation events are in homopolymer regions, so this is bad.

but it's more than sufficient to recognize a virus.

Correct. But you need to know more. In particular, which strain of virus? Strain variations can easily be much less than 4%.

but so long as you're taking many samples from a community you can probably make a pretty high-confidence conclusion about even SNPs

If the errors were mostly random, you are correct. That is the problem here, the errors are systematic, not random, which is why they can't be corrected out with higher coverage. The good news, though, is that if you are looking at other types of mutations, like inversions or repeat expansions, that are easier to identify than SNPs, the error rate is probably good enough.

Not perfect, but a huge improvement over simply guessing at the path of infection.

You don't have to guess. You just have to use a different sequencing technology. Almost every vendor is trying to provide a rapid sequencing service for this exact reason. Illumina has MiSeq (12-24 hrs. run time), and PacBio is always fast (run time ~3 hrs) as is Ion Torrent (run time ~2 hrs). The biggest advantage that ONP has is portability, but if you need a lab (and an Internet connection) anyway to process samples, I'm not sure that this will really play out to their favor in the long run. ONP gets a lot of awe and excitement, which leads to a lot of hype, but not a lot of practical advantages.

Comment Re:nanopore tech still has accuracy problems (Score 1) 33

Or maybe they're already doing that and accuracy plateaus at 96%

Yes, this is correct.

They're not trying to do genome-research class sequencing, they just need to identify the DNA strands of interest

Well, it does depend on what kind of downstream analysis they plan to do, but 96% is not great. That is 1 error per 25 bases. Good enough for alignment procedures to work, but definitely bad if you are looking for SNPs.

As one commenter on ONP has been stating for a while: what's the point of a portable sequencer if you have to haul around a full-size Illumina sequencer along with it to get the accuracy you need?

The nanopore's advantage in this example is the virus genome, which is a relatively small size, and a well-defined reference sequence. In its present state, the nanopore is mostly useless for larger, previously unsequenced, genomes on a cost/bp basis.

Comment Re:nanopore tech still has accuracy problems (Score 2) 33

> Base calling accuracy: up to 96%

Um, bullshit. See, this has been the problem with Oxford Nanopore since the beginning. They distract and confuse through a lot of misleading statements and media hype, which is why I can't trust any of their claims. The typical accuracy of single-pass 1D reads on real data is about 70%, about 80% on 2D reads. The 96% accuracy they are quoting on their site is after they error-correct the reads.

Comment Re:Very cool but has competition (Score 1) 33

An illumina can do many more samples in a single run, in batch. But you might not want to take it into the field and your latency would be higher since you would accumulate samples until you had enough to justify one run.

But they are still (probably) burning one nanopore per sample, so that's $900 per run (yeah, I'm sure ONP gave them a hefty discount). So the overall cost is much higher than with Illumina, but you are right about the latency. In cases where that matters, I would go with PacBio (cheaper and faster).

Another way this thing is superior is in read-length (50kbases)

Well, let's say "up to 50 kb". The average is a different story, especially if you need to get the higher-quality 2D reads for your downstream analysis. ONP has been promising >100 kb reads for a long time but have yet to deliver. Much better than Illumina, as you say, but not better than PacBio.

Comment Re:This breaks my brain. (Score 3, Informative) 33

I'm an outsider, so I've just gotta be misunderstanding something.

Well, like pretty much all press coverage of the Oxford Nanopore sequencer, there is a ton of hype but questionable value. I'll give the nanopore portability. It is an incredible feat compared to the large sequencers (even the benchtop MiSeq). But here's the thing:
    1) The accuracy is terrible. This is especially important when you are looking at SNP variants. You need accuracy.
    2) The sequencer may be portable, but data analysis in this version currently uses a cloud service that (obviously) requires an Internet connection, so I'm not sure the hype about service in rural areas is really that great.
    3) The throughput is ok, but not great. For virus genomes this might be fine, but for bacterial and larger genomes, it's a no go.
    4) The speed isn't all that great. It's around 24 hrs. to complete a 2D run. That is right in line with what is offered by other sequencers.
    5) Yes, you DO need a library prep, contrary to what the proponents claim. It might be a little bit simpler than some conventional protocols, but you cannot just drop DNA into the pore.

All of this, in my mind, comes down to two features that matter most for any sequencer: cost and speed.

The best cost/bp currently, by far, comes from Illumina technologies. This will never compete with that. That said, Oxford Nanopore has an advantage in read length that Illumina will never have. However, PacBio has been competing in the read length niche for a while now and is well-established. So is the cost of Oxford Nanopore better than PacBio? Cost is mostly a function of yield x read length. For PacBio, the cost of a sequencing unit (a SMRT cell) is ~$600 (the library prep cost is ~$400, but is a one-time cost for each sample). One SMRT cell yields ~0.5 - 1 Gb per run, so $0.0000006/bp (Note: this is with the older RS II system. With the newer Sequel system the throughput is better). The Oxford Nanopore site claims up to 1 Gb per chip, at $900/chip, but the reality is a bit less. Based on a recent paper where they assembled the E. coli genome with nanopore data, the proportion of actual usable data is closer to 150 Mb. So that's about 10 times the cost of PacBio sequencing.

The Oxford Nanopore site claims they are fast, but to get the higher quality 2D reads that you need for assembly, the run times are typically about 18-24 hrs. For a MiSeq, the run time can be as low as 12 hrs, and for PacBio it is 3 hrs. So the nanopore is not really winning with speed either.

It seems to me that portability is the biggest strength of the nanopore, but the majority of groups are still going to get their sequencing done at core facilities, so I have doubts about how that will play out in the market. What they really need to focus on is cost. But everyone is doing that at the same time, so it is a hard race to keep up with.

Comment Re:Because Docker uses a Linux container (Score 1) 131

FreeBSD Jails and Linux Containers are really different beasts. Jails are great if security is your primary consideration. Hence the name: Jails effectively isolate processes and go to great lengths to prevent them from accessing anything outside the jail. Containers use separate kernel namespaces to give groups of processes separate views of kernel global variables. Security (especially with user namespaces) is a bonus, but the primary goal is efficient os-level virtualization and isolation of resources. A more apt comparison is with the BSD VPS project rather than Jails.

At a guess the jail support didn't make it.

Correct. XNU does not have support for Jails, and it likely won't because it requires some pretty severe changes to kernel data structures to make them work.

Comment Re:RAID 0 is not for anything you don't want to lo (Score 2) 73

Ummm...ok. So when your SMART detects a failing drive in your RAID0 array and you decide you want to replace it, how do you do that exactly? Oh, that's right, wipe the entire array and restore from backup, which, depending on the size of your array can take anywhere from several hours to days, more if you decided to use your array to run the OS as well. RAID0 is just a plain terrible idea, period. It doesn't matter if you don't think you need uptime, an N disk RAID0 is N times more likely to fail catastrophically as a standalone hard disk (assuming the failure rates on all of the hard disks are equal), and without redundancy getting back up and running is a long process.

Comment Re:I've made my peace with systemd (Score 1) 242

Did you actually read that report? The very first comment:

The only way to deal with journal corruptions, currently, is to ignore them: when a corruption is detected, journald will rename the file to .journal~, and journalctl will try to do its best reading it.

Further down in comment #3, some more detail:

Journal files are mostly append-only files. We keep adding to the end as we go, only updating minimal indexes and bookkeeping in the front earlier parts of the files. These files are rotated (rotation = renamed and replaced by a new one) from time to time, based on certain conditions, such as time, file size, and also when we find the files to be corrupted. As soon as they rotate they are entirely read-only, never modified again. When you use a tool like "journalctl" to read the journal files both the active and the rotated files are implicitly merged, so that they appear as a single stream again.

Now, our strategy to rotate-on-corruption is the safest thing we can do, as we make sure that the internal corruption is frozen in time, and not attempted to be "fixed" by a tool, that might end up making things worse. After all, in the case the often-run writing code really fucks something up, then it is not necessarily a good idea to try to make it better by running a tool on it that tries to fix it up again, a tool that is necessarily a lot more complex, and also less tested.

Now, of course, having corrupted files isn't great, and we should make sure the files even when corrupted stay as accessible as possible. Hence: the code that reads the journal files is actually written in a way that tries to make the best of corrupted files, and tries to read of them as much as possible, with the the subset of the file that is still valid. We do this implicitly on every access.

(emphasis added)

So, in other words, there can be journal corruption, as there can be corruption in any data file. The way journald deals with that is by rotating the log, so that a new log can start in its place, and the contents of the old log preceding the corruption are preserved. The journalctl utility knows how to read the corrupted files and extract useful information from them.

Google shows up plenty with little effort.

Google shows up people ranting about something they don't understand. The way journald handles corruption is probably the best possible way any software can handle corruption.

Comment Re:Thank you. (Score 1) 112

But Perl, well, let me just leave this here, []
or Perl Jam 2, the Camel Strikes Back

You're kidding right? That presentation is based entirely on developers not sanitizing user input, from the Internet no less, which will of course lead to vulnerabilities. I don't buy that it is an inherent weakness of the language. If you blindly submit SQL queries passed around as variables, you deserve what you get. That is just really bad programming in any language. Did they do a PHP jam?

Comment Re:I've made my peace with systemd (Score 1) 242

They could have added binary logging to any existing syslogd. Instead, NIH, because Poettering.

If binary logging was the sole intention, then yes. But, as you know, the journal does a lot more than just a binary output of rsyslog, and would not be easily achieved by patching rsyslog. One can argue that those features are unnecessary, but that is a different subject.

The normal adoption process for something like systemd would be to roll a new distribution using it, and then once it is tested, debugged, and proven, for it to propagate throughout the community.

Well, systemd was being used by Fedora and Arch for quite some time. RHEL was on the way to adopting to it already. Ubuntu already had Upstart, so maybe they were forced into systemd by the upstream changes, but if you ask me, I think systemd is an improvement over upstart. It sounds to me that enough people liked it that distributions started adopting it. Maybe it was a faster change than you would have liked, but it is not a conspiracy. It comes down to who is doing the work to maintain the distributions and dependencies, and they decided that systemd was the way forward. An alternate way may be shown to us by Gentoo. We'll see, but I doubt there will be much traction.

Comment Re:I've made my peace with systemd (Score 2) 242

Basically, you need to be able to read the logs with the most minimal of tools, because you are going to be diagnosing it in a downed state most likely-- You cannot bank on having a full suite of binary manipulation tools on hand. You will be lucky if you have more than vi.

This is the biggest load of canned bullsh**t argument. Where do you guys come up with this crap? We don't work on 1970s mainframes anymore. Why would vi be the only thing available? Why would an emergency shell not have the journalctl utility? Why would there not be serial logging capability? Why would you not be able to boot with a rescue disk (in case all other things fail)? The answer is, there is no reason. It is a made up scenario that doesn't exist.

Also, text based logs compress REAALLY well for long term storage for audit purposes! Binary logs? Probably not so much.

More bulls**t. There is no reason to think the journal won't compress well. Being binary, or not, has little if anything to do with compressibility. Remember that ASCII is itself a binary format! Also, auditing is one of the primary strengths of journald. It is designed specifically with auditing in mind.

if the binary logs are in some stupid "easily damaged" format, then having a process suddenly die horribly from abnormal termination will result in corrupt logs

That makes no sense at all. A process dying does not result in a corrupt log. There would be an error message, or at the very least a sigterm message, printed in the log just as it would be on the console if it had been run from an interactive terminal.

Hell, the file chain can be damaged from FS corruption, and parts of the log will still be readable.

The journal is designed to preserve as much of the log as possible when corruption happens. Anything written before or after the corruption event will be accessible by journalctl.

Comment Re:I've made my peace with systemd (Score 2) 242

systemd notices that the log is corrupt, and... deletes it

That's not what happens.

See: the long and angry (and unfixed) bugzilla tickets.

Which one? The one that says journald will rotate the log on detection of corruption, thereby preserving it in an un tampered state while still allowing the uncorrupted portion to be read?

Comment Re:I've made my peace with systemd (Score 2, Insightful) 242

NFS filesystems fail to be mounted at boot. Why? I have no bloody clue. There's nothing in the logs.

Unlikely. If there was an error running the mount command, it was definitely recorded in the log. Did you use journalctl -u nfs or journalctl -b?

With sysvinit I could boot to an interactive shell in the late initramfs and step through every script by hand, and pinpoint exactly where something was failing

You can get a debug shell in systemd. The process is just a little bit different,

With systemd, even with correctly configured systems, I still experience occasional unrecoverable hangs at boot as it screws up its nondeterministic boot ordering and waits forever on something or other (who knows what it might be).

I'm sorry, but that's bulls**t. If your system is randomly hanging, then it's not configured correctly. While the boot sequence of systemd can be characterized as non-deterministic, it is not random. If services are hanging because dependencies have not been met, then you need to specify your dependencies properly.

Another thing, is the broken compatibility with what came before. Example: I edit /etc/nfsidmap.conf to configure NFSv4 id mapping. Previously I'd reload/restart the nfs-common service with 'service' or 'invoke-rc.d'. Job done. What's the systemd equivalent?

Since you should be using a >3.5 kernel, nothing. The rpc idmapper has been replaced with the nfsidmap system call in the kernel. So there is no need to start/restart an idmapper service.

Yes, it's partly me lacking familiarity with the new way of doing things;

Understatement. Have you read any of the systemd docs or transition/setup guides available? There are a ton. Just do it.

Slashdot Top Deals

Mathematics is the only science where one never knows what one is talking about nor whether what is said is true. -- Russell