csirac - Slashdot User

Comment Re:Is the science repeatable? (Score 1) 69

by csirac on Thursday June 27, 2013 @06:59AM (#44120609) Attached to: 700,000-Year-Old Horse Becomes Oldest Creature With Sequenced Genome

Not to mention - imaging a planet doesn't affect the planet. Extracting DNA, without contamination is a huge challenge for ancient DNA. It's hilarious how many NCBI sequences of mammal specimens turn out to matches for fish or insects (lab assistant's lunch? Did a fly get smooshed into a vial?) etc. Even if you do successfully extract, isolate and amplify some ancient DNA, how do you know you amplified actual DNA of the specimen and not something living in it (nematode etc)? In any case, I was just speculating that the "6.8M" year figure was perhaps the limit for the stability of the basic chemicals making up the GATCs under ideal conditions. IIRC they very quickly loose their structure and loose context from their neighbours much more quickly than that though. Disclaimer: not a scientist :-)

Comment Re:Is the science repeatable? (Score 1) 69

by csirac on Thursday June 27, 2013 @06:51AM (#44120591) Attached to: 700,000-Year-Old Horse Becomes Oldest Creature With Sequenced Genome

I'd go for that. It doesn't seem implausible at all, and DNA is much more simple in construction than you might think - which gives fewer combinations but more tricky fitting together. Get enough fragments, though, and you can throw it through a computer and get something useful out of the other end.

But that's the whole problem! Doesn't matter if you image a lonely letter 'A' on a shred of paper in 72dpi, 300dpi, 60000dpi - it's still a letter A, and you're never going to know what its neighbours were :-) Imagine those 10,000 image sources you mentioned, imagine they're 10,000px each. But instead of working from whole frames neatly arranged into 10000 frames of 100x100 pixels, all you have are 100000000 apparently random, individual pixels. How would you begin the task of assembling them into a single picture? You can imagine that as you grow the fragment size into 2x2, 4x4, 10x10 etc etc. squares which randomly cover random different pieces of the subject, you can eventually come up with a single compelling assemblage with a strong consenus that "this is what the subject must have looked like". But if these fragments get too small, especially without any idea of what the subject should look like... you suddenly get a worthlessly large number of equally valid contigs.

Comment Re:Is the science repeatable? (Score 4, Interesting) 69

by csirac on Thursday June 27, 2013 @03:31AM (#44120061) Attached to: 700,000-Year-Old Horse Becomes Oldest Creature With Sequenced Genome

To cut a long story short, at "6.8 mllion years old" I assume they mean "the longest read (maximum number of consecutive GATC 'letters' in a row) you're possibly going to get is one". Imagine having a pile of letters which were once arranged into the collective works of William Shakespeare: could you re-assemble the original work? No. But what if you had 4-letter fragments? You might be able to learn something about the english language, indirectly, but you probably won't be able to reverse-engineer the complete original work. Now what if you had slightly longer fragments? That would help. What if the garbled pile of letters/fragments actually consisted of multiple, similarly (randomly!) shredded copies of Shakespeare? Well, as long as they're randomly fragmented in different ways - you can imagine that where we guess two fragments might join each other, if we have a fragment from that same region from another copy wich spans that join - we can become more and more confident about forming a plausible assembly. So we can take advantage of this redundancy and randomized fragmentation to attempt recovery of the original work.

In other words, the more degraded the DNA, the shorter the fragments and the harder it is to come up with an assembly. At some point the fragmentation might be so bad that the only way you can attempt to achieve anything is to try to use a relevant, well understood reference sequence from a modern day specimen/consensus for comparison (or clues, or to fill-in-the-blanks)... if one exists. I'm no geneticist, but I think in those circumstances the confidence in the results start to go from "hey, that's cool!" to "interesting" to, eventually, an artist's rendition of what an ancient genome might have looked like - drawing from long lost cousins which are still alive today.

Happily, re-assembling short, fragmented DNA happens to be how commodoty high-speed, high-throughput, low-cost sequencing works these days - DNA is split into small lengths, Eg. 500-ish basepairs, and then depending on the experiment/purpose/targets etc. it's all (or partially) re-assembled by finding enough overlapping bits (hopefully beginning and ending with proprietary markers used in the splitting process) with statistical tricks to qualify if the data is sufficient, which areas are problematic in coverage/confidence etc... and it helps enormously if you're working on an organism that's already been sequenced to death for comparison.

So there are many well advanced tools for coming up with contiguous DNA from a pile of short reads.

IIRC, the other trick with ancient DNA is - first of all, extracting enough useful material to begin with, without damage. As reads get shorter, increased redundancy helps - more randomly overlapping regions can ease the task of re-assembly - but very short reads might mean that a number of different assemblages are possible. Not to mention delicate amplification methods which might increase the noise as well as the signal...

Comment Re:This is FUD (Score 1) 115

by csirac on Sunday June 23, 2013 @08:40AM (#44084281) Attached to: Genomics Impact On US Economy Approaches $1 Trillion

That's cool. I understand the original article is about human genome research, but I still consider that you're thinking rather narrowly - but what do I know, so far my involvement in bioinformatics has only been accidental, I'm an engineer really. But just as an anecdote, I worked with a couple of unrelated teams - sponsored by pharma companies - to do basic "alpha taxonomy" and biology research on scientifically-neglected organsims (they're not cute or furry!). They make themselves out of (or secrete) interesting compounds potentially useful for cancer treatments. But because the biology/population dynamics of these things are so poorly understood, simply knowing where the populations exist, how diverse these populations are (sometimes "same species" individuals are chemically different in important ways - Due to life cycle? Are they just different forms of the same species? Or do the taxonomists need to split the species up? How are they interbreeding? What role does the compound of interest play in them? Etc) makes repeatability of these chemical assays on subsequent indviduals really quite difficult.

And in any case, I'm sure you're aware of all the interesting arguments for biosecurity/invasive species/food security etc... but that's getting off topic :)

Comment Re:This is FUD (Score 1) 115

by csirac on Friday June 14, 2013 @12:52AM (#44004095) Attached to: Genomics Impact On US Economy Approaches $1 Trillion

I think you, honestly, misread. My point was that sequencing random organisms is not medically useful; it's focusing on diseases (to divine means of attack) or some carefully-selected model organism (to understand a simplified version of ourselves) that brings us important information.

I have to say that as somebody tangentially involved in evolutionary biology research (boring computational stuff), I appreciate and agree with most of your input in this discussion, however it has to be said that you're being a bit too dismissive of studies on non-human, non-model species. There is much to be learned about some very fundamental questions in molecular biology, not all of which might necessarily be answered by studying in-bred lab rats. It's my belief there is a mountain of data (sadly of poor quality either in controls/methods or provenance/curation) which could lead to questions and further studies of these "alien" species (Sea slugs, insects, plants) which have answers to important, basic fundamentals which wouldn't be as obvious by sticking to the utter desert of homogeneous specimens which medical research relies upon today.

That's not to say in-bred lab rats are the wrong tool for the job, but if that's all we're limited to, our discoveries will be similarly limited.

Comment Re:Large Format PC Tablets (Score 1) 141

by csirac on Saturday June 01, 2013 @09:21PM (#43886479) Attached to: Ask Slashdot: Portable High-Resolution External Displays?

There are apps to make your tablet (at least for android anyway) a second display, but they're almost exclusively for windows. I've hacked up some scripts with VNC/x2x/etc. over WiFi in Debian with my 10" Galaxy Tab, but it was too clunky and I rarely want to travel with a tablet anyway. So I went for a Lenovo LT1421 USB DisplayLink screen instead.

Comment I use a Lenovo ThinkVision LT1421 (Score 1) 141

by csirac on Saturday June 01, 2013 @09:13PM (#43886457) Attached to: Ask Slashdot: Portable High-Resolution External Displays?

Which is one of the 1366x768 resolution monitors you said you didn't want: http://www.lenovo.com/products/us/monitor/lt1421/. Given that portable productivity is my main concern though, I thought I'd share my experience with it. I use this display with a maxed-out i5 Lenovo x230 which itself is only 1366x768 - something that nearly put me off buying this brilliant little machine in the first place; but in the end I knew I'd be docking into a proper monitor for any serious work.

I take the display with me if I'm away for more than a day or two and expect to get some serious work hours in somewhere. It sits quite comfortably in my backpack which goes everywhere with me, next to the notebook. Setup is quick and painless, after some custom udev scripts at least, and in Linux also don't expect to (easily) have shared clipboard/window-dragging across screens: I've only ever been able to make this DisplayLink stuff work as a separate X11 server (with some extra bits like x2x to make it nicer).

Surprisingly, it's not the extra real-estate that I've come to appreciate most: it's the ergonomics. I position the USB display above my notebook, resting on whatever I can find up and away from the keyboard so I can look straight forward at it rather than spending hours hunched down over a little 12" notebook screen where the keyboard is.

At my home office I dock into a decent workstation setup with 27" WQHD 2560x1440 IPS display, as an almost-30-year-old I'm regretting all the terrible posture/ergonomics I've inflicted on myself over the years - so I make sure I'm setup properly for any work which stretches for more than an hour or so.

I run the USB display at 16 bit colour depth to improve responsiveness over the USB 2.0 connection. This is just fine for coding/browsing/email/project-management stuff but any full-screen multimedia (movies/games/etc) is going to happen on your main laptop screen, unless you find a USB 3.0 DisplayLink screen perhaps. The LT1421 also isn't IPS, so it's not quite as nice to look at but to be honest any time I find myself setting it up for a decent coding session it's in an appropriately lit/quiet area anyway.

Comment Re:It's the infrastructure, stupid! Not the .debs. (Score 1) 302

by csirac on Monday March 11, 2013 @02:33AM (#43135691) Attached to: Shuttleworth On Ubuntu Community Drama

You pretty much entirely misunderstood what I was saying

... after misunderstanding the GP yourself

And therefore, when someone says something bad about Linux package management, you interpret it as an attack on the only thing which can provide sustainability, sharing of effort, etc. You need to take into account that people might be disagreeing with your axioms instead...

We're having a disagreement, not a brawl. Just because I haven't adopted your point of view based on a few casual, wishful remarks from yourself doesn't mean that I am stubbornly clinging to something for the sheer fun of it. I have responded because it seems like you're trying to say something interesting, I just can't figure out what it is. I certainly can't relate it to any actual experiences in using, supporting, maintaining and developing open source software. So I guess I'm trying to understand if you've arrived at your conclusions based on something real, or did you just like the sound of it?

You really, really don't get me. I think that, rather than helping, the existing Linux model is actually hindering.

I got that, but you have to offer some sort of reasoning or justification for this. I completely fail to see that ditching shared libraries wouldn't result in a net increase in burden - I'm trying to imagine this world, it's a forgotten pre-internet era which seems far more tedious for both users and developers alike. How would you address the concerns I raised? Can you address them, or am I supposed to simply accept that your scenario is better?

I just want to use my computer, code, and share my code. I don't want to babysit my computer in every excruciating detail. I wonder if what you actually have an issue with is more abstract - fragmentation from competing ecosystems? Community/contributor organisation? OSS collaboration/release practices? Development priorities? Policies?

And the reason for that is that when you really look into it, the things you say about how it helps all contain paradoxes which mean they actually hinder.

And yet, the very things you're saying are hindering us are prominent features in the platforms Ingo wants us to reproduce!

For example, build and test infrastructure isn't actually shared, it's duplicated -- each distro does build and test on its own, because each distro is trying to tweak thousands of applications.

Again, more misunderstanding. Distros do not run build & test infrastructure because they're tweaking applications. Yes, it happens, but the vast majority of packages are completely unmodified, using distro-specific build parameters which are supported by the toolchain and is *not* upstream's concern. In fact (especially in the case of libraries) the human involvement in updating a package with upstream is simply running a tool which automates this!

The reason distros run build & test infra is to confirm that upstream have released something sane and behaves correctly in the distro's environment. Which is exactly what the platforms Ingo advocates do as well - Android, iOS, Windows Mobile.

You haven't shown me any technical challenge yet. And I don't buy that "use of packages or package management systems" equates with "OMFG what a waste of unnecessary extra work for everybody". Nobody is forcing anyone to package anything, and if you look carefully the "too many packages" argument can be re-cast as a "move stuff out of main and into contrib/universe" - or abandon the latter entirely, which is what PPAs or vendor/project-specific repos are all about. Hell, I can't be the only one using Oracle, MongoDB, and other project/vendor-specific repos can I?

You really, really don't get me. I think that, rather than helping, the existing Linux model is actually hindering.

You said that, I know you're saying that. But mere statements don't convey meaning or understanding or in fact any actionable information at all. Do distros package too much? Yes, but I have to say things have already been quietly changing for many years now: there's heaps of places to get pacakges other than the distro's official archives. What about the assertion that packages are bad, build/test infra is bad? Ingo advocates mobile platforms which:

Have their own package format. The horror!
Have their own duplicate build/test/validation infrastructure (arbitrarily) gating releases
Have a centralized, curated repository for distribution

I am trying to understand, but I have yet to see any technical challenge. About the only real difference I see is chucking most stuff out of "main" repositories to focus on a core set of a few hundred things (1000+ packages) or so; and asking all software authors to drop everything to make sure they can be bothered to do the packaging of the chucked-out stuff for us (on all architectures).

Which means each distro and upstream software pretending that all the other distros don't exist; hence my comment about consolodation.

Comment It's the infrastructure, stupid! Not the .debs... (Score 1) 302

by csirac on Friday March 08, 2013 @11:56PM (#43123755) Attached to: Shuttleworth On Ubuntu Community Drama

Er, what?! Distributions aren't a great fault line for segregating users of different CPU types. There are many distributions which support multiple CPU architectures, after all. And there's no "risk" in attempting to run a PPC binary on an x86 computer, aside from lost time downloading it. It simply won't work.

What an odd interpretation - surely he was referring to the fact that distros have made it so you don't have to care about what your CPU arch is: whether you're on one of the ARMs, MIPSes, Sparcs, PPC or otherwise. The distro's infrastructure/ecosystem keeps continuous, automated builds and test reports of all packages for all supported CPU architectures, rather than hoping that the author of your favourite text editor has up-to-date binaries for your particular CPU on his blog (which is where now?).

It only sucks if you're super anal about saving every last bit of disk space, but even so it's possible to find software that'll scan all binaries and strip alien architectures.
...
For whatever reason nobody in the Linux world wants to do this, even though I'd guess it wouldn't be all that hard to extend ELF to support it.

That's because saving on disk space is *entirely* the last reason to argue for shared libraries! If I was asked "why bother with shared libraries", disk/memory usage wouldn't even make the list. But it would have things like:

Performance improvements for free.
Feature enhancements for free.
Bugfixes for free.
Security updates for free.

Contrast with not using shared libs. The author must now stay on top of all dependencies by themselves. And how should it be decided when it's worth cranking out a new release? And perhaps it comes to build day (probably manually, most authors have day jobs after all) and one of the dependencies has a current stable version that's only 2 days old. Now there's a decision about whether to use it or go with the previous version which has had a few months in the wild. Also, what CPU architectures am I going to build for? Do I even have cross-compiler set up for them? Is my build environment at the best version/config to support them? Do I even have an example of the hardware to test on?

What about from the user's perspective - say there's a libpng bug on the loose which enables arbitrary code execution. How do I go about understanding which of my fat-binary-statically-linked executables use libpng, let alone what versions they're running? Okay, let's pretend we have a decent tool which can scan the system and reverse-engineer this. What can I do about it? Nothing, unless you want to uninstall each affected piece of software or hang around hitting 14 separate websites waiting for each one of them to do a new release, which may or may not have a fixed libpng built-in (you are hoping each release manager for each piece of software could be bothered to stay on top of the updates for something as benign, boring and stable as libpng and that they will have bothered to update, test and release with the fixed version).

These are all things distros take care of for you - automated build farms continuously test and build your software on machines and configurations you've never considered, much less have time to actively support.

This is all about sustainability and sharing of maintenance effort, the very reason package management systems were invented in the first place. It's not just the package format, the tool, the assumptions/"visions" they implement - it's the infrastructure, ecosystem, and shared curation. So that we can strive for consistent software quality in a way that scales all the way up and all the way down from the largest and smallest codebases, userbases and development teams.

Which is truly the core of this argument. Dismissing package management systems means dismissing the ecosystems which needed them.

This is not a real problem for 99.999% of users, and for the 0.001%, I would advise they build everything themselves anyways.

And I do - using the package management system, and the benefits I get from doing so. Eg. cpan2deb is my friend.

Well, I think you can tell where I stand on the issue. All the "diversity" arguments I've ever heard in favor of Linux style package managers and "distros" boil down to prioritizing difference for the sake of being different over getting useful things done. (And your post hasn't done anything to move that needle for me, fwiw.)

I doubt we'll see consolidation of not only Linux (desktop, server, mobile or otherwise) to the extent necessary to make package management and shared build/testing infrastructure the waste of time you seem to think it is...

Comment Re: It's The American Drean (Score 4, Insightful) 1313

by csirac on Thursday February 21, 2013 @02:32AM (#42963581) Attached to: US CEO Says French Workers Have Three-Hour Work Day

Just how many teachers have tenure? Honest question, I thought it was quite rare. Here in Australia, we're spending more than ever on education (iPads, sporting stuff, school halls) and yet my cousin's school last year could not afford highschool maths text (poorly OCRd PDFs of painfully substandard material don't count). We have far worse education outcomes than 10 years ago. Our neighbours are kicking our arses in educating highschool kids, and one of the biggest differences is the totally opposite spending priorities - fewer computers and iPads, better paid (relative to median wage) teachers.

Comment Re:duke nuken (Score 1) 128

by csirac on Wednesday February 20, 2013 @06:55AM (#42954247) Attached to: <em>Duke Nukem 3D</em> Code Review

Quake was very nice with a 3D card. But at least where I was at school, most players enjoyed the game without one.

Comment Re:Just use R (Score 2) 332

by csirac on Thursday February 14, 2013 @03:38PM (#42899637) Attached to: Ask Slashdot: Spreadsheet With Decent Programming Language?

+1, although as a big ruby fan and using perl at work for nearly three years (and matlab at uni for four) I much prefer (and use) python+pandas instead.

Comment Google Refine (now Open Refine) has python (Score 1) 332

by csirac on Thursday February 14, 2013 @03:35PM (#42899577) Attached to: Ask Slashdot: Spreadsheet With Decent Programming Language?

But it's not quite a spreadsheet application.. I found it quite powerful last I tried it ~18 months ago, but I had trouble fitting the entire dataset in memory openrefine.org

Comment Re:Meds by mail (Score 1) 564

by csirac on Saturday February 09, 2013 @09:59PM (#42847401) Attached to: On the end of USPS 1st Class Saturday delivery:

Wait. You get insulin shipped to your door? How is it packaged? I sometimes have trouble with heat-damaged insulin at the best of times. Just in December I was over in Western Australia, I left my backpack in an unairconditioned building which coincided with the onset of a nasty cold. It took a few (big) ineffective doses and a fresh cartridge to realize that my staggeringly high BSLs were due to heat-damaged insulin, rather than the virus... so I'm curious how they package insulin for mailing.

Comment Re:I get the impression that (Score 1) 180

by csirac on Wednesday February 06, 2013 @09:14AM (#42807381) Attached to: Python Gets a Big Data Boost From DARPA

Perhaps he means it's well funded in the sense that they have dedicated programmers at all. "Run of the mill" science is done by investigating scientists or their jack-of-all-trades research assistants, collaborators or grads/post-docs, etc. most of which are unlikely to have substanital software engineering experience or training in their background.

Nonetheless, they write code - very useful, productive code - but it's in whatever tool or high-level language popular among their peers/discipline (matlab, R, python, perl, fortran... each corner of science has their favourite things and if you want to leverage the work of others you run with whatever everyone else is using unless you have funding and good reasons not to).

Slashdot Top Deals