Forgot your password?

typodupeerror

Comment: Work with your company's legal team (Score 1) 431

I work under a similar, very restrictive IP agreement. I raised the issue of side projects with the corporate lawyer in charge of IP and explained the types of projects I do on the side for fun and profit. While the company does not grant blanket exclusions, they were happy to review them on a project by project basis and grant exceptions.

Their goal was to protect the company's business using standard legal tools. Just like my job requires me to use my skils to the fullest, so does theirs. However, talking through it made it clear that there was no malicious intent.

One important thing to know when doing this: the lawyers represent the company and are ethically bound to put the company's interests first. They won't be able to give you any legal advice. You may want to talk to a lawyer first, just so you have outside counsel.

Also, this is just business for the company. The more you treat as business (and not good vs evil), the better chance you'll have of success.

-Chris

Comment: TEDx Talk on the Subject (Score 3, Informative) 239

by rockmuelle (#38244100) Attached to: Genome Researchers Have Too Much Data

I did a talk on this a few years back at TEDx Austin (shameless self promotion): http://www.youtube.com/watch?v=8C-8j4Zhxlc

I still deal with this on a daily basis and it's a real challenge. Next-generation sequencing instruments are amazing tools and are truly transforming biology. However, the basic science of genomics will always be data intensive. Sequencing depth (the amount of data that needs to be collected) is driven primarily by the fact that genomes are large (e. coli has around 5 M bases in it's genome, humans have around 3 billion) and biology is noisy. Genomes must be over-sampled to produce useful results. For example, detecting variants in a genome requires 15-30x coverage. For a human, this equates to 45-90 Gbases or raw sequence data, which is roughly 45-90 GB of stored data for a single experiment.

The two common solutions I've noticed mentioned often in this thread, compression and clouds, are promising, but not yet practical in all situations. Compression helps save on storage, but almost every tool works on ASCII data, so there's always a time penalty when accessing the data. The formats of record for genomic sequences are also all ASCII (fasta, and more recently fastq), so it will be a while, if ever, before binary formats become standard.

The grid/cloud is a promising future solution, but there are still some barriers. Moving a few hundred gigs of data to the cloud is non-trivial over most networks (yes, those lucky enough to have Internet2 connections can do it better, assuming the bio building has a line running to it) and, despite the marketing hype, Amazon does not like it when you send disks. It's also cheaper to host your own hardware if you're generating tens or hundreds of terabytes. 40 TB on Amazon costs roughly $80k a year whereas 40 TB on an HPC storage system is roughly $60k total (assuming you're buying 200+ TB, which is not uncommon). Even adding an admin and using 3 years' depreciation, it's cheaper to have your own storage. The compute needs are rather modest as most sequencing applications are I/O bound - a few high memory (64 GB) nodes are all that's usually needed.

Keep in mind, too, that we're asking biologists to do this. Many biologists got into biology because they didn't like math and computers. Prior to next-generation sequencing, most biological computation happened in calculators and lab notebooks.

Needless to say, this is a very fun time to be a computer scientist working in the field.

-Chris

Comment: Queasy... (Score 1) 332

by rockmuelle (#38208512) Attached to: A Floating Home For Tech Start-ups

I get queasy just thinking of coding on a ship for an hour, let alone a few months or a years. Maybe the Caribbean or the Gulf of Mexico might work, but anywhere at sea subject to swells that have had thousands of miles to mature can't be that conducive to coding. And, if you can tolerate it, you'll make more money on an oil rig.

-Chris

Comment: Re:Possible use... (Score 2) 412

by rockmuelle (#38054294) Attached to: China Building Gigantic Structures In the Desert

That's more likely the slice of images taken from the satellite's path. I suspect the satellite imaged that region when the channels/roads/whatever had a layer of water on top and were reflecting the sunlight. If you look at the adjoining tiles, there's still a channel structure, it's just not reflective.

-Chris

Comment: Phidgets (Score 2) 147

by rockmuelle (#38032344) Attached to: Ask Slashdot: Physical Input Devices For Developers?

http://www.phidgets.com/

I've used Phidgets in the past for exactly this application (research into UIs for large data). Lots of premade USB controls available and easy to hook up most analog controls to their IO boards. I went to the local electronics shop and bought a slew of buttons, knobs and slides and had no problem hooking them up with phidgets.

For programming, I wrapped the C library in Python using SWIG.

-Chris

Comment: HPC Applications (Score 1) 314

by rockmuelle (#37527480) Attached to: Ask Slashdot: Successful Software From Academia?

A number of HPC applications funded by NSF/DARPA/DOE grants are able to provide a continued source of new research while maintaining and improving the applications.

One example is OpenMPI. BLAS/LINPACK/LAPACK are also examples. Some of the C++/Boost libraries also are maintained in academic, such as the Boost Graph Library.

-Chris

Comment: DNA, RNA, and Genes (Score 5, Informative) 173

by rockmuelle (#36932572) Attached to: Ruling Upholds Gene Patent In Cancer Test

The judge's reasoning in the ruling hinges on the fact that the BRCA1/2 genes do not appear in nature as isolated, unmodified DNA and instead only appear in DNA form as part of a (much) larger chromosome. While technically true, it ignores an important fact of genomics: while the BRCA genes do not appear in vivo as isolated _DNA_, the do appear as isolated _RNA_. The RNA counterpart of the DNA sequence is slightly modified - it is the 'reverse-complement' of the DNA with the T's replaced with U's (for example, AACC - (reverse complement) -> GGTT - (sub U for T) -> GGUU.

So, in a very perverse way, the judge is correct. The isolated, unmodified DNA does not appear in nature.

There is natural mechanism for converting RNA back into DNA called reverse transcription (RT). RT-based methods are how we sequence genes. RNA from genes is isolated and converted back into DNA for sequencing. This is a standard lab method and used for all gene sequencing. (interestingly, if someone were to find RT at work in a cell converting BRCA genes back to DNA, the patent could be invalidated.)

The gene itself, in RNA form, appears isolated in nature. The RNA sequence cannot be patented. But, sequencing methods all rely on converting RNA back to DNA for sequencing. The sequence is read as DNA. But, that's not really the gene, that's just a modified representation of the gene. The functioning gene is the RNA version, not the DNA copy of it.

What's frustrating is that Myriad is using a technical aspect of how gene/RNA sequencing works to claim a patent on a gene itself.

-Chris

Comment: Re:Depends... (Score 1) 244

by rockmuelle (#35338316) Attached to: Is Attending a CS Conference Worth the Time?

"BTW, a conference publication isn't considered a "journal" publication, and doesn't confer the same status. Conferences are where the work gets done: people present developing ideas and get feedback on them."

This is true for all fields except computer science. In CS, conference publications form the basis of a publication record and journals tend be be more for 'archival' bodies of work. CS conferences are peer reviewed at the same standards as most journals in other fields.

CS is lacking journals with quick turnaround times and journals that accept incremental work. Articles submitted to CS journals often go through a year or more of reviews and revisions whereas the entire cycle for a conference from submission to publication is typically around 8 months. In CS, incremental work, which forms the bulk of most publishing records, is communicated through conferences, not journals. There is no equivalent of Physical Review Letters in CS.

Of course, this presents challenges and opportunities for CS academics. The main challenge is that few tenure committees (not departmental, but once the tenure case is presented to the larger committee) understand this and it always takes some explanations when making the case for a strong CS professor who's only pubs are at conferences such as POPL or SIGGRAPH (for non-CS people, publishing in those venues is right up there with Science or Nature in terms of importance). The opportunity for clever tenure track computer scientists is to partner with a physicist or biologist and rack up a number of co-authorships on papers in incremental journals. Oddly, CS people tend to think that _all_ journals are archival and difficult to publish in, so pubs in journals with collaborators helps your CS case.

It's a silly world...

-Chris

Comment: Re:Too small.... (Score 1) 243

by rockmuelle (#34017086) Attached to: The World's Smallest Full HD Display

I had regular access to a T221/Big Bertha for years. While it was a great display, it had some problems and ended up being overkill for most applications.

It's biggest challenge was that there were very few cards that could drive it at a decent framerate. ~10 fps was the most it ever got, which made it just useful for interactive applications.

Most applications and windowing systems had a number of UI elements that were rendered based on pixel values (think 32x32 pixel icons). This made most GUIs unusable as the fixed pixel elements were too small (too small to even see in some cases).

200 dpi was also too fine grained for anyone to notice without the aid of external magnification. We kept a magnifying glass next to the monitor to help.

Despite those limitations, as a toy and a testbed for visualization ideas, it was a lot of fun. I used it regularly for GIS applications, often with the aid of the magnifying glass. I also developed large graph visualizations for it and it could easily display much more detail than a standard display.

For rendering, it was possible to render simple primitives and not worry about anti-aliasing or sub-pixel rendering techniques. Large scatter plots, in particular, benefited from the resolution. Sparse areas that would appear dense on a normal display (due to one pixel being used per data point) actually appeared sparse, helping people interpret their data easier.

-Chris

"How do I love thee? My accumulator overflows."

Working...