Want to read Slashdot from your mobile device? Point it at m.slashdot.org and keep reading!

 



Forgot your password?
typodupeerror
×

Comment Agile Manifesto (Score 2) 491

In reading the comments, it's clear that many people don't know the roots of agile software development. In short, agile (note the lower case) development is basically a set of of principles laid out by a group of very talented developers in the late nineties in their agile manifesto:

http://agilemanifesto.org/

Note that the manifesto makes no mention of Extreme Programming, Scrum, or any of the other capital-A Agile methods. Instead, it focuses on observations about what made their software projects successful. Itpecifically doesn't prescribe any specific methodology, but rather encourages communication, iteration, and excellence in design and engineering. The last two points come from this section of the manifesto:

"Continuous attention to technical excellence
and good design enhances agility."

The manifesto very much allows for, and even encourages, design. It also assumes that the practitioners are already experienced developers who know how do design software and know how much design is needed before coding. Unfortunately, the most Agile methods traded experience for certified training and the 'technical excellence' portion was lost.

I've worked with many talented teams and have seen agile work time and again. Of course, all of those projects did have design, documentation, and tests. But, all those artifacts were developed using the same principles in the manifesto.

-Chris

Comment iPhone + BT keyboard + HDMI/DVI adapter (Score 2) 339

I've been using that combo more often for conferences and business meetings. If you want more screen, an iPad or galaxy tablet would work.

I like the iPhone approach since it limits me to a single device for everything (except coding). Keynote works great for presenting (I usually author in PowerPoint).

-Chris

Comment Re:This is what I like about Microsoft (Score 5, Insightful) 118

The big difference is that Microsoft Research is one of the last large corporate research labs focused on pure research. That is, research done for the sake of the research, not to drive product development. Research done at MSR doesn't have to be product driven (it has to be in the general space of software and computers, but that's about the only requirement). MSR is well funded by Microsoft and an integral part of the company's culture.

Sure, IBM, HP, and Intel all have research labs, but their charters have been re-written over the last ten years to focus more on product-centric research. Most research projects at these companies must start with a business plan that shows how the work will be commercialized within 5 years before being approved. This is not the pure research these labs were once known for.

Google, Facebook, Yahoo, and many other internet companies have some interesting projects (self driving cars, for instance), but these tend to be one-off projects and aren't part of a larger, long lived research organization.

Another interesting aspect of MSR is that they encourage all MS developers to take a stint in the organization, not just specially recruited Ph.D.s. It's not uncommon for someone to go from working on a product for a few years, take some time in MSR, then go back to product work.

I've worked directly with many of the research groups mentioned in this post over the last 20 years. Based on my experiences, MSR is truly the last real corporate research group (in the spirit of 20th century PARC/Watson/et al). The others are just part of the product funnels or whims of the founders.

-Chris

Submission + - Just what is 'Big Data'?

rockmuelle writes: I work in a 'Big Data' space (genome sequencing) and routinely operate on tera-scale data sets in a high-performance computing environment (high-memory (64-200GB) nodes, 10 GigE/IB networks, peta-scale high-performance stroage systems). However, the more people I chat with professionaly on the topic, the more I realize everyone has a different definition of what consitutites big data and what the best solutions for working with large data are. If you term yourself a 'big data' user, what do you consider 'big data'? Do you measure data in mega, giga, tera, peta-bytes? What is a typical data set you work with? What are the main algorithms you use for analysis? What turn-around times are typical for analyses? What infrastructure software do you use? What system achitectures work best for your problem (and which have you tried that don't work well?)?

Comment Work with your company's legal team (Score 1) 467

I work under a similar, very restrictive IP agreement. I raised the issue of side projects with the corporate lawyer in charge of IP and explained the types of projects I do on the side for fun and profit. While the company does not grant blanket exclusions, they were happy to review them on a project by project basis and grant exceptions.

Their goal was to protect the company's business using standard legal tools. Just like my job requires me to use my skils to the fullest, so does theirs. However, talking through it made it clear that there was no malicious intent.

One important thing to know when doing this: the lawyers represent the company and are ethically bound to put the company's interests first. They won't be able to give you any legal advice. You may want to talk to a lawyer first, just so you have outside counsel.

Also, this is just business for the company. The more you treat as business (and not good vs evil), the better chance you'll have of success.

-Chris

Comment TEDx Talk on the Subject (Score 3, Informative) 239

I did a talk on this a few years back at TEDx Austin (shameless self promotion): http://www.youtube.com/watch?v=8C-8j4Zhxlc

I still deal with this on a daily basis and it's a real challenge. Next-generation sequencing instruments are amazing tools and are truly transforming biology. However, the basic science of genomics will always be data intensive. Sequencing depth (the amount of data that needs to be collected) is driven primarily by the fact that genomes are large (e. coli has around 5 M bases in it's genome, humans have around 3 billion) and biology is noisy. Genomes must be over-sampled to produce useful results. For example, detecting variants in a genome requires 15-30x coverage. For a human, this equates to 45-90 Gbases or raw sequence data, which is roughly 45-90 GB of stored data for a single experiment.

The two common solutions I've noticed mentioned often in this thread, compression and clouds, are promising, but not yet practical in all situations. Compression helps save on storage, but almost every tool works on ASCII data, so there's always a time penalty when accessing the data. The formats of record for genomic sequences are also all ASCII (fasta, and more recently fastq), so it will be a while, if ever, before binary formats become standard.

The grid/cloud is a promising future solution, but there are still some barriers. Moving a few hundred gigs of data to the cloud is non-trivial over most networks (yes, those lucky enough to have Internet2 connections can do it better, assuming the bio building has a line running to it) and, despite the marketing hype, Amazon does not like it when you send disks. It's also cheaper to host your own hardware if you're generating tens or hundreds of terabytes. 40 TB on Amazon costs roughly $80k a year whereas 40 TB on an HPC storage system is roughly $60k total (assuming you're buying 200+ TB, which is not uncommon). Even adding an admin and using 3 years' depreciation, it's cheaper to have your own storage. The compute needs are rather modest as most sequencing applications are I/O bound - a few high memory (64 GB) nodes are all that's usually needed.

Keep in mind, too, that we're asking biologists to do this. Many biologists got into biology because they didn't like math and computers. Prior to next-generation sequencing, most biological computation happened in calculators and lab notebooks.

Needless to say, this is a very fun time to be a computer scientist working in the field.

-Chris

Comment Queasy... (Score 1) 332

I get queasy just thinking of coding on a ship for an hour, let alone a few months or a years. Maybe the Caribbean or the Gulf of Mexico might work, but anywhere at sea subject to swells that have had thousands of miles to mature can't be that conducive to coding. And, if you can tolerate it, you'll make more money on an oil rig.

-Chris

Comment Re:Possible use... (Score 2) 412

That's more likely the slice of images taken from the satellite's path. I suspect the satellite imaged that region when the channels/roads/whatever had a layer of water on top and were reflecting the sunlight. If you look at the adjoining tiles, there's still a channel structure, it's just not reflective.

-Chris

Comment Phidgets (Score 2) 147

http://www.phidgets.com/

I've used Phidgets in the past for exactly this application (research into UIs for large data). Lots of premade USB controls available and easy to hook up most analog controls to their IO boards. I went to the local electronics shop and bought a slew of buttons, knobs and slides and had no problem hooking them up with phidgets.

For programming, I wrapped the C library in Python using SWIG.

-Chris

Comment HPC Applications (Score 1) 314

A number of HPC applications funded by NSF/DARPA/DOE grants are able to provide a continued source of new research while maintaining and improving the applications.

One example is OpenMPI. BLAS/LINPACK/LAPACK are also examples. Some of the C++/Boost libraries also are maintained in academic, such as the Boost Graph Library.

-Chris

Comment DNA, RNA, and Genes (Score 5, Informative) 173

The judge's reasoning in the ruling hinges on the fact that the BRCA1/2 genes do not appear in nature as isolated, unmodified DNA and instead only appear in DNA form as part of a (much) larger chromosome. While technically true, it ignores an important fact of genomics: while the BRCA genes do not appear in vivo as isolated _DNA_, the do appear as isolated _RNA_. The RNA counterpart of the DNA sequence is slightly modified - it is the 'reverse-complement' of the DNA with the T's replaced with U's (for example, AACC - (reverse complement) -> GGTT - (sub U for T) -> GGUU.

So, in a very perverse way, the judge is correct. The isolated, unmodified DNA does not appear in nature.

There is natural mechanism for converting RNA back into DNA called reverse transcription (RT). RT-based methods are how we sequence genes. RNA from genes is isolated and converted back into DNA for sequencing. This is a standard lab method and used for all gene sequencing. (interestingly, if someone were to find RT at work in a cell converting BRCA genes back to DNA, the patent could be invalidated.)

The gene itself, in RNA form, appears isolated in nature. The RNA sequence cannot be patented. But, sequencing methods all rely on converting RNA back to DNA for sequencing. The sequence is read as DNA. But, that's not really the gene, that's just a modified representation of the gene. The functioning gene is the RNA version, not the DNA copy of it.

What's frustrating is that Myriad is using a technical aspect of how gene/RNA sequencing works to claim a patent on a gene itself.

-Chris

Comment Re:Depends... (Score 1) 244

"BTW, a conference publication isn't considered a "journal" publication, and doesn't confer the same status. Conferences are where the work gets done: people present developing ideas and get feedback on them."

This is true for all fields except computer science. In CS, conference publications form the basis of a publication record and journals tend be be more for 'archival' bodies of work. CS conferences are peer reviewed at the same standards as most journals in other fields.

CS is lacking journals with quick turnaround times and journals that accept incremental work. Articles submitted to CS journals often go through a year or more of reviews and revisions whereas the entire cycle for a conference from submission to publication is typically around 8 months. In CS, incremental work, which forms the bulk of most publishing records, is communicated through conferences, not journals. There is no equivalent of Physical Review Letters in CS.

Of course, this presents challenges and opportunities for CS academics. The main challenge is that few tenure committees (not departmental, but once the tenure case is presented to the larger committee) understand this and it always takes some explanations when making the case for a strong CS professor who's only pubs are at conferences such as POPL or SIGGRAPH (for non-CS people, publishing in those venues is right up there with Science or Nature in terms of importance). The opportunity for clever tenure track computer scientists is to partner with a physicist or biologist and rack up a number of co-authorships on papers in incremental journals. Oddly, CS people tend to think that _all_ journals are archival and difficult to publish in, so pubs in journals with collaborators helps your CS case.

It's a silly world...

-Chris

Comment Re:Too small.... (Score 1) 243

I had regular access to a T221/Big Bertha for years. While it was a great display, it had some problems and ended up being overkill for most applications.

It's biggest challenge was that there were very few cards that could drive it at a decent framerate. ~10 fps was the most it ever got, which made it just useful for interactive applications.

Most applications and windowing systems had a number of UI elements that were rendered based on pixel values (think 32x32 pixel icons). This made most GUIs unusable as the fixed pixel elements were too small (too small to even see in some cases).

200 dpi was also too fine grained for anyone to notice without the aid of external magnification. We kept a magnifying glass next to the monitor to help.

Despite those limitations, as a toy and a testbed for visualization ideas, it was a lot of fun. I used it regularly for GIS applications, often with the aid of the magnifying glass. I also developed large graph visualizations for it and it could easily display much more detail than a standard display.

For rendering, it was possible to render simple primitives and not worry about anti-aliasing or sub-pixel rendering techniques. Large scatter plots, in particular, benefited from the resolution. Sparse areas that would appear dense on a normal display (due to one pixel being used per data point) actually appeared sparse, helping people interpret their data easier.

-Chris

Slashdot Top Deals

God made the integers; all else is the work of Man. -- Kronecker

Working...