This is a free ebook download from Microsoft and uses a variety of leaders in data driven science to write chapters about a variety of scientific disciplins and what "big data" means to them. The first chapter is especially enlightening! Blurb about the book:
Increasingly, scientific breakthroughs will be powered by advanced computing capabilities that help researchers manipulate and explore massive datasets.
The speed at which any given scientific discipline advances will depend on how well its researchers collaborate with one another, and with technologists, in areas of eScience such as databases, workflow management, visualization, and cloud computing technologies.
In The Fourth Paradigm: Data-Intensive Scientific Discovery, the collection of essays expands on the vision of pioneering computer scientist Jim Gray for a new, fourth paradigm of discovery based on data-intensive science and offers insights into how it can be fully realized.
My dad is also an academic. Watching his path was quite inspiring for me, through I didn't appreciate all he had done until I was set on doing the same. He worked wherever he could that would allow him to write grants and do the work he wanted to do. It wasn't until he was 50 that he landed his first profship. He's now been a prof for over 15 years, works harder than before due to department responsibilities, graduate students and post docs, and loves every minute (almost). That showed me that if you keep at it long enough, eventually things would work out.
If you decide to go for the academic life, good luck and enjoy every step along the way. Just don't worry too much about the sad state of affairs for doing basic research.
I'm actually involved with a large US National Science Foundation project to help build the cyberinfrastructure to help handle these data and analyses: the iPlant Collaborative: http://iplantcollaborative.org./ In addition, I maintain a set of web-based software for comparative genomics: CoGe, http://genomevolution.org./ From the standpoint of genomes, I adopted the philosophy of building a system that can easily accommodate new versions of existing genomes and new genomes. Thus, as new data becomes available, they get quickly loaded into the system and made available for analysis by any of the existing tools or compared to any of the already loaded genomes. So far, the system has scaled quite well and it is storing over 16,000 genomes from over 12,500 organisms. While the science is a lot of fun (sort of like the ultimate video game except no one knows the rules and there are no pre-built user interfaces), it is awesome to see how quickly the number of sequenced genomes has grown over such a short period of time. This is driven by how cheap the technology has become to use and the quantity of data that can be produced. For those interested, the National Human Genome Research Institute keeps track of this and has some very informative graphs: http://www.genome.gov/SequencingCosts/.
While it has also been said, the analyses and interpretation of these data is extremely rate limiting. Lots of opportunity for folks with programming, algorithm, data visualization, web, and user interface experience.
August 12. The Moon is just a couple of days past new at the [Persids] shower's peak, so there will be no moonlight to interfere with the faint meteors. The shower should reach its peak in the hours after midnight (before dawn on August 13), with a maximum of a few dozen meteors visible per hour.
Absolutely stunning piece of work:
Our current understanding of how dynamic a genome is, the types of changes that occur, and the factors that limit these changes is very limited. Much of this is because getting a genome of an organism can be expensive and laborious, depending on the size of the genome (RNA virus 15,000 nt, DNA virus: 150,000 nt, bacteria: 5,000,000 nt, yeast: 20,000,000 nt, multicellular organisms: 100,000,000-10,000,000,000). Since our understanding of how genomes evolve depend on getting genomes sequenced that are appropriately related to one another (e.g. populations of organisms versus diversity of organisms), we can only get answers for those genomes we currently have (current ~8000 for all viruses, bacteria, archaea, and eukaryotes). Fortunately, there is currently a major technological revolution happening in biology: generating DNA sequences fast and cheap. For example, the first human genome was approx a 10 year project and cost ~$1,000,000,000. Now, the record for a human genome takes less than a week and costs ~$15,000.
This project is a major milestone as the authors sequenced 6 plant genomes (a mustard known as Arabidopsis thaliana) that are related to one another by 30 generations. Because of the close evolutionary relationships of these organisms, the authors can characterize the types of genomic change happening over very short time periods.
The emerging picture is that genomes, the fundamental genetic blueprint for a lineage of organisms, are much more dynamic than we had previously thought.
Link to Original Source