Want to read Slashdot from your mobile device? Point it at m.slashdot.org and keep reading!

 



Forgot your password?
typodupeerror
×

Comment Data Scientists are this bubble's Web Masters (Score 2, Interesting) 139

I've been working with big data since before it was a term and currently run a scientific software company that touches on many aspects of "data science". Many of my colleagues also work in the field. I've seen many fads come and go. Data Science as a profession is one of those.

Most people who call themselves data scientists are really just doing "big data" processing using tools such as Hadoop. They are delivering results to managers who have jumped on the big data band wagon and, not knowing any better, have asked for these skills. In 99% of the cases, the processing is simply haphazardly looking for patterns or running basic statistics on data that really isn't that big. However, there is a lot of low hanging fruit in data that hasn't been analyzed before and most practitioners who've suddenly become data analysis experts are rewarded for trivial findings. A tiny bit of statistics, programming, and data presentation skills go a long way.

Compare this to the Web Masters of the late 1990s. The Web was new and managers knew that they needed Web sites. HTML and CGI were techie things but also fairly easy to learn. A group of people quickly figured out that they could be very important to a company by doing very little work and created the position of Web Master. A tiny bit of programming, sys admin, and design skills went a long way.

Web Masters disappeared when IT departments realized that you actually needed real software developers, real designers, and real sys admins to run a corporate Web site. Sure, the bar is still low, but expertise beyond a 'For Dummies' book is still needed. And, few people can be experts in each area, hence the need for teams.

Real data science has actually been around for a long time. Statisticians and data analysts have been performing this role for decades and have built up a lot of rigor around it. It a tough skill set to develop, but a very useful one to have. "Big Data" distracted people a bit and let the current generation of data scientists jump in and pretend everything was new and we could throw out the old methods. As the field evolves, data science will necessarily transition back to the experts (statisticians) and become a team effort that includes people skilled in programming, IT, and the target domain (analysts).

That said, there's good money to be made right now, so if you have Web Master on your resume, you might as well be a data scientist while you can. ;)

-Chris

Comment Re:Classic Slashdot (Score 1) 463

Just sent this to feedback@slashdot.org ... please everyone do the same with your comments. Spam them into compliance with our demands! ;)

I have been reading and participating in discussions on Slashdot since 1996.* Over the years, there have been few times when I have gone more than a few days without reading /. /. is my connection to the broader geek community and the discussions on the site have been influential in shaping me as a geek. As an older geek, /. gives me a chance to share what I’ve learned and also keep up with how people are thinking about new technologies.

I have yet to find a site that is as comprehensive in its community and coverage as /. Let me restate that: there is no other place on the internet as essential to the geek community as /. Nothing comes close. /. works because of its community. The community values /. because it’s given us a place to discuss a broad range of topics in a civilized manner. Without the community, /. will simply be another discussion board on a news site, with loud voices screaming their opinions and very little civilized discourse.

If you haven’t noticed by the comments on many of the stories recently, the community will disperse if the beta is rolled out. I won’t go into the reasons why (there are thousands of great comments explaining the shortcomings of the new site) other than to say that the new format discourages discourse and community and encourages quick comments with little context.

Deep, threaded, moderated discussions that are easy read and inviting to participate in are what makes the current site work. The conventions for discourse that have evolved over the last 17 or so years, many enabled by the design of the site, are what allows the community to function. Change it too much, and the community will disperse and go their separate ways - off to Reddit and 4chan and the comment sections of Ars, Wired, Toms, and other tech sites.

The managers in charge of this redesign are facing probably the biggest decision of their career. By going forward with the beta, they will surely meet their quarterly goals. But, in doing so, they will be directly responsible for the destruction of one of the most important communities on the internet.

Please make the right decision and don’t go forward with the beta as it currently exists. (and note: redesigns are fine, just understand why your site works and don’t destroy its soul in the process)

-Chris aka rockmuelle

*My 6 digit ID is only due to the fact that I valued privacy in the early days of the internet and was reluctant to sign up for accounts anywhere.

Comment Re:Ya pretty much (Score 2) 299

More to the point, as a professional programmer and musician, as much as I'd love to write music software, I use my free time writing music. The commercial tools are really good and the parent is correct in saying that it will take a lot of effort to catch up.

Fwiw, I use Ableton for recording and production and hardware synths for sounds. Of you love hacking and music, check out the latest generation of affordable analog synths (Korg volca/monotribe, Artutia microbrute, etc). Desiging sounds from scratch is highly satisfying.

Comment Re:Data Scientist for mass mail company says... (Score 1) 124

I've argued that it's more like "web master" from the 90s. A tendy job that will soon be replaced by actual experts.

For data scientist, the experts are the traditional analysts and staticisians that were already doing these jobs before Hadoop experience became the only job requirement.

-Chris

Comment Re:Okay, I'll say it. (Score 1) 45

The rectangle with rounded edges was a design patent (http://en.wikipedia.org/wiki/Design_patent) not a utlitity patent, which the Google patent is.

I'm not saying the Google patent isn't bad, especially given the clear prior art with MS Comic Chat, but just that it's important to distinguish the types of patents when pointing out the inanity of the system. Design patents are a little easier to accept since they're closer to copyright on physical objects.

-Chris

Comment Spending investors money vs. their own (Score 2) 229

It will be interesting to see if they keep this up when they're spending customer's money rather than investor's. A blank business with a set amount of money to spend is easy to model this way. Once you start to find the real value in your offering and determine how revenue is actually made, things get trickier. One or two stellar salespeople or engineers can be responsible for an outsize portion of the business. They need to be compensated appropriately.

-Chris

Comment Re:Do Some Homework Allison (Score 2) 545

Luckily for this discussion, the data actually exists. Indiana recently went from not changing time to chaning time. Turns out energy costs are 1-3% more under daylight saving time than with out it.

Here's the citation:

NBER WORKING PAPER SERIES
DOES DAYLIGHT SAVING TIME SAVE ENERGY? EVIDENCE FROM A NATURAL EXPERIMENT IN INDIANA
Matthew J. Kotchen Laura E. Grant
Working Paper 14429 http://www.nber.org/papers/w14429

-Chris

Comment Re:Peer review isn't about validation (Score 2) 197

But the problem with this model is that there's no way for a grad student to publish a negative result if they fail to replicate the results. To compound the problem, if a student starts getting negative results, they will quickly change their course of research to something that may produce results. PhDs are not granted for negative results - there is little incentive to pursue research paths that aren't fruitful.

In the end, the student will know original the result is questionable, but the scientific community will not.

-Chris

Comment Open Source is not a Panacea (Score 5, Insightful) 307

Look, I use open source all the time and have contributed to many projects and ran a few. I love open source just as much as the next slashdotter.

BUT, broad statements like "open source will fix healthcare.gov" don't add anything to the conversation. What if it was built on open source and it failed? Would we be making the same claims about commercial software? "If only they had used WebSphere and DB2!! Everything would have been wonderful!".

No. No. And. No.

As many people have already pointed out, the problems with healthcare.gov are mostly the same ones that plague many large scale IT projects. Insufficient testing, complex interactions between many existing complex systems (which are hard to get right), consultants that get paid for code delivered, working or not, and so on.

Now, TFA actually makes the argument that healthcare.gov as an _open platform_ would be a good idea. It goes on to point out that that's one thing that makes some of the bigger web apps successful: they are platforms for building apps rather than apps themselves. How much of that is true is open for debate (is google really a beautiful platform or is it a bunch of hacks held together by duct tape? only google engineering knows for sure...) , but as a goal, healthcare.gov as a platform isn't a bad idea.

However, platforms don't just materialize from thin air. In fact, building a platform before you have apps is a recipe for failure. It's usually only after the third or fourth app that the patterns emerge that make a platform possible. It takes time for good platforms to evolve.

Given that, designing healthcare.gov from the beginning as a platform would probably have failed, too. The developers would have created a wonderful platform for some vague requirements that likely didn't actually meet the needs of an insurance exchange at all.

From a pure software engineering perspective, what's happening right now isn't that bad. Version 1.0 launched, it had problems. Let's get working on Version 2.0 and maybe try out some new ideas. Then for Version 3.0 and 4.0, we can start thinking of a platform. The other important point here is that you have to plan for multiple versions and long term maintenance/evolution for software. The suggestion that healthcare.gov should have been run as a startup in the government rather than outsourced is probably the best idea for fixing the problem.

-Chris

Comment Re: Lord Forgive me, but (Score 1) 316

According to Microsoft/Apple/etc. software developement costs large amounts of money and equipment, yet Linux and the open source community exist and flourish. How many scientists would risk their own money in their own experiments? If not, what does that say about the experiments?

Whoa there... Linux and most open source tools cost large amounts of money to develop. Look at the list of top contributors to Linux:

http://arstechnica.com/information-technology/2013/09/google-and-samsung-soar-into-list-of-top-10-linux-contributors/

Most of those are companies that are paying their employees to work on Linux. The sum of their salaries and the resources they require is a good part of what it costs to develop Linux.

Just for fun, let's estimate what the Linux kernel costs to develop each year. The actual report from the Linux Foundation lists the number of changes each organization made to the kernel. If you sum of the number of changes from commercial entities, you get 55,604 changes committed by paid developers. Assuming each developer contributes one change a day on average and assuming they work hard, that's about 200 changes/year/per developer. Dividing the number of changes by the changes per developer suggests around 278 full time developers are contributing to the Linux kernel. Assuming the average fully burdened cost for a kernel developer to be $250k, the cost for those developers is $69.5M/year.

tl;dr: The Linux kernel costs somewhere in the ballpark for $70M a year to develop. This is just the kernel, not the rest of the Linux ecosystem.

If those companies stop contributing to Linux, Linux goes away.

-Chris

Comment Re:not entirely false (Score 1) 394

I have to disagree with this.

Bugs in open source software can be fixed by _developers_, not any user. If you use open source software and are not a competent developer, you can't fix it. You can _pay_ someone to fix it, but at that point, there's a good chance your fix will cost more than a commercial alternative. If you do provide a fix, there's no guarantee that your fix will be accepted into the codebase. When this happens, you now have to maintain that fix with every new release, further adding to the cost.

I'd also like to get real data on the claim that lots of people look at bug fixes and vouch for them. I've been involved in a lot of open source projects and have found that this just isn't true. Even heavily curated projects like Boost don't necessarily get the scrutiny they deserve (I dare you to read the source for the Boost Graph Library or Spirit and say with confidence that they're bug free and secure, or even evaluate a patch submitted by one of the hard-core Boosters).

Just last week, I spent a few days going through the source code for Galaxy (a bioinformatics tool) since the documentation was almost non-existent. I'm pretty sure I looked over a number of bugs and security vulnerabilities and didn't catch them. For problems I did see, I don't have the time or resources to propose and execute fixes. And this is a tool that people are using in clinical applications.

It's easy to repeat sweeping claims about open source, but I'd like to see some real data to back up the common claims made to support open source over commercial tools.

-Chris

Comment Re:not entirely false (Score 1) 394

I don't have mod points today, so I'll reply and add some more supporting material. The parent's point on reliability is dead on.

When it comes to cost, we've done a lot of market research and internal analysis on the actual costs of basing a business on open source. When properly accounted for, open source can be much costlier than closed source alternatives. The basic reason is simple: open source software stacks take time to maintain.

Most organizations that use open source software have full time people dedicated to maintaining the software, just like organizations that use closed source software. However, in open source shops, the internal developers/analysts/etc (the _users_ of the software) also must maintain the software. This is where the hidden costs of open source lie. In many cases, using open source software forces everyone to become a developer, or at least a sys admin, whether they want to or not.

If someone's primary job is to analyze data for a business, they should spend most of their time either performing analysis, sharing results, or furthering their analysis skills. Instead, we've seen analysts (I work in genomics) that use open source software spend up to 80% of their time just maintaining their tools and working around limitations imposed by them. When commercial tools are available that perform the same function but without the hassle, few open source advocates will even consider them, even if the cost is significantly less than the cost of the time they spend messing with open source tools.

Oracle's probably not the best company to be leading this conversation, but it's important enough that the software community should engage in it. There was a time when commercial and open source solutions coexisted peacefully. It'd be nice to see some balance return.

-Chris

Slashdot Top Deals

"The most important thing in a man is not what he knows, but what he is." -- Narciso Yepes

Working...