Forgot your password?
typodupeerror
Cloud Virtualization

VMware's Serengeti Brings Hadoop To Virtual, Cloud Environments 28

Posted by Soulskill
from the lions-to-be-added-in-future-patch dept.
Nerval's Lobster writes "VMware's Serengeti is a new open-source project for deploying Apache Hadoop in virtual and cloud environments. Serengeti 0.5 is available as a free download under the Apache 2.0 license. It has been designed as distro-neutral, with support for Apache 1.0, CDH3, Hortonworks 1.0 and Greenplum HD 1.0. Of course, VMware isn't the only company seeking to leverage the increased interest in Hadoop. In June alone, midsize IT vendors such as Datameer, Karmasphere, and Hortonworks have all announced platforms that utilize the framework in some way. Research firm IDC recently predicted that worldwide revenues from Hadoop and MapReduce will hit $812.8 million in 2016, up from $77 million in 2011."
This discussion has been archived. No new comments can be posted.

VMware's Serengeti Brings Hadoop To Virtual, Cloud Environments

Comments Filter:
  • by busyqth (2566075) on Wednesday June 13, 2012 @03:09PM (#40312793)
    So, if I've got only one server, then up to now, I would have to just run an application on that server.
    But now, with only a little overhead, I can pretend to be running the same application in a distributed manner on a cluster, even though it's actually still running on the single server.
    I have to admit this is pretty awesome.
    • by ArsonSmith (13997)

      Or you have a smaller cluster of big boxes running a big cluster of smaller boxes(vms)

    • by Anonymous Coward

      Just imagine, you can build a beowulf cluster with just one computer, instead of dozens!

    • by tom17 (659054)

      So there are a lot of new names and jargon in the summary that I am not yet familiar with, but could you not just do this before using virtual machines?

      I am sure I am missing the bigger picture here somehow...

      • by abigor (540274) on Wednesday June 13, 2012 @04:26PM (#40314011)

        Yes, of course you can manually set up Hadoop in whatever environment, but it's a pain and generally speaking management is annoying. This new project appears to alleviate at least some of that, making it easy to remotely deploy and manage a Hadoop cluster. At least, that's what I got from the demo video - there's probably more to it.

        Regarding Hadoop, I'm always surprised by its popularity given the relative fragility of HDFS (the NameNode is a single point of failure; other distributed filesystems have beaten this problem) and the dubious, beta-like quality of the tools built on top of it (Pig, etc.)

        • by Rakishi (759894)

          Regarding Hadoop, I'm always surprised by its popularity given the relative fragility of HDFS (the NameNode is a single point of failure; other distributed filesystems have beaten this problem) and the dubious, beta-like quality of the tools built on top of it (Pig, etc.)

          So what's the alternative you recommend?

  • by Lord Grey (463613) on Wednesday June 13, 2012 @04:25PM (#40313977)

    From TFS:

    Research firm IDC recently predicted that worldwide revenues from Hadoop and MapReduce will hit $812.8 million in 2016, up from $77 million in 2011.

    Notice that the revenue is directed toward the few companies supporting and extending Hadoop. If you're working for one of those companies, congratulations. If you're working for one of the companies that is spending its money on this new shiny thing, you're probably in for a ride (one way or another). The technology is definitely good, I'll grant you that. But it is not the solution (or, not a very good solution) for many of the problems IT/data shops have. It really seems that a lot of people are jumping on the Hadoop bandwagon because "everyone else is getting it" and not because it will solve particular, concrete, existing problems. Or, it will solve exactly one relatively small, concrete, existing problem while erecting a complex infrastructure that must be supported for several years, making it more of a PITA than a solution.

    Anyway, back to my original point: I think this revenue citation is more of an indication of a technology bubble and successful marketing than anything else. The price IT will pay for that bubble will probably far exceed the original cost.

    • by Rakishi (759894) on Wednesday June 13, 2012 @04:57PM (#40314433)

      As someone who actually uses Hadoop, you're so far off the mark you've hit a bystander in the head. Dealing with large amount of data is a major PITA. If you don't understand that then you must never have worked with anything but trivial data sets. Hadoop fixes much of it, period. Without having to spend insane amount of money on databases, DBAs and still not being able to scale properly. It's not optimal but it works, it scales and it's flexible.

      That's why companies are moving to it.

      • by Anonymous Coward

        +1

        Once you have collections of hundreds of millions of objects and need to work with a trillion properties,
        everything that you know about working with data stops working.

        Hadoop is not being adopted because it is fun and trendy, but because it addresses real needs now with resources that are attainable.

      • by Lord Grey (463613)

        As someone who also works with large amounts of data every day, I know exactly what I'm talking about. You may want to reread what I actually wrote.

        Hadoop is a decent technology and is one approach to dealing with "Big Data" problems. There are other products out there, and for the most part they have all been around a lot longer than Hadoop. The problems all these products address have been around for quite some time, as most people know.

        So what is the difference at this point in time? Did everyon

        • by Rakishi (759894)

          Hadoop is a decent technology and is one approach to dealing with "Big Data" problems. There are other products out there, and for the most part they have all been around a lot longer than Hadoop. The problems all these products address have been around for quite some time, as most people know.

          So what are these alternatives? I like how people keep mentioning "alternatives" but never state them by name. Afraid of their actual flaws being ripped apart I guess. Always a vague "other options" statement.

          Hadoop is inexpensive, flexible and well supported. It's cheaper overall than paying for some silly clustered RDBM licence which is optimized to solve a problem you don't actually care about. If you don't realize the specific set of problems Hadoop excels at solving then, frankly, you really don't unde

        • by Rakishi (759894)

          Also, a million rows of data? Most any decent web startup that does data is probably running at a million rows of data per day. Minimum. Maybe closer to a 100 million once they get around to collecting everything and got a few companies or users on board. Especially once you remove silly idiotically low restrictions on scaling and storage (unless you spend $$$$$). Got more data? Add more nodes, problem solved, get on with running the company. And they want to run complex analysis over the last year of data

There's got to be more to life than compile-and-go.

Working...