Slashdot is powered by your submissions, so send in your scoop

 



Forgot your password?
typodupeerror
×

Comment Re:The Fuck? (Score 1) 175

And then what? How do you get them to work together? You could add a coordination system between them so you have a bunch of slaves nodes getting dispatches from the master node? Well to make that work you need to design your data to be partitioned and your workload to be able to be combined at the coordination level with little contact between the coordinator and the slave nodes. That's a big data engine.

Comment Re:The Fuck? (Score 0) 175

Your comment about set theory is nonsense. This is computer science not math. In math (and especially set theory) a problem has a finite solution if given any finite amount of computation power C there exists any finite amount of time T such that the algorithm will arrive at an answer. Computer science is all about making C and T small.

If you want to have a long conversation get an account.

Comment Re:The Fuck? (Score 1) 175

Absolutely true. SQL technology is much more mature. But just like SQL made sense in a world where non-SQL COBOL based systems were more mature for many client-server workloads, big data makes sense for non client server workloads of the kind that are more similar from a hardware standpoint to the old non-SQL workloads.

Comment Re:The Fuck? (Score 4, Informative) 175

SQL engines are often slower than what?

Than engines designed for massive parallelism in dealing with workloads which can be effectually processed in parallel.

Operating on what hypothetical database schema with how many records spread across how many tables?

Generally NoSQL engines use schema on read techniques not schema on write. The table structure comes during the read. To get some sort of fair comparison something like a typical star schema with a much too large fact table (think billions or trillions of rows) and a half dozen dimension tables.

Or if you really want to make it worse. The same query where the table is getting 1m writes / second and you want an accurate stream.

SQL engines have problems with massive parallelism? Why? Which ones?

Because SQL by its nature operates on the table not the individual rows. Older database technologies that were row oriented like what you see on a mainframe on in SaS work better when the ratio of table size to computation speed is low. Today because disk storage size per dollar has gone up so fast, we disk we face many of the same problems systems in the 1980s faced with tape.

And the next question is pretty much all of them. The big data SQL engines have the least problems though and via. their execution plans turning into map-reduces might present a viable long term solution.

How well do you *really* know SQL in general and the capabilities of different database engines in particular?

Assume I don't know anything. Oracle, which has the best engine and SQL people on the planet has a guide for hybridization to handle things their engine can't handle well. IBM which probably comes in second and invented the relational database produces their own Hadoop / R to handle queries that DB2 (which is BTW far better than Oracle at stream) can't handle. Teradata's engine which was originally written specifically for larger amounts of data for a decade has had specific features of another subsystem to do enhanced big data, they also have guides for hybridization for things even their enhanced engine can't handle And Microsoft which writes the 3rd most popular engine has spent many millions on hybridization strategies. Enterprise DB (postgres) fully supports the IBM strategy.

I don't know anyone in the space who does agree with the /. "SQL can do everything" attitude.

but that portion off the article was ridiculous, and thus far all of the comments in support of it have demonstrated a similar lack of familiarity with actual databases, their operation, or performance tuning.

The article was ridiculous I said as much in another response. However the comment I was responding to went much too far in the other direction. As for performance tuning -- performance tuning is designed to avoid full table scans and expensive joins. To goal of many hybridization strategies is to take a raw data flow and convert it into a relational ETL using a big data engine which can take advantage of indexing and a better execution plan. It doesn't do much good when the initial goal is to do a full table scan.

Comment Re:The Fuck? (Score 4, Informative) 175

I know SQL pretty well. I agree with you it handles most stuff. That doesn't mean it handles everything.

SQL engines are often slower.
SQL engines have problems with massive parallelism (i.e often at around 12 CPUs you stop gaining much at all by adding addition CPU).
SQL engines have problems with complex in document (i.e. in blob) searches
etc...

Comment Terrible arguments for Big Data (Score 1) 175

I'm a big data advocate. I like the idea of engines designed for unstructured data. But the two examples in the article barely even register as difficulties of relational databases, "What if two people share the same address but not the same account? What if you want to have three lines to the address instead of two? Who hasn’t tried to fix a relational database by shoehorning too much data into a single column? Or else you end up adding yet another column, and the table grows unbounded.".

As for his comments on denormalizing, I'm wondering if he has ever head of a data warehouse and a star / snowflake schema both of which handle the "I want cheaper joins" problem without having to denormalize the dimension tables.

Comment Re:Causes of hording. (Score 1) 107

The department of defense runs servers out of house. Lockheed Martin runs a cloud provider. Many of the country's banks handle it. There is no question you can buy better security than any company has internally.

As for running an internal cloud that's pretty easy and they could ask a vendor to run the financial it while keeping all the servers physically on their prem.

Comment Re:Causes of hording. (Score 1) 107

One way to handle that is to not own your infrastructure and just rent month to month from the vendor who provides a pool of servers. What you are likely facing is the problem of how to prevent the administrative cost from going above X% by preventing the IT administrative cost from going about Y% by slowing down acquisitions... Better yet is just to guarantee Y and save the labor.

Comment Re:Money (Score 2) 107

At this point for almost all companies good quality colo space is infinite. Most times a company isn't even using a meaningful fraction of their colo's space and so they could double or triple instantly without hassle much less an extra 33%. And even if their colo doesn't other's direct connected to it do have extra space... So consider space infinite once you are willing to rent.

That being said, I have problems believing the 1/3rd of severs figures from the article. That's not my experience at all.

Comment Re:NAN (Not a Number) (Score 1) 1067

It can in better type systems and those exist today (example "safe division via. the Maybe monad") You can have a NAN added to the integers ( but then your integral math code can't execute directly on the arithmetic logic unit in the CPU, it becomes an abstract type. For higher level languages that often won't matter anyway.

Slashdot Top Deals

We want to create puppets that pull their own strings. - Ann Marion

Working...