Want to read Slashdot from your mobile device? Point it at m.slashdot.org and keep reading!

 



Forgot your password?
typodupeerror
×

Comment Re:Poor Quality Assurance does not boost confidenc (Score 0, Flamebait) 183

The problem is that we have one peephole and one organization looking through it and crying wolf. Under normal scientific circumstances, if you say something preposterous, I go spend my $20 replicating your experiment and prove you're a fool. But when your instrument costs us billions dollars and you use it to make absurd, demonstrably insane claims, only to admit you're working the instrument wrong—and you do this repeatedly—you make us all look like fools for spending the money on you. Which is a shame, because the world would be a better place with more research.

Comment Re:When programming tools and databases meet.. (Score 1) 29

Map data structure (in my programming language) direct to data storage (in my database), with little in between, even if my data structure is on one physical machine, and my storage is on another.

That's the crux of the problem. You see the programming language as having the actual structure, and that dusty databasey thingie over there as having some sort of optimized-for-storage representation of the data. In reality, that database has a much richer notion of your data than your program does, and your program has an optimized-for-this-calculation representation.

The proof WaffleMonster submits is the CREATE VIEW statement. This little feature isn't a way of making your SELECTs shorter. It's a way of allowing your applications to restructure the data being stored without introducing copies. It's a way to ensure that when the database is refined (i.e. storing more in more complex ways) the applications you already have don't need to be rewritten to participate.

It depends on how you define ACID

Not really. Every distributed database forgoes some aspect of ACID compliance. Take my shitty iTunes Match service. Just today, I had to delete 28,000 copies of a playlist from iTunes. Why? Because iTunes Match doesn't know who to believe when two devices disagree about a playlist having been made, so it defaults to resolving the conflict by creating a playlist rather than deleting. In a distributed scenario, you are going to have to decide how to handle conflicting versions of the truth. MySQL cluster is marketed as if you don't need to do that, but as someone who has run MySQL with replication, I can tell you, it's extremely fragile. The PostgreSQL version is much harder to set up. Why? Because there are hard questions there that you have to be able to answer, but if you are essentially a front-end engineer who's really excited about Node.js, you have no fucking idea what a hard problem is.

The absolute irony about this whole situation is that all this effort is being made to enable distributed data storage why? So we can have one domain with all our data on it: Facebook, Twitter or whatever. Wouldn't it be neat if, instead of having these massive centralized systems we had some kind of distributed way of hooking computers together?

Plenty of apps try to use offline storage as a cache when disconnected from the main data store, for example.

Yes, and how many of them get it right? Using a local store as a cache is explicitly giving up on ACID compliance: you have to be able to anticipate the situation where your changes can't be applied to the server, and what to do if that happens. This is an application-level problem that can't be swallowed by the database. In some apps, the server will always be right, in some apps the client will always be right, but in most apps, it will be a nasty ad-hoc combination of the two that will demand mental effort to reconcile.

That's not necessarily true if your code is the interface to your database and if your code enforces the necessary constraints so that your database can play dumb.

If you're doing this, you're wasting your database and you're lying to yourself. You're lying to yourself because you think you can shoulder the burden of writing integrity into your application. You're also lying to yourself if you think you can manage concurrency correctly, and you're lying to yourself if you think that your application or library is the only one that's going to access your data and can be the gatekeeper. If any of these things seems reasonable to you, come back here in a decade and let's talk about how it worked out for you.

Relational databases, the good ones anyway, have 30-40 years of debugging and optimization poured into them. If you want to take a year and do your little "database" project, I believe you could make something that could compete with or match a traditional relational database in any one area: speed, integrity, storage performance, feature set or whatever dimension you like. But it will suck eggs for every other parameter. And indeed, if you look at the popular NoSQL databases that is what you see: a fast one that loses data, a slow one that doesn't, one that has great conflict resolution, and so forth. You choose one of these and you are essentially saying, "yeah, we're never going to care about that other stuff." At least not for 5, 10 or 15 years. But how often do your requirements change? More often than that, perhaps? And how many of these databases are going to be around in that length of time? CouchDB didn't even make it to its 6th birthday before being forked and losing the primary author. If you're in it for glory, you're not in it for the long haul.

Using a relational database, you're not just subscribing to some sort of death by SQL. You're buying into decades of work by truly hardcore engineers who did not do it to get famous (oh, don't worry, I'm not talking about Monty!) You're dealing with really solid software that has been proven in production for ages.

all progress depends on the unreasonable man!

You know, it's sort of true. After all, E. F. Codd was once that guy, and it was literally more than a decade before there were databases that performed adequately. The difference is that Codd started with the math, showed that it could be possible and that it would be a win, and then after enough time and energy was invested, it became feasible, and then awesome. This is the opposite of what's going on with NoSQL: a rejection of what came before from front-end developers who probably couldn't take a derivative if their life depended on it, complete lack of acknowledgement that they are reimplementing the very network and hierarchical databases that preceded relational databases, all of the emphasis is on performance and almost none of it is on reliability, expressiveness or any sort of mathematical attribute that data might have. No knowledge of what came before and what mistakes they might be repeating.

I'd find it a lot easier to put stock in these dreams if I thought they weren't, fundamentally, about undermining forty years of progress for the sake of a little venture capital today, but that's exactly what I see: everybody talking about scalability is either a scalability engineer at one of the big N where it actually matters, or, more likely, some moron who thinks they're going to take over the internet in six months with their stupid knockoff variation. And of course they want the answer to be simple, because if the answer is complex they don't stand a chance. Unfortunately, the answer is very subtle. Simple and subtle look related but are actually quite different.

Comment I call BS (Score 1) 29

You can't call it an ORM solution if the backend isn't a relational database. Where are the relations? There aren't any. There's just documents. Not an ORM. This is not 'Nam, there are rules.

Moreover, this has nothing to do with the actual problem. The reason MongoDB et. al. are gaining steam is precisely because there are these "web scale" morons running around ignoring real problems like data validity in favor of starry-eyed dreamland problems like scalability. You can write nice happy OO-like bindings to any document store and make the same claims.

The actual problem underlying the O/R mismatch is that the relational database is a high-level, declarative tool. How do you wrap something high-level in something low-level? You don't, or you reimplement 90% of it. The ORMs that make egregious assumptions and overcomplicate things are the problem (I'm talking about ActiveRecord and Hibernate). There are other, better ORMs which work because they have an appreciation of the difficulty of the task and actually attempt to bridge the gap (yeah, SQL Alchemy).

The problem is overhyped and so is the solution.

Comment Re:Haskell !! (Score 1) 278

You obviously haven't heard of functional reactive programming, which greatly simplifies writing interactive apps. You don't wind up needing state monads for everything.

I agree that it isn't the right tool for this situation, but the main problem is the state of the GUI libraries. Getting GTK installed is not easy (but getting easier). Not something I have desperately wanted to do though. Though it certainly can be done, it would be tremendous effort, and since the OP is asking here for a recommendation he probably isn't swift enough with Haskell that it wouldn't be a huge effort on top of doing his GUI experimentation. Learning Haskell is hard, and it will be an impediment.

Comment Re:Dead trees == outdated as soon as printed (Score 3, Interesting) 160

They're not going to throw out the JVM and rewrite it from scratch between releases. If there are 60 options now, there may be 66 in the next release. That means 90% of the book is still useful and the other 10% is just missing.

On top of that, as the reviewer clearly states "Unlike most computer books, there's a lot of actual discussion in Java Performance, as opposed to just documentation of features.... there are pages upon pages of imposing text, indicating that you actually need to sit down and read it...". So this book is already the kind of book that isn't going to be overturned by one more JVM release. It may contain actual wisdom rather than a list of flags.

Comment Re:Am I the first to call BS? (Score 2) 354

I think you can actually make some tentative links. For example, if you have some product that sells very rarely and you take the intersection of the sets of cars that are in the parking lot whenever that product is sold, if that intersection becomes one car, the probability that this is the guy buying that product is probably higher than if you just averaged the sales of the product over all the cars that were ever present during that purchase. After all, if this product X is only purchased a few times a year and the only car that was there each time was car Y, the probability that this guy driving car Y just "happens" to be there every time that purchase happens becomes lower too.

I'm no statistician, but it seems like you could calculate the probability of it being a coincidence versus the probability of there being a relationship, and when the probability of there being a relationship is high enough, you could take the leap and make the assumption. Of course, you'll get false positives, probably many of them, but if you crank your thresholds up high enough it may be a net win.

A simpler way to improve your data would be to ferret out whatever public information you can about the owner of a given license plate. I wouldn't be too shocked if there were ways of getting this information in bulk. After all, you could do the same sort of subset thing with credit card purchases. If I see person A, B, and X on day 1 and person X, Y and Z on day 2, and I see cars a, b and x on day 1 and x, y and z on day 2, the same sort of subsetting operation could get you a bunch of single-element sets. You'd still probably have to have lots of days of information, but when you have 24-hour monitoring times thousands of stores nationwide times tens of thousands of customers per day per store, you quickly develop a pool of information you could sift through like this. And once I know your car is the one with plate X, I know it for keeps: you can stop paying with your cards all you want, I only needed so many repeat instances to figure it out.

Ultimately, it would be easy to get freaked out by all this, but let's remember what this information is used for: to send you coupons you'd actually want to use. That's the whole thing. Dial back the paranoia a bit.

Comment Multiple Projects Planned (Score 1) 57

I have multiple projects planned already. The first is to use it as a very cheap, simple router. I have a zyxel wireless AP, but it won't accept USB tethered cell phones as WAN connections. So I'm going to use the cell phones as usb modems to the rasberry pi and use the pi as an ethernet gateway to the zyxel.
The next project is to use the rasberry pi + old monitors as thin clients to my servers. That way I can monitor them from my desk without going through a full computer. (Other option is to buy cheap android tablets to do it.)

Comment Re:Next step (Score 1) 308

I hate to ruin your argument by pointing out an obvious fallacy, but an iBooks "textbook" stretches the definition of "book" way past the breaking point. I also doubt there are going to be competing implementations of the iBook textbook reader or other bookstores from which to distribute them. You'd certainly miss out on the iBooks marketplace, which one can reasonably assume will be the only meaningful distributor of iBooks books and therefore iPad books.

Complaining about this note in the EULA while ignoring the overall ecosystem is picking the pepper out of the fly shit. If you have a problem with this, you probably have lots of other issues with Apple or iBooks that aren't going to be resolved by fixing this detail. Likewise, if you don't care about those details, you probably don't care about this one either.

Slashdot Top Deals

"God is a comedian playing to an audience too afraid to laugh." - Voltaire

Working...