No, no, no, no, NO. DO NOT use regular expressions to validate context-free languages. The only way to validate SQL using regular expressions is to use Perl-style regular expressions with backreferences, and those are actually pushdown automata. They're also next to impossible to read.
I'll toot my own horn and point out that context-free validation can be done sanely
However, I don't know a real language where it's easy enough to write new types to make this feasible.
Ada, but it's unlikely you'll be using Ada unless you're working for DoD or something.
Yep, that would be me. It's been a long strange trip, but eventually I ended up in the CS department at the University of Iowa.
My Query By Example project uses a support vector machine (a type of machine learning algorithm) to learn classification rules based on the set of examples you specify. Those rules then get applied to the rest of the data points in whatever table you're looking at. So, yes, there's a lot of big nasty math -- at its core it's a quadratic programming problem. I didn't want to get into that in the interview because I figured nobody would get it.How would it work for a site like OKCupid? Their matching algorithm is based on users' responses to multiple-choice questions -- assume each response has some numeric (enumerated) value. Throw all those values into a table, probably via a join, such that each row is a user and each field corresponds to a question. (Let NULL values correspond to questions a user hasn't answered.) You in front of your computer will be looking at people's profiles, but the system operates under the assumption that the person will answer questions in a manner consistent with their profile, so if you mark several people that you're interested in and several that you're not interested in, based on their profiles, the system can train a classifier based on their answers to questions and find people whose responses are similar.
I don't think OKCupid is using the same math I'm using, but their approach is probably pretty similar.
//GO.SYSIN DD *, DOODAH, DOODAH