Want to read Slashdot from your mobile device? Point it at m.slashdot.org and keep reading!

 



Forgot your password?
typodupeerror
DEAL: For $25 - Add A Second Phone Number To Your Smartphone for life! Use promo code SLASHDOT25. Also, Slashdot's Facebook page has a chat bot now. Message it for stories and more. Check out the new SourceForge HTML5 internet speed test! ×

Comment Re:Doesn't do C++x17 (Score 1) 195

They are actively working on full C++ language support. But, 2017 doesn't mean C++17 - the release year has nothing to do with what it supports. The actual version is VS15 (VS 2015 was version 14).

MS is working on language support in two ways. First, they're trying to get two-phase lookup into their own frontend, but this has been very slow work because it doesn't even have an AST. Secondly, they're working on an Clang based frontend, which already has all the goodies. You can already install the Clang preview right from VS itself.

(And, what's with C++x17? You either say C++1z or C++17 - you do not say C++x17 - the letters are only for unofficial versions. The versions go: C++0x, C++11, C++1y, C++14, C++1z, C++17 ... C++2x, C++20)

Comment Re:I actually feel for NATO (Score 2) 134

Fix your culture?

Here in Denmark, nobody talks during the movie, nobody is using their cellphone, people are just there to watch the movie. The theater is cleaned after each screening. Oh, and we have assigned seats. When you order your ticket, you also pick what seats you want. We have both still and motion ads and 1-3 trailers for other movies before the feature.

Been this way for at least 30 years, and works great. If it doesn't work in whatever country you're in, fix it.

Comment Re:Suspicious (Score 1) 91

"Chinese and back" is not a valid metric. Translation party sites are fun and all, but translation engines are not symmetric because languages are not symmetric. A translation is often going to be imperfect, so using a raw translation as input just amplifies the error level.

The metric you want is: How much effort would a human have to put in to make the translation output idiomatic for the target language? And the answer to that is decreasing rapidly with modern quality rule-based translation engines.

Comment Re:Interesting name. (Score 1) 34

MS marketing cannot come up with unique names to save their life, or they just prefer to take generic terms and slap Microsoft in front. Either case, it's truly getting annoying. MS Surface, MS Edge, MS Stream, Windows Phone, all horrendous names. And when they do come up with original ones, we get Zune.

MS marketing hated the original Xbox name and tried to get it changed, but was shot down by popular vote. They've never been able to figure out what names would resonate with people.

Comment Re:Rule-based still easily best (Score 1) 56

...the problem with rule-based grammars that lack any statistical weights is that they come up with an unbelievably large number of parses for many real-world sentences.

Generative grammars suffer from that problem and scales very poorly, and may indeed be impractical to use for real world text. Our constraint grammars and finite-state analysers do not have that problem. With CG, we inject all the possible ambiguity into the very first analysis phase, then use contextual constraints to whittle them down, where context is the whole sentence or even multiple sentences. This means performance scales linearly with number of rules.

So the 96% accuracy claim is suspect, not to mention that a comparison of the Google system is already difficult because Spanish =/= English. (Spanish has more morphology on verbs, it's pro-drop, it has relatively free word order compared to English,...)

The paper is for Spanish, because that's what I could find. Our other parsers, including English, are also at the 96% or better stage, but because it's mindbogglingly boring to do a formal evaluation, we don't have up-to-date numbers.

So I don't believe you can say that "Google is hopelessly behind the state of the art."

Given that we had 96% in 2006, 10 years ago, and Google only now has reached 94% (90% for other domains), I feel confident in saying Google is very far behind.

Comment Re:Rule-based still easily best (Score 1) 56

Who said they're giving away their best stuff?

The nature of machine learning does. All they're giving away is an algorithm and a system trained using that algorithm. Linguistic machine learning is a field where even a 0.5% improvement takes years to get and is worth a paper. So even if they aren't giving away their top algorithm, their best one can't be much better.

Comment Re:Rule-based still easily best (Score 2) 56

which seams much more expensive than

It'd seem that way, but it's really not if you factor in the whole chain.

Machine learning needs high quality annotated treebanks to train from. Creating those treebanks takes many many years. It is newsworthy when a new treebank of a mere 50k words is published. Add to that the fact that each treebank likely uses different annotations, and you need to adjust your machine learner for that, or add a filter. Plus each treebank is for a specific domain, so your finished parser is domain-specific. If you want to work with other kinds of text, you need to produce a treebank for that domain and then train on it.

Thus, the bulk work is in annotation and mathematical models. Google skipped the step of creating a treebank, and instead use available ones. There aren't any usable treebanks for smaller languages, making the whole machine learning endeavor useless for all but the large languages.

Rule-based parsers are the opposite of that. You can put the same amount of man hours into creating rules as you otherwise would a treebank plus mathematical model, but you can do so on any old laptop with almost zero data to work from. You just need to know the language. A parser produced in this way is not domain specific, but can be easily specialized for a domain if needed. And a rule-based parser can be used as a bootstrap engine for creating high quality treebanks, because the rules are upwards 99% accurate, meaning humans only need to put a fraction of work on top of it.

And as I wrote, rules are debuggable. You can figure out exactly why a word was misanalyzed, and fix it. Machine learning can't do that. The edit-compile-test loop of machine learning is in weeks or hours - with rules it's in minutes or seconds.

Comment Rule-based still easily best (Score 2) 56

94% syntax is definitely good, for a machine learning parser. Now if you were to come to the land of rule-based parsers, 94% is the norm.

Google loves machine learning, and it's easy to see why. That's how they made their whole stack. They have the huge amounts of data to train on, and the hardware to do so. It's so seductive to just throw a mathematical model at huge amounts of data and let it run for a few weeks.

Rule-based systems don't need any data to work with - they just need a computational linguist to spend a year writing down the few thousand rules. But the end result is vastly better, fully debuggable, easily updatable, understandable, and domain independent. That last bit is really important. A system trained for legalese won't work on newspapers, but a rule-based system usually works equally well for all domains.

In 2006, VISL had a rule-based parser doing 96% syntax for Spanish (PDF) - our other parsers are also in that range, and naturally improved since then. Google is hopelessly behind the state of the art.

Slashdot Top Deals

"Show me a good loser, and I'll show you a loser." -- Vince Lombardi, football coach

Working...