Forgot your password?
typodupeerror

Comment Data dominates methods (Score 1) 387

I work in Natural Language processing/Computational linguistics, and I think I can see where this article is coming from.

Look at the case of Machine Translation (MT). For a long time, the approach to getting better MT was to develop better models (alignment models, syntactic models, semantic models, etc.). The idea was that if we used more sophisticated methods to model language phenomena, then we'll capture more nuance and produce better results. Well, that worked for a while. As, people started collecting larger and larger amounts of data, to the point where meaningful statistics could be calculated over a collection of text, simpler statistical models started beating out complex models created by trained experts. One of the big names in MT from IBM made the controversial, but accurate, statement that every time he fired a linguist, his accuracy went up.

Fast forward to Google and their petabytes of text data. At this point, some very sophisticated models, developed by some very intelligent people are being beaten hands-down by systems Google is working on that use very simple methods, trained on LOTS of data. The point is, as long as your models/techniques are reasonable having more data seems to dominate using better methods.

Slashdot Top Deals

Real Users find the one combination of bizarre input values that shuts down the system for days.

Working...