Follow Slashdot blog updates by subscribing to our blog RSS feed

 



Forgot your password?
typodupeerror
×
Education

Open Source Grammar Checkers? 17

DaveBarr asks: "Maybe I'm more sensitive to this than most, but after continuing to see "it's" instead of "its" and "loose" instead of "lose" everywhere in the media and on web sites of supposedly reputable origin, I began to wonder. Are there any Open Source projects trying to develop a reliable grammar checker -- one that would catch these common foibles? Are all these algorithms proprietary? Are there any University research projects which could be used as a basis for even a halfway-decent grammar checker?"
This discussion has been archived. No new comments can be posted.

Open Source Grammar Checkers?

Comments Filter:
  • by Anonymous Coward
    I'm always amazed when a Slashdot article
    is posted *without* any grammar mistakes.

    I've often wondered what would happen if
    the "preview" function for submitting an
    article included something like
    s/([iI]t)'s/$1 is/g

    Can we force geeks to recognize "it's"
    for what it is through technology?

    --kyler
  • by Anonymous Coward
    It is interesting to note that Grammatik was written by a very big Open Source author and advocate, Dr. Bruce E. Wampler [objectcentral.com]. Dr Wampler is also author of of the Open Source LGPL V C++ Portable GUI [objectcentral.com]. It is a very easy to learn C++ GUI that is portable between Windows, OS/2, and Linux. It was designed that way from the start and is quite a robust and clean product.

    Hmmm, I wonder if Dr. Bruce [mailto] has any thoughts on designing an Open Source grammar checker? He probably could offer a lot of guidance to any group who wanted to start such a product.

  • First, what's a homophone? It's the preferred term for words (like "sale" and "sail") that sound identical but mean different things. Our present system of teaching reading fails to set up the neural pathways to recognize and remember which of several possible spellings was used in some text. It seems to be enough to obtain the correct sound, and to forget about the spelling. It's not enough. We seem to be getting away from communicating via text, and only sound seems to matter to many people. It's really sad when "they're" ( = "they are") is used for "there".

    Building upon spelling checker code, a fairly small dictionary could provide all the data needed to identify most homophones. At the user's choice, each homophone could be flagged with alternate spellings shown in a dialog box, with really-concise meanings for each. The user would select the intended meaning.

    So far, this idea seems to have generated little interest, but it would help create fewer ridiculous bodies of text.

    Far more ambitious would be a lexical analyzer that would try to deduce whether a given homophone seemed appropriate for the meanings of the words (a bottomless pit?) in the surrounding text. (Bloatware, anyone?)

    Nicholas Bodley // nbodley@world.std.com

  • It's true that most AI programs have a necessarily limited semantic model, based on a few logic predicates and deductive rules. Logic itself is a philosophical construct, derived from observations of how people reason and solve problems, but it's not really a model of how the brain works, and efforts to get computers to assemble their own sets of rules and facts have been largely unsuccessful.

    "When people try to get computers to learn, the people do and the computers don't" - Alan Perlis

  • Word Perfect for linux comes with a grammar checker, Grammatik, licensed from Novell.
  • Forgive me if I'm wrong here, but won't KOffice come with a grammar checker?
  • Squad helps dog bite victim (a classic indeed).

    Or:

    The boy is hungry
    The boy is a toad

    Or:

    The boy carried a sandwich to the playground and ate it. (the playground? Note that conjunctions are the most ambiguous words in the English language.)

    It's easy for us to tell how to parse those, but a computer would have to maintain a database of the following:

    playground is big
    sandwich is small
    people normally eat small things
    when dogs bite, they harm humans
    a noun indicating [a] human[s] (squad) would not harm humans.

    One can argue that the purpose of learning is to fill in those pieces of knowledge, but:

    1) The amount of knowledge that would have to be stored and recalled is *huge*.
    2) Even if we have the storage and recall capacity, computers need to be able to interpret everything and know that, among other things, squad can be a group of people, "normally" may not always apply, etc. etc.

    void recursion (void)
    {
    recursion();
    }
    while(1) printf ("infinite loop");
    if (true) printf ("Stupid sig quote");
  • Im very interested in the project of writing such a beast. I have been interested in natural language processing for years. Im also a C coder (under *nix). Anyone interested please email me at joshr@netspace.net.au

  • "s/([iI]t)'s/$1 is/" is (ugh) Perl for "substitute `it is' or `It is' for every instance of `it's' and `It's'" I don't know why people expect everyone on /. to understand Perl. I only use it for fixing other ppl's broken Perl code.
  • The problem with parsing English lies in the nature of English itself. English was not designed to be parsed. it was not designed with a logical structure that has been consistently implemented.

    The question is what do you mean by a grammar checker? If you simply mean a program to read text and try to find obvious errors. You do not need to be able to parse English completely to do this. To extend the example from above you do not need to know exactly what "The cow is brown" means. Only if the tense agree. That program would just need to be able to recognize certain patterns as wrong. That is not impossible.

    As for the other side of it, a program that actually understands what you are writing and figures out the best way to communicate that. This is much more complex. It would be a very cool program if it could be completed. Besides, what better than OSS to harness the immense mindshare that would require?

    That being said, my grammar is so horrible I would love to see either one working as soon as possible.

    Nate Custer
  • There is a program called diction.

    This is a GNU program still in development. It's available at:
    this link [moria.de]

    I've played with diction and it's not bad, not great but not bad. :)
  • Frankly, I'm suprised that I haven't seen a program that understands a spoken human language. The rules are codified in millions of textbooks and semantics should be parsable from WordNet, the OED or various other sources. And there are plenty of 'M-x doctor'-like programs that try to emulate conversation; and some of them, like megahal, can 'learn' well enough to fool some people.

    I've even played with coding a C library that reads like English without proper writing mechanics. A natural language interpreter shouldn't be too hard, though it would be time consuming and would probably not produce a substantial return on investment to a financial sponsor.

    I am inclined to think that the problem is ideological. There are so many disagreements among philosophers, linguists, and computer scientists as to the meaning of 'The cows are brown.' that unless one person is sufficiently savvy of all three and some other disciplines, no consensus or plan will ever be implemented.

"Protozoa are small, and bacteria are small, but viruses are smaller than the both put together."

Working...