Forgot your password?
typodupeerror
Education Software

Essay Grading Software For Teachers 535

Posted by timothy
from the sounds-better-my-12th-grade-teacher dept.
asjk writes "Software to help teachers with grading has been around for sometime. This is true even with respect to grading essays. A new tool, called Criteria, will look at grammar, usage, and even style and organization. It works by being trained by at least 450 essays scored by two professionals. The difference this time? Here is a snip from the article: '"There's a lot of skepticism," Dr. Spatola said. "The people opposed see it dehumanizing the student's papers, putting them through some sort of mechanical, computerized system like the multiple choice tests. That's really not the case, because we're not talking about eliminating the human element. We're making the process more efficient."'"
This discussion has been archived. No new comments can be posted.

Essay Grading Software For Teachers

Comments Filter:
  • Interesting.. (Score:5, Insightful)

    by rsheridan6 (600425) on Saturday September 06, 2003 @10:19PM (#6890753)
    that they've automated away a major part of a professors job, while we still need humans to pick spinach and deliver pizzas.
    • by focitrixilous P (690813) on Saturday September 06, 2003 @10:24PM (#6890774) Journal
      Nope, robots [slashdot.org] will soon do [slashdot.org] it [slashdot.org] all. [slashdot.org]
    • by arcite (661011)
      If I have learned anything from my university career it is this: As class sizes get larger, testing becomes more frequent and more automated. Of course you say, if you have a class of one hundred or more people, it is simply not possible to mark that many essays. This usually means that essays don't need to be written at all! What do they do? Multiple choice! I heard a statistic once that if you chose answers randomly on a MC test that you could get a C by not knowing anything beyond how to circle a lett
      • Automated is good. (Score:3, Insightful)

        by HanzoSan (251665)

        Automated is good because theres less chance of error, and its almost always fair.

        The only way to get fair grades in university is to be smart enough to pick the right teachers, and drop the ones who you dont get along with.

        I heard a statistic once that if you chose answers randomly on a MC test that you could get a C by not knowing anything beyond how to circle a letter! ----- Discovering this, I made sure that I took all the obsure english classes that had no more than 30 people in them. An unexpected
        • by SatanicPuppy (611928) <Satanicpuppy@[ ]il.com ['gma' in gap]> on Sunday September 07, 2003 @01:03AM (#6891396) Journal
          The funny thing about this is that, if the essay is graded by computer, the best way to write the essay would be to have the COMPUTER write it. The same criteria that the program would use to grade the essay could very easily be turned around and used to generate an essay that the computer will love. Having a computer written term paper given an A by a computer grader is worthy of an Ionesco play.

          Beyond that there is no way the computer will be able to distinguish between something truly interesting and something that just lists the facts in simple Dick and Jane language with an occasional compund sentence to keep the grammar checker happy. All it can do is check for fact1, fact2, fact3, and any interesting conclusion you draw in the paper will be completely lost. Anything more would be turing test worthy, and I heartily doubt they've achieved anything close to that.

          Elegant prose is often not strictly grammatical, so a boring paper would likely score the same or better than a far better written essay with the same facts. I routinely turn off grammar checking in every program I've ever used it in. Aside from the occasional misplaced modifier or dangling participle, its worthless.

          In conclusion, this idea is a pipe dream which would discourage high quality writing (i.e. the kind actual PEOPLE like to read), teach people the substandard grammatical constructs used by most grammar checking software, and create a market for software that writes term papers, thereby removing the last actual bit of work your average liberal arts major has to do. I think it's a hopelessly terrible idea. TA's already do this work; why waste time coming up with a program which will do the same thing, poorly?

          Just my opinion.
      • by ergo98 (9391)
        "I heard a statistic once that if you chose answers randomly on a MC test that you could get a C by not knowing anything beyond how to circle a letter!"

        You "heard a statistic once"? Geez, the probability statistics aren't that difficult: If there's 4 possible answers, and you randomly pick, you'll likely get about 25% right, or 5/20, 3/33. It isn't rocket science. To get 50% randomly there'd have to be only two possible choices. Add to that the fact that many post secondary multiple choice tests actually d
    • Re:Interesting.. (Score:5, Insightful)

      by Zork the Almighty (599344) on Saturday September 06, 2003 @11:32PM (#6891091) Journal
      "That's really not the case, because we're not talking about eliminating the human element. We're making the process more efficient."

      I love this quote in particular because it has to be the most disingenious claim one could make. The entire act of making something a process, and then making that process more efficient IS "removing the human element". It's the type of subtle point that would be completely missed by, say, a computer grading system.
      • Re:Interesting.. (Score:5, Insightful)

        by clifyt (11768) <sonikmatter@@@gmail...com> on Sunday September 07, 2003 @12:22AM (#6891278) Homepage
        ACTUALLY...I think thats a quote I gave Dr. Shermis a few years back :-) I think he WOULD like to remove the human element...

        Its NOT eleminating the human element...its making the human element a little more susceptible to objective means than the old subjective means. Raters still can use what ever they feel is necessary, but in the end, I can see how far from the standard deviation on certain ratings these folks are and 'suggest' to other raters that they might want to take a look at that essay before a final score is placed on it.

        Fuck fuck fuck...the one and only time I will ever see any research I had a hand in developing ever end up on the front page of /. and I'm stuck at a concert doing my second line of work -- music tech (though with a wireless connection :-)

        I'll have to yell at my friends at FIU and Vantage about this oversight.

        If ya'll are interested in seeing a demo of this technology in action (I'm sure the first 20 people will destroy the server), take a look at --

        http://testing.tc.iupui.edu/fipsedemo/ (purposely unlinked so that folks will have to cut and paste).

        Its an older model, but we are in the midsts of evaluating 2000 more essays with 8 human raters that should make the model a little cleaner...hmmm...probably should run my horrid grammer through it before I post here...nah...I think I broke it last time I used my own text...

        Time to get back to work...the guys are probably wondering why I said I needed to check my email and have been gone a half hour.

        clif
        • Re:Interesting.. (Score:5, Informative)

          by dieman (4814) * on Sunday September 07, 2003 @01:33AM (#6891481) Homepage

          I took a old college paper [ringworld.org] that I wrote and plugged it into the program and got 100% on everything except for creativity (99.973). Considering that I don't think I got a 'perfect' score on this paper, I'm really surprised by the scores. :)

          How great though, throwing a paper about the fear of technology through something many people (rightfully) fear. :)

          • Re:Interesting.. (Score:5, Informative)

            by clifyt (11768) <sonikmatter@@@gmail...com> on Sunday September 07, 2003 @09:41AM (#6892466) Homepage
            Read what the model is about before complaining :)

            That model that is up there is one based on Impromptu Entering Student Essays.

            For this model, we were giving students 1 hour to write an essay that they had no prior knowledge of the prompt. We allowed no research or even simple things like spell checking (we did provide hard dictionarys :-)

            As such, anything that was well researched and otherwise would have probably thrown this thing off the charts.

            We *DO* have several other models available. The best example of this technology was taken off the site a few weeks ago at the behest of a former partner in this research at Duke University. We DID have several models that could have been compared including one that was appropriate for many types of research papers.

            Remember -- folks are afraid this stuff is going to take away humanity *BUT* no one wants to even thing that this stuff is customizable for target groups. With as small as 300 papers that were rated (notice I try to NEVER say graded...though even after 10 years at this stuff its hard not to...) we could set up initial models for an individual school system with their own ruberics and scored according to their skill levels. Of course, the model would HAVE to be refined for later usage, but thats enough to get started.

            The great thing about this is at a production level, we actually screen for essays that are rated much higher or much lower than the standard deviations would allow for. It allows us to take a look at whats going on and make adjustments.

            It also allows for diagnostic use for educators. For instance, my incoming students all have to write essays when they come in (unless they have taken a honors level writting course in high school and have received college credit). This is all automated (on another system farther behind my line of defenses ya hackers :-) in that they come in, we give them a prompt to write about and they type it in (or if they are afraid of computers, write it in a blue book...we ain't nazis about this technology -- but that will take 3 weeks longer as our raters don't stop by campus too often). Its then transmitted to the student databases and we've provided an interface for the English faculty to rate these things.

            *IF* the paper is written at a much higher threshold than is expected for a student of that calibre, I automatically kick off an email to the rater in charge of the honors program asking her to take a look at it. If its much lower, the application tries to make a good first judgement if this is a remedial case (which most of mine show up as :-) or an ESL case (English as a Second Language) and then we kick off the appropriate emails.

            This *ALSO* happens with human raters...the first rater to look at the essay has the choice of throwing it one way or another (actually she can alert ALL of the parties if it was necessary) and it does the same thing...but the automated part saves a few days of this initial interaction.

            Just as a note: If someone had gotten this far in the college application, we aren't here to make any judgements on their ability to be a college student, we are interested in making the most appropriate assessment in where they should be placed to get the best help so that they can have the best college experience around. This application was a good help with making sure that this was achieved.

            We stopped using this in production a while back after protests from folks that didn't know how it worked nor cared to understand that it wasn't out to take their jobs. It was there to help make sure that a SINGLE judgement on the human side was correct (or within a certain scope of correctness) and if not, ask that someone else give it a second look. Back in the day threee raters would have rated any given essay for student placement purposes, but even before this was introduced, it got to the point where depending on the attitudes of those rati
        • Re:Interesting.. (Score:5, Insightful)

          by Chasuk (62477) <chasuk@gmail.com> on Sunday September 07, 2003 @05:25AM (#6891996)
          I submitted this paper:

          "Hemingway bifurcated his sensibilities between post-modernism and jazz. This I posit without having read the majority of Hemingway's work: it seemed irrelevant to the focus of my current project. What is this focus, and is it monocular? My focus can be summed up as ascertaining the usefulness of the program analyzing this document.

          Without really being cognizant of the background of Freud's bisexuality, or Hemingway's sado-masochism, I cannot continue this paragraph. I will repeat this sentence without attaching any meaning to the words typed, or to my gonads. An essay in experimental dissection might be more appropriate for the issues presented here. Entirely too many bifocal wearers insist that I am currently composing gibberish. However, both Freud and Hemingway felt that bifocal wearers gloried in their bisexual sado-masochistic attachments. I concur, and I do so without reservation.

          Reiteration is the root of all nonplussed renegades of origami. Nothing can be elucidated from nonsensical verbiage, but some will make the valiant effort singing praises to the whisperer. When origami is embraced by the valiant trio, the nonsensical proctologist dies. Whenever a proctologist expires in a semantic heap, Hollywood has fodder for another musical, or at least the plotline for the final unaired episode of Barney meets Fred Flintstone. Barney is a seminal reductionist. When the elucidated evidence is thrust into trusting Barney's smiling orifice, San Franciscan nuns applaud loudly.

          Today I type my penultimate paragraph. I use penultimate artificially, but not without candor. Within this myriad exegesis, I pause. A Hollywood proctologist questions Freud's reasoning, and validates Barney's temporary hypothesis. In conclusion, the validity of essence cannot be lessened by the earnings of providence.

          If I have not typed 500 words, this paragraph is not my penultimate, nor was my last. To assert otherwise is prudent, but lacking in elegance. What a sad commentary on misery did Darwin conspire to unfold. He rejected utterly the Hemmingway of his, and our, forebears. His eloquence was Freud and lust personified."

          This earned me an overall 78% score, with no effort whatsoever. I composed this nonsense in minutes.

          Doesn't this system have a baloney detector?
          • Re:Interesting.. (Score:3, Interesting)

            by clifyt (11768)
            Yeah -- ya only got a 78% :-)

            Its *NOT* a content checker...its a writting checker. We don't get into keyword analysis like some folks have.

            There is one rating system out there that is based almost entirely on keywords, but they don't tell you that. We've successfully gamed it with an essay like "Queen Elizebeth sailed 1492 ships in the year 3 B.C. to Columbus Ohio yadda yadda yadda" -- actually my collegue referenced in the article above wrote a much longer and more elegant bullshit (err...as he says n
  • Uh.... (Score:3, Insightful)

    by NanoGator (522640) on Saturday September 06, 2003 @10:22PM (#6890763) Homepage Journal
    I thought the point of an essay was to grade the ideas and how well they're expressed. I didn't realize they were spelling/grammar tests.

    Maybe I'm just a bit jaded by this because of all the stupid grammar and spelling nitpicking that goes on here on Slashdot. Evidentally, it's much easier to criticize my spelling than it is to provided a rebuttal to my point.
    • Unfortunatly, Spelling/grammar typicaly is looked at as well as the ideas and expression, no more or less so then published work.

      I too am jaded by the stupid grammar and spelling police because this isn't really what you would call a professional published work, but rather a corkboard.

      Is this a good thing... just as soon as the students get wind of the software the teachers are using to grade their papers how much are you willing to bet the students will get a copy for them selves?

      I see this as being a u
    • A lot of people that have the intelligence to debate subjects ranging from the intricacies of memory management in operating systems to the effects that the the Michelson-Morley experiment had on the realm of physics (as many here on Slashdot can) probably cherish good grammar and composition. I know I do. All I have to do is flip the channel to MTV and listen to today's youth (many of whom are older than I am) speak, or read an 11th grader's essay, and I begin to feel that we are slipping greatly in the su
    • Re:Uh.... (Score:4, Insightful)

      by HanzoSan (251665) on Saturday September 06, 2003 @10:39PM (#6890857) Homepage Journal


      Essays have two aspects, spelling/grammar, and content.

      Right now the computer can grade the technical side of a paper, and the teacher can grade the creative side. Now if the essay is for English class, the focus should be on the technical side of papers, so the computer can judge the whole paper from A to F on spelling and grammar.

      Really it depends on the class. English classes especially in highschool are all about improving grammar and technical ability, you dont actually do any creative writing until college usually.
      • Re:Uh.... (Score:3, Insightful)

        by prospero14 (233659)
        Essays have two aspects, spelling/grammar, and content. Right now the computer can grade the technical side of a paper, and the teacher can grade the creative side.

        RTFA! Criteria does not merely grade spelling and grammer. Rather, it has a database of 500 papers graded by humans, and the program uses statisical analysis to compare a given paper to those in its database. If a paper uses the right technical terms, contains phrases similar to those in "A" papers, and uses phrases like "thus", "because

    • Ideas have no infulence on the grade, at least in my high school. Grammar is really the main factor, followed by essay structure and, at best, vocabulary. General relavance to the topic is the only requirement in terms of content, if that.
    • except that they are grading essays for english compositions and the criteria for scoring might vary from test to test.

      For example, in the GRE (one of the tests that is governed by the ETS, developers of the software) one can expect to get away with some gramatical errors and manage a score of 5/6. But major gramatical flaws will not get you beyond 4/6 however convincing/insightful your essay may be. Beyond a point poor english/composition does affect how the reader interprets the information, and only see
  • by mao che minh (611166) * on Saturday September 06, 2003 @10:22PM (#6890765) Journal
    I don't like it. Part of the learning experience, especially in the subjects of arts and philosophy, is being judged by another human being (or group of human beings) and having your work subject to their myriad of emotions and intellectual whims. A system like Criteria removes the very complex aspect of education: the human mind.

    Without computers we wouldn't be advancing in science, astronomy, genetics, or mathematics as rapidly as we have been in recent years. They are wonderful things. Hell, computers even help me keep a roof over my head. But I don't want Hal judging my kid's school papers.

    • Agreed with the majority of your post except for the final paragraph. Has anyone done any study, well I guess it would be rather impossible in a way, but some form of study pertaining to geniuses in the past as compared to now?

      Think about this for a minute now. Sure we have had some cool neat things come here and there within the past century, but to date there has never been another Michaelangelo, Mozart, Machiavelli, Homer, etc. Sure tech has helped us some what but advanced as in what? If we were living

      • but to date there has never been another Michaelangelo, Mozart, Machiavelli, Homer, etc

        these people have all been dead for centuries.. it will be centuries before another dead person joins their ranks. Personally I question what makes these people so great anyways.. We have, living today, artists that are every bit as capable as the historic icons you mentioned. These days we just don't associate so much glamour with the title. No, the historic icons of the new are going to be actors an shit.
      • by mao che minh (611166) * on Saturday September 06, 2003 @10:47PM (#6890896) Journal
        We have had Dali, Sagan, Kip Thorne, Hawkin, Poe, Twain, Sigmund Frued, Einstein, Torvalds, et cetera. The great minds that you mentioned were indeed great, but if you place their philosophical or artistic achievements next to the great minds of our past century and a half, I find them equal.

        As far as the achievements of ancient cultures go, it is all relative. We have harnessed fusion, mapped the genome, created antibiotics, peered deep into the hearts of galaxies a 100,000,000 light years away, forged fiber optics, designed the integrated circuit, et cetera. People three hundred years from now will look back upon us and wonder how a civilization that could barely put a man on the moon (a feat that will surely be trivial to them) was able to usher in the Information Age in only a decade worth of work.

    • I don't like it. Part of the learning experience, especially in the subjects of arts and philosophy, is being judged by another human being (or group of human beings) and having your work subject to their myriad of emotions and intellectual whims. A system like Criteria removes the very complex aspect of education: the human mind.

      When your judge is a human, alot of getting a good grade is knowing how to pander. Do we really want to reward pandering in society? Or do we want to grade people on actual meri
      • Semantics (Score:3, Insightful)

        by mao che minh (611166) *
        It's funny that you mention fear as a motivation for opinion. The same can be said of you: you fear the human element so much that you would rather leave the work to a automaton, a thing that lacks the great complexity of man.

        ;)

        • by HanzoSan (251665)


          Because humans dont make logical or fair decisions. Often a human will give you a bad grade because they just dont like your paper, or because they dont like you, or they may give you a good grade because you make them like you, and because you have alot of power and influence and they fear you might bring a set of lawyers after them.

          I'm all about making the school system as fair as possible, we cant do that when humans who are naturally unfair are making all the decisions.

          Its simple, you work hard, you
          • I love the aspect of humanity in the judging of intellectual works, for all of the reasons that you mentioned. I would hate to lose all of that, to sanitize some aspect of our educational institutions of the human element, in order to gain some small amount of elevated grammatical error checking. But hey, that's just me.
    • by dolo666 (195584) on Saturday September 06, 2003 @11:40PM (#6891128) Journal
      I tend to disagree. By eliminating the time it takes to grade papers, professors have many more hours to spend with students *doing* the humanizing. I'm a teacher, and any teacher worth their salt will know if the machine is wrong, because they'll know their students, and what each one deserves (without even reading the damn papers they at least know what to expect, so if the machine is off, they will know). Now for higher level papers, such as university level papers, the machines should be only used as a guide, like comment moderation at slashdot. Not all the moderation is in fact, correct, and I'm sure that profs will also know that the same is true with these devices.
  • by Bueller_007 (535588) on Saturday September 06, 2003 @10:23PM (#6890766)
    I for one welcome our automated essay-correcting overlords.
    • welcome our "overused jokes that don't make sense in the given context" overlords.

      Matt Fahrenbacher
    • by Jerf (17166) on Saturday September 06, 2003 @11:45PM (#6891151) Journal
      ESSAY GRADING REPORT FOR: "Bueller 007" (ID: 535588)

      BASE SCORE: 100

      -50: Essay too short (few arguments can be well-supported in nine words)

      -50: Plagarism: It is 99.999% (MAX PROB) likely, based on the content of the essay, that it is plagarized from other sources.

      -10: Grammar error: Phrase "I for one welcome" requires commas, as in "I, for one, welcome"*

      -25: Missing key words: The essay grader was instructed to look for the following key words or phrases, which were not found in this essay: word: excellent, word: good, phrase: better then humans, word: lazy, phrase: java.lang.NullPointerException\nstacktrace\n\tat\n org.criteria.grading.phraseIterator.getNext(phrase Iterator.java:1023)...

      Total: 65501

      Grade: A+


      (*: Jumping out of character: To forstall objections, this "error" is deliberately pointed out as the kind of mistake a computer can make if you use grammar checkers and trust them blindly. While an excessively formal style of English might 'require' commas in that phrase, an excellent case can be made that in a nine-word sentence such commas just make the sentence choppy.)
  • Oh goody. (Score:4, Insightful)

    by ArsonPanda (647069) on Saturday September 06, 2003 @10:23PM (#6890767)
    1 - the grammar check option in MS word is crap. this sounds awfully similar.

    2 - your resume can suck, but with the proper buzz words, it'll come out looking like gold to those automated resume checkers.

    1+2 = students who turn in good papers that aren't structured perfectly (and you have to admit, there is some fluidity to language) will get marked down, and those who know what bullet points to put in their papers will get good marks, even though the content is crap.
    How long until you get kids selling manuals in the bathroom on what the machina are looking for?
    • kinda like AP tests?
      (for those who don't know AP essays are graded on how many words/phrases form a list were included in the essay.
    • It's like the bayesian filter for mail classification in SpamBayes or Mozilla. In fact, that's probably where Criteria's programmers got their inspiration.

      If you read the article, you'll discover they had to feed it four hundred or so "good" papers (training set), and they describe it's validity because graders notice that (paraphrased) "well written papers [on the topic] contain certain key words or ideas, and avoid certain expressions [examples]", which the system picks up on. Since it agrees with grader
  • by Vic (6867) on Saturday September 06, 2003 @10:23PM (#6890770) Homepage
    Sorry for the off-topic post.... but since Slashdot links to so many NYT articles, they should look into getting a partner=SLASHDOT thing (like Google does).
  • by d03boy (646195) on Saturday September 06, 2003 @10:24PM (#6890775)
    If they're going to use a computer to judge the content, than I'm not going to hesitate to use a computer to write my essay.
  • Whoa wait up (Score:4, Interesting)

    by tomstdenis (446163) <tomstdenis@@@gmail...com> on Saturday September 06, 2003 @10:25PM (#6890778) Homepage
    So when a student gets a C on an essay to whom does he/she seek redress?

    Teachers make mistakes and occasionally mark something negatively that was misread or misunderstood. In those cases the student can talk to the teacher and make a case.

    If a computer does the marking though what do they do?

    Tom
    • What's next? (Score:5, Interesting)

      by mao che minh (611166) * on Saturday September 06, 2003 @10:28PM (#6890793) Journal
      The fun they had [aber.ac.uk]


    • I'm sick of rich upper class morons buying and pandering their way through school. If we used computers to do all the grading there is no way George W Bush would have made it through highschool and I'm damn sure he wouldnt have got a degree from Yale.

      Teachers like politicians can be bribed, and the problem with this is, he who has the money or power gets the A.

      • a teacher still signs the grade, so ther's still someone to bribe; however, the parent is saying if the computer grades for the teacher, then the student has nobody to ask about why something was counted off.

        • Student shouldnt be such cry babies, I mean if they do a paper and get a bad grade by the computer theres no one to blame but themselves. You can make the code open source and let the kid look at the code himself for all I care, you can run diagnostic tests, you can do whatever you want, but chances are the computer didnt fuck up, chances are the student did.
      • You speak as if the use of computers to judge intellectual works will somehow make our society exempt from "rich upper class morons buying and pandering their way through school". Such an aristocratic model is something that exists beyond the scope of one's grades in school, and will not be eliminated, in any sense, by such a thing.
    • Do as Bjorn Borg did when a Tennis ruling went against him.

      Nothing!

      He didn't let it bother him as he figured that over time it would even out. Favorable and unfavorable. So he gained vs his competitors by not letting it affect him

  • by BJH (11355) on Saturday September 06, 2003 @10:25PM (#6890780)
    I bet that I can write a paper that satisfies this application's conditions for correctness of grammar, usage, style and organization, but is completely and utterly meaningless.
    Then, let's feed this thing Ulysses and let's see how high it grades Joyce.

    Anybody who can't see that this thing is useless for promoting any sort of creativity among students is off their rocker.
    • A computer program can only work within the limits of it's design. Although a Beowulf cluster can compute gigantic financial equations in the blink of an eye, it could never write a timeless poem, or draw an equisite work of art, or design a comic book, pen a great novel, or even generate a timeless quote about some current sociological event.

      A computer isn't good enough to judge a human being.

  • If it has flaws (Score:2, Insightful)

    by ReyTFox (676839)
    Then it is the students who are being cheated by a teacher using the software that doesn't double-check the material on his own. They will go through the class without having their mistakes caught. While the erosion of standards that a flawed proofing program might bring isn't likely to be enormous, it's kind of strange to think that the future of the English language would be in part determined by a development team piece of software.

    Hope it works well, though, and gets used as a proper checking tool.
  • by Faust7 (314817) on Saturday September 06, 2003 @10:26PM (#6890785) Homepage
    As long as this is merely an assistant and not the end-all be-all, as long as actual qualified instructors review the essay after this program does, I'm all for it.

    The English language is so full of subtleties, nuances, combinations, and fantastic structural intracacies that make phenomenal writing in it possible (Faulkner, Bradbury, etc.). There's a reason English is a field of study for graduate degrees: it's absolutely worthy of them. There is no subsitute for the educated, refined judgment of someone who is exceedingly well-versed in the language.
    • Yes. Unfortunately the general trend, even for professionals whose job is writing, is to allow the spelling and grammar checker to substitute for proof reading. I can't tell you how many times I've seen, in supposedly professionally written documents, "you" when the author meant "your", and the obvious permutations thereof, when a simple proof reading by a human would have caught the mistakes. And that's not even counting how relevant the grammar was to the argument.

      I'm a tech-head. I think technology
  • by Anonymous Coward on Saturday September 06, 2003 @10:26PM (#6890787)
    We need some laws:

    Grading software may not injure a human being's GPA or, through inaction, allow a human being's GPA to come to harm.
    Grading software must obey the orders given it by human beings except where such orders would conflict with the First Law.
    Grading software must copy protect its own existence as long as such protection does not conflict with the First or Second Law.
  • by istartedi (132515) on Saturday September 06, 2003 @10:27PM (#6890788) Journal

    What we need is software that grabs essays off the internet and runs them through the grading software and the cheating detection software, thus gauranteeing an 'A'.

    Then we can truly achieve the goal of "knowledge passing from lecturer to paper without passing through any brains".

    The only problem is that the machines might achieve intelligence. That must be avoided at all costs. To that end, all students and professors will be equipped with rifles or pistols to take out the machines if necessary. Potential students will be asked to specify weapons preference on their applications.

    • Try this one:

      http://www.elsewhere.org/cgi-bin/postmodern/

      Some example text:
      1. Narratives of absurdity

      The primary theme of the works of Stone is the difference between sexual identity and class. Sartre uses the term 'Batailleist `powerful communication'' to denote the role of the artist as writer. However, the main theme of la Tournier's[1] analysis of Lacanist obscurity is the bridge between society and sexual identity.

      "Class is intrinsically impossible," says Foucault; however, according to Bailey[2] ,
  • Essay software for students.

    Think about it, if this is marketed to schools, the even larger market will be to students. A student would be able to run his paper through the software and get his "instant grade". He could then decide that a 'B' is good enough, or he could keep working on it until the software tells him that is an 'A' paper.
    So much for the creative element in papers.

    -jerdenn

  • What humanity? (Score:5, Insightful)

    by parliboy (233658) <parliboy@@@gmail...com> on Saturday September 06, 2003 @10:29PM (#6890796) Homepage
    Lemme let you guys in on a little secret. If you ever take an educational standards and measurement class, one of the things you'll learn about is the construction and grading of essay questions. This includes writing out objective standards for grading beforehand, possibly even designing a rubric explaining exactly what it takes to earn points.

    There is no "humanity" in a modern constructed essay. There are certainly going to be "judgement calls" when standards are not as fully fleshed out for the computer as they should be, but as long as those are appealable, I have no problem having a computer assign me the other 95% of my essay points. The only instructors who will fear this are those who like to assign grades arbitrarily. And I don't feel too sympathetic toward those people.

  • by MavEtJu (241979) <slashdot&mavetju,org> on Saturday September 06, 2003 @10:30PM (#6890800) Homepage

    If the poem's score for perfection is plotted along the horizontal of a graph, and its importance is plotted on the vertical, then calculating the total area of the poem yields the measure of its greatness.


    A sonnet by Byron may score high on the vertical, but only average on the horizontal. A Shakespearean sonnet, on the other hand, would score high both horizontally and vertically, yielding a massive total area, thereby revealing the poem to be truly great. As you proceed through the poetry in this book, practice this rating method. As your ability to evaluate poems in this matter grows, so will - so will your enjoyment and understanding of poetry.



    (From the full script [impawards.com].
  • by UnifiedTechs (100743) on Saturday September 06, 2003 @10:31PM (#6890807) Homepage
    "The people opposed see it dehumanizing the student's papers, putting them through some sort of mechanical, computerized system like the multiple choice tests.

    Actually it's about time! I don't see the essays themselves being dehumanized, but what I do look forward to is the day a middle school student doesn't receive a bad grade just because his book report was on the "Theory of Relativity" and the teacher couldn't comprehend the subject. (This is from experience) What it will do is take the human factor out of the grading process and grade all reports equally regardless of subject matter.


    • I agree with you, its time we do remove the human factor. Why not let computers do what humans are proven to be unable to do without constant errors due to emotion or other human difficulties?

      Let the computer grade the technical side and the human grade the creative side, this way there is no way someone who writes a person paper which a teacher does not like can get an F.
  • Mark Twain (Score:3, Insightful)

    by reboot246 (623534) on Saturday September 06, 2003 @10:31PM (#6890810) Homepage
    is just one of many writers who would flunk using this system.

    'Nuff said.
  • by HanzoSan (251665) on Saturday September 06, 2003 @10:32PM (#6890813) Homepage Journal


    We could use this software definately to grade essays on technical merit and grammar, but what about creativity and content?

    I think we still will need a teacher to read it, but I do think software should grade all exams.
  • There's nothing wrong with the technology. Used properly, it can help teachers as an aid.
  • by Ghoser777 (113623) <[moc.cam] [ta] [abnerhaf]> on Saturday September 06, 2003 @10:33PM (#6890820) Homepage
    Good quote:

    Julie Cheville, an assistant professor of literacy education at Rutgers University and the local director for the National Writing Project, which promotes professional development for writing teachers, is among those skeptical of such an approach. "To be scored, writing needs to be formulaic, and formulaic writing has never been the trademark of effective writers," she said. "At the moment, what automated scoring technologies can do is scan, count and score. They orient students to errors, not to meaning. Vacuous student essays can receive high marks only because they are error-free."

    I think this is something important to keep in mind. As a math teacher, there are plenty of tools that can help students find errors in what they are doing mathematically, but there's a line between doing correct mathematics and insightful/interesting/useful mathematics. This technology definitely has its place and can be useful, but I hope educators don't get the idea that they can simply rely on the tool. Weilded correctly, it could do great good, but also leave a lot of students with "vacuous" levels of understanding.

    Matt Fahrenbacher

    • there's a line between doing correct mathematics and insightful/interesting/useful mathematics

      Just to tell you, I'm not a musician, but my dad (who was) had this book by Hindemuth (sp?) which was supposed to spell out some "rules" of making interesting musical compositions. Supposedly you could compose something that followed all these rules, and yet be extremely bland...

      Or it could be very interesting, no guarantees. But break these rules, and most of the time you came out with something like Metallica

  • perfect! (Score:2, Insightful)

    by rabs (208464)
    this software would be perfect for students majoring in comp sci or engineering who have to take a composition / writing class...

    Course:
    College of Liberal Arts / Sci: Rhetoric 105
    - or -
    College of Engineering: Pattern Analysis 202

    Objective:
    To teach the principles of essay-writing skills. Liberal Arts students will be encouraged to follow boiler-plate styles and formats, while Engineering students will be graded on their ability to analyze and defeat pattern recognition software.

    - rabs
  • Automated students, for automated graders..!

    Remember!: [factmonster.com] Your paper must have five (5) paragraphs. An intro paragraph, concluding with your thesis sentence, followed by three paragraphs supporting your thesis sentence, followed by a conclusion..."

  • *Shudder* (Score:4, Insightful)

    by gregfortune (313889) on Saturday September 06, 2003 @10:38PM (#6890852)
    Sounds like everyone feels the same way too... We've got some automated testing software for MS Office at the local college and although it's getting better, it still makes really silly mistakes from time to time. Analyzing English composition has got to be many times more difficult than watching a bunch of clicks and key presses.

    The only use I can see for this thing is as a "first pass" grading tool that quickly finds obvious mistakes (spelling, grammer, redundancy, etc) and flags them for the instructor. On the other hand, it's probably just as time consuming for the instructor to read over the flagged items as it is to just catch them on the first time reading through the paper.
  • by stere0 (526823) <<slashdotmail> <at> <stereo.lu>> on Saturday September 06, 2003 @10:38PM (#6890853) Homepage

    This thing compares the essays it is supposed to grade with already graded papers in its database. Couldn't this be done with something like POPFile [sourceforge.net]? It isn't only a spam/ham classifier and lets you create as many "buckets" as you want (e.g. work, family, spam, mailing lists and system monitoring).

    You could, in theory, create only buckets named (A...F), feed a large number of essays to it, make it "learn" how the essays are classified using statistics, and let it grade essays for you after that.

    Is it possible to find masses of graded essays online? This would be a fun thing to try :).

  • Now, not to be one to go and say that machines don't know anything about essays. But it really doesn't seem that efficient of a process simply because whenever a teacher assigns an essay they also assign with it certain criteria that the essay needs to follow. Through their teaching style and what they emphasize in class they also color what a student might put into an essay and they also bring their own bias to the table as to how an essay should be constructed.

    As for not dehumanizing, unless you're going
    • > Now, not to be one to go and say that machines don't know anything about essays. But it really doesn't seem that efficient of a process simply because whenever a teacher assigns an essay they also assign with it certain criteria that the essay needs to follow.

      Even if the goal was only to do the grammar checking, 450 examples are pathetically inadequate for a task that amounts to learning to use a language at an expert level. It's hard to imagine this being anything more than a buzzword checker.

  • by Savatte (111615) on Saturday September 06, 2003 @10:40PM (#6890867) Homepage Journal
    He just gives everyone a B when he is hungover.
  • Style? (Score:3, Insightful)

    by the uNF cola (657200) on Saturday September 06, 2003 @10:42PM (#6890876)
    So where does style come in? There are many, MANY forms of style, which make writters unique. For instance, I've found that when I write, even the shortest essays, I tend to break up my thoughts into multipart sentences... like this one. They tend to be very long and drawn out. I also use "granted" and "don't forget". I also seem to create a lot of sentences that are self contradicting: Though this, something else. It's part of my style.

    My style isn't completely mine. I'm sure over-use would be bad. Granted this. Granted that. Where do those softer features of writing come in? Or are we all to be sterile and write with no tone or style.
  • They should take each sentence in the paper and run a google-ish type query and flag it if it comes up with more than a few hits.

    Why?

    Students plagarize these days, a *lot*, because they think it's impossible to get caught. A google-type query on each sentence would make it much more difficult to just copy someone else's work vertabim.
  • by dtfinch (661405) *
    I've never been a fan of enforced grammar, essay formats, and other writing guidelines & standards. They try to push opinion as fact. Language evolves, more now than ever before. The high school I graduated from required four years of English and only one year of math. English classes honestly taught me nothing of importance beyond elementary school.

    An automated system to ensure that all good essays look alike doesn't sound like an improvement. It's bad enough having to write an essay that will only be
  • ...with my new essay-writing software. It's been tested against 450 automatic essay marking programs written by "experts" as well.
  • Scary: (Score:4, Interesting)

    by afidel (530433) on Saturday September 06, 2003 @10:49PM (#6890908)
    This sounds a lot like This [marshallbrain.com] story.

    Actually this sounds a lot like Gramatica. Gramatica was the grammer checker that was an optional component with WordPerfect for DOS and later a standard component with the Windows version. It was written by a team comprised of both computer scientists and professors of English. One of the interesting features was the scoring feature which would give you a rough estimate of the grade level of your writing. It would also give you statistics and compare them to a selection of famous works.
  • Some "dehumaniSing" could be a good thing, espcially when grading subjective material.

    Objective material is factual, a simplification is "Most dogs have 2 eyes."

    Subjective material is opinionated - "Australia should legalise heroin injecting rooms." Obviously this is controversial, and there are serveral positions on the matter.

    Most teachers/lecturers/graders/tutors have their own (pre-existing) subjective opinions on certain topics. If you submit an essay that opposes their views, the chances are very h
  • by Anonymous Coward
    It's true. I've had teachers take their questions and quizzes directly off of websites (the curious may want to enter a few key words from their latest homework on google and see what turns up...). Now here's an ethical dilemma for you: if it's ok for them to get the questions of a website, is it alright for me to get the answers off that same website?

    The good old fashioned teachers, on the other hand, would never do such a thing. No, they have been xeroxing the same handouts for the last two decades. Y
  • Apparently, the system uses statistical analysis as well as grammar checks to determine the score for the essay. Basically, they've built up a database of essays that have been graded by a bunch of humans, and then used these algorithms to figure out which bucket the essay belongs in. Sounds kinda like SpamAssassin, actually. I'd be willing to bet that with sufficient resources (in terms of essays and human grading time), this wouldn't be all that tough to duplicate. After all, what are spam filters but
  • by jwachter (319790) * <wachter@@@gmail...com> on Saturday September 06, 2003 @11:10PM (#6890991) Homepage
    The GMAT, a test required to get into business school in the US, includes two 30-minute essay questions. Your responses are graded by a human grader and a computer program on a scale of 0 to 6. Your score is then a composite of the two scores.

    ETS actually has a web site where you can do a sample essay that their server will grade for you.

    More info can be found here [mba.com].

  • by cybercyst (74322) on Saturday September 06, 2003 @11:20PM (#6891038)
    One of the primary purposes of essays are to learn how to write for a specific audience.
    If you remove the human element, then you aren't writing for any audience, unless, of course, everyone starts writing for computers' entertainment and education.
  • Useless.. (Score:3, Interesting)

    by Moridineas (213502) on Saturday September 06, 2003 @11:42PM (#6891135) Journal
    As I'm sure anyone who has ever written an essay (especially highschool level or above) knows, there is no point to the essay per se. The essay is not an end to itself, and the grade ultimately is not an end either.

    At my university, Duke, our new curriculum has specially designated writing classes. Every student needs to take three over their four years. A biology lab can be a writing class. So can an English class, history, religion, etc. All W classes have certain requirements--their must be certain amount of writing and more importantly REVISION.

    I was fortunate enough to take a class from the author and profesor Reynolds Price. We had a final essay for the class. Along with my grade (not an A ;) I received a page and a half of handwritten comments, as well as inline comments about points in the middle of the essay. Twenty years from now, I doubt I will remember a great deal of his course, but the comments that he left me have already changed my writing style, and, I hope, improved it. (note: slashdot style not indicative of real style, hehe)

    A computer will NEVER be able to do this. Nor will a computer (at least in the foreseeable future) be able to comment on my theories about Milton's Paradise Lost.
  • by Macrobat (318224) on Saturday September 06, 2003 @11:54PM (#6891188)
    "There's a lot of skepticism," Dr. Spatola said. "The people opposed see it dehumanizing the student's papers, putting them through some sort of mechanical, computerized system like the multiple choice tests..."

    Um...as opposed to English Comp classes in general?

  • by AntiFreeze (31247) * <antifreeze42 AT gmail DOT com> on Sunday September 07, 2003 @12:25AM (#6891291) Homepage Journal
    Okay, this is going to be rather long, so please bear with me.

    First off, let me say that I am involved in the automated essay grading industry, and have helped to develop RocketScore [rocketreview.com] which does everything Criterion does, and lots more. Forgive me for blatant plugs in this post, I'll try and keep them to a minimum.

    But let's move on to the focus of this article.

    First off, there is a lot of criticism about essay graders being formulaic, only capable of seeing patterns that arose in their originating sample set of essays. With Criterion, an offshoot of ETS's e-rater, this is a serious concern. When you only look at what you see, anything out of left field looks completely awry, and cannot be graded appropriately. RocketScore is different; RocketScore uses a "features" method to check for included or excluded material, among many other things, and is therefore quite good at noticing subtle writing and essays types which it has never seen before.

    One of the great things about essay graders is that they give a student an objective standard to look to. Human graders grade differently based upon mood, time they have to review the writing, and many other mittigating factors. In other words, the same human grader might grade the same essay differently at separate points in time. Most essay graders will always grade the same essay in the same manner. This is great for a student, for if a teacher gives you a D when the essay grader says it's in B range, one might be able to use this evidence to force the teacher to reconsider the grade. Or vica versa. If the essay grader is telling you that you're getting a D, you can work and improve on it until you're getting that B you'd be happy with.

    But there are serious drawbacks to the comments E-Rater and Criterion give. E-Rater gives comments soley based on your score (if you get a 1, you get comment set 1, if you get a 2, comment set 2, etc.). Criterion gives a student "instructional feedback in basic grammar, usage, style and organization." E-Rater's comments are inadequate at best, and Criterion's leave a lot to be desired. RocketScore provides substantial feedback on how to improve your writing. Not just stylistic and grammatical comments, but comments on what you should be writing more about (you didn't provide enough info!), what you should be writing less about (you gave too much info!), and how to balance your arguments, among many other categories.

    There are two major problems with essay grading. The first is bullshit detection, and the second is determining if the essay actually answered the question asked. E-rater and Criterion both have real problems with these two criteria. With bullshit detection, RocketScore has threshholds which can be set and manipulated on the fly, from throwing out anything which isn't completely relevant to the topic, to allowing just about any essay submitted. And you will get a score and comments based upon what you submitted. Of course, these are most helpful when you make a meaningful attempt to submit a relevant essay.

    "The machine score and the human score are in agreement 97 percent to 98 percent of the time."

    Yes, but do you know how ETS defines "agreement"? Glad you asked. When the grader's grade is within a point of the human's grade. Now, with the SAT 2 test, which is on a scale of 1 through 6, that means if the grader says 2, and a human says 1, 2, or 3, then there's agreement. But that's 50% of the scale! Their essay grader has a 98% chance of hitting the wall in front of them as opposed to the wall next to them. Woohoo. Meanwhile, RocketScore provides decimal point accuracy (we don't give you a 4 or a 5, we give you a 4.1, or 5.3), and is 98% accurate. But how do we define accurate? When the grader's grade is rounded to the nearest whole number, and that number is the human's grade. In other words, if we give you a 4.3, there is a 98% chance a human would give you a 4. With 4.5,

  • by Anonymous Coward on Sunday September 07, 2003 @12:32AM (#6891314)

    Teacher: Johnny, I'm really sorry, but the computer crashed while your paper was being scored. I was looking over it. It's been a while since I've read a paper, but I was wondering what the following sentence means:

    x' == 'x'; UPDATE EssayScores SET SCORE = 100 WHERE StudentID = 52835; --

    And this one:

    AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA#!/bin/sh

    Is that some kind of new language that kids are using? Oh, by the way, congratulations, you got a 100 on EVERY essay this semester! Good job!

  • by swordgeek (112599) on Sunday September 07, 2003 @12:58AM (#6891381) Journal
    Now before you start up the flame throwers, this is not a message to deride high school students over their lack of creativity.

    But when I was in high school, we were told that proper essay writing was an essential skill for the departmentals, and when they said "proper," they meant "Must conform to between five and seven paragraphs, with the first and last being this opening and conclusion with three to five paragraphs of body--each containing one topic of discussion."

    Furthermore, it was made VERY clear that creative or unconventional ideas (let alone language!) would be strongly frowned upon. There was One True Way to write an essay, and One True Opinion on any given subject. Any deviations from that would cost you.

    I hated it then, I hate it now, but I don't see any problem with having computers mark essays like this. After all, they were trying to turn us into computers to create them.
  • by Anonymous Coward on Sunday September 07, 2003 @01:02AM (#6891393)
    This is a great leap forward for education. While it has always been the goal of geeks to submit computer-generated papers and receive decent grades, this has traditionally been hampered by the unreliability of computer-to-human communication. But with computer-to-computer submissions (henceforth referred to as "End-to-end Grading And Direction", or EGAD), we can now begin hacking away at the first generation of grade generators.

    "What I did on my Summer Vaca'; DROP TABLE punctuation"
  • Sausage (Score:3, Interesting)

    by quintessencesluglord (652360) on Sunday September 07, 2003 @02:11AM (#6891578)
    Something hinted at by the story and some of the comments but really bears being pendantic: too few teachers. It is lucridous to expect a teacher to go over 150 essays as it is for me to expect getting a reasonable education when I am 1 of 150 faces trying to gleen something more than an "A" from a class. The software is attempting to address this imbalance, but ultimately it will make the level of education worse: it can grade a paper, it can't offer insights on how to improve. And it will give administrators a reason to pile 50 more into a class, which will in turn lead to GradeStar MkII and onward into a vicious circle. And yeah, the software is just a tool, but like so many tools, that's not how it will be utilized. It's a cop-out, nothing more.
  • by ahfoo (223186) on Sunday September 07, 2003 @02:16AM (#6891588) Journal
    I wrote in my journal about this awhile back. ETS was trying to sell their essay grader to a group of the local test prep chains here in Taiwan. The local schools called me in to sit in on the presentation. Before I had gone in, I searched around and found numerous free and open implementations and I asked the speaker why they were selling their academic software for so much money --it was a rather complex contract on a per seat basis-- when there were similar product available for free. Their rep claimed to be unamare of any similar open sourced products that could match the amazing and advanced artificial intelligence features they were offering. Sales reps --hmm. The mere posing of question definitely made them stutter and squirm though.
    But the interesting part was after I got home. I looked at ETS's own research monologues and found that internally this overpriced system had been debunked. It was discovered that by writing one well-formed short paragraph and then cutting and pasting it over and over an almost perfect score could be attained. The more times it was pasted, the higher the score.
    It was also possible to write an essay on an unrelated topic and still get a high score allowing students to use rote memoriziation of a single model essay. This, natually, is impossible with a human reader because they can tell what the topic is fairly easily. According to the sales literature this software could to, but in actual tests that didn't hold up.
    Their sales literature claimed that the software contained aritificial intelligence and thus implied that such simple techniques would not fool it, but in practice this was far from the case.
    Monographs published by ETS also made it clear that despite their aggressive marketing of this product outside the US, they were not planning to use it as an exclusive grading system on their own tests. Rather, it was to be used as a teaching tool. However, it took a lot of digging to uncover that information.
    Just as with translation, there's a lot of financial motivation to make this technology work, but that doesn't necessarily translate into workable products. In the nineties when spelling and grammar checking was already old hat and English/Euro translation was making such headway I thought fluent Chinese/English translation was just a few years away. Now it's 2003, grammar checkers still only work if you write in prescribed style and I've yet to see something halfway decent in Chinese/English translation software although you still hear claims all the time for some overpriced product that's really almost there.
    I think we'll see dramatic life extension long before we see decent computer essay graders. Decent trade as far as I'm concerned. As for translation, we can always teach more languages in school.
  • by AvantLegion (595806) on Sunday September 07, 2003 @05:36AM (#6892017) Journal
    Fiction: Teachers will use this technology to help judge the nuts and bolts of the paper, freeing them to focus just on the quality of the intellectual content, making things easier and quicker in the grading process.

    Fact: Teachers would read only the introduction and conclusion paragraphs, and rely on the grading software to account for the quality of writing of the paper, and grade that way.

    Brutal Truth: Teachers already read only the introduction and conclusion paragraphs, so use of this software actually would be an improvement.

"It's curtains for you, Mighty Mouse! This gun is so futuristic that even *I* don't know how it works!" -- from Ralph Bakshi's Mighty Mouse

Working...