Paraphrasing Sentences With Software 203

Posted by simoniker on Thursday December 04, 2003 @05:01AM from the university-paraphrase-progress dept.

prostoalex writes "Cornell University researchers are making progress in paraphrasing and "understanding" complete sentences in a software application. Analyzing sentences on the semantic level allows the software application to treat two sentences, expressing similar thoughts and ideas, but written in a different manner, as a single semantic unit. Significant achievements in this area could revolutionize the information searching field."

This discussion has been archived. No new comments can be posted.

Paraphrasing Sentences With Software

Load All Comments

Search 203 Comments Log In/Create an Account

Comments Filter:

This translation just got out (Score:2, Funny)

by Anonymous Coward writes:

Imagine a beowulf cluster of this
- Re:This translation just got out (Score:1, Redundant)
  
  by orthogonal ( 588627 ) writes:
  
  Imagine a beowulf cluster of this
  
  Imagine John Ashcroft, Admiral Poindexter, and the National Security Agency using a Beowulf cluster of these to scan everybody's email.
  
  I wonder if there's a Bayesian filter that picks out athiests, free-thinkers, commies, anti-war activists, and Democrats.
  
  Pass that list through a geo-locator, and the thought police can be at your door by midnight. (According to Solzhenitsyn, they always knock on your door at midnight.)
  - Re:This translation just got out (Score:1)
    
    by dk.r*nger ( 460754 ) writes:
    
    I wonder if there's a Bayesian filter that picks out athiests, free-thinkers, commies, anti-war activists, and Democrats.
    
    Hmm.. Why not just criminate anyone sending emails with a subject different from "FW: fw: fw: FW: READ THIS!!! FW: fw: Something cute"
The problem is... (Score:4, Insightful)

by Anonymous Coward writes: on Thursday December 04, 2003 @05:03AM (#7626714)

That's there's absolutely nothing formulaic about idioms, which comprise 80% or so of english conversation. A human learns it by years of experience, a computer has to be given programming for every idiom there is.

Share
twitter facebook
- Re:The problem is... (Score:1, Interesting)
  
  by Anonymous Coward writes:
  
  80% sounds a bit high. Did you make it up, or is there a source for it?
  
  I doubt that any system designed to deal with idioms would be programmed with every idiom. More likely, they would take a huge corpus of text and do tons of statistical manipulations to it, such that idioms would be roughly equivalent to non-idiomatic phrases expressing the same concept.
- Re:The problem is... (Score:1)
  
  by mirko ( 198274 ) writes:
  
  well, if you use a distributed web application with learning capabilities to fill it, I think this could easilly be sorted out.
  - Re:The problem is... (Score:1)
    
    by mirko ( 198274 ) writes:
    
    This web site [20q.net] gives a nice example of what I meant in my above post...
- Re:The problem is... (Score:5, Informative)
  
  by ravydavygravy ( 230429 ) writes: on Thursday December 04, 2003 @05:35AM (#7626807) Homepage
  
  a computer has to be given programming for every idiom there is.
  
  Rubbish - Ever heard of Machine Learning?
  
  There has been much work on resolving coreferance and named-entity recognition problems has been onging for several years, with the aim being to lead onto full NLP. This research seems interesting in that it takes work from another field (genetic sequence matching) and applies it to an NLP problem. What links them all is that in almost every case, the research involves machine learning at some point... it makes no sense to hand-code millions of case-specific rules, when a machine can learn them faster and better...
  
  Read their paper [cornell.edu] and you'll see that indeed it's an unsupervised learning approach - even nicer in that it doesn't require you to label training examples for the algorithm...
  
  ~D
  
  Parent Share
  twitter facebook
  - Re:The problem is... (Score:1)
    
    by ravydavygravy ( 230429 ) writes:
    
    There has been much work on resolving coreferance and named-entity recognition problems has been onging for several years,
    
    And if only I spent as much time on my english usage research.... :-)
    
    Obviously, I meant:
    
    There has been much work on resolving coreferance and named-entity recognition problems in recent years,
    
    ~D
- - Yes. (Score:5, Funny)
    
    by Gordonjcp ( 186804 ) writes: on Thursday December 04, 2003 @07:18AM (#7627052) Homepage
    
    An American friend of mine was terribly confused by the expression "Crash us a fag, mate".
    
    Parent Share
    twitter facebook
    - Re:Yes. (Score:2)
      
      by JoeBuck ( 7947 ) writes:
      
      Then there are the English women puzzled by the expressions they get from Americans when they say "Knock me up the next time you're in town".
      - Re:Yes. (Score:2)
        
        by Gordonjcp ( 186804 ) writes:
        
        Well, "knocked up" is a common British colloquialism for "pregnant", usually referring to an unwanted pregnancy. Fuck knows what it means to Americans.
First use of this technology (Score:5, Funny)

by mcrbids ( 148650 ) writes: on Thursday December 04, 2003 @05:03AM (#7626716) Journal

I think that the first and best use of this technology would be to help the editors of Slashdot find duplicate articles!

Think about the possiblities...

Of course, the biggest problem with that is that there wouldn't be nearly as many cool articles to read!

Share
twitter facebook
- Re:First use of this technology (Score:2)
  
  by Dreadlord ( 671979 ) writes:
  
  I know your comment is meant to be a joke, but after thinking about it, I guess using a similar system can give false positives, let's say a story about an event was posted, and then a update regaurding theevent is posted a while later, both will definitely contain many similar sentences.
  - - Re:First use of this technology (Score:2)
      
      by Dreadlord ( 671979 ) writes:
      
      Nope, a dupe is posting exactly the same story because 2 users submitted it with different words, at different times.
      example of dupes:
      id Says 60fps Is Enough For Doom III [slashdot.org]
      DOOM III to be capped at 60 fps [slashdot.org]
      an example of what I mean is:
      a story [slashdot.org] regarding Doom III at QuakeCon is posted, later, a story [slashdot.org] about a specific feature in Doom III is discussed.
      The second example may not be the best one, but it gives an idea of what I actually mean.
- Re:First use of this technology (Score:2)
  
  by jkrise ( 535370 ) writes:
  
  I think this technology should be used in the SCO case first. Find out how differently constrtucted programs achive the same result!
- Re:First use of this technology (Score:2, Insightful)
  
  by Arleo ( 16712 ) writes:
  
  Or how about removing redundant comments?
  - Re:Second use of this technology (Score:2)
    
    by Thing 1 ( 178996 ) writes:
    
    Or how about removing redundant comments?
- Schoolkids (Score:3, Interesting)
  
  by Azghoul ( 25786 ) writes:
  
  My guess is any slick technology set up with this will let plagiarism run rampant.
  
  Google translator already let my sister-in-law "cheat" on a German paper, but the translation was "too good" so she got caught. Paraphrasing that's excellent (obviously would take a while, but what the hell, we can play Apple II games on a Palm not 20 years later....) could be real messy.
- Re:First use of this technology (Score:2)
  
  by Polo ( 30659 ) * writes:
  
  Funny, I was thinking it could read my mail and when I say "this is spam", it would know from then on it would help filter out these mortgage/viagra/etc offers
This reminds me of the Infocom classics (Score:5, Interesting)

by chewtoy-11 ( 448560 ) writes: on Thursday December 04, 2003 @05:06AM (#7626720)

I always loved the text adventure games by Infocom. They were way ahead of their time, and I have been truly amazed on several occasions by the software's ability to 'understand' what I was asking it to do. Of course I'm sure this is leaps and bounds beyond what was available back then, but it's truly amazing how far ahead of their time they actually were.

There is a mailbox here.

Share
twitter facebook
- Re:This reminds me of the Infocom classics (Score:4, Insightful)
  
  by pubjames ( 468013 ) writes: on Thursday December 04, 2003 @05:50AM (#7626857)
  
  I always loved the text adventure games by Infocom. They were way ahead of their time, and I have been truly amazed on several occasions by the software's ability to 'understand' what I was asking it to do. Of course I'm sure this is leaps and bounds beyond what was available back then, but it's truly amazing how far ahead of their time they actually were.
  
  Yes. I can't be the only one that is disappointed that text adventure development essentially died. The great limiting factors always used to be memory (with no disc drives, the whole game had to be stored in a very limited amount of memory) and processing speed. Now that we have both of these in abundance it should be possible to write a real "interactive novel", but I guess that will never happen. Shame, it's a great format for cell phones and pdas.
  
  Parent Share
  twitter facebook
  - Re:This reminds me of the Infocom classics (Score:5, Informative)
    
    by blancolioni ( 147353 ) writes: on Thursday December 04, 2003 @06:51AM (#7626997) Homepage
    
    Interactive fiction hasn't died, and you can certainly play it on your PDA. Furthermore, it's generally acknowledged that the quality of modern works has surpassed that of Infocom. Baf's guide [wurb.com] is probably a good place to dip your toes in, but there's resources all over the place and the annual competition [ifcomp.org] has just finished.
    
    An interactive novel, at least the kind you're probably thinking about with deeply implemented characters and so forth, is probably AI-complete. It's not about the disk space and processor speed, it's about the inherent trickiness.
    
    Parent Share
    twitter facebook
    - Re:This reminds me of the Infocom classics (Score:2)
      
      by pubjames ( 468013 ) writes:
      
      Interactive fiction hasn't died
      
      Yes, I know about the stuff you are talking about.
      
      it's generally acknowledged that the quality of modern works has surpassed that of Infocom.
      
      That's the problem... The modern games have only just surpassed games that were created for machines of 12 years ago.
      
      It's not about the disk space and processor speed, it's about the inherent trickiness.
      
      Not today, but it was an extremely limiting factor when you are trying to get a whole game into 32Kb of memory.
      
      Yes, it is a
      - Re:This reminds me of the Infocom classics (Score:3, Informative)
        
        by Sargent1 ( 124354 ) writes:
        
        There are changes to the various interactive fiction languages to address various problems and shortcomings in the field. The trouble is, most of the easy stuff has been done. What's left now is trying to figure out what hard stuff can be done, or is even worth doing.
        
        For example, right now most of the languages accept sentences of the form [VERB] [DIRECT OBJECT] [PREPOSITION] [INDIRECT OBJECT]. Occasionally someone suggests, "Why not add adverbs?" The general concensus is that doing so suddenly requires th
  - Douglas Adams (Score:3, Funny)
    
    by squaretorus ( 459130 ) writes:
    
    Another area in which the world is poorer for the lack of a Douglas Adams wandering (or more likely flying first class) around it.
    
    I would have LOVED to see him tackle a 'text message adventure' along the lines of the old infocom classics. He has written a number of pieces (some of which are collected in salmon of doubt) about how much he enjoyed this marrage of writing and computing. The flexibility and restrictions of the medium would have led to something pretty neat I'm guessing. Of course - then he'd h
    - Re:Douglas Adams (Score:2)
      
      by thatguywhoiam ( 524290 ) writes:
      
      I would have LOVED to see him tackle a 'text message adventure' along the lines of the old infocom classics.
      He did - a game called Starship Titanic was written by Adams, in conjunction with a game developer (Simon & Schuster? can't remember...)
      It combined a text adventure interface with some nice 3D graphics that would move around above the text box, in a Mystian sort of way. The game itself was very funny, had some beautiful designs and ideas, and was almost totally impossible. In other words it w
      - Re:Douglas Adams (Score:2)
        
        by squaretorus ( 459130 ) writes:
        
        Ive played starship titanic - excellent piece of kit! I meant 'text message adventure' as in text message to your mobile phone!
- Re:This reminds me of the Infocom classics (Score:3, Insightful)
  
  by TwistedGreen ( 80055 ) writes:
  
  Um, infocom's text interface wasn't too complex. I mean, it mostly simple commands in the form "verb + noun."
  
  > open mailbox
  - Re:This reminds me of the Infocom classics (Score:2, Informative)
    
    by Anonymous Coward writes:
    
    That'd be scott adams games.
    
    Infocom's parser was much better. "Put the big bunch of keys in the blue box under the table." can be parsed by it, for example.
    
    As the OP said, this isn't near the level of what's mentioned in the article, but it's certainly better than you imply.
    - Re:This reminds me of the Infocom classics (Score:2)
      
      by jafuser ( 112236 ) writes:
      
      I'm still amazed at how they were able to parse some things. I used to throw all kinds of stuff at it to try to make it look dumb, but more often than I expected it handled things quite well.
      
      Does anyone have any insight into the algorithm they used?
  - Re:This reminds me of the Infocom classics (Score:2)
    
    by JoeBuck ( 7947 ) writes:
    
    It could do a little bit better than this. It understood direct and indirect objects ("give him the orb"), as well as some particles (the difference between "put on" and "put down"), and could figure out some omitted words from context. But they could do this because the situation was so limited.
- Not a coincidence.. ? (Score:3, Funny)
  
  by Channard ( 693317 ) writes:
  
  What's the betting Infogrames code has in fact been reused for this application? Twenty years down the line...
  Auto Greeter Machine: I welcome you to our country, and greet you with open arms. Please enjoy your stay - we have a fine range of tourist facilities, restaurants, bars and so forth. And on a personal note, may I say that you are likely to be eaten by a grue.
comments? (Score:1, Interesting)

by mutagenman ( 721545 ) writes:

Will this get rid of the 10 people who get +5 informative from stealing the link out of the comment a few spots up.
- Re:comments? (Score:2, Funny)
  
  by mirko ( 198274 ) writes:
  
  if these people get an "informative" when they paraphrase the article, they should be metamodded to "insightful"...
  but the day the mods will be replaced by parsers, I think I'll get one to post instead of me.
google? (Score:4, Interesting)

by Anonymous Coward writes: on Thursday December 04, 2003 @05:07AM (#7626724)

so would this allow something like google to pick up a phrase and relate it to the results instead of just picking up keywords?

Share
twitter facebook
- Re:google? (Score:3, Interesting)
  
  by millette ( 56354 ) writes:
  
  Actually, google already does this a little. If I can find an example, I'll reply again. Excite, the old search engine, used to pick out synonyms (well, that's how I heard it explained once) by comparing pages and related content.
  - Re:google? (Score:2)
    
    by zerblat ( 785 ) writes:
    
    I seems like Google has started to use stemming [lancs.ac.uk] (or something similar). If you search for "linux print", it also finds pages containing "linux printing". IOW it considers word with certain suffixes (probably -ing, -s, -ed etc, depending on wordclass) to be equivalent with their stem, i.e. the word with the suffix stripped off. This isn't such an important thing in English, since there aren't so many different suffixes, but it can be very important for more inflective languages.
    - Re:google? (Score:2)
      
      by millette ( 56354 ) writes:
      
      Read about it first in slashdot *hehe* Thanks - didn't know that!
- Re:google? (Score:5, Informative)
  
  by millette ( 56354 ) writes: <robinNO@SPAMmillette.info> on Thursday December 04, 2003 @06:30AM (#7626956) Homepage Journal
  
  Just discovered this [geeklife.com]:
  Now when searching Google, you can use a ~ (tilde) to find pages using synonyms of the word you're searching for. For instance, search for:
  
  css ~help [google.com]
  
  and you'll get sites with tutorials, guides, support, etc.
  
  Parent Share
  twitter facebook
  - Re:google? (Score:2, Interesting)
    
    by Frogg ( 27033 ) writes:
    
    ..also worth noting that Google have recently introduce a very powerful implementation of word stemming. (Yup, this is separate to the synonyms, but is still related)
    
    It's enabled by default - if you want exact match words (like it was a month ago) you need to search for: +keyword
  - Re:google? (Score:2)
    
    by Arslan ibn Da'ud ( 636514 ) writes:
    
    For instance, search for:
    css ~help
    and you'll get sites with tutorials, guides, support, etc ...but you won't get DeCSS!
how it can be useful (Score:5, Interesting)

by Dreadlord ( 671979 ) writes: on Thursday December 04, 2003 @05:07AM (#7626728) Journal

one of the ways I can think of to use this technology is to improve search engine capabilities, instead of looking for exactly the same words, search engines then can look for similar sentences, giving more accurate results.
However, after reading the article, I wonder whether the research can be applied to Latin languages, as they did the research on semantic languages.

Share
twitter facebook
- Correctly paraphrasing is a difficult problem (Score:1)
  
  by Serious Simon ( 701084 ) writes:
  
  after reading the article, I wonder whether the research can be applied to Latin languages, as they did the research on semantic languages
  
  ...is a good example :)
Hrm (Score:4, Interesting)

by Auckerman ( 223266 ) writes: on Thursday December 04, 2003 @05:09AM (#7626732)

I was too lazy to lazy to read the article so I used the Summarize feature in OS X to parse the sentences down since it seems a bit wordy.

Okay, maybe I exaggerate a bit here, I did read the article and while the summarize isn't that far off from what these guys are doing...

Share
twitter facebook
- Re:Hrm (Score:2)
  
  by plumby ( 179557 ) writes:
  
  Word has a similar feature.
  
  So for the 1% summarisation of the article "The sentence-based paraphrasing system could improve machine translation, according to Barzilay".
Google News? (Score:5, Interesting)

by cryptor3 ( 572787 ) writes: on Thursday December 04, 2003 @05:09AM (#7626735) Journal

I'm curious as to whether Google News, since it draws from various news sources and groups articles by topic (similar to paraphrasing, perhaps), uses any of the same techniques.

Share
twitter facebook
- Re:Google News? (Score:1)
  
  by mghiggins ( 61851 ) writes:
  
  That's an interesting question: in the field of language comprehension, is the cutting edge of research in academia or in industry?
  
  Anyone know?
- Re:Google News? (Score:5, Informative)
  
  by Kappelmeister ( 464986 ) writes: on Thursday December 04, 2003 @10:12AM (#7627932)
  
  I'm curious as to whether Google News, since it draws from various news sources and groups articles by topic (similar to paraphrasing, perhaps), uses any of the same techniques.
  
  No, but Regina Barzilay, who is the researcher featured in the article, worked (with me) on the Newsblaster [columbia.edu] project at Columbia University, where she indeed applied these techniques to multidocument summarization. Newsblaster gathers and clusters news like Google News, but produces more sophisticated summaries.
  
  Parent Share
  twitter facebook
Translation software? (Score:3, Informative)

by znaps ( 470170 ) writes: on Thursday December 04, 2003 @05:11AM (#7626740)

I'm sure this would improve translation software too, since a paraphrased sentence should be easier to translate into something sensible.

Share
twitter facebook
Fascinating read (Score:1)

by zhenlin ( 722930 ) writes:

But... I wonder, will it produce 'In Soviet Russia' pseudo-paraphrasing.

I wonder what its' application could be, other than to detect duplicates... Perhaps, a tool to suggest ways of rewriting sentences? Or maybe part of a more advanced grammar check?
- Re:Fascinating read (Score:5, Insightful)
  
  by Jugalator ( 259273 ) writes: on Thursday December 04, 2003 @05:44AM (#7626836) Journal
  
  I wonder what its' application could be, other than to detect duplicates... Perhaps, a tool to suggest ways of rewriting sentences? Or maybe part of a more advanced grammar check?
  
  My first thought was translation tools. GOOD translation tools that understand the grammar in the source language, and uses the grammar in the destination language to form the resulting sentence.
  
  There has been some work on something to solve this problem, where a phrase in language A was translated to some special "universal" code, and then finally to language B. The developers would then need to make the translator translate all languages to the universal code, and vice versa. The universal code could be whatever necessary to make the software as easily as possible be able to preserve the "meaning" of the sentence.
  
  However, if this is done, the problem could change from this:
  
  Source: I love hot dogs.
  Destination: Ich liebe heiBe Hunde. (i.e. a literal translation, from Altavista Babelfish) ... to this:
  
  Source: I love hot dogs.
  Destination: Ich liebe Nahrung. ("I love food")
  
  In case the universal language wasn't advanced enough and the english -> universal translator conversion was "lossy". So we might exchange our current problem with mangled grammar with lots information.
  
  Here's a web site [mundo-r.com] about it, and I'm sure there are many more.
  
  Parent Share
  twitter facebook
  - Re:Fascinating read (Score:2)
    
    by Jugalator ( 259273 ) writes:
    
    Destination: Ich liebe heiBe Hunde
    
    Cool, /. doesn't understand Unicode, and not even Latin characters (!) like the german sharp s. Is it still living in the world of 7 bit characters or what? :-O
    - Re:Fascinating read (Score:1, Interesting)
      
      by Anonymous Coward writes:
      
      They didn't like people using some of the odder Unicode characters to do page widening tricks, and stuff. It's a shame, because some of these extra characters were quite pretty [slashdot.org].
  - Re:Fascinating read (Score:3, Interesting)
    
    by Trejkaz ( 615352 ) writes:
    
    I guess you could try using Esperanto or Lojban as your intermediary language. Lojgan in particular is computer parseable *and* human understandable, so it would probably be the easiest to write translations for.
Fascinating (Score:3, Insightful)

by Raindance ( 680694 ) * writes: <johnsonmx&gmail,com> on Thursday December 04, 2003 @05:12AM (#7626747) Homepage Journal

Things like this are what makes academic research Really Cool and allows useful things to come about, Go Cornell.

I'd note that this is a novel approach, and, for better or for worse, it goes about doing things much differently than our minds do.

Actually, though, it's closer to how humans understand writing (stringing together atomic words/phrases in an implicit context) than previous statistical methods. ... and I'd relate my 2nd and 3rd paragraph if it wasn't 3am here. Goodnight, slashdot. :)

RD

Share
twitter facebook
Paraphrased version (Score:1, Interesting)

by Anonymous Coward writes:

Maybe prostoalex could learn something from the Cornell researchers! How about this for an article summary, eh?

Cornell University researchers could revolutionize the information searching field by analyzing sentences on the semantic level to allow a software application to treat two sentences, expressing similar thoughts and ideas but written in a different manner, as a single semantic unit.
So... (Score:1)

by CyberSlugGump ( 609485 ) writes:

Who will be first to post the paraphrased article so I don't have to RTFA?
Does this mean... (Score:1)

by Powercntrl ( 458442 ) writes:

The days of "All your base are belong to us" Engrish may soon be over? A brand new AirSoft gun I just purchased has the phrase "No point at the creature" molded into the plastic. Don't get me started on the owners manuals for consumer electronics. Japan needs this software, bad. If it comes at a cost of no more "All your base" jokes, well, that's a cost I think society will have to bear.
- - Re:Does this mean... (Score:2)
    
    by Adam9 ( 93947 ) writes:
    
    Sounds like the Chinglish Files [silverladder.com]
It's been done (Score:3, Interesting)

by CanadaDave ( 544515 ) writes: on Thursday December 04, 2003 @05:20AM (#7626776) Homepage

Microsoft Word had AutoSummarize in Word 97, or was it 2000? Anyhow it seems to be absent in Word XP. It was the trashiest thing I'd ever seen. Actually I used to use it all the time to write my abstract. It provided a nice way for me remember everything I talked about in my report, and I think it made an effort to use keywords words which came up a lot in the report. But sometimes it did things which made no sense at all. Too bad Microsoft wasn't Open Source, their AutoSummarize feature might actually be half decent by the year 2003, but instead the abandonned it to work on other projects I guess.
I looked again and whaddayaknow? I asked the paperclip about auto summarize and it is still there in the toold menu afterall! Looks like I don't have that feature installed though.

Share
twitter facebook
- I have it installed (Score:2)
  
  by Inda ( 580031 ) writes:
  
  Summary
  It's been done by CanadaDave (544515) on Thu December 04, 9:20
  
  Microsoft Word had AutoSummarize in Word 97, or was it 2000? Anyhow it seems to be absent in Word XP.
  
  -----
  
  Fantastic bit of programming there, Bill.
  
  Not really the same thing Mr. Dave. :)
Who didn't think of Reginald Barclay? (Score:1)

by philipdl71 ( 160261 ) writes:

Two ideas led to the system, said Regina Barzilay...

Speaking of natural language recognition, I parsed this sentence from the article as reading, "Two ideas led to the system, said Reginald Barclay [startrek.com] ..." :)
Someone help me out here (Score:5, Funny)

by prockcore ( 543967 ) writes: on Thursday December 04, 2003 @05:25AM (#7626788)

I'm too lazy to read the article.. could someone write some software to paraphrase it for me?

Share
twitter facebook
My take on this (Score:2)

by product byproduct ( 628318 ) writes:

If strcmp says that two strings are different, but you say that they mean the same thing, then the problem is with your language, not with strcmp.
- Re:My take on this (Score:1)
  
  by ravydavygravy ( 230429 ) writes:
  
  You're absolutly right - however last time I checked, we all speak some form of natural language, right?
  
  Would you prefer if we all spoke some sort of langauge governed strictly by some computer-linguistic grammar? I'll get started on the Yacc code right away... :-)
  
  ~D
- Re:My take on this (Score:3, Interesting)
  
  by ideonode ( 163753 ) writes:
  
  Yes, but strcmp can say two strings are identical, yet they can convey different information. Big-endian vs. little-endian, anyone?
  
  Binary identity does not imply semantic equivalence. It all depends on how the data is interpreted.
- - Re:My take on this (Score:2)
    
    by jrockway ( 229604 ) writes:
    
    > "It's enormous".
    > "It's immense".
    > "It's massive".
    > "It's huge".
    
    Damn! I have GOT to remember to close the shades before I undress :) But, uh, thanks :)
Japanese manuals (Score:3, Funny)

by Space cowboy ( 13680 ) writes: on Thursday December 04, 2003 @05:30AM (#7626799) Journal

Finally, auto-translate, then auto-parse can rid us of these "manuals" :-)

Simon

Share
twitter facebook
Goodbye, Cliff Notes... (Score:4, Funny)

by IvyMike ( 178408 ) writes: on Thursday December 04, 2003 @05:31AM (#7626800)

Hello, automatic paraphrasing of literature.

P.S. Just joking, kids. Stay in school!

Share
twitter facebook
What about... (Score:3, Funny)

by millette ( 56354 ) writes: <robinNO@SPAMmillette.info> on Thursday December 04, 2003 @05:38AM (#7626820) Homepage Journal

Let's see the srtwfaoe cut its tteeh anigist tihs lttilte puzzle! (blatant reference to an older article [slashdot.org])

Share
twitter facebook
Another Killer App (Score:5, Funny)

by varjag ( 415848 ) writes: on Thursday December 04, 2003 @05:45AM (#7626843)

They should use this technology to transcribe legalese into plain English and back. Like, you feed it with "Due to unanticipated circumstances as listed under the terms of the clause 17(a), we may be unable to comply with your request within this and successive fiscal year(s)", and it spits out "bugger off".

Of course, millions of lawyers worldwide would lose their jobs, but I, being bitten by them, just take it as an added benefit.

Share
twitter facebook
It has to be said ... (Score:1)

by B3ryllium ( 571199 ) writes:

Paraphrase THIS!

(from the I'll-Paraphrase-YOU! department)
I get it. (Score:1)

by yo303 ( 558777 ) writes:

Significant achievements in this area could revolutionize the information searching field.
Significant achievements [GOOD] in this area could revolutionize [IS] the information searching field. [THIS].
yo.
Finally ... (Score:5, Funny)

by makapuf ( 412290 ) writes: on Thursday December 04, 2003 @05:56AM (#7626870)

a "-1, redundant" generator.

Share
twitter facebook
Forget Research! (Score:1)

by eWarz ( 610883 ) writes:

What about true speech recognition? As i understand it this could go a long way towards making speech recognition work effectively. Me: "Computer i want to write an email." Computer: "One moment please."
Paraphrase of the article. (Score:5, Informative)

by fven ( 688358 ) writes: on Thursday December 04, 2003 @05:59AM (#7626878)

Without thinking too much about it, we paraphrase all the time. Trying to give a sentence to a computer to reword, is a complicated task.

At Cornell, University, researchers decided to avail themselves of two different sources of the same news and use computational biology methods to make it possible for computers to automatically paraphrase input sentences. Their first step was to compare the two different sources of the same news.

Eventually, it is hoped that this research will have benefits in computer processing of natural-language queries, translation engines, and in assisting people with certain types of reading disabilities.

The project began when two ideas came together, said one of the Cornell researchers, Regina Barzilay. Regina Barzilay is an assistant professor of computer science at the Massachusetts Institute of Technology.

The vast amount of duplicated content online is a valuable resource for computer systems learning to paraphrase. A number of reporters report the same news but using different wording. The redundant sources of news are able to assist in learning the different ways one piece of information can be paraphrased, as the same basic facts are reported in each. So with these multiple sources, you can sort out the noise and get the facts and then work out different ways of stating those facts.

Even with similar styles of writing, paraphrasing of sentences is more than just working out ans substituting synonyms. The researchers' provide a couple of common business phrases to illustrate this:

After the latest Fed rate cut, stocks rose across the board.
Winners strongly outpaced losers after Greenspan cut interest rates again.

The next step, was to use computational biology techniques to determine how much in common two sentences had and how closely they were related. The technique used was similar to when biologista are looking to see how close two sets of genes are that may have started from the same seed but then evolved. They are different but have a degree of similarity.

They important thing was to compare news sources that were written differently but covered the same event. This generated a whole set of word patterns that were kind of the same. This was exactly the core data needed to inform a computer paraphrasing technique.

The Reuters and AFP news sources were used to test the system. News was selected from English articles produced between September 2000 and August 2002.

The system developed by the researchers performs two groupings; firstly comparing articles from the same source:

Word-based clustering methods were used to identify sets of text that had a high degree of overlapping words. This method identified articles that reported distinct acts of violence occuring in Israel and the Palestinian territories.

Computational biology techniques were then used on these sets of articles to generate lattices or sentence templates for the computer to use. Each lattice contains a number of sets of words that occur in parallel and empty slots where arguments, such as locations, number of fatalities, times and dates can be inserted.

The challenge was to sort out which lattices were indeed due to different events and which were due to writing variability.

The researchers were thus able to identify common templates used by journalists to describe similar events. Ie. journalists who take the same article and change or take out a word, add a detail, reverse the sentence and so on are hereby busted.

One of the templates, or lattices, read: Palestinian suicide bomber blew himself up in NAME on DATE killing NUMBER (other) people and injuring/maiming NUMBER. In addition to the injuring/maiming variable, there are several variables within the name argument: settlement of, coastal resort of, center of, southern city, or garden cafe.

43 AFP and 32 Reuters templates were thus discovered by the system. The researchers then cross-compared these lattices.

They compared the
Read the rest of this comment...

Share
twitter facebook
Pleasure-ism (Score:2, Funny)

by DrewCapu ( 132301 ) writes:

The next generation of students sure will have it much easier than us. How is a teacher supposed to catch plagiarism with software like that?

Oh wait...

Mrs. G: Johnny, come here for a second.
Johnny: Yes Mrs. G?
Mrs. G: What did you mean by "Shrub claimed that Basket Hamper and the Hatchets of Sin will be blown out" in your current events report?
Johnny: Oh, whoops! What I meant to say there was, "Bush says Bin Laden and the Axes of Evil will be defeated." Sorry about that. Darn that defective spell-ch
Obligatory Paraphrases (Score:3, Funny)

by KoolDude ( 614134 ) writes: on Thursday December 04, 2003 @06:41AM (#7626975)

How do you paraphrase Slashdot ?
Ans : Dupes for nerds, stuff that matters again and again.

How do you paraphrase Microsoft Innovation ?
Ans :

Share
twitter facebook
- Re:Obligatory Paraphrases (Score:2)
  
  by Lord_Dweomer ( 648696 ) writes:
  
  "How do you paraphrase Microsoft Innovation ? Ans : "
  Apple?
But could it..... (Score:2, Funny)

by MegaHamsterX ( 635632 ) writes:

But could it understand bablefish translations.
Better idea (Score:3, Insightful)

by richie2000 ( 159732 ) writes: <rickard.olsson@gmail.com> on Thursday December 04, 2003 @07:08AM (#7627033) Homepage Journal

Significant achievements in this area could revolutionize the information searching field.
Not to mention the increased ability to quickly spot "re-written" bought term papers.

Share
twitter facebook
Interesting (Score:5, Funny)

by Illserve ( 56215 ) writes: on Thursday December 04, 2003 @07:09AM (#7627036)

There's this algorithm called Latent Semantic Analysis [coloradu.edu] which has been under development for quite some time (freely available!). It's quite good at comparing the semantic content of 2 bits of speech based on its database of many thousands of book (in fact you can specify the education level by choosing different databases).

The output of LSA has been shown to be roughly equivalent to human scorers for examining summary essays produced in tests.

Point is, that by combining this here paraphrasing algorithm with LSA, we can have computers summarizing text and other computers giving them grades on it. This takes students and teachers out of the equation entirely. Saves us big bucks and get public education back on its feet!

Share
twitter facebook
- Re:Interesting (Score:2)
  
  by Illserve ( 56215 ) writes:
  
  silly Rabbit, messed up the link
  
  It's coloradO.edu
SCO Analysis (Score:5, Funny)

by richie2000 ( 159732 ) writes: <rickard.olsson@gmail.com> on Thursday December 04, 2003 @07:12AM (#7627040) Homepage Journal

I tried running this on all statements and press releases coming out of SCO and Darl McBride for the last six months and after a thorough semantic analysis, this is the resulting summary:
"Pass me the crackpipe, man!"
Proudly karma-whoring since the turn of the millenium

Share
twitter facebook
- - Re:SCO Analysis (Score:2)
    
    by richie2000 ( 159732 ) writes:
    
    Dunno really. Came to think of it (since I felt like I was just karma-whoring with that comment, even if I don't need it) and thought it'd look cool like that.
    Like all other disasters, it seemed like a good idea at the time. :-)
    Patenting fake .sigs since the turn of the millenium
extracting & searching on memes (Score:2)

by crovira ( 10242 ) writes:

and deep contextual dependency.

Neat trick if they can pull it off. Then Google results would really improve.
LOLITA? (Score:3, Interesting)

by spongman ( 182339 ) writes: on Thursday December 04, 2003 @07:47AM (#7627123)

can anyone else shed any light into how far the LOLITA project (under Roberto Garigliano) got at Durham Unversity? Yeah, it's a research project, but last I heard (10 years ago) it was able to parse complete texts (for example, newspaper articles) and answer simple questions based on it. I believe ther was also work underway to make it understand/'speak' chinese/russian. There was also supposed to be some kind of 'script' support which would give it contextual information about certian situations (the common example was what contextual knowlegde do you need to know when you go into a restaurant and how can that knowledge help you understand what is said there).

Share
twitter facebook
- Re:LOLITA? (Score:2)
  
  by QwkHyenA ( 207573 ) writes:
  
  Good job there bud. You just reminded 100k geeks they need to check out the Alt.Binaries tonight. And my download rate was just starting to creep up...
Spamfilter (Score:3, Interesting)

by Goodbyte ( 539941 ) writes: on Thursday December 04, 2003 @08:02AM (#7627180) Homepage

Shouldn't this make it possible to improve spam filters?

Share
twitter facebook
Already been done (Score:2)

by jez9999 ( 618189 ) writes:

Babelfish [altavista.com] already does this.
Advances in Automatic Text Summarization (Score:5, Informative)

by fingal ( 49160 ) writes: on Thursday December 04, 2003 @09:48AM (#7627668) Homepage

If anyone is interested in the history of this field then I would highly recommend the book with the above title, edited by Inderjeet Mani and Mark T. Maybury. amazon [amazon.com]. Lots of very interesting articles, including discourse trees and a brief bit of stuff about summarising non-textual assets such as diagrams, video streams etc etc

Share
twitter facebook
Call Infocom! (Score:3, Interesting)

by Hoi Polloi ( 522990 ) writes: on Thursday December 04, 2003 @10:43AM (#7628283) Journal

Just think of the ramifications this will have for Zork. Now I'll be able to say "Will you just open the damn egg?"

Share
twitter facebook
For the lazy, or interested, a summary via OS X! (Score:5, Informative)

by 2nd Post! ( 213333 ) writes: <gundbear@pacbel[ ]et ['l.n' in gap]> on Thursday December 04, 2003 @11:01AM (#7628476) Homepage

Set on the lowest setting, a summary of the article is:
The method could eventually allow computers to more easily process natural language, produce paraphrases that could be used in machine translation, and help people who have trouble reading certain types of sentences.
At a roughly 10% size:
The researchers used gene comparison techniques to identify word patterns from different news sources that described the same event.
The method could eventually allow computers to more easily process natural language, produce paraphrases that could be used in machine translation, and help people who have trouble reading certain types of sentences.

...When two reporters describe the same news event, for instance, they may use different details, but they tend to report about the same basic facts, said Barzilay.

...you have genes which started from the same kind of seed, and then they change during evolution [but] there is some similarity," said Barzilay.

...Given a sentence to paraphrase, the system finds the closest match among one set of lattices, then uses the matching lattice from the second source to fill in the argument values of the original sentence to create paraphrases.
At a quarter size:
The researchers used gene comparison techniques to identify word patterns from different news sources that described the same event.
The method could eventually allow computers to more easily process natural language, produce paraphrases that could be used in machine translation, and help people who have trouble reading certain types of sentences.

...When two reporters describe the same news event, for instance, they may use different details, but they tend to report about the same basic facts, said Barzilay.

...Second, to sort out sentence similarities, the researchers borrowed techniques from computational biology that determine how closely related organisms are by finding similarities among genes.... you have genes which started from the same kind of seed, and then they change during evolution [but] there is some similarity," said Barzilay.

...Lattices are made up of words or parallel sets of words that occur across several examples, and arguments, or slots, where names, dates or number of people hurt or killed occur.

...One pattern, or lattice, read: Palestinian suicide bomber blew himself up in NAME on DATE killing NUMBER (other) people and injuring/wounding NUMBER.

...Given a sentence to paraphrase, the system finds the closest match among one set of lattices, then uses the matching lattice from the second source to fill in the argument values of the original sentence to create paraphrases.

...The researchers' ultimate goal is to use the system to allow computers to be able to paraphrase like humans, and to understand paraphrases, "but that's very far [off]", said Barzilay.

...Barzilay's previous work, which used a different technique to paraphrase at the level of words and phrases rather than sentences, is part of the Columbia News Blaster project, which summarizes news stories.

...The researchers' system has the potential to accomplish the same thing by taking one human translation and creating 10 paraphrases of it automatically, she said.

...The system could be used to produce paraphrases based on a specific model, for example, for phasic readers, who find it difficult to read certain types of phrases, she said.

...For example, the system learned incorrectly that "Palestinian suicide bomber" and "suicide bomber" were the same, and that "killing 20 people" is the same as "killing 20 Israelis", said Barzilay.
Read the rest of this comment...

Share
twitter facebook
- - Re:For the lazy, or interested, a summary via OS X (Score:2)
    
    by bill_mcgonigle ( 4333 ) * writes:
    
    Get some text on your screen with a Cocoa app. Say, this post with Safari.
    
    Select the text.
    
    Choose from the Application (e.g. Safari) menu in the menubar Services...Summarize.
    
    The Summary tool pops up. Horray! The sad part is they demoed it at MacWorld Boston '97, and released it in Jaguar, IIRC.
  - Re:For the lazy, or interested, a summary via OS X (Score:2)
    
    by 2nd Post! ( 213333 ) writes:
    
    Describes [mercury-soft.com] it a little, since it's written with Apple's Summarize Service.
    
    I think Apple uses the service internally in their file indexing and search feature, too!
Online Machinese syntax parser (Score:2)

by Jugalator ( 259273 ) writes:

Here's a site demo'ing the Machinese syntax parser. It can build parse trees for sentences you type in where the components in the sentence are separated and related to each other.

http://www.connexor.com/demos/syntax_en.html
How I do this in my product (Score:4, Interesting)

by MarkWatson ( 189759 ) writes: on Thursday December 04, 2003 @11:24AM (#7628680) Homepage

I use a fairly effective algorithm to do this in my product:
I first classify the text into a category, then weight every word in the text based on how much it contributed to this classification - I then output as a "summary" of the one or two sentences in the original text that most contribute to the classification of the entire text.
Not really sumarization, but useful.
-Mark

Share
twitter facebook
Translations (Score:2)

by z_gringo ( 452163 ) writes:

Depending on how that develops, it will have a great impact on translation software.

Imagine, using a computer to translate from one language to another, and end up with a gramatically correct result. That would be amazing..
- Re:Finally! (Score:2)
  
  by WuphonsReach ( 684551 ) writes:
  
  Similar to one of the short stories at the start of the Foundation series where the foundationers are visited by a high-ranking official from the old empire.
  
  He says a lot while he's there, but after they run it through some sort of language processor they find out that he said exactly *zip*.
  
  Aren't weasel-words fun?

There may be more comments in this discussion. Without JavaScript enabled, you might want to turn on Classic Discussion System in your preferences instead.

This translation just got out (Score:2, Funny)

Re:This translation just got out (Score:1, Redundant)

Re:This translation just got out (Score:1)

The problem is... (Score:4, Insightful)

Re:The problem is... (Score:1, Interesting)

Re:The problem is... (Score:1)

Re:The problem is... (Score:1)

Re:The problem is... (Score:5, Informative)

Re:The problem is... (Score:1)

Yes. (Score:5, Funny)

Re:Yes. (Score:2)

Re:Yes. (Score:2)

First use of this technology (Score:5, Funny)

Re:First use of this technology (Score:2)

Re:First use of this technology (Score:2)

Re:First use of this technology (Score:2)

Re:First use of this technology (Score:2, Insightful)

Re:Second use of this technology (Score:2)

Schoolkids (Score:3, Interesting)

Re:First use of this technology (Score:2)

This reminds me of the Infocom classics (Score:5, Interesting)

Re:This reminds me of the Infocom classics (Score:4, Insightful)

Re:This reminds me of the Infocom classics (Score:5, Informative)

Re:This reminds me of the Infocom classics (Score:2)

Re:This reminds me of the Infocom classics (Score:3, Informative)

Douglas Adams (Score:3, Funny)

Re:Douglas Adams (Score:2)

Re:Douglas Adams (Score:2)

Re:This reminds me of the Infocom classics (Score:3, Insightful)

Re:This reminds me of the Infocom classics (Score:2, Informative)

Re:This reminds me of the Infocom classics (Score:2)

Re:This reminds me of the Infocom classics (Score:2)

Not a coincidence.. ? (Score:3, Funny)

comments? (Score:1, Interesting)

Re:comments? (Score:2, Funny)

google? (Score:4, Interesting)

Re:google? (Score:3, Interesting)

Re:google? (Score:2)

Re:google? (Score:2)

Re:google? (Score:5, Informative)

Re:google? (Score:2, Interesting)

Re:google? (Score:2)

how it can be useful (Score:5, Interesting)

Correctly paraphrasing is a difficult problem (Score:1)

Hrm (Score:4, Interesting)

Re:Hrm (Score:2)

Google News? (Score:5, Interesting)

Re:Google News? (Score:1)

Re:Google News? (Score:5, Informative)

Translation software? (Score:3, Informative)

Fascinating read (Score:1)

Re:Fascinating read (Score:5, Insightful)

Re:Fascinating read (Score:2)

Re:Fascinating read (Score:1, Interesting)

Re:Fascinating read (Score:3, Interesting)

Fascinating (Score:3, Insightful)

Paraphrased version (Score:1, Interesting)

So... (Score:1)

Does this mean... (Score:1)

Re:Does this mean... (Score:2)

It's been done (Score:3, Interesting)

I have it installed (Score:2)

Who didn't think of Reginald Barclay? (Score:1)

Someone help me out here (Score:5, Funny)

My take on this (Score:2)

Re:My take on this (Score:1)

Re:My take on this (Score:3, Interesting)

Re:My take on this (Score:2)

Japanese manuals (Score:3, Funny)

Goodbye, Cliff Notes... (Score:4, Funny)

What about... (Score:3, Funny)

Another Killer App (Score:5, Funny)

It has to be said ... (Score:1)

I get it. (Score:1)

Finally ... (Score:5, Funny)

Forget Research! (Score:1)

Paraphrase of the article. (Score:5, Informative)

Pleasure-ism (Score:2, Funny)

Obligatory Paraphrases (Score:3, Funny)

Re:Obligatory Paraphrases (Score:2)