Forgot your password?
typodupeerror
Graphics Software

Words That Speak a Thousand Pictures 102

Posted by timothy
from the high-tech-meets-expired-copyrights dept.
venolius writes: "The New York Times (free registration required) has an article on TextArc (created by W.Bradford Paley), a site that "aids in the discovery of patterns and and concepts in arbitrary text" (from the detailed overview at TextArc). The site serves an applet that performs the task (texts on which analysis is available include Alice in Wonderland, Hamlet, and thousands of others -made available by Project Gutenberg-). The NYTimes article reports that Paley found that "Dracula", which relies on a strong storyline had a few keywords clustered hotly at the center, and that the metaphoric "Frankenstein" generated a circle of 50 words of modest intensity that faded towards the edges. "Portrait of the Artist as a Young Man" with evenly distributed key words produces tight and round lines and "Alice in Wonderland" produces loopier lines. Check it out! (the applet was tested on better hardware, but I did well enough with 98/IE6/550MHz/64MB)"
This discussion has been archived. No new comments can be posted.

Words That Speak a Thousand Pictures

Comments Filter:
  • Don't ever do that to my browser again...
    • that slashdot posting a link would indicate that it doesnt do horrible things(tm) to your system / browsing experience....

      guess not...
      • Worked pretty smoothly in Netscape 6.2 on my G3/450 with 384MB

        However, when I loaded the Bible, it chewed up just about all of the available memory. The machine started choking a little and swapping to the hard disk, something I haven't heard it do in a *long* time.

        See this [frontiernet.net].
    • yeh, it only works on those systems where you don't have things locked down, and even then it seems broken.

      probably not properly load tested or something.

      well we just took care of that.

    • Phew-uck! A perfect example of why Java needs to die...
    • I thought people with browsers with sucky Java VMs would do the smart thing and disable Java?

      Ran fine in JRE 1.4 on my Opera 6.01 at least, though I kinda failed to see the point of it.
    • Congradulations, morhoj! Your first post has been officialy recognized as the true First Post

      Current Statistics:

      Logged in FPs: 4
      AC FPs: 0

      First Posters:

      1 - morhoj
      1 - Spanko
      1 - teambpsi
      1 - Tensor
    • And that's not even Konqi 3...

      Konqueror 2.2.2, Java 1.3.1, Linux.

    • Re:Please... (Score:2, Informative)

      by WBPaley (574192)
      Hi people,

      Thanks for all the discussion!
      Here are some notes from the perpetrator (Brad)...

      >by morhoj on Tuesday April 16, @07:29AM (#3349188)
      >Don't ever do that to my browser again...

      Valuable feedback; perhaps more gracefully put by

      >by Paradise Pete on Tuesday April 16, @09:34AM (#3349613)
      >I think his complaint was that it did it unexpectedly.

      I have put in a warning about the screen takeover; Others say there are ample warnings about the research & speed issues, so I left that alone. I agree that /.should link to Alice.html and
      Hamlet.html and Thousands.html, where the warnings are, rather than directly to the page that opens the applets. Can this be changed now, so others don't have morhoj's problem?

      ---

      >by reo_kingu on Tuesday April 16, @07:33AM (#3349201)
      >is this really new? I think maybe some of my teachers having
      >been using this thing to grade papers.

      Don't know if it's new, but I haven't seen it before.

      >by big.ears on Tuesday April 16, @10:41AM (#3350110)
      >...factor analysis on text. It maps every word in a text into about a
      >100-dimensional space, based on how often they co-occur in similar
      >contexts. If you feed those factors into a clustering algorithm or and
      >multi-dimensional scaler in order to present it graphically, you probably
      >get something very close to this trick.

      Flattering, but I was trying to come up with something easier to write and explain. This trick uses arithmetic (each word is drawn at its average position) not math. Net pull of a bunch of rubberbands is easier to explain _and_ conceptualize for a lot of my audience.

      ---

      >by proxybyproxy on Tuesday April 16, @07:56AM (#3349250)
      >Once again Project Gutenberg shows its beautiful face. ...

      Hear, here! Inspiring and generous work.

      ---

      >Just ran Slashdot through it (Score:2, Funny)
      >by Anonymous Coward on Tuesday April 16, @08:43AM (#3349375)

      ;)

      ---

      >by TheCrunch on Tuesday April 16, @09:28AM (#3349577)
      >(User #179188 Info | http://www.slippersandpipe.co.uk/) But a word
      >of warning to anyone else running Win98 on a P133 with 64MB RAM.
      >This thing nuts your machine. I can't get it off my desktop. I'm gonna
      >have to reboot again.. arg.

      Sorry... That warning's now on the intro pages to each applet

      ---

      >The Emperor Is Naked! (Score:1, Informative)
      >by robbway on Tuesday April 16, @09:47AM (#3349706)
      >I have to say it: I see no value in this. The mathematical algorithms do
      >more to shape the images than the words themselves. My opinion is
      >that this is rather unartistic, uninspiring, and doesn't reveal anything
      >about language at all.

      A damning observation, if it were true. I also have little respect for artsy code that doesn't express the variability in the data. In fact, the only "algorithm" here is the averaging, so any variation _must_ come from the language. They initially look similar, but so do leaves to people who don't get into the country a lot. For some people developing a feel for how different texts reveal themselves here might be worth the time. But I expect that will take more than a few minutes.

      As to unartistic--I'll weigh your opinion with Larry at the Whitney, Bruce at Columbia, Matt at the Times, Sara at Banff, and a few dozen others as I decide whether it's art. (I made it as an ndex/concordance).

      I agree that it doesn't say anything about language, but leaves don't say anything about biology. _You_ gotta provide the intelligence.

      Actually, it was built to tap into the human brain's pre-attentive processing abilities. (Oh no, do I need to provide a warning now that it'll take over your brain as well as your desktop? ;) You can actually read many more words than you are consciously aware of as your eye scans text. I hoped that as your eye jumps from word to word
      in a TextArc it wasn't jumping randomly, but to the next most "important" word, where "importance" is some function of brightness (frequency), position (distribution), and recency of concept activation,
      or level of interest (in your own head). It seems to work especially well in the 32" x 20" printed versions. Different people read different things.

      ---

      >Wishing I could see an example... (Score:1)
      >by BobTheJanitor on Tuesday April 16, @10:33AM (#3350032)

      Some screen shots are on the site, lower right button. (Guess I should make it more prominent.) http://textarc.org/Stills.html

      ---

      >Dark grey text on black background? (Score:1)
      >by an_mo on Tuesday April 16, @11:20AM (#3350462)
      >If textarc.org [textarc.org] continues to publish their stuff
      >with dark grey text on a black bacground they're not
      >reacing for the masses.

      Oops. Fixed, I think. (Do you?)
  • is this really new? I think maybe some of my teachers having been using this thing to grade papers.
    • there was never anything as sophisticated as that used for grading papers...

      I heard a rumour that grades were assigned by how close the teacher got to the target while holding the paper in her hand in a game of "pin the tail on the donkey"
    • Re:You know.. (Score:3, Interesting)

      by big.ears (136789)
      You are probably thinking of LSA: (Latent Semantic Analysis) [colorado.edu], which was 'Invented' by former Bell labs researcher (and current U of Colorado psych. prof) Tom Landauer. He uses it to grade his papers, and others probably do as well. It uses the same principle that some search engines (e.g., excite) are based on, and essentially amounts to factor analysis on text. It maps every word in a text into about a 100-dimensional space, based on how often they co-occur in similar contexts. If you feed those factors into a clustering algorithm or and multi-dimensional scaler in order to present it graphically, you probably get something very close to this trick.
  • I wanted to generate one on this comment and see how it handled recursion...
  • alt.sex.stories.* And see what the results are!! :) What "patterns" might develop? The mind reels!!
  • by fabiolrs (536338)
    "All your base are belong to us"

    "Somebody set up us the bomb"
    Is there any pattern there?
  • by tessellation (133537) on Tuesday April 16, 2002 @07:46AM (#3349231)
    ...the one we already have, that is:

    map [lexfn.com] connections between two words, concepts, or famous names

    see [rhymezone.com] a word's rhymes, synonyms, definitions

    and I leave the rest to you.
  • Free reg. (Score:2, Informative)

    by k98sven (324383)
    As usual, one can change the www.nytimes to
    archive.nytimes to acces the article without registration.
  • and and it even filters out extraneous conjunctions!

    Brought to you by the Associated Federation of Organizations.
  • Gutenberg (Score:5, Interesting)

    by proxybyproxy (561395) on Tuesday April 16, 2002 @07:56AM (#3349250)
    Once again Project Gutenberg shows its beautiful face. If you haven't heard about it before, then read a Wired feature here [wired.com]. Michael Hart started the project years ago and he wants to digitize anything which is out of copyright. The uses are infinite (think of the blind who can fead texts to tactile printers, for example), which this story also shows.

    Anyway, Hart is a big supporter of sensible copyrights (read the feature) and if you can spare the time, help him by digitizing your favourite book.
  • by kvn299 (472563) on Tuesday April 16, 2002 @08:05AM (#3349269)
    Although I only viewed one book, it came up with some interesting results. I'd be curious to know how similar an authors books are to one another... can this distinguish an author's style, or merely individual works.

    I also imagine that a college professor might be interested to run this against term papers!

    • I'd be curious to know how similar an authors books are to one another... can this distinguish an author's style, or merely individual works.


      This was exactly what I was thinking. I'm studying an authorship problem of a Sanskrit text, and would like to try this on the work and other works allegedly from the same author.

      By the way, the applet rendered properly on my iBook with OmniWeb. Didn't need Windows or Pentium III or 256MB RAM.

  • Just so you know (Score:2, Interesting)

    by sielwolf (246764)
    The Jargon File is out there and, oddly enough it too looks pretty similar to the others described. I don't know that is speaking highly of the JF or poorly abou the rest of the work out there.
  • Works great on my SGI... (250 MHz R10K Octane, 256 MB, Netscape 4.79, Java 1.3.1, IRIX64 6.5.15m).

    Got the latest versions from here:
    http://www.sgi.com/products/evaluation/

    Zipping thru some CS Lewis right now. Very, very cool!

    [snazzy sig here]
  • by wbg (566551)
    throws my library of books about web-usability at those biggots...
    nice try. but staring at white noise on my TV is more fun.
    or listening to "cat /boot/vmliunz > /dev/audio"
  • Market trends. (Score:3, Interesting)

    by Faux_Pseudo (141152) <Faux,Pseudo&gmail,com> on Tuesday April 16, 2002 @08:33AM (#3349344) Homepage
    This is very nice looking.
    Would make a really cool screen saver if it where in c and not java. Any volentears?
    But now I must put on my "think like corp. hat"
    Some publisher goes out and maps all the great books and compairs them with current best sellers. Coralate the patterns and then decide that Fromat X creates the best sellers that people buy. Now they refuse to print any book that does not fit their demo graphic of what they concider to be the next best seller.

    Its only a matter of time befor these kinds of things are used like a DNA test to see weather a book has good "genes" or bad "genes".

    I know it sounds like a conspearicy but I have seen corp.s do stranger things in attempting to repeat past successes. Just look at the movies. We are about to release Star Wars -2 in the name of working on a tried and true formula that started with the release of Jaws II. Did anyone else catch the Special on PBS (frountline i think) that talked about how Jaws was the birth of the end of original movies as we knew it?
  • by Anonymous Coward
    There was a dual explosive center around the words "FreeBSD" and "Linux", a tangled network emanating from the phrase "Alan Thicke" and complex sparkling array of connections between the words "Gates", "Microsoft", "monopoly", and "blue screen". There was also a massive weird lumping around the word "Stallman" but it crashed my browser.
  • Word linkages (Score:3, Interesting)

    by Blue23 (197186) on Tuesday April 16, 2002 @08:55AM (#3349420) Homepage
    This is exceedingly facinating. I've worked with word associations for computer authoring, mostly Markov chains of various lengths and phrase-structure stuff. While this takes works for human authors and works out from that, there are some very interesting concepts in here which may be useful in the other direction.

    And on top, a wonderful way of displaying it, to catch the eye so the brain has time to engage. 8)

    =Blue(23)
  • by JJ (29711)
    This is certainly a very interesting tool for summarization and analysis. Viewing it thru an NLP perspective, it converts a text into a purely visual representation. It would be interesting to examine writing from different communication channel dominant authors and check for the pattern differences. It would also be helpful for checking consistency of translations.
  • Rosetta Stone (Score:3, Interesting)

    by JJ (29711) on Tuesday April 16, 2002 @09:05AM (#3349464) Homepage Journal
    TextArc would certainly be a useful tool for analysis of undeciphered languages and texts. Ventris certainly could have used this for Linear B. The only big limitations would be requiring a suitable sized text and having a consistent meaning to that text. As in, the Rosetta stone probably was not a long enough text to analyse this way.
  • by iotk (547332) on Tuesday April 16, 2002 @09:09AM (#3349483)
    Does anyone besides me think that this kind of technique could provide stronger protection in cases of source code piracy such as GPL violations, theft of codebase, etc.,? By generating visual patterns based on the occurrence of keywords (or even compiled bytecodes) a signature of a codebase could be generated that is still recognizable even after comments have been stripped out or subtle changes introduced. This could be immensely valuable in GPL infringement cases.
  • But a word of warning to anyone else running Win98 on a P133 with 64MB RAM. This thing nuts your machine. I can't get it off my desktop. I'm gonna have to reboot again.. arg.
    • Go upgrade your OS to something good (Preferably non-bill-product), and upgrade your hw and try it. It is really fascinating. :) Kudos to TextArc team!
  • by robbway (200983)
    I have to say it: I see no value in this. The mathematical algorithms do more to shape the images than the words themselves. My opinion is that this is rather unartistic, uninspiring, and doesn't reveal anything about language at all.
    • Maybe because you read mathematical equations more than books. :) I read lots of books, I have 2 undergrads, 3 masters, etc. And I find it very inspiring. :) And it is nice that it is written with Java, so that MS, SGI, LINUX, MAC users are able to see this. :)
  • Amazing... (Score:3, Funny)

    by dcigary (221160) on Tuesday April 16, 2002 @10:05AM (#3349818) Homepage
    One of the coolest, and most useless things I've ever seen.

    I like it!
  • Could someone host a screenshot? I'd love to see this work, but the java on my browser craps out on me, so I only see an empty grey box.
  • This reminds me of the Virtual Theasaurus [plumbdesign.com]. Seems like it's doing about the same thing, but it a much simpler way.
  • Unpossible! (Score:2, Funny)

    by elphkotm (574063)
    Java is too slow to do such intense computationalizations! They should use something fast, like VBScript!
    • Whaaaat? VBScript? Hahahahahaha! Read the posts, people using Windows, Mac, Linux, SGI boxes are able to see it working. Thanks to Java. :)
  • Notice the "patent pending" notice on the site.

    While this is a delightful little entertainment, and quite fun to play with (though a bit of a hog while it's running, not to mention my difficulty in getting it to run in Mozilla on Win32), semantic networks have been around forever. Let's hope the patent application is meant to keep things like this in the public domain, rather than fencing in yet another area of the commons.

  • If textarc.org [textarc.org] continues to publish their stuff with dark grey text on a black bacground they're not reacing for the masses.

    I have to higlight the text in order to be able to read it.

    • erm. it's intentional that words fade away the longer it is since they were last used.

      least, that's how i interpreted it.

      The actual story (which appears along the bottom) is nicely highlighted as it goes along. If you can't read it, check your monitor contrast settings.

      personally i think it was interesting, and would (as someone else mentioned) make a classy screen-saver. But I can't actually see a decent use for it.

      ~Cederic
  • Current tally from W.Bradford Paley's Bio:
    Semi-interesting javascripts - 1
    Dates with real women in the last 6 months - 0
  • Just for fun, I loaded the Old and New Testaments into the thing just to see if there could be any interesting and or humorous relations in the text. One thing that I thought was mighty interesting was the fact that "God" was smack dab in the center of everything.

    Clicking on "God" linked to damn near everything. My screen lit up yellow like the sun. Well, I guess that's one book that knows its topic!

    Unforunately, the text is so large that it really didn't render very beautifully. It was really jumbled. It might be time to crank up to super-res...

    -AP
  • beowulf
    /--------\
    |first post|
    \--------/
    grits

  • i do not know what this is exactly good for, but i do know that it is fscking beautiful!
    caroll the old mathematician would have loved it too i believe.

  • One of my major complaints with English (North American bias here...I should say Literature) in high school was that we learned about creative prose or patterns in word choice from teachers. Basically, they told us who used what word choice and we repeated these thoughts on exams. I really like that this tool adds something more concrete to such statements...I'm waiting for some recent Tom Wolfe novels to appear in the Gutenburg database so I can confirm my suspicion that he overuses 'solar plexus'.

This system will self-destruct in five minutes.

Working...