Journal Journal: beat geek 5
-------------------
I've been working on a project (nicknamed "beat geek" in my head) that uses the digital equivalents of dada/beat cut-up techniques and other forms of randomness in or artificial generation of language.
For example, I have a program called autopoem (written by Bill Sethares) loosely based on an idea from Shannon's original paper on information theory.
Suppose you took all the words in the English language and calculated how often the character "s" is followed by the character "t", the character "e", and so on. You'd end with a table of transition probabilities that showed how often each letter is followed by any other letter (or punctuation mark or space) and starting with a single seed letter you could generate "english-like" words randomly. The output using the probability that a single letter is followed by another letter doesn't actually resemble English much, nor does the output using probabilities based on two letter combinations (how often is "th" followed by "e", by "a", and so on) but by the time you get to 3 letter combinations, (how often is "the" followed by "a" or by "s") the output starts to look a lot like "twas brillig and the slithy toves", like ye olde englishe with very creative spelling.
The scheme I described above is difficult to implement in practice, because the table of probabilities gets big fast as the number of letters used to determine the next letter gets longer. Autopoem uses a particular text as a source and instead of generating a table of probabilities it scans the text looking for the next of the letter sequence, say "the", and then selects whatever letter or punctuation mark comes next, say "a", then it continues scanning until it finds the next occurrence of "hea", and selects the following letter, and so on. the longer the sequence of letters, the more likely it is that whole words or phrases from the original text will appear in the output. An alternative version, requiring a reasonably long text, applies the same principle on the word level, how often is the word "red" followed by the word "hat" or "dog" or so on.
Here's some autopoem output:
Your strip of entirely
tired witches scarecrow me at night
That reached the next
He witches at and glow in a cruel head
Done behind the mark
Nothing but the Land of blue
And the green wizard answer with sharp teeth
(anyone care to guess the source text?)
Other ideas/algorithms/programs that fall into the same genre are dilbert's corporate values generator (now defunct?), eliza (especially when she interacts with zippy), madlibs (I don't know of a computer application), scott reynen's poetry and prose generators, rob malda's poetry generator (currently offline) & googlism.
Any suggestions or links to related ideas or programs would be greatly appreciated --- anything having to do with language generated digitally would be of interest.