## JournalJournal: Noddy and Mr. Miyamoto?

Tonight, Nintendo's new video game console enters the hands of the obsessive-compulsive Americans who have queued up for over 12 hours. I am not one of them. So to pass the time, I put "Wii" into Google image search. Twice. But one result disturbed me (see image): Why is Noddy with Shigeru Miyamoto?

Blyton will probably take me to court for this given an incident from 1999 where I compared Noddy (pics) to Pinocchio (article) on a web page and got a cease-and-desist.

## JournalJournal: Bayesian Filtering: Is It Doomed?7

Bayesian text classification is a statistical method of determining the probability that a message is in a given category. It works by making a database of how often each word occurs in messages from a corpus that are or aren't in the category, looking up this probability for each word in a given new message, and then using Bayes' theorem on the probabilities to predict how likely the message is to be in the category.

When applied to the corpus "e-mail" and the category "unsolicited bulk e-mail", the method is called Bayesian spam filtering. For example, words such as "Viagra", "mortgage", "Rolex", "Nigeria", and the like are likely to occur in spam, but some other words are more likely not to occur in spam. Many e-mail service providers applied Bayesian filtering to their customers' incoming e-mail and moved likely spam into a separate folder. This worked ... for a while.

After several months, spammers discovered ingenious techniques to defeat filters. First they disguised the operative words by "creatively" spelling them, Some spammers just misspelled key words: "Ciallis", "mortagee". Others randomly replaced letters with near-homoglyphs from l33tsp34k or from foreign languages: "Viagra" became "Wla9ra" or "\/ 1 A G R @", or "porno" might use a Greek omicron or Cyrillic o or replace the 'p' with the Greek rho or Cyrillic er, both of which look like a Latin 'p'. Anti-spam filters eventually began to check for such techniques and flag them specifically.

Later, spammers attacked the method by using innocuous words in e-mail in order to fool the filter into thinking that a message is not spam. First they used random sequences of letters. Filters blocked words with too many consonants for the target language. Then they used random dictionary words. Filters blocked too many long words in a row. Then they used sentences from literature, as seen in so-called Gutenberg spam and Hobbit spam. These techniques are intended to increase the spam probability of innocuous words, introducing noise into the database and causing the filter to misclassify messages.

However, not all people have the same words marked as not-spam. For instance, people on a constructed language mailing list are more likely to have linguistic jargon marked as not-spam, while people on a video game mailing list may have video game terminology marked as not-spam. Thus, a spammer could collect addresses from a newsgroup, a public web board, a public mailing list, or the contact page of a public web site, and associate each address with words that appear on the same page as the address. How will Bayesian filters block this? Can it be blocked at all?

## JournalJournal: Pie Lovers Wobble But They Don't Fall Down3

The old

The new

Weebles + Bob + 355/113 = Wobbl and Bob, now on DVD.

## JournalJournal: Yes, Copyright Infringement Is Theft.3

At least in Indiana.

In the United States, federal law defines "copyright infringement" in Title 17, United States Code, and state law defines "theft". For example, in the State of Indiana, Indiana Code 35-43-4 defines the crime of "theft" as "knowingly or intentionally exert[ing] unauthorized control over property of another person, with intent to deprive the other person of any part of its value or use". In turn:

a person's control over property of another person is "unauthorized" if it is exerted: [...] by transferring or reproducing:

• (A) recorded sounds; or
• (B) a live performance;

without consent of the owner of the master recording or the live performance, with intent to distribute the reproductions for a profit."

So yes, even pedants should recognize that some copyright infringements are considered theft. If you can come up with analogous laws in other U.S. states, please post the details in comments.

## JournalJournal: Spreading Myself Too Thinly?3

That does it.

People often discuss several clique web sites that require some sort of invitation before essential parts of the site become available. I'm not a Freemason; I'm not big on secret societies. I try to ignore those sites to the extent that I can because I don't want to jump in head-first without testing the waters.

In my spare time, I maintain free software for PC and Game Boy Advance and am in the middle of writing an ambitious GBA programming tutorial. However, I'm not entirely sure that the projects I maintain have a high enough profile in the general interest community to attract the "certifications" that allow me to write anywhere but inside my own profile page. Free software advocates seem to prefer to certify people who design their software to run natively on popular free software operating systems that run on PC hardware. However, though I do make an effort to use cross-platform toolkits, I currently do not and cannot test my PC software on any platform but Microsoft Windows. I can't just switch to GNU/Linux because it has no drivers for peripherals that I own, and I cannot afford to purchase new compatible peripherals. I can't just dual-boot because I have processes that don't like to be started and stopped every hour with downtime. Therefore, it appears I'm not the model free software developer that Advogato is shooting for.

MetaFilter

MetaFilter doesn't accept new users because it wants the community to remain small, that is, not much over 17,000 members. The administrator discovered that not only does the MeFi system eat copious amounts of valuable traffic and computing resources, but also the MeFi format itself doesn't scale past that many members for at least two reasons: things would drop off a reasonably-sized front page too quickly, and it would take too much labor to clean up inevitable dupes. It appears that the administrator wants erroneous information to persist uncorrected on comment pages and wants prospective new users to migrate to competing sites such as MonkeyFilter. Likewise, people who found Kuro5hin locked-down for several months were driven to Hulver's site instead.

Orkut

This is the biggie. Orkut is a purportedly popular by-invitation-only social networking web site. From what I've gathered in comments to this Slashdot story, Orkut is just a big bulletin board, not much better than a Yahoo! Group and much slower and less stable. Second, it's said to be full of Brazilians who refuse to use English in communities designated as English-speaking. Third, be prepared to delete Portuguese spam from your internal private message mailbox. Finally, for all I can tell, it might not even exist; it could just be an elaborate hoax, as broad and deep from the outside as EA's old Majestic immersive game.

Oh, and the name "Orkut" means something not safe for work in Finnish.

Gmail

Other than the increased storage space, is there really anything significant that Gmail provides that other popular web mail doesn't? Does it warrant switching e-mail providers from SpamCop?

Here's an invite code. Or here's a site that doesn't need an invite code. Just try the site, and if you can't get the hang of it, quit.

I value my time. Between participating in online communities [S] [G] [D] [B] [N], exercising at a local gym, writing free software, and babysitting, I feel that I may already be spreading myself too thinly. In fact, I have had to become nearly inactive in several communities [K] [U] [P] [T] [R], to the point where some administrators have even deleted my account one or more times.

Perhaps when somebody decides that one of these communities wants me, by sending me a well-reasoned explanation of what I could get out of a membership, such as job leads in northeast Indiana, then I'll decide that I want the community. If you wish to contact me privately, feel free to do so.

## JournalJournal: How the Drinking Age Cements the Record Cartel6

The Constitution for the United States of America is the supreme written law of the United States. It lays out a set of powers for an elected legislature called the Congress, reserving power over everything else to the several states (50 at last count). The Congress regulates commerce across state lines, but each state regulates commerce within its borders. This would seem to allow each state to set its own drinking age.

However, the Constitution has more to say: "The Congress shall have power ... To establish post offices and post roads". Nobody would want young intoxicated drivers on the highways, running the risk of colliding with postal trucks. Thus, courts have interpreted this grant of power as letting the Congress dictate the conditions under which states can qualify for federal funds for improving their highways.

Each state has power to set its own minimum age to purchase and consume "drinks" (beverages containing ethanol), but the Congress will not grant highway funds to states whose drinking age is less than 21 years. To make it easier to enforce this law, states have established separate licensing for establishments that serve food: "restaurants" admit minors, and "bars" don't. States also limit the amount of drinks that restaurants can serve.

A "rock band" is a group of people who routinely perform live rock music together in front of an audience. A rock band can choose to perform in any of several venues: a recording studio, a stadium, a theatre, a restaurant, or a bar. Problem is that many people won't spend money on a record they've never heard, and radio stations charge an exorbitant "independent promotion" fee to get a record played. Stadiums and theatres also charge an exorbitant venue fee, which many local rock bands cannot afford. This leaves restaurants and bars, and very few restaurants find it profitable to let rock bands perform on their premises.

Local rock bands also have trouble getting their records heard on the radio.

Therefore, minors have nowhere to turn to see a local rock band perform. A captive audience of teenage listeners is exactly what the largest publishers of recorded music (hereinafter "major labels") want, as they find it easier to cultivate a Britney Spears or *NSYNC than to find real musical talent. Instead of buying records at shows, they buy what they've heard on the radio, which the major labels control, or what they've seen in stadiums and theaters, which the major labels also control.

## JournalJournal: Five Blockers to Linux21

Conventional wisdom holds that at least the following five problems block the adoption of Free operating environments such as GNU/Linux on home computers. What steps have GNU/Linux advocates begun to take in order to fix these?

1. The only consistency among graphical applications for GNU/Linux is that they consistently ignore the GUIdelines of their desktop environment.
2. Best Buy carries no peripherals with a penguin on the front of the box. A penguin would indicate that the IHV has chosen to include working Linux drivers on the disc bundled with the hardware. "Print out your distribution's hardware compatibility list and carry it into the store" does not easily apply to gifts from relatives.
3. Best Buy carries virtually no recent release proprietary 3D games designed for GNU/Linux, other than those few M-rated first-person shooters that include a Linux client binary on the CD alongside the Windows binary. Parents may find M-rated games unacceptable, or players may prefer MMORPGs or tactical simulations.
4. Best Buy carries no recent release proprietary educational games designed for GNU/Linux. People buy computers to run Reader Rabbit.
5. GNU/Linux lacks a DVD Video player application licensed by DVD Forum and DVD CCA.

