Want to read Slashdot from your mobile device? Point it at m.slashdot.org and keep reading!

 



Forgot your password?
typodupeerror
×
The Internet

Submission + - GeoCities shuts down

Lillesvin writes: Today is the day GeoCities shuts down. For those who didn't grow up with those GeoCities banners and poorly formatted (at best) HTML, this might seem odd to even mention, but those of us who did grow up with it will remember it with mixed feelings. After all, in spite of the poorly written content and the sad HTML, some people seem to think that GeoCities and the likes provided part of the basis for the social networks we know, love---and hate---today.

Comment Re:Could have told you writing analysis was bogus. (Score 1) 96

That is true, but that's where the habitual aspect comes in. While you may be conscious about various aspects of your writing style, there are certain areas that are less prone to conscious manipulation --- e.g. certain syntactical constructions or your active vocabulary. No one (ie. no forensic linguists) will believe that you are Douglas Coupland if the frequency of certain prepositions in your text deviates wildly from his works. And yes, you can of course tamper with such frequencies, but the point is that most people don't. You don't totally dismiss fingerprints as evidence because some criminals wear gloves, do you?

It's also important to note that no court has ever based a verdict solely on stylometry. Stylometry will never give any definitive answer, but it might corroborate other evidence, which is kind of the whole idea. Stylometry may help eliminate a subject as well as identifying one, so while it may not be usable as the sole base for a conviction, it's still very useful and should be acknowledged as such.

If you really want to know about stylometry (and forensic linguistics in general), I suggest taking a look at John Olsson's website: http://thetext.co.uk/ and/or reading his book Word Crime, which is easily read even by people without linguistic training. John Olsson is one of the only full time forensic linguists and has dealt with a lot of different cases --- some involving stylometry.

A final note: Please stop refering to it as "writing style fingerprint" --- no serious forensic linguists do that, since it's in no way similar to fingerprints. Writing style doesn't rely on biometrics and is much more easily changed than the pattern of the ridges on your finger tips.

Comment Re:Could have told you writing analysis was bogus. (Score 2, Insightful) 96

It is completely subjective and there is no real hard science to support such tests.

I beg to differ. There's very little subjective in stylometrics, the subjective part is interpreting the results, but definitely not producing them. Take a look at http://en.wikipedia.org/wiki/Stylometry and tell me which of the methods described there you think is "completely subjective".

The main problem with stylometry is not the methods, but the data. As TFA describes, changing writing style throw off the results - at least to some extent. Stylometrics relies on the fact that old habits die hard, but if someone is aware that the text they are producing might be subjected to stylometric analyses, they can employ various mechanisms to avoid identification and will probably have a better chance at succeeding than if writing casually. However, most texts used in court has been produced casually (letters, emails, text messages) and almost always have some unique traits specific to their author. Even in cases where people plagiarize a known author, they always miss some subtlety in his/her style that gives away the plagiarism. These subtle differences in style are usually caught somewhere in the stylometric analysis.

It occurs to me now that you may be talking about hand-writing analysis, in which case my reply is completely irrelevant and you have completely missed the point of summary and TFA.

Comment Re:Of course... (Score 4, Insightful) 241

I'm inclined to correct that, because long before the internet there was piracy too. I remember copying the new Guns n' Roses album (Use your Illusion I) and lots of other stuff to tape. Yeah, that was 1991 - internet did technically exist, but let's be realistic, it wasn't a common thing to see in a house hold.

So how about we say, "as long as art exists, there will be piracy"?

Comment Re:Run Linux much? (Score 1) 655

I have never had to reinstall Ubuntu ever since I first installed Edgy Eft (Ubuntu 6.10) on my Macbook and I've even been upgrading to alpha-versions of Ubuntu every now and then just for the fun of it. Of course, there have been occasional problems (especially with the alpha-versions), but nothing that made the system unusable. I guess, however, that I'd probably be considered a power-user since I've been messing with Linux for nearly 10 years now, so I know how to fix the occasional X-crash and I prefer working in a terminal and stuff.

My girl friend has been running Ubuntu since Edgy Eft also, and I can't recall her ever having problems with upgrades - and she's definitely not a power-user and I've never had to help her with anything upgrade-related except for stuff like, "it's asking me if I want it to foo - what do I say?"

I think calling Ubuntu "rather infamous for being nearly un-updateable without a fresh install" is an over-generalization (at best), since I've only heard about people having to do a reinstall because they've really, really messed up their installation to the point where it's actually un-upgradable (e.g. force-removing packages that other packages depend on, mixing repositories etc.), in which case I'd say that it's the user who's in fault.

Comment Re:There are ~1,308,361 American dead... (Score 1) 164

I can't speak for everyone, but here in Denmark we don't celebrate Memorial Day, but there are numerous Towel Day celebrations going on. Last time I checked /. wasn't "News for US-nerds only" (even though some insensitive clods seems to think so).

I'm sure a lot of Americans have died since 1776. In general a lot of people die over a stretch of 233 years... I guess I'm trying to say that I don't really get your point. Besides, we're not celebrating a towel, we're celebrating the guy who gave the towel its immense significance 14 days after the day of his death. (Granted - 42 days after would be more appropriate.)

Comment Re:Breaking News (Score 1) 334

I had some friends over and we listened to some music on their iPods (via iTunes on my Mac, which submits to last.fm). Am I going to jail now?

Seriously though, I fail to see what such lists can prove. I even scrobble CD's I've borrowed at the library or from friends. AFAIK that not illegal (yet!)

Comment Re:1. Upload to Wikileaks with Xerobank 2. Link to (Score 1) 471

Wow! Great link! I scored 67%.

Someone please mod parent up!

However, while this is interesting, it's not really relevant for a stylometric analysis. Usually* you'd find the 50 most common words in the entire text, then split the text up into chunks of 5000 words and find the frequencies of each of the 50 most used words in these chunks. Then you'll have 50-dimensional descriptions of each chunk, which you'll then process using principal component analysis. The linked test is more in the field of folk linguistics, which is quite different from forensic linguistics, but very, very interesting none the less.

*: Note, usually... The numbers may vary depending on who you ask.

Comment Re:1. Upload to Wikileaks with Xerobank 2. Link to (Score 1) 471

Some will, but not all. I'm not sure exactly how Google Translator handles certain dialectal traits, and lexical choices may still remain. It will without a doubt make it harder to determine whether the suspect has written the text.

Interesting idea actually. I'm gonna look into that when time allows it. Thanks!

Comment Re:1. Upload to Wikileaks with Xerobank 2. Link to (Score 1) 471

A.) Paranoid, or
B.) Legitimately concerned 'they' are actually out to get him.

C.) Not tech-savvy enough to publish it anonymously himself/herself.

While you and I would have no problem publishing something anonymously, I know about --- hmm --- a lot of people that wouldn't know where to even start. Of course, I can't say that this is the case here, but it's a possibility, but yeah, you're right --- a little extra security never hurts.

Comment Re:1. Upload to Wikileaks with Xerobank 2. Link to (Score 4, Informative) 471

Well, "they" didn't identify the Unabomber - his brother did, because he recognized his writing style in his published manifesto, which in turn resulted in the forensic investigation and comparison of his manifesto to some earlier stuff that he'd written. The method is called stylometry (or stylometrics) and is used widely in forensic linguistics, but it's still only an indicator of authorship - not proof.

In the Unabomber case, they had two sets of texts, the manifesto written by the Unabomber and the texts written by Ted Kaczynski, hence it was relatively easy to compare the two sets and see if there was reason to believe the author of both sets to be the same. In this case, you'll have a single text by an unknown author... What will you compare it to first when you have no suspect or suspected texts? Exactly... This document will have to mean the end of the world before they start trawling the web for random texts and comparing. Mind you, these stylometric comparisons must be verified by a human, even though a lot can be automated with principal component analysis.

I'd say that the author can feel pretty safe, as long as he/she isn't a well-known author and/or uses linguistic constructions specific for his/her dialect or regiolect. Remove all meta-data from the file (e.g. go with plain text or HTML as suggested (far) above) and publish to wikileaks through Tor from a public hotspot. At least, that's what I'd do. I don't know about Brian Boitano, Buddha, Muhammed or Jesus.

Oh, and yes, I am a linguist. :)

Comment Re:A matter of the environment? (Score 1) 508

Haha, you got a point - but on the upside I can't miss a deadline no matter how much work they send my way. :-p

On the other hand, we are calculating coding_performance_level (CPL) and not amount_of_work_done (AWD), so I guess that means that when a deadline is reached, my CPL is infinitely high (and based on my caffeine and cigarette intake, so am I.) ;)

Slashdot Top Deals

Get hold of portable property. -- Charles Dickens, "Great Expectations"

Working...