Forgot your password?
typodupeerror
User Journal

Journal: Any Jobs in Canada?

Journal by mshomphe

What's the tech job market like in Canada?

Anyone in Canada looking for a Computational Linguist?

I want to move to a different country for a while....

Programming

Journal: A new Python Enhancement Proposal?

Journal by mshomphe

I use Python at work. Gets a lot of stuff done and done well. I used to be nutz about Perl, now I'm nutz about Python. Not that I don't like Perl anymore (I still have my thinkgeek #! /usr/bin/perl shirt).

One of the functions that I use quite a bit is what I'll call striplist(). This takes a list of strings and returns it with the whitespace removed. I used to have a function that was something like:

def striplist(l):
    return(map(lambda x: x.strip(), l))

However, the processing overhead was INCREDIBLE for this function. Not only were you penalized for calling the function, there were two function calls (map & lambda), each which had their own penalties. It was faster to call .strip() on each argument in the list that I was using (since they were never terribly long lists).

This solution seemed inelegant, so I've been rewriting the function. This is what I have now:

def striplist(l):
    return([x.strip() for x in l])

This is a lot faster. However, it got me thinking about how this would be an amazingly useful method to have on the list object. [].strip() would return the list with the whitespace removed.

(As an aside, I took this to its logical conclusion in my head -- a general [].method(args=None) idiom. However, it seems to be too confusing. How do you know if the method is supposed to apply to the list or the objects in the list. What if it's a list of lists and you call a method? Do you do it recursively? In other words, it got way too unwieldy. So I decided that since .strip() only applies to strings [theoretically], this would be a special case.

Of course, it might stand to reason to have a special class for a list of strings, that way all methods available strings would be available to apply to the entire list.)

So my proposal is this: A new nethod on the list class .strip(char=string.whitespace) that returns the list of objects with the method .strip() applied to them.

The reason for doing this is twofold:

  1. It's a nice idiom to use, and
  2. I would suggest pushing it as far down into C as possible to make it as fast as possible.
    .

Here's example code:

class sList(list):
    def __init__(self, args=None):
        list.__init__(args)
    def strip(c=None):
        return([x.strip(c) for x in self])

If only GvR were reading this :)

I welcome your comments!

User Journal

Journal: Is this for real? 4

Journal by mshomphe

So the Onion has an article up about "Barry Ploegel". It's an old one, and I just noticed it against whilst flipping through "Dispatches From the Tenth Circle". It's still archived and available. "Ha, ha," I think, "they really nailed us geeks!". But then I did a google search, and not only does his website come up, but a whole bunch of other stuff as well. Like he's on kuro5hin, and active.

What are we to believe? Does he exisit?

Only one way to find out

User Journal

Journal: Generating strings (a cry for help) 5

Journal by mshomphe

So, I have this code at work that is used for figuring out if two sets of words are phonologically ambiguous. For example, 'eight o'clock' (where the 'o' is a schwa) and 'ate a clock'. The problem is that the number of possible pronunciation for a given string can get REALLY big really fast. For example, there is a 17 word string that has over 1 million different pronunciations.

I need a better algorithm for generating the strings. Right now, I'm just going through and for each string, generating all the possible prons. It takes an inordinate amount of time to do.

Any suggestions for algorithms for this kind of situation:

for each word in string
            generate all the pronunciations for word
            append it to the current set of possible stirngs.

It's written in Python (which is probably half my problem).

User Journal

Journal: Calling all phonologists! 1

Journal by mshomphe

Any phonologists/linguists out there who know where I can get a complete list of English syllable onsets, offsets (codas) and nuclei? I want to make a random word generator, and I have the basics down, I just need some lists.

For example, 'ft' is a valid offset in English, but not a valid onset. In other words, you can have a word like 'aft', but not a word like 'fta'.

Help is appreciated.

User Journal

Journal: Self-reference getting outta hand? 1

Journal by mshomphe

What is it with current culture and self-reference? It seems that for any witty medium, eventually it turns back into itself. Slashdot has numerous examples of this (from "Imagine a Beowulf cluster of..." to "And someone's going to say 'Imagine a Beowulf cluster of....'"). "The Simpsons" have this problem, too. I've noticed it of late in the newer episodes. This isn't to say that there was never self-referential humor in the show ("If there were any justice in the world, it would be my face on a bunch of worthless junk"), but it's different. It's referencing episodes within the show.

Is this the natural progression of a creative outlet? Eventually you starve so much for material that you start to self-cannibalize? Isn't that the point where you should quit?

User Journal

Journal: $$ != Speech

Journal by mshomphe

I'll probably write more on this later, but I just need to get it out of my system.

It seems to be a common belief that money is speech. This is patently wrong. Money has no meaning, no sematics behind it. The purpose for having free speech is so that (1) people won't be persecuted for thier thoughts and beliefs and (2) eventually, with all this free unimpeded discussion, we as a society would begin to learn the Truth about Life, the Universe, and Everything (apologies to Mr. Adams). However, money does not do either of those things. Just because I have more money doesn't mean I'm right. If I endorse political candidate X over Y, and I pump $3 million into X's campaign, does that mean that X is the better candidate? If X has more money than Y, does that make him more qualified?

Of course it doesn't. That's like saying we're going to choose the political candidate who weighs more. It's the same arbitrary measurement. What should be measured is their beliefs, actions, deeds. Whether what the believe is true, to a reasonable degree.

Too often we're won over by flash, platitudes and patronizing rhetoric. There is no meaning that goes on in public debate any more. Reclaim rationality for the public good!

User Journal

Journal: The Digital Genie 3

Journal by mshomphe

(Or, How to learn to stop worrying about copyright and begin to love digital distribution)

There are few things preoccupying the entertainment business cabals more than the looming threat of digital piracy. There have been some humorous takes on the current stance of the entertainment industry with respect to sharing of files. However, there are legitimate concerns that the RIAA/MPAA have about the future of filesharing.

Let's imagine a future where bandwith and memory are virtually infinte. Digital reproduction is near perfect. So, with a few clicks of a mouse, I can download an entire album (or movie), liner notes and all, right onto my computer. And further suppose that I have a 'fabber', a device that can produce a CD/DVD of the media I have downloaded, that spits out the CD instantly. In 2 minutes, I have generated an exact duplicate of the latest pop sensation's album, without giving any money back to the artist and the group that created the album.

This is the fear of the music industry.

In order to prevent this future, they are on the frontlines now, forcing legislation and enforcement of laws in a manner that seems almost despotic and maniacal.

But this is not what will happen. The industry fails to see three key points:

1. This kind of instantaneous reproduction is not only an important advance in the dissemination of knowledge, it will reduce their distribution costs to near nothing.
2. Monetary compensation is not the intent of art.
3. Total "digital rights management" (the complete control over where/when/how a piece of digita media is used) is not possible.

The first point is something that the industry is not willing to examine because it involves a major move from existing infrastructure to a substantial investment in a new, untested infrastructure. Selling objects in stores is something that the industry not only knows how to do, it has been doing it for a very long time. Inertia, it seems, is one of the strongest forces in the universe. However, the industry needs to realize that new methods of distribution can complement, not just supplant, the existing methods.

The second point is more subtle. People want to be compensated for their art, investment, etc(1). They want to have some control over the use of their creation. If I write a song, I don't want it to be associated with or co-opted by some group that I strongly disagree with (meaning that I wouldn't want, through my art, to be associated with, say, the KKK). But an artist has to realize that to share your art is to inspire others -- by doing that, you've already lost some degree of control. A painting you create could inspire a madman to destroy indiscriminately; or it could inspire a whole new movement in painting. These are the risks associated with being a public artist.

The industry surrounding the artistic community is a business. It operates for monetary gain from assisting the artist in some way. The true center of the issue is that the industry is afraid that it will not be compensated in the future for its investment. One argument that could be made to quell this fear is that for an album to make it "into the wild" on a p2p network, someone would have to buy the inital copy. And one could futher argue that the mathematics of altruism would take over, dictating that everyone would have to buy enough albums on their own to keep the network, and by extension the industry, alive. A more immediate response is the following: Just because somethign is offered for free, doesn't mean someone won't pay to get it. There seems to be this overwhelming sense in the business community that if someone offers a product for free, people will go with that option every time. This is not the case. Brand loyalty is a strong drive (as ad people know). Also, people will compensate someone that they feel has done a good service for them, even if the service was free. As someone once pointed out "You can get coffee for free at the office, but people still go to Starbucks".

The third point is a huge issue. Digital media means absolute fidelity in reproduction, ease of distribution, and the ability to manipulate that media. The digital genie has been let out of the bottle. This is a good thing -- we can encode vast amounts of data and easily make sense of it. It's a new dawn for science and analysis. However, many want to stuff that genie back into his bottle. With the supremecy of digital media already demonstrated, the genie will no longer fit back in. Furthermore, complete control over a piece of digital media once it has passed into the hands of the consumer is impossible. Here's why:

THE USER MUST BE ABLE TO LISTEN TO/SEE THE DIGITAL MEDIA THAT S/HE HAS PURCHASED

Art is communication, and once it starts travelling throught the aether, it can be recaptured. To take a concrete example, if I buy some encrypted CD, and say it only works with my RIAA-endorsed player, I can always put a microphone up to the headphones and record it onto my computer. I now have an unencrypted form of the media I was listening to. There is no way to encrypt a piece of media meant to be sensed by a human being so that i cannot be recaptured in some other format.

This is my proposal for the entertainment industry.
(1) Embrace digital media. This is a huge advance and make the most of it.
(2) Revamp distribution. Incorporate digital transfer as a way to cheaply distribute product to customers.
(3) Give the people what they want. P2P networks are notoriously unreliable. If you make a network available that has high-quality recordings, excellent bandwidth, and reasonable prices, people will abandon free p2p services. (4) Do not look at filesharing as piracy. This is free advertising. As long as you are charging unreasonable rates, screwing artists, and calling your customers criminals, free p2p will continue to grow.

If the industry fails to do this, it is incumbent on the artists themselves to take the digital distribution paradigm to the masses. Loose alliances of artists can wield as much power as these multinational conglomerates in this new space. The power shifts dramatically when the artist directly markets/interacts with his/her fans.

(1) Despite Marx's insistance of speaking of homo faber, most people would probably be more like the protagonist of "Office Space" and do nothing.

Real programmers don't write in BASIC. Actually, no programmers write in BASIC after reaching puberty.

Working...