Forgot your password?
typodupeerror
Google

Journal: Google Treating Entity Codes Differently?

Journal by gbulmash
So, I've got a blog post about a company with an ampersand (&) in its name that's been generating some odd search results. If you search Google for the part of the company's name before the ampersand, my post is the #1 result. If you search for the whole company name but use & to specify the ampersand, my post is the #1 result. If you specify the ampersand as just plain & or &, the top result is the home page of a company with a slightly different name. The odd thing is that they use & throughout their pages to specify their ampersand while my page is using #038;. You'd think that Google would treat & and #038; the same, but it seems they're being treated differently. Or is it just me?
User Journal

Journal: Thought Experiment: Selling The Digital Original

Journal by gbulmash
Recently I used my limited artistic skills to produce a cartoon of a giraffe's failed suicide attempt. In a flight of fancy, I thought about how I might sell it as fine art rather than mass merchandising it. AFAIK, one of the selling points for certain works of art is that you're buying the original.

But with digitally created pieces, how can you sell the original? How could you certify one digital file as an original vs. a copy? Would you sell the hard drive you stored it on while creating it? Would you buy a computer for each digital work and sell the computer? Do you sell the copyright with the piece, so you surrender the right to make copies?

Part of the value of art sold to collectors is in its scarcity, either through never copying it or allowing only a very few copies in a limited edition. How would you create that sense of uniqueness so that you can add that scarcity premium when selling a digital work?
Programming

Journal: Predicting How Much Slashdot Effect A Page Can Take?

Journal by gbulmash
What's the best way to measure a CGI script's usage of system resources (processor cycles, RAM) to predict how many users you can handle at once, or within a short time period, before your server is "slashdotted"? I figure there has to be a utility for stress testing a CGI-script or even a plain web page and measuring what levels of use start degrading performance and to what degree. So, any recommendations?
Programming

Journal: Regular Expressions: A Bunch of Little Ones or One Big One?

Journal by gbulmash
So I have a piece on my blog about detecting mobile browsers and some PHP code to do it. One part of the code checks 67 text snippets against the User Agent string, and a visitor asked if the way I was doing it -- putting all the snippets in an array and regex matching them one at a time -- was more efficient than using one giant regex with all the snippets in it.

So I actually tested it. When a matching pattern was hit very early on in the text, the giant regex was faster. But if there was no match or the match was much deeper in the text, the batch of smaller regexes performed better. Essentially, the batch of smaller regexes took the same time to run whether or not there was a match or where it occurred in the examined text. The giant regex slowed down the farther the match happened in the examined text and was slowest when there was no match.

As the length of the examined text grew, particularly with no match, the disparity between the giant regex and the batch of regexes grew. At 760 characters and no matching text, the giant regex was taking around 4.25 times longer than the batch of regexes. At 1520 characters (around 250 words), the giant regex had slipped to 5.4 times longer. Yet if there was a match in the first 10 characters of the 1520-character string, the giant regex was the same speed as if the match had come in the first 10 characters of a string 1/8th the size.

So while the giant regex would be more computationally efficient against shorter strings or ones where you knew the match would come early, the batch of many smaller regexes is actually better as a rule of thumb. It's faster against larger blocks of text when the match is deeper or non-existent, and if the text grows, it's computational needs don't grow as fast as those of the giant regex.
User Journal

Journal: Google AdSense Reporting Still Broken 1

Journal by gbulmash
Many web site owners use Google AdSense to help cover the costs of their sites or earn some money from them. But for nearly a week now, Google's reporting of exposures and clicks has been at least partially broken. I've been tracking this on my blog since I noticed one of my ad channels was reporting around 7% of what it should despite my site traffic being at normal.

Google acknowledged there were discrepancies after it became so bad you couldn't help but notice it, but they only did so on a thread in the Google Groups AdSense Troubleshooting group and have yet to offer an explanation or ETA for a fix. The next day numbers went down again, this time by a factor of 10 or so, and Google remained mum. As they started coming back up on Thursday, Google chimed back in with another "we're working on it" and that was it.

They say that this problem is just affecting specialized tracking, but isn't affecting aggregate reporting for a whole domain (basically, you're making all the money you should, but you just can't track where it's coming from in as much detail), but many webmasters are complaining of lower revenues to go with the lowered channel numbers. I know my clickthrough rate was the lowest I'd seen in months on the same day this problem was at its worst, though that could be a coincidence. Despite this, Google remains tight-lipped, giving webmasters a minimum of information and providing a minimum of reassurance (two posts in three days that merely state they know it's happening and are working on it). Hopefully they'll be more forthcoming when this is solved and they can do a postmortem on it. But, in the meantime, they're making a lot of people nervous.
User Journal

Journal: Decoding The Mysterious Future - A Mini Slashdot Effect 1

Journal by gbulmash
If you're a /. subscriber, you get to see stories before they're publicly posted (i.e. open for comment). On the story's comments section, they like to say "Posting will only be possible in The Mysterious Future!"

One problem. Those stories go up on the Firehose with their post time labeled on them. It'll say something like "Posted by Zonk on Thursday October 18, @02:10PM", only it's 1:45 PM. "02:10PM" is "The Mysterious Future". Now you have some time to compose your post in a text editor and copy it to your clipboard. At 2:10, the "reply" button appears on the page. For tne next minute or so, you'll get an error message instead of a form for posting. Then the system is ready and you get the posting form.

You type in your subject, paste in your response, count to "fifteen-one-thousand" to ensure you've waited the required 20 seconds from hitting the page to posting your comment... voila. First post.

And if you make it reasonably intelligent, it gets modded up enough so everyone visiting that discussion sees your post. If you've got a link in your .sig, you'll get a mini Slashdot effect (maybe 50 hits).

I wouldn't suggest marketing via the .sig in a first post as an effective marketing plan, but the traffic's a nice side benefit. Primarily the benefit is that just about everyone reads your post and you stop some idiot who would post GNAA stuff or "first post!" from getting there first.
Portables

Journal: Detecting Mobile Browsers

Journal by gbulmash
When I went to find out how to detect mobile browsers, I found out it wasn't simple or trivial. After looking at a number of scripts and a couple big lists of User Agent strings, I put together a free PHP script for detecting mobile browsers. It seems to do very well (no false positives on all 63 desktop browser versions tracked by BrowserCam, caught all the mobiles I've thrown at it so far), yet the code is very compact and easy to follow. There's also a quickie mobile browser detector using the code... if you want to see it in action.
Google

Journal: Is Google Page Rank Dead?

Journal by gbulmash
It's been 158 days since Google updated their Toolbar Page Rank (TBPR), which are the Page Rank numbers the public can see. That's 36 days longer than their prior record longest gap, and two months longer than their "quarterly" average. Furthermore, it seems Matt Cutts has been oddly silent or dismissive on the topic for months. Is Google going to terminate public page rank numbers outright, just let them die a slow death of obsolesence, replace them with something different, or are they just running really slow for some unknown reason? What's up with Google Page Rank?
User Journal

Journal: Does Content Really Want To Be Free? 3

Journal by gbulmash

To those who actually say "content wants to be free" with a straight face and seriousness of purpose...

"Content wants to be free" makes as much sense as "content wants to be a fireman when it grows up."

When you say "content wants to be free," you actually mean, "I don't want to pay for content." You're talking about your own desires, but the way a three-year-old does.

"Content wants to be free... and my teddy bear wants a chocolate chip cookie... and my shirt wants a hug."

Free content is a good thing, but let's stop talking like three-year-olds, please.

User Journal

Journal: Malware Using Fake Registration Confirmations

Journal by gbulmash
So, I've been getting a flood of fake registration confirmations and Symantec finally put out a detailed warning about them. Spammers have already just about ruined it for small e-card sites. Now small membership sites will likely have increased bounce rates as overzealous spam filters reject or eat legitimate membership confirmations. Will spam have the eventual side effect of forcing every site owner who wants to send mail from their site into a $1200+ a year e-mail accreditation program?
User Journal

Journal: To The Moon And Back... In Rush Hour Traffic

Journal by gbulmash
CafePress recently announced they would begin charging an extra $3 for printing on the back of shirts, to bring their pricing in line with their competitors. Many of their t-shirt sellers place logos on the back of their shirts and CafePress is refusing to provide even a simple tool to just clear the backs of all the shirts in a store. I've calculated that the lack of that tool will cost enough man hours to go to the moon and back in rush hour traffic.
User Journal

Journal: Protecting Our Children... From Ideology

Journal by gbulmash
I've seen a lot of people shouting about how we have to protect our kids from porn on the Internet. But what about protecting them from ideology? Why is it not okay for someone to show my kid a picture of a bare breast, but it is okay for them to try to convert my child to their religion or political party? So, is ideology as dangerous as porn? If so, shouldn't we be doing as much to protect our kids from ideology as we're doing to protect them from pornography?

I find you lack of faith in the forth dithturbing. - Darse ("Darth") Vader

Working...