The Ham and Spam of Weblogs 192
An anonymous reader submits "Will the blogosphere become just as spammy as Usenet? There may be over 10M weblogs out there, most of them seem to be fake spam blogs created to manipulate the search engines. Scott Johnson, CTO at Feedster, complained that "at times we see upwards of 90% of the traffic from Blogspot being spam," and the problem is likely to only get worse. Can blog search engines like Technorati, Feedster, and PubSub filter the signal from the torrent of noise? Or will we have to seek new approaches such as the social filtering used by Del.icio.us or collaborative filtering used by Findory to separate the ham from the spam?"
To me (most) blogs ARE spam (Score:4, Informative)
I'm not trying to be flamebait; It would be a nice option though.
Check out (Score:5, Informative)
It was a bit unintuitive how you add sites to the filter list though -- just cut and paste "http://*.whatever.com/*" into your extensions list and any search results from whatever.com will then be greyed out.
Welcome to Slashdot. (Score:5, Interesting)
That being said, Google (along with other large search engines) have already taken stances on blogging, and are actively pursuing their individual stances. For most, this is creating their own blog service, and doing some shifting in their code to make sure blogs don't come out on top. But this isn't an absolute truth.
If you want these things, and Google doesn't offer them, make your own search engine, and do it better. No, seriously, don't look at me like I'm crazy; there have been over a dozen "major" search engines created after Google, some are only in serious use by geeky populations (AlltheWeb, as far as I can tell, fits this), some by the trendy, some by the "I hate Google"ites, etc. etc. It's as simple as that.
One reason I think Google's strayed from taking such a hardline on blogs is simply out of ease of use. Google doesn't want to complicate life with a million more search options, especially ones you can deal with yourself by subtracting out the majorly offensive sites (-livejournal -blogger -blogspot, etc).
Re:Welcome to Slashdot. (Score:3, Insightful)
Make no mistake: Slashdot is not what people are talking about when they complain of the spam that blogs have dumped into Google..
Slashdot represents thousands of voices.
Most blogs represent one voice only.
Re:Welcome to Slashdot. (Score:5, Insightful)
Re:Welcome to Slashdot. (Score:2)
Re:Welcome to Slashdot. (Score:5, Interesting)
Secondly, haven't you ever heard of the Freedom of Speech, as guarenteed to us by the Second Amendement in the Bill of Rights of the Constitution of the United States of America? By your comment, I'll assume not.
Why should we quash out individuality so that one person can get to the content they want better? Why shouldn't we just solve the damned problem, instead of creating more?
Re:Welcome to Slashdot. (Score:2, Insightful)
Re:Welcome to Slashdot. (Score:2, Insightful)
If Google brings it up as a top match, USE ANOTHER SEARCH ENGINE. The problem is you think services should cater specifically to you, while the company that runs the services is trying to think of the greater good of everyone.
Freedom of Speech is all the Internet is. Audience is you. If you want to look at the site, go right on ahead, and if you don't, then you know how
Re:Welcome to Slashdot. (Score:2)
Plus there are many (and increasing) blogs about that do contain useful content. A lot of writers and grassroots pressure groups are using the blog format to present what previously they used a non-blog webpage format for. There's been a fair few times I've come accross a gnarly technical problem, landed up Googling on the error message/symptoms and found that the most useful result was a blog entry that was either someone blogging "I had this problem and this how I fixed it." or "I've got this problem, h
Re:Welcome to Slashdot. (Score:4, Informative)
If what you say above were true, I'd be careful where you point that mouth. The safety is off.
The Second Amendment is our right to guns, not our right to free speech. Free speech is in the First Amendment. So
And be very careful. All the First Amendment guarantees is " Congress shall make no law..." [wikipedia.org] abridging freedom of speech.
If you want to go to a public park and preach religion or recite your political manifesto, the First Amendment guarantees your right to. But it's not absolute.
If you want to preach/recite on my front lawn, my property rights prevail and I can physically throw you off my property if you refuse to leave voluntarily. If you want to preach/recite at midnight and you're preaching/reciting too loud, city noise ordinances prevail, and the cops can arrest or ticket you if you refuse to quiet down or move along.
Slashdot is required to allow you a certain amount of leeway in exchange for safe harbor [wikipedia.org] protections covering public forums, but that is a matter of them trying to avoid getting sued over any libelous/defamatory content in your posts, not any First Amendment guarantee they are obligated to provide you. And if you go beyond that leeway, they can ban you from posting and erase your posts.
So if you want to argue in favor of blog spam, find another argument. The First Amendment has nothing to do with whether Google and other blog services should voluntarily clean up their act and put roadblocks/barriers in place to stem the flow of blog spam.
- Greg
Re:Welcome to Slashdot. (Score:3, Informative)
If you want to preach/recite on my front lawn, my property rights prevail and I can physically throw you off my property if you refuse to leave voluntarily.
Only the police can use physical force to remove a trespasser, as any landlord knows. According to Wikipedia [wikipedia.org]:
Re:Welcome to Slashdot. (Score:2)
Plus the US constitution only really holds sway in, well, the US. Slashdot may be hosted in the US (unless it's been outsourced to a datacentre in India by now) but an increasing portion of the web is outside of the US and US direct jurisdiction, as has been evidenced recently [slashdot.org].
Stephen
Is Google the government? (Score:2)
Re:Welcome to Slashdot. (Score:4, Insightful)
As I can see it, choice is on your side. They have the choice of posting or not posting. You have the choice of how you want to deal with it.
Re:Welcome to Slashdot. (Score:2)
" I feel a fluctuation in the net, it's like thousands on voices suddenly became silent"...
Re:Welcome to Slashdot. (Score:2)
I agree that this is not what people are talking about when they talk about spam, but it *is* a type of blogging, and not one that is entirely unique to Slashdot.
Slashdot Karma (Score:2)
Allow users to directly rate the worth of the sites Google returns in a search. Anything from "Not what I was looking for", "This is a crap site", "Nothing but advertising" to "This is probably illegal".
It would give Google direct stats on the worth of the sites. People marking competitors down could be made difficult through techniques like character recognition.
Re:Welcome to Slashdot. (Score:2)
So add it to the "Advanced Search" page. A handful of extra checkboxes there, like "No blogs" and "No commercial sites" (anything with lots of currency symbols) would be easy enough to deal with, and wouldn't affect the masses who just want the "one click" experience Amazon babbles about.
Re:To me (most) blogs ARE spam (Score:5, Insightful)
But blogs? Sure, much of the content is poorly written, or not applicable to what most people - or, well, rather, 90% of a given population - are interested in. But in searches especially, doesn't it make sense to list results that include those normal people so interested in a particular topic that they blog about them?
For example, blogs can be very helpful when facing computer troubles, provided you're dealing with bloggers who know how to write for Google. This is a good example [daringfireball.net]. I mean, this surely has to be more worthy of inclusion in Google than the lion's share of those web-based bulletin boards that get indexed - you know the ones, with the "Next in thread" and the replies that are typically out of date, or altogether unrelated to your original query.
Everyone's quick to dismiss things lately. Don't dismiss blogs, just because sometimes their content seems insular and not applicable to what you've searched for. That's a problem with the search engines, not the sites they index.
The web is not just blogs! (Score:2)
Blogs are fine. But 99% of the time, they are useless to me when I'm searching for something. I'm often after technical data or reviews, and blogs are not usually the best source, or the best venue, for such things.
Re:To me (most) blogs ARE spam (Score:4, Insightful)
I don't know how many times I have done a Google search, and the 3rd or 4th result comes back with my exact phrase..yay!
Then I go to some stupid, totally lame site advertising domain names, or listing other sites, or something like that.
I never have figured out how they get listed in Google the way they do though- because my search phrase is not listed on the page...so evidently they know something I don't.
Re:To me (most) blogs ARE spam (Score:2)
The website side of that is easy... autogenerate the page based on the URL. The google side is something I've been trying to work out with no success yet.
Re:To me (most) blogs ARE spam (Score:2)
I think that there's a wildcard 'match everything' phrase that google have allowed some advertisers to use. I've typed in some really wacky stuff and got the exact same pages back every time... each time the title of the page is something like 'information about (my search phrase)' and the content is spam.. the URL is something like http://spammer.com/(my [spammer.com] search phrase).html
The website side of that is easy... autogenerate the page based on the URL. The google side is something I've been tryi
Re:To me (most) blogs ARE spam (Score:2)
Parking Services and Search Ranking (Score:3, Informative)
Like email spam, these sites will continue to exist so long as people click on the links, thus supporting the business model.
city name spammers, too. (Score:3, Interesting)
The first link *is* relevant, and maybe 2 more on the first Google page are as well.
The rest? PURE CRAP. Lawyers in New Idria, CA? Job listings? Home appraisals? All just SPAM.
(FYI, New Idria, CA is a ghost town. It has a population of 3. There are no homes being sold, and thank god, no lawyers there either.)
So, I was looking for further history & photos and I was flooded with marketing garbage. Take a look at some of the URLs. It's clear that they're trying to boost their
Re:To me (most) blogs ARE spam (Score:3, Interesting)
Re:To me (most) blogs ARE spam (Score:2)
There are also a number of people (including me) who use blog software as simple CMSes even for sites that are not strictly speaking blogs - are you going to exclude us too?
Finally, how are you going to determine what software a site runs on? If you use generator meta tags, they can be excluded or faked.
Incidentally thanks for reminding me to remove them from my new theme (I had done it earlier, but forgot this time).
Re:To me (most) blogs ARE spam (Score:2)
Re:To me (most) blogs ARE spam (Score:2)
phpBB is a forum system, not a blog (though I've hacked it up to work like one in the pase). And trust me, half of the most valuable information I've found on line is in forums about specific topics where someone has asked my same question and received an answer.
- Greg
Re:To me (most) blogs ARE spam (Score:2)
90% rule in force. (Score:3, Funny)
Mal-2
Re:90% rule in force. (Score:3, Interesting)
Okay, great, so 90% of it is crap. It's a given, call it whatever you want. My personal favorite is the "long tail effect".
Built into blogs is a way to tell the crap from the good stuff -- they're linked together intelligently by people who can tell crap apart, and the people who don't write crap don't link to crap. So find one good blog, and you've found a hundred or more good ones just three levels deep in links. Go one more level, and there are a thousand. It's exponential. And chances are, most of them
Re:90% rule in force. (Score:2)
There's an easy answer to that (Score:2)
If people had to pay even a nominal fee ($12/year) the majority of the spam blogs would disappear. And probably 90% of the crap blogs, too. They'd either quit because it's not worth the cost, or (in a minority of cases) they'd actually start thinking more before blogging (which has to be one of the stupidest words of the last 100 years, right up there with "bling-bling").
Human validation (Score:5, Funny)
If it takes you less than 10 minutes to write in your dear diary--I mean blog--then it's probably a 1 liner to the effect of "i think she likez me omglolbbq!!!" and you need to get off my internet.
Problem solved. Next?
Re:Human validation (Score:3, Insightful)
Unless you happen to be a blind blogger. With all the effort people have put into accessibility there's got to be a validation method that can work for the blind as well.
Just mentioning this because I've seen this complaint several times by blind users on slashdot.
Re:Human validation (Score:2)
Re:Human validation (Score:3, Insightful)
I'm not entierly sure how many blind *and* deaf users there are browsing the web unassisted, but I suppose a broader solution would depend on what technology they're using to browse the web. Some form of braille reader, perhaps? If anyone knows the answer to this, I'
Re:Human validation (Score:3, Informative)
However there are already some major sites with "sound" captchas for the blind -- craiigslist for example.
Re:Human validation (Score:2, Funny)
You should probably get that fixed....
Shouldn't be too hard to filter (Score:5, Interesting)
Blog comment spam will remain a problem, of course.
Re:Shouldn't be too hard to filter (Score:2)
Re:Shouldn't be too hard to filter (Score:2)
Re:Shouldn't be too hard to filter (Score:2)
My blog (on LiveJournal) started to get a lot of comment spam when it started to appear in Google results, usually on entries that were at least a few days old. The spam was usually vaguely related to the content of the post, so if I wrote an entry about (say) seeing a pretty girl I'd get a spam comment advertising a pron site but if I wrote about a piece of software then I'd get an spam comment advertising "CHEAP MICROSOFT OFFICE CDs **$10**" &c. I guessed they got my blog URL from search engine resu
Software is Not Social (Score:5, Insightful)
A prime example of software in a "social" context is the chatter that accompanies networked video games. This does not form real relationships between people. I heard a teenager recently say that his gaming buddies, who he doesn't even know by name, are like family to him. Technology has helped a whole generation and then some to fail to learn what real relationships are. When a teenager can't distinguish between somebody he's only ever witnessed virtually shoot ze germans and the people who nurtured him before he was able to take care of himself, we have a problem Houston.
And it's only getting worse. Now we've begun adding "social" in front of all kinds of new web applications. Anything that lets other users see your profile and the items you post and comment on them is seen as a valid replacement for real human contact.
There was a line from a movie I saw recently called Crash, where Don Cheadle's character says to his girlfriend "It's the sense of touch. Any real city you walk, you know. You brush past people, people bump into you. In L.A., nobody touches you. We're always behind this metal and glass. I think we miss that sense of touch so much, that we crash into each other just so we can feel something.". The next time we use the word "social" to describe a new type of web application, I think we should give that some thought first.
Re:Software is Not Social (Score:2)
People have written letters, sent telegrams, and called each other for hundreds of years. Long-distance communication is part of human social behavior -- regardless of whet
Re:Software is Not Social (Score:2)
I tried to clarify my point a bit in this post:
http://slashdot.org/comments.pl?sid=154127&cid=12 9 29098 [slashdot.org]
Hopefully that clears up any confusion I created in my first post.
Cheers,
Lux
Re:Software is Not Social (Score:4, Insightful)
Re:Software is Not Social (Score:4, Interesting)
While you did answer your own question ("Probably..."), I do like your response. You raise good questions. I definitely don't believe that only face-to-face communication is real social interaction, but I could have been clearer on this point. I'm not an absolutist, and I'm not pining for the dark ages or anything like that
Anyway, my real point is that these online substitutes are serving more and more people as substitutes for the real thing, to the point where young'uns are being brought up not knowing that there is a difference. Instead of getting together (in cases that are actually able to) they go online and "chat". Mediated communication inherently encourages more mediation because we as human beings form habits. And while mediation can still produce relationships (I can't deny that), they are less rich than direct unmediated ones. And technology is inherently a mediator, no getting around it (pun slightly intended
To be perfectly honest though, most face-to-face relationships are just as mediated as those maintained through technology. Real-world mediators include our political and religious views, our egos, etc. which inhibit our ability to relate directly and honestly with one another just as much as the inability to see facial expressions on a forum.
I definitely use technology where appropriate to augment relationships at distances. I only see my family twice a year, but I keep in touch via telephone all the time, and I post photos to flickr for them to see. My sisters email me once in a while, which is great too. These things definitely have value, but they are no substitute for being able to see and hug my family. They simply help make the time between visits bearable.
Cheers,
Lux
Re:Software is Not Social (Score:2)
Perhaps a little of both.
Have you seen the British comedy "The Office"? It's an hilarious example of the failures of communication (among many other failures too) in a professional setting. Highly recommended.
Offtopic, really. (Score:2)
Del.icio.us has none of these features, and the words "social filtering" are not used to imply any sort of substitution of human contact. It is a system where you can file bookmarks and can find the most popular bookmarks as tagged by other users. "Social filtering" is the phrase tha
Re:Offtopic, really. (Score:2)
I suspect you were also as disappointed as I was in school when I found out that "Social Studies" wasn't a place to talk to other students.
I never found out, but I was confused as to why I got kicked out for talking too much...
Re:Software is Not Social (Score:2)
Re:Software is Not Social (Score:2)
I have a hard time believing most myspace users are there to meet real-life people when I see the line "So-and-soo has as 44392 friends." I can't comment on xanga or the others, as I haven't used them.
One piece of software that is an interesting use of technology for real-world social purposes is http://www.dodgeball.com/ [dodgeball.com] (recently acquired by Google). Their slogan is "see your friends more" and since it's all cellphone-based, its pretty much determined that their service only really works i
Already is a moderated Blog Index out there (Score:2)
Blogs and Forum Posts = Spammers (Score:2, Interesting)
Luckily, Google is one step ahead of the spammers, and has allowed only one link from each forum to contribute as a valid backlink. Therefore, having 100 forum signatures linking to www
rel=nofollow attribute (Score:2)
The real solution, at least as far as search engine rank goes is the new rel=nofollow attribute for links that Google started using a few months ago. The best link that I could find when I was looking at this a couple weeks ago is this one [haacked.com]. If it grows in popularity and the ma
ridiculous kevetching about blogger (Score:3, Interesting)
"Lately, it seems like almost every time you tune into your favorite Blogger-hosted blog to catch up on the latest gossip, meme, political diatribe or cybersnark, you find that the site is frozen in time. Or, there are multiple posts with identical content."
Uh, no, not as far as I can tell. "Frozen in time," perhaps, after someone decided to stop blogging, but I used blogger for six months and never had a single hitch. Apparently, googling "blogger sucks" gives you thousands of sites bitching about google's service.
Sometimes there are outages, when you can't get in to alter a post or something similar, but those were few and far between (at least they happened less than half a dozen times in six months, and it only lasted a few hours.)
I guess this is a sign about how popular blogger is. I mean, then only way to balance my experience (zero fatal errors in six months) with thousands of complaints is to assume that there are a HELL of a lot of bloggers out there.
Oh, and to those bitching in general about blogs: please shut up. Yes, there are annoying vanity blogs, but blogger -- and the blogging concept -- has been a godsend to specialists, as well as to political organizing.
Friends only (Score:2)
Blogspot (Score:4, Informative)
If you have a few minutes, click on the randomizer button at the top of the screen that reads "Next Blog" a couple of times. I'd be willing to say that at least 2 out of every 10 blogs is a spam farm.
It's just fucking sad.
Re:Blogspot (Score:2, Funny)
Re:Blogspot (Score:2)
Just a thought.
Re:Blogspot (Score:3, Interesting)
Still, I wish I could have studied that page for comparison. I found http://bobthebuilder123.blogspot.com/ [blogspot.com] one day in my blog referrer logs. I wondered why people interested in Bob the Builder had linked to me. They hadn't. The whole page is nothing but spam - all posted on one sunny day this month. If you can help me see what gonorrhea has to do with Bob the Builder I'd be very much obliged.
At any rate
Re:Blogspot (Score:2, Interesting)
After all, why run through the entire gamut of blog styles and presentation formats, when you can just examine content-only from your own servers.
Wrong approach (Score:2)
And since most bloggers simply recycle and rephrase current events you need a different approach.
Interesting insight. (Score:2)
My hopes for the 'blogosphere' (Score:2)
Usenet has improved substantially (Score:5, Informative)
Re:Usenet has improved substantially (Score:2)
Re:Usenet has improved substantially (Score:2)
This isn't rocket science... (Score:2)
The fact that I know how to enter the necessary information for Blogger to SFTP to my server demonstrates that I am not seeking to link farm.
Bottom line? Educate bloggers on how to integrate with alternative service providers, and aschew blogspot hosted blogs.
The legit will rise to the top, and the rest will be safely ignored.
Argh (Score:2)
Yeah, that happened to me. (Score:2)
Then some dweeb from canuckistan changed the password and uses it to boost the google ratings for his pathetic little torontine blog about getting drunk with ron jeremy.
The "random
Del.icio.us? (Score:2)
um, I already use "social filtering" (Score:2)
On sites that I already know and like (including some blogs), people mention and link to other sites, like, say, blogs. Since I then go over there for real content, well, guess what, it's not a link farm; it's good.
Problem solved
Blogosphere (Score:2)
Re:Blogosphere (Score:2)
No.
(Next someone will come up with the idea of filtering communications by the ratio of quoted to original text, no doubt...)
FIlter the signal? (Score:2)
Down with neologisms (Score:3, Funny)
Re:Down with neologisms (Score:2)
Re:Down with neologisms (Score:2)
Re:Quick Fix (Score:2, Insightful)
Anyone who has a blog that you have to pay to comment on (or to see) isn't going to get much traffic.
Re:Quick Fix (Score:2)
A *lot* of people... and a lot more than $1 too.
Soaps, Reality TV.... Big Brother is the ultimate proof of this - a dozen totally uninteresting people sitting in a room for 2 weeks - gets top rating and is simulcast on 3 channels simultaneously.
Re:Quick Fix (Score:2)
Re:Quick Fix (Score:2)
Well, yes, but you'll understand that we haven't raised the price of thoughts for nearly 200 years now (2c is the price of a thought at the moment, a penny is merely their redemption value after use -- thought recycling is important for the environment). This rise is merely in line with inflation, and you've benefited from cheap thoughts in the meantime.
Re:Personal blogs compete directly with spam blogs (Score:3, Insightful)
Blogs are a new form of communication. Before, we had "editorials" which were published in newspapers, where someone of stature is making their opinion well known, simply to spark debate and interest in the public's mind. Now this is a turn for everyone to have their own editorial, and to foster debate and discusion. Welcome to Slashdot, by the way.
Secondly, they offer a form of sympathy to the author; normally someone either says "I like y
Re:Personal blogs compete directly with spam blogs (Score:2)
It's pretty easy actually. Write about something other than ****FREE SOFTWARE NOW FOR A GOOD PRICE WINXP WIN2K WIN98 FREE FREE FREE**** and I think you have a pretty good chance of people actually reading your material.
But go ahead, keep on not providing actual content, I'm sure that's a great way to get readers.
Re:Personal blogs compete directly with spam blogs (Score:2)
I think you're over-generalizing. Yes, many blogs are indeed vanity projects where people say whatever they feel like about subjects nobody cares about, but there are good, worth-while blogs out there which have nothing to
Spam indeed! (Score:2)
The link defeats the Firefox pop-up blocker...
Re:Spam indeed! (Score:2)
That's not exactly hard....
Re:2 years and no one will care (Score:2)
Should a user need to know how to install their operating system in order to use one?
As slashdot geeks we sometimes get narrow minded, but there are different types of intelligence. Just because someone isn't a computer scientist doesnt mean they don't have something interesting to say.
Re:2 years and no one will care (Score:2)
Why reply to my post with something that has nothing to do with what I said?
Did I say that someone needs to know how to code HTML to post a blog?
Re:2 years and no one will care (Score:2)
Re:2 years and no one will care (Score:5, Insightful)
Now, my blog isn't going to be popular. I cover mostly neurological problems and how to deal with them. But I've had some fascinating discussions with complete strangers because of my blog and I'll continue blogging into the forseeable future. Because of Google many people find my blog despite it being a small fish in a big and noisy blog sea. Google is a great tool and I'm glad they index blogs. Now, I'm as upset as the next guy about spam blogs, but "crap" blogs are relative. You may read my blog and find it lame. Others, including myself, would disagree with you. But if you don't find the subjects I write about interesting or valuable, so what?
Slashdot cracks me up sometimes. What is it to some of you guys if somebody wants to blather on and on about their breakfast or their boyfriend? If the site is a bore move on, but you could tell that from the Google search, right? Seriously, I haven't found many blogs that come up in my searches that aren't related to my searches. Not as much as parked domain sites and adsense whores at any rate.
Not all bloggers can't be bothered to code a web page. In fact, because I do code I'm able to personalize my site. Every month I tinker and tinker with the code when I find some time. Blogging may be an exercize in vanity, but then so isn't hosting your own website. In fact, the whole web publishing scene is about personal expression, and what's wrong with that?
Re:2 years and no one will care (Score:2)
Well, yes. But Sturgeon's Law is that 99% of everything is crap, so blogs are only a tenth as likely to be any good. And given how many of them there are, that means that probably most of the useful information on the internet is on one...
Seriously, though, there are good blogs and bad blogs. Find the right blogs and you've got a lot of very interesting reading ahead of you.
Re:2 years and no one will care (Score:2)
That was and is mostly copy and paste.
If a person goes through the trouble of learning to code in a computer language, the likely-hood that their content will be higher quality is better regardless of the ho
Re:2 years and no one will care (Score:2, Interesting)
But being able to program, and being able to program well are two different things. And even if they become an expert programmer, doesn't mean that they will know how to use it properly.
Someone had to design those Javascript butterflies that follow my cursor around.
And seriously. HTML is not hard at all to learn. Or at least not so hard to learn to be able to put up a web page.
Re:once again... (Score:2)
But I have a LiveJournal. So do a lot of my friends from high school and college. I can easily read up on their lives by reading my friends page, and they can do the same, and we can all comment on each other. It's put me back in touch with people I hadn't talked to in months.
I'm also a member of several LJ communities. These aren't much different from traditional message boards, but be