Slashdot Log In
Greatest Task of Web 2.x: Meta-Validation
Posted by
kdawson
on Sun Dec 03, 2006 09:33 PM
from the vetting-the-metadata dept.
from the vetting-the-metadata dept.
CexpTretical writes "This Technology Review article about Web 2.x problems fails to mention the 800 pound gorilla in the room when it comes to fulfilling the dreams of the Semantic Web — i.e., assumptions about the validity of metadata or tagging schemes. We can add all of the metadata and/or tags we want to web resources but that does not mean that the 'data about the data' honestly or accurately describe the resource or are 'about the data' at all. This is why Google does not place much importance on the metadata already contained in HTML document headers for search ranking, because it cannot be trusted. And to validate it would require more effort than to search and index that data from scratch. Ensuring or verifying the validity of metadata would be a task equal to that of initially creating it, but would have to be repeated on an ongoing basis. Hence all of the talk about 'trusted networks,' which then require trusting the gatekeepers of those networks. Talk about 'semantics.'" Slashdot's moderation and meta-moderation offer one example of getting useful metadata in a non-trusted environment.
This discussion has been archived.
No new comments can be posted.
The Fine Print: The following comments are owned by whoever posted them. We are not responsible for them in any way.
Full
Abbreviated
Hidden
Loading ... Please wait.

Meta data (Score:4, Interesting)
The tagging system might be a better example, or at least an example of mostly useless meta information.
Yep. No functionality aside from in-jokes (Score:5, Insightful)
"Is Linux ready for desktop?"
yes, no, fud, notfud -- and it would be marked omgponies, dupe, and thistagisfreakinguseless if any of those options weren't automatically stripped.
Its almost like tags are designed to be useless here, in a way that they're not with delicious (put the periods in wherever you want them -- I use www.delicious.com and I am so very glad it works). I can use delicious as a "Hmm, I want to read this later" bookmark-shared-across-machines, to categorize Java samples for my own use later, and to do things which are of use to *me*. The social aspect grows naturally from the personal uses, because when you mark Sun's whitepaper as being about Java or this photo on flickr as being of sakura everyone else gets to piggyback on your diligence. But if there isn't any personal use possible then tagging is just textual autoeroticism.
You can mark me fud and omgponies if you want.
Re: (Score:3, Informative)
You (sort of) can. Go to http://www.slashdot.org/tags/foo [slashdot.org]
Click the tags that are listed, rather than clicking the arrow. If the tags were meaningful you'd get similar articles.
Embrace the future. (Score:5, Funny)
For instance: "IT: Vista Designed to Make Malware Easy [slashdot.org]" is tagged "troll, fud, vista, notfud, microsoft". I mean -- that's it! That's the whole discussion right there. Point, spastic head-nodding, counterpoint, rehash of the original article. Thank you sir, may I have another.
I'm hopeful that on some future "Slashdot Mobile," they'll remove everything but the titles and tags, and display it as a feed. Maybe after that, they'll even get rid of the titles, so you can just see a constant stream of tags.
Forget a boot stamping on the face of humanity; that's the future for you: "microsoft fud notfud troll itsatrap google dupe evil internet hardware nvidia slashvertisement pigpile dupe sun esr fud ubuntu dupe microsoft dupe
Re:Meta data (Score:4, Interesting)
Ah, the value judgement rears its head again. I'm not sure how you can easily distinguish between invalid thoughts and thoughts you disagree with. For me, and I would hope for many people who get mod points, the points are expended on those comments that add something useful to the discussion.
A tired old argument that, for me, was debunked years ago -- however 'Interesting' or 'Insightful' it may have been the first few times I heard it -- simply isn't saying anything. Knowing why an argument or point is flawed, invalid, or deliberately vicious, am I obligated to spend my mod points on it just because its falsehood might be interesting or even insightful to an uninformed reader? Do I not, on the contrary, have a duty to remove information which I know to be false or misleading by downmodding?
Moderation isn't about feeding your own opinions back to you; It's about obtaining an aggregate value judgement from the community as a whole. If you want to browse sans value judgements, or if you disagree with the community's concensus and want wrong, invalid, or uninformed (per the moderation system, as judged by the community) opinions to be given equal or greater weight than those moderated up, use the 'Prefs' panel. That's what it's for.
Speaking of Slashdot's metadata... (Score:5, Insightful)
Re:Speaking of Slashdot's metadata... (Score:4, Funny)
Please move along folks, there's nothing to see here.
Re: (Score:3, Funny)
Re:Speaking of Slashdot's metadata... (Score:5, Funny)
You can't trust the moderation system either (Score:4, Insightful)
However since posts lower than zero do not get displayed automatically, views that are unappealing to the Slashdot community are relegated to obscurity regardless of their validity and correctness.
Linux sucks.
Re:You can't trust the moderation system either (Score:5, Interesting)
Mod Spam? (Score:5, Interesting)
Here on Slashdot, there is a selection process and a reputation system that determines who has the ability to moderate. How does this "Web 2.0" address the fact that anyone can attach and moderate tags?
Re:Mod Spam? (Score:4, Insightful)
Is that true? My understanding was that any registered user with an account older than X period of time was eligible to moderate.
If there really is some sort of reputation system, I'm not sure I approve of that. For example, I've been reading Slashdot for close to 10 years. Check out my account number. Presumably I have a pretty good "reputation." But then again, I love a really good troll.(*) I've been known to post a few, too. (Ssshh!) Based on those facts, should I really be allowed to moderate more than somebody else, just because my "reputation" is ostensibly more established?
Wait ... did I say that? Or only think it?
(*) It's a pity there are so few really good trolls anymore.
Re: (Score:3, Insightful)
Re:You can't trust the moderation system either (Score:4, Insightful)
You seem to be arguing against yourself. Moderators are chosen from a large pool according to rules described in moderation guidelines. It stands to reason that if these moderators come to consensus about a post, then that consensus would be descriptive of the post.
Re: (Score:3, Interesting)
No, it just means that their
Re: (Score:3, Insightful)
"Democracy is two wolves and a lamb voting on what to have for dinner."
You've picked out a particular failing of a republic, and
Re:You can't trust the moderation system either (Score:4, Interesting)
Re:You can't trust the moderation system either (Score:5, Insightful)
Here's a thought: Rather than indulging in self-satisfied name-calling, why not perform some analysis on the moderation system and actually try to provide some evidence for your facile assertion? It's pretty easy to do, precisely because the kind of abuse you claim is rampant here would also be completely transparent, if it were happening.
For my part, I have no inclination to agree with your assertion, because in the 2 years I've been meta-moderating daily, I haven't seen more about 1% of posts[*] that show such symptoms. On the contrary, if my experience is any guide, there's a far more common tendency to content-free comments like yours upward than to mod unpopular, but well-argued, comments downward. The consistency of the data, and the fact that it's semi-randomly selected for me, leads me to believe that it's statistically significant, and that my experience doesn't differ significantly from anyone else's.
YMMV, but the burden of proof does lie with the accuser, so please back your assertion with evidence.
[*] I base that on viewing slightly less than 1 abusive down-mod a week, or 1 in 80-90 moderations.
Re: (Score:3, Insightful)
Re: (Score:3, Interesting)
Digg is even worse. (Score:3, Interesting)
When it was first becoming popular, I used Digg for
Re:You can't trust the moderation system either (Score:4, Interesting)
When I moderate, I view all comments, even the ones with negative scores. That's the responsibilty of moderators, yes? The moderators have to wade through the sewerage so that you don't have to.
With that in mind, I have no idea why your message is rated as insightful.....
Re: (Score:3, Informative)
Anyway, you should be happy that we have Slashdot's moderation system. Here, content-free jokes and trolls get modded up and relatively anyone with a long, reasoned-
Re:You can't trust the moderation system either (Score:4, Insightful)
The fact that there's a general consensus viewpoint that tends to re-enforce itself is just an artifact of human nature. Slashdot, not being any great exception to the human condition, does what it can to reduce this, and in my eyes does about as decent job as you're going to have done when you let the mob moderate itself.
Hal Porter, what are your thoughts? (Score:2, Funny)
Re: (Score:2, Interesting)
Not just about the users... (Score:2)
Idioms (Score:5, Informative)
I thought it was "elephant in the room"? Googlefight! [googlefight.com]. We're talking orders of magnitude here... Please tell me that lame TV commercial that botched the idiom isn't starting a trend? I think 800 lb gorilla should remain as the Urban Dictionary's [urbandictionary.com] "an overbearing entity in a specific industry or sphere of activity" and not expand to the more abstract, from Wikipedia [wikipedia.org], "an obvious truth that is being ignored"
Metadata doesn't work (Score:3, Interesting)
It's a lazy shortcut to somebody with a brain doing the editing/moderating themselves. The masses are NOT always right and are often wrong, in fact (Wikipedia). Meta-validation is a way to let "the users" do the work, even though those users are generally not qualified to do so. The whole value in say, a web site, is offering useful, accurate information to other people who don't already know that information. Meta-validation is essentially mod rule, with no order or methodology. Meta-validation is a shortcut to profit, and as a result, it will never result in good, long-term information.
The difficulty: association is not relation (Score:5, Insightful)
Working with metadata from a non-trusted community is a few orders of difficulty harder than working with trusted metadata. All the examples from non-trusted user groups that I've seen are either 1) only able to track fairly simple data or 2) ambitious but disappointing. I'd put Slashdot's moderation and metamoderation in the first category. Relevance, quality, and a few kinds of description are possible, but these are fairly simple things to track. Most internet resources would require metadata that is much harder to validate to be useful.
A primary example of this that comes to my mind is the current crop of music recommendation services. The idea behind these sites is that they can, through one of various methods, recommend music to you based on what you like. I've experimented somewhat extensively with Pandora [pandora.com] and Last.fm [last.fm], and the difference in the quality of their suggestions is amazing.
Last.fm uses community data for recommendations. It tracks tags that users attach to songs and the collection of artists that each user listens to. Based on what artists you have listened to or which tags you select, it attempts to point out other artists you might like.
Pandora makes recommendations based on musical qualities. The data the service uses comes from the Music Genome Project, which paid people who have studied music to catalogue the musical qualities of songs in their database. Employees listen to songs and select which attributes are applicable to the song from a list of hundreds of attributes. To use the service, you enter some songs and artists that you like, and based on the musical attributes of those songs and artists, it recommends other songs you might like.
The results that the services provide, at least in my case, are like night and day. Last.fm's recommendations are heavily influenced by what's popular and how a common user would categorize an artist or song. They sort-of hit the right areas, but it doesn't get much better than Amazon's recommendations. Pandora's recommendations always seem to be more on target, even though it uses only a few artists or songs that you enter at the start, in contract to Last.fm, which can use my entire play history.
I guess a lot of this can be chalked up to the difference between association and relation - without some type of new innovation, it seems that community-based metadata can only be based on association, which is a far cry short of relation. Yes, it is a type of relation, but a set of data has qualities that a few simple tags from users are not going to be able to touch. It seems to me the next generation of metadata will only be possible when we can figure out a way to get the sort of data that Pandora uses from a community group. It's a daunting challenge that tagging and simple user activities like the Google Image Labeller have just started to slightly touch.
Screw the meta-data validation... (Score:2)
Solve the problems we all have, not the strawman problems that are created to justify the "solutions" being proposed.
If you really want to deal with meta-data validation, take that project up in Web 3
Re: (Score:3, Insightful)
One thing that I don't see mentioned (Score:3, Insightful)
Greatest Task of Web 2.0: Materialization (Score:3, Insightful)
Web 2.0 is an empty buzzword for the evolution of the internet. There is no single event that can be unequivocably be called the atart of "Web 2.0".
According to Daniel Glazman [glazman.org], Tim Berners-Lee has officially given up on XHTML as of last week's W3C Advisory Committee meeting in Tokyo, and then apparently explains what Web 3.0 is supposed to be.
TBL is apparently not the visionary we all thought he was. Apparently no one in the W3C can (or is willing to) figure out how to relegate HTML to the junk heap, like a 286 computer: it was a good idea at the time, but newer technology has come along. Eventually, someone will want to see one in a museum. Contrary to popular reports, the W3C has not fixed itself, but merely rolled back the clock on itself a decade or so.
After 8 years, what do all the developers who embraced XHTML get for our efforts? Our smorgasboard of web standards becomes a (tag) soup kitchen once again.
Web 2.0 is a fleeting concept with no substance, it's existence can only be inferred by serruptitiously attributing semi-related events to its influence. Now that the inventor of the WWW has bought into this folly, and simultaneously abandoned one of the W3C's greatest achievements, how can anyone put any stock in what he or anyone else at W3C says?
I held out longer than most in my hopes that web standards could be straightened out, but now the W3C is dead by its own hand, after 6 or more years of atrophy, manic depression, and schizophrenia.
Re: (Score:3, Funny)
* Cure Cancer
* Solve World Hunger
* End All War
* Show Us the One True God
* Eliminate Spam
* Turn Janet Reno Into a Beautiful Woman
* Help Pandas Mate
* Prevent Dupe Articles on
As you can see, Web 2.0 is more than a buzzwor
Re: (Score:3, Funny)
The Great Google Metadata Myth (Score:4, Informative)
Google does, in fact, use metadata -- tons of it. Google uses explicit metadata built into headers (like the description, robot control); it uses the rel-license microformat; and it uses titles and h1 headers. It also uses some crucial metadata that's not self-reported by the Web site -- namely, the number and text of links inbound towards a page. It also uses metadata in HTTP headers.
Google also uses lots of data that is unreliable or could be dishonest. After all, there's a huge dark business of blackhat SEO that has its sole intention to trick Google's bots into thinking pages are more important (or are on a different subject) than they actually are. There is no particular part of an HTML page or any other Web resource that cannot be a lie. Web spiders have to deal with this all the time, and they have to balance the information they get from different data sources to determine what's true and what's not.
It's true that Google's search results don't depend as heavily on the specific meta keywords the way many first-generation search engines did. But I think that's more a consideration of the remarkable naivete of early search engines than anything else.
Another 800 pound gorilla sighting? (Score:3, Funny)
Whether emitting gas or validating meta-information, this gorilla has maintained his importance and kept his mass steadily high. Are there larger gorillas? And if there were, would it matter?
Some thoughts to ponder while the pr0n is loading
Bad News, Folks... (Score:3, Funny)
Metadata is a great idea... (Score:3, Informative)
Despite all that has been said in the comments and elsewhere, there simply is no good implementation of metadata for the Internet that applies to all types of data and all instances of data sharing.
If you want to be a hero, figure this little problem out and the world will beat a path to your door... so to speak.
Just Asking For It (Score:3, Insightful)
Slashdot's moderation and meta-moderation offer one example of getting useful metadata in a non-trusted environment.
Why, oh why, would you include that at the end of the summary? Even if there weren't horrible issues with the moderation system (there are), this particular audience is going to rip that comment apart.
Tepid Moderation (Score:4, Interesting)
An example of unaccountable, gameable metadata that generates untrustworthy info that is almost as useless, through abuse, as it is useful.
Slashdot's moderation could:
Those few improvements could introduce some accountability and feedback into the now mostly abused meta/moderation system. Until then, Slashdot has little to teach the world about the right way to accumulate useful metadata in an untrustworthy environment.
Slashdot's moderation is pretty good (Score:3, Interesting)
I only recently started posting on Slashdot, but I find your claim that the moderation system is mostly abused pretty inaccurate. While your suggestions for improving the system seem like they would be useful, moderation, which is certainly not perfect,
Re: (Score:3, Insightful)
Which is reflective of the quality of discourse on Slashdot. Kinda fun, but far from rigorous enough to be taken seri
Re: (Score:3, Interesting)
That's why I introduced my post with "I only recently started posting on Slashdot" - to indicate that I would be giving the perspective of someone who is new to posting here, not to give the expertise of someone's who's been posting for five years. There
The ~real~ Greatest Task of Web 2.x (Score:3, Funny)
This is the war that desktop-bound Redmond cannot afford to lose. One browser to rule them all, the men of Middle Cubicle, the Dvorves of Dvorak, the Geeks of Ajax, the Elvish and the rhinestone-laden Elvish Impersonators. Starring...
The rest of this roman à clef I leave to you, my fellow Slash Hobbits.
Metadata, Ajax and Trusted (Score:3, Insightful)
As for the issue of metadata on the web it is a serious concern and search engines can't continue to just ignore it. As ajax and other dynamic presentation technologies become more and more common less and less of the content on the web will be encoded in simple HTML. Sure everyone who writes up some fancy ajax site and isn't an idiot will leave some html files around for google to index but this doesn't solve the problem. If everyone who visits the site sees something other than the info in the HTML then the HTML itself has become the metadata.
This problem is solvable since, as the success of google itself indicates, if the data is being used by the end user for some significant purpose the authors stay honest. The reason websites sometimes give bogus meta tags is because it doesn't affect the user's experience in the least. If we get something like the semantic web where the users are actually making use of the metadata then things are no different than they are now.
I hope this is what happens as the other option where google starts learning to crawl through ajax calls is much less pleasant. It was bad enough when all ruby actions were gets and google would trigger all sorts of things to happen in your app. It will be far worse if they are deliberately trigger all the JS scripts on your page in order to search effectively. And they *need* to be able to search effectively as that is the heart of why the web works.
Alternatively maybe google could start incentivizing accurate metadata descriptions of *other* pages (via outgoing links) by giving your web page a boost in the rankings. Thus, like wikipedia, perhaps enough good contributions would outweigh the bad ones.
Web 2.0 is schizophrenic (Score:3, Insightful)
In the end, the whole thing is just marketing hype. Web 2.0 is just the haphazard collection of messy technologies people happen to be using on the web in 2006, and don't expect things to get any better in the next few years either: the W3C, Adobe, and Microsoft will see to it that things remain messy and complex, because, heck, if we actually made the technologies clean and simple, how would these companies and the swarm of overpaid and underqualified consultants make a living?
FIRST POST v 2.X (Score:2, Funny)