
Tim Berners-Lee's List 42
weink writes "Tim Berners-Lee has made a career out of resolving Internet pet peeves. Ten years after he invented the Web, making the Internet user friendly, he is still drafting lists of things that could work better. "
ERrr..you're thinking of the internet. (Score:2)
I never metadata I didn't like (Score:1)
Babelfish just has to look up words and phrases in a dictionary and replace them with their defined equivalents -- that's why it's not as good as a human translator.
Picking up the relevant topics in a page is a good deal harder, since it seems to require some degree of comprehension. I can't imagine a bot being able to distinguish a page about how much Bill Gates sucks from a page about how much pages about how Bill Gates sucks suck -- or even easier stuff.
The mapping is easy; the extraction is hard.
web even (Score:1)
e-commerce... blech (Score:1)
It makes it hard to argue against the anti-Internet/anti-technology people when so many people arguing for the Internet see it as nothing more than a way to sell products. People who read about the Internet from the mass media have every reason to be scared of it -- it sounds horrible from that perspective.
Irony - Bingo! (Score:1)
Clarification (Score:1)
Again, I apologize for any confusion I may have caused.
Ted Berners-Lee? (Score:2)
The Problem with Metadata (Score:2)
Consider the problem that search engines faced when they indexed solely on the basis of textual relevance: pr0n sites filled their pages with the same words, repeated over and over again: "teen sex xxx porn pictures teen lesbian sex erotic sex xxx porn porn xxx sex teen girl babe sex sex xxx" and so forth. This made their pages more likely to turn up at the top of a search, and thus garnered more eyeballs for their advertisers. Who suffered? Teen-age lesbians (etc.) looking for informative sites about issues related to their lives, not for hetero-oriented pr0n.
Metadata systems are just as exploitable. Anyone familiar with the Prisoner's Dilemma will recognize the following --- because these systems (like pure textual relevance search systems) reward "defecting" behaviors such as deliberately false labeling, they will not solve the problems that result therefrom.
Even notwithstanding the problem of dishonest behavior, there remains the problem of clueless or simply self-aggrandizing behavior: users labeling their pages as more relevant to a given topic than they really are, or not understanding distinctions among topics. A marketer at Dell might not know what "computer science" is, and insist that "computer science" be added to the metadata of Dell's e-commerce site. "After all, we sell very scientifically-designed computers. Isn't that what computer science means?" Cluelessness reigns supreme.
Until these problems can be solved, human-indexed sites like yahoo.com and dmoz.org will have some huge advantages over spider-powered search engines.
Not much news... (Score:1)
The topic of P3P alone should be enough to start whole flamebaits on privacy issues.
Then again the ICe seems more like a Pointy-Haired Expo anyway.
"We mean eBusiness" - why does that remind me of Dilbert?
My take on RDF (Score:2)
There's a lot of interesting things out there. In particular, I think XML and DOM could be the basis for a very good component framework in which powerful components would be easy to write, and would integrate nicely without a lot of hassle. I'm looking at RDF as a piece of this.
But, as far as I can tell, the problem that RDF solves is a bit different than the one mentioned in this article. RDF is a way of representing documents as graph structures, allowing individual files to contain both local and external pieces without everything getting tangled up.
The problem of representing metadata unambiguously is a tricky one, but is not yet solved. The RDF spec presents an interesting outline about how this might be done, but it doesn't quite tell me what I need to do to get my own Web pages to be correctly meta'ed. If I were a library, then the Dublin Core [purl.org] would start to give me the specific markup I needed, but that's just for libraries. What do I use do as metadata for my free software efforts?
It seems like the combination of XML plus XML-NameSpaces plus Dublin Core plus all the other recommendations, specifications, and standards analogous to the Dublin Core but for domains other than libraries might cohere into a workable metadata system for the Web, but on the other hand, the complexity and fuzziness of specification could very easily prevent the beast from flying.
When you're dealing with software, precise specification is key. Some metadata standards have succeeded pretty well in this regard - take MIME content types, for example. If you have a JPEG image, you know that the content type should be "image/jpeg". But the XML crew hasn't even managed a consistent namespace name for HTML 4.0 (I've seen "urn:w3-org-ns:HTML", "http://www.w3.org/TR/REC-html40" and others).
For those hoping for a more technical discussion of RDF, I recommend the Mozilla page on RDF [mozilla.org] and of course the specification itself [w3.org].
rdf? (Score:1)
It is supported in Mozilla, but not Internet Explorer 5.
I never metadata I didn't like (Score:1)
What I would suggest is automated classification. There has been plenty of work over the years in AI and related technologies for parsing, digesting, and classifying raw, unmarked-up text. If babelfish can read a page and map it to another language, it's reasonable enough to expect that the relevant topics in a page can be extracted and mapped to some kind of categorization. There are a number of companies out there selling the technology to do this right now. It's not perfect, but with human editing it can put together a web directory very quickly.
Ultimately I envision being able to start from a page anywhere on the web, push a button, and get a list of other sites on the same topic. If it were done right it would beat keyword search by a large margin.
Everything could always work a little better (Score:1)
rdf? (Score:1)
RDF is an XML compliant structure (Score:2)
RDF Realistic? (Score:4)
In fact, using RDF in a fractured or improper way may even be more detrimental than good 'ol heuristics. Malformed RDF will send syntactically correct, but semantically incorrect metadata to a search engine equipped to handle it. This is a dnagerous combination - it makes bad search results more precisely wrong. I'd rather that have good guess than a precisely wrong answer.
It ultimately boils down to whether you trust users to be able to describe their own metadata. I don't. Perhaps a good apporach is to have centralized servers attempt to create correct RDF files based on a set of common criteria. While this is still a flawed approach, I would rather have search results that are consistent (consistently wrong or consistently right) than try to get inside the psychology of each individuals web designer's implementation of RDF metadata. This approach might also cut down on metadata abuse (trying to bump up page in searches where it should not rank highly, etc).
In other words, I think we're way way off solving the metadata/search issue. For now, the best answer seems to be human categorization (yahoo) or smart smart heuristics (google, inktomi).
Ted Berners-Lee? (Score:1)
- NeuralAbyss
~^~~~^~~~^~~~^~~~^~~~~^^^~~~~~~~~~~~~~~~
Real programmers don't comment their code.
Internet is not the web (Score:1)
Do you newbies know nothing?
Samael
And is faked by people needing hits (Score:1)
RDF as XML (Score:1)
I'm not about to undermine RDF, but it should be noted that RDF is an application of XML and there are many other languages based on XML that could have an equal or even greater importance. Dozens of such languages already exist, and they include
I'm looking at RDF as a part of XML, which is the big thing - not any of the individual languages.
Ted Berners-Lee? (Score:1)
Not much news... (Score:1)
I guess they are in the business of "quick news" for the low attention spent people...
Invented the Web? (Score:1)
Clarification (Score:1)
Just imaging Al Gore as Prez of US, that's going to be confusion...
Irony (Score:1)
Yack!!
Al Gore and confusion. (Score:1)
--
- Sean
This is why sane people use Opera... (Score:1)
--
- Sean
metadata on the web is unreliable (Score:4)
At one time, you could force AltaVista to only show pages containing certain text or URL's. While those options are still accepted by the engine, they are largely ignored. As a user I am annoyed when I ask a search engine to only show me pages that actually contain certain strings, only to navigate to the page and turn up empty on a find.
I have actually gone from 'portal' surfing (early Yahoo) to search-based surfing, back to 'portal' surfing. The web is hairy enough that I actually *do* want someone to filter out the crap for me, unless I am looking for something extremely specific and unhappy with the portal-based results.
I do agree with some of his other thoughts, about form submission and URL changing. . .
And is faked by people needing hits (Score:2)
On the other hand, metatags make a lot of sense for huge commercial empires (Amazon, eBay, Buy.com, etc.) which will be willing to maintain reasonably accurate markings. I have a suspicion that in the not-so-distant future we will have a situation when the big search engines will accept (=believe) metatags from big commercial sites, but will ignore them from small fry. There may develop a "club" whose metatags Yahoo, AltaVista, Lycos, etc. will believe.
Invented the Web? (Score:2)
Will
Irony (Score:2)
yeah (Score:1)
actually (Score:1)
Invented the Web? (Score:1)