Slashdot Log In
Tim Berners-Lee and the Semantic Web
Posted by
michael
on Mon Sep 27, 2004 12:50 PM
from the same-old-refrain dept.
from the same-old-refrain dept.
An anonymous reader writes "As we all know, Tim Berners-Lee is the hero of the Web's creation story--he conjured up this system and chose not to capitalize on it commercially. It turns out that Sir Tim (he was knighted by Queen Elizabeth II in July) had a much grander plan in mind all along--a little something he calls the Semantic Web that would enable computers to extract meaning from far-flung information as easily as today's Internet links individual documents. In an interview with Technology Review, the Web-maestro explains his vision of 'a single Web of meaning, about everything and for everyone.'"
This discussion has been archived.
No new comments can be posted.
The Fine Print: The following comments are owned by whoever posted them. We are not responsible for them in any way.
Full
Abbreviated
Hidden
Loading... please wait.
What Does 42 Mean for Privacy? (Score:4, Interesting)
So, once this is off the ground, who wants to bet that the answer really is, 42?
Seriously though, this could be really cool, but I imagine that this could have some very adverse effects on privacy given the amount of information that finds itself on the web. Items that are linked by obscurity in disperate places would be easily linked into a single profile (If the stuff he's talking about isn't primarily smoke and mirrors). Either way, like any powerful technology, it will have both good and bad consequences. Here's hoping for the good...
Re:What Does 42 Mean for Privacy? (Score:3, Insightful)
Seriously though, this could be really cool, but I imagine that this could have some very adverse effects on privacy given the amount of information that finds itself on the web. Items that are linked by obscurity in disperate places would be easily linked into a single profile (If the stuff he's talking about isn't primarily smoke and mirrors). Either way, like any powerful technology, it will have both good and bad consequences. Here's hoping for the good...
People would do well to note the principle:
Re:What Does 42 Mean for Privacy? (Score:5, Insightful)
That is to say, I may be an item scammer in online gaming realms, or in Diablo, but not in EverQuest. However, I may be one of the most honest people I know in the real world. Perhaps I have a second account that I use to Troll on Slashdot, but otherwise have this account where I try to post insightful information. You have the right to link these things, you may even have the right to link these to real world data like where I work and where I park my car. However, if I jilted someone in Diablo, do I want them to so easily find me and take it out on my car (as some people would)?
Do I want my employer having instant access to all of my online transactions, regardless if I'm on shift or off shift at the time? Individually, these are not things that have been considered something you would even want to 'secure', yet they may be valuable to someone.
Parent
Re:What Does 42 Mean for Privacy? (Score:3, Insightful)
Ah, but what constitutes privacy but an obscurity of your own behaviors in certain circles.
I would disagree. I would say privay is more like cryptography in that privacy is the ability to control who knows certain information. So privacy is confidentiality.
That is to say, I may be an item scammer in online gaming realms, or in Diablo, but not in EverQuest. However, I may be one of the most honest people I know in the real world. Perhaps I have a second account that I use to Troll on Slashdot, but
Re:What Does 42 Mean for Privacy? (Score:3, Interesting)
My point in the virtual vs. real persona is that you cannot expect the same behavior patterns from the same people given totally different situations. My killing your character in an online death-match does not mean I would be unethical enough to kill you. Likewise, if I pick up trinkets from the monsters you have slain (clearly, they are not my spoils to take), this does not mean that I will take tips off of tables at a resta
'Twas a happy day on SemWebCentral... (Score:4, Interesting)
What is the semantic web? (Score:5, Informative)
You don't want a "single" web... (Score:4, Insightful)
This is to insure against a monoculture that is so disastrous in computer circles as demonstrated by the numerous security failings of Windows...
Re:You don't want a "single" web... (Score:4, Insightful)
Windows executes stuff. The semantic web is just data. Your warnings about a monoculture apply to the semantic web about as much as they apply to text files.
Parent
Re:You don't want a "single" web... (Score:3, Insightful)
Yes, and again, the problem is when the stuff that executes has a monoculture. It's not like you see Pine users or KMail users infected by emails with Outlook viruses in.
Re:No, there's something there (Score:3, Insightful)
By your declaring such functionality to be an error of logic does not (in my view) make it less likely.
Back to my very example... the 'scams and cheats' property assertion of an online gamer against my account number is, by definition, a symantic inferrenc
Duplicate Posting (Score:5, Funny)
Actually Slashdot posts this article over and over again every few months, with basically the same headline (sometimes "and" sometimes "on" sometimes "Tim" sometimes not). Kinda bizarre really.
Dang CERNopeans! (Score:4, Funny)
"Where's some semantic web software?" (Score:5, Informative)
Eclipse plugins [semwebcentral.org], visualization tools... there's some good stuff there.
about everything and for everyone... (Score:5, Funny)
Re:about everything and for everyone... (Score:5, Funny)
Parent
Opposing view (Score:5, Informative)
Re:Opposing view (Score:3, Interesting)
Having just read quite a lot of his article before becoming far too annoyed to go any further, I really wouldn't take him very seriously. The bulk of his complaint is that although the Semantic Web is about drawing conclusions from widely disparate pieces of data, people don't think like that. I have no complaint with this.
However, he attempts to illustrates his point with lots of syllogisms. Unfortunately, he doesn
Re:Opposing view (Score:5, Insightful)
His writings appear to have some uncorrected logical fallacies.
You can conclude the following from those statements:
- Count Dracula is not real
- Count Dracula lives in a region of Romania
I'd like to see the mystery step that combines these to conclude that Romania isn't real; at most, you could say that Romania houses something that isn't real. The conclusion he makes isn't supported by any logic.More importantly, these are dumbed-down semantics. The assertion that a fictional character lives somewhere real needs to be qualified that this occurs in a certain set of fictional stories, not real life. The fact that these unqualified statements are represented in this example ontology means that the ontology is insufficient, not that this method isn't useful.
Another example in that article:This is even factually incorrect. The First Amendment doesn't actually say anything about US citizens; it restricts the US Congress from certain actions, period, not for certain people.
Ignoring this, you can make one conclusion and reduce this to the following:
- the First Amendment covers the rights of people
- Nike is protected by the First Amendment
Concluding that Nike is a person from this is a logical fallacy [datanation.com]. (Nothing in these logical statements says the First Amendment might not also cover the disposition of small peanut butter sandwiches with blueberry jam, which set Nike might then be an element of.)I find it hard to treat this article with much weight, given its fast-and-loose treatment of logic and ontological assertions.
Parent
Re:Opposing view (Score:3, Insightful)
All information that is subjective is a poor candidate for the symantec web. All information that is quickly subject to change is a poor candidate for the symantec web. When mixing subjective (verb) pointers to a given truth on a large scale, modified by objective pointers, where even one of many thousands is false (or mis-keyed), the overall meaning can become quickly subverted.
In other words, if I get
Re:Opposing view (Score:4, Insightful)
It's based on the assumption that all semantics can be explained by syntax. So far this has not been proven yet, and all attempts to get there went stuck somewhere and turned out something different, sometimes useful (Chomsky's grammars), sometimes not so useful.
The semantic web would have to deal with the laziness of people who can't be bothered to write meaningful ALT attributes to tags. It can try to guess on some of the semantics, but it can also easily be fooled. Everyone who ever tried to use content filters for an internet connection knows what I am talking about. There are lots of false positives rejected and hundreds of questionable sites run through, because the syntax of a site alone doesn't help with evaluation the semantics (the meaning) of this site.
Parent
Re:Opposing view (Score:3, Insightful)
1. The Semantic Web (or rather, ontology construction and construction of relationships between your local ontology and other ontologies) is complicated and time consuming, and require you deciphering lots of other people's stuff to connect your stuff to it. Ultimately the success of any new technology, especially one that requires widespread adoption to be useful, must be easy enough
Semantic Web (Score:3, Informative)
Two major problems to a semantic web (Score:5, Insightful)
Re:Two major problems to a semantic web (Score:3, Insightful)
With data it is different, just look at how quickly RSS & ATOM are being adopted. There's an obvious advantage because having a feed on your site makes it easier for readers to lear
And the bigger problem: Trust (Score:3, Interesting)
This burns me up!!! (Score:5, Funny)
I'm so tired of Semantic trying to take over all the security tools. Are they now trying to take over the Internet? I mean really, Semantic Antivirus totally sucks ass big-time!!! And don't get me started on Semantic's SystemWorks tool and how bad it blows!
Oh, wait a minute...
Meanwhile... (Score:3, Funny)
Why is a hero? (Score:4, Interesting)
Statistical text analysis killed semweb (Score:5, Insightful)
Re:Statistical text analysis killed semweb (Score:3, Interesting)
Statistical methods excel at query relevance, not ontological interpretation. If the latter were the case, Google would be auto-constructing DMOZ instead of seeding page rank with it.
The next "web"? (Score:3, Informative)
...from the minds of Alan Kay, David Smith, David Reed, and others...
Ontology (Score:5, Interesting)
While this idea for more thorough, concise, and accurate searches is a good one, I would question whether embedding semantic tags into web pages is the way to go.
As outlined in Ontological Smenatics [nmsu.edu], there is an automated system of semantic processing already underway. Basically, it takes a text, then runs it through a parser, which looks up meanings in a lexicon, then reduces whatever translation it comes up with to a text-meaning representation (TMR), by pushing the concepts from the lexicon through an ontology / onomasticon / world-knowledge library. The TMR is basically the "pulp" of the semantics of the article, web page, book, or whatever it's been fed. It just contains the ideas, the things involved, and other relevant concepts, stripped of all other linguistic information.
TMR is great, becuase the TMR can be used then, by reversing the process and using the lexicon of another language, to translate a text from one language to another.
However, it seems to me that with the bits and pieces of the TMR stored in a search engine's index, this could be a huge boon for the search engine.
Instead of just trying to match keywords, by parsing the TMR of web pages and by parsing TMR of search strings, you no longer search for keywords, but keyconcepts.
The advantage to semantic searches / indexes by this implementation is manifold:
-Searches (and the web as a whole) will gain the richness Mr. Berners-Lee is advocating.
-Web authors will not be able to lie in their semantic tags, or otherwise misinform spiders what the page is about (remember tags?)
-No extra work is required in the actual construct of the web or *ML standards. The TMR is only generated and stored by the sites / processes that need it.
-Others?
Just an alternative solution, for fun
Not doing it right (Score:4, Insightful)
Re:Not doing it right (Score:5, Insightful)
Um... he invented www and started the W3C. I'd say he's had some experience with companies as a advisor. Take a look at some of the W3C recommendations and look for corporate involvment.
But in practice, all he is doing is proposing and overseeing standards.
That's kinda what the W3C *does*.
Standards should follow successful technology, not vice versa.
XHTML,XML,XSLT and a lot of other recommendations started as standards that *later* had robust implementations. Technology that starts without standards if often not fully thought out and awkward, and at worst, proprietary. Waiting for technology before standards will only inhibit interoperability and adoption of the standard.
The fact that Tim has been trying for 15 years to sell this idea with little success indicates that he approach is insufficient.
I suppose that it has nothing to with the fact that it's a tremendouly difficult and abitious project. You're right. Anything that take 15 years to develop should be scrapped.
Parent
Google can leverage its search (Score:5, Informative)
Re:Google can leverage its search (Score:3, Interesting)
Will the "spash screen"... (Score:3, Funny)
Tagging vs. Understanding Conext (Score:3, Interesting)
IMHO, the problem with the Semantic Web is the same problem that evolved the Web from a linked knowledge store to a commercial-driven directory.
Yes, it would be nice if all data were tagged and understandable, but let's be honest: the commercialization (and its result: exploitation by marketers) of the web would certainly spill into the Semantic Web, and so Berners-Lee's vision would be once again ruined by 1) incorrect/misleading tagging, 2) competing standards and 3) out and out fraud.
I assume what Berners-Lee really wants is for a machine to truly understand that, using his example: something is a calendar, and that you are interetsed in it, and that you should add the event to your schedule and then book a flight for it.
But the chances are -- one day -- machines will be able to understand how data is typed by understanding the context around it (just as a human would go through the aforementioned process manually).
Obviously, this type of reading "comprehension" is a long ways off, but the "search engine wars" are resulting in a lot of mind power thrown at the problem of understand context. And I'm guessing it'll be a reality before anything as pure as the vision for the Semantic Web is realized.
(and to throw in a plug for my own copmaniy's attempt at understanding web context: theConcept [mesadynamics.com]).
Second System Effect (Score:4, Insightful)
I've been hearing noise about the semantic web, RDF, and what not for years now, and every time I do, the first thing that pops into my head is "Second System Effect".
He got lucky once, because he put together some tools that were simple and straightforward enough for people to pick it up quickly, thereby avoiding the fate of the dozens of other hypertext systems going back to the late 1980's.
Now, like all second systems, he wants to "do it right", over-engineering away all of the things that made the first one take off ...
Just my opinionated rant ...
Why this is a bad idea - it's a taxonomy (Score:5, Insightful)
In the beginning, we had library card catalogs, with their painful attempts to index and cross-reference books. That works well in some areas, typically ones where names of people are significant. Attempts to apply the same approaches to technical papers worked less well.
There's a very elaborate classification system for patents. When you had to look through patents on paper or microfilm, it was essential. Now that we have full text search, it's used less and less.
A modern example of this approach is the ACM Taxonomy [computer.org], a structure into which all computer science can be fitted. (As an exercise, try to put the current Slashdot stories into that taxonomy.) Nobody actually uses that taxonomy to find anything.
As to data interchangability, that's a separate issue, and more of a standards one. The big problem for publicly available data is that the cost of encoding the data is borne by different people than those who benefit from the encoding. Many companies don't like having all their product and pricing information easily searchable by price. (Froogle may change this, because Google has so much clout.)
I've spent some time dealing with public financial reporting [downside.com]. There's opposition to detailed disclosure in a standardized format [xbrl.org]. Many companies don't want their detailed information to be too easily analyzed. Embarassing results show up.
The future is better search engines, not user-created indexing data. As we've painfully learned, a search engine must look at the same data a human reader would, or it will be lied to. Lied to to the point of uselessness.
The need for information management pops up again. (Score:5, Interesting)
Remember the CVS story a couple of days before? it's information management: http://slashdot.org/comments.pl?sid=123076&cid=10
WinFS is also about information management: http://slashdot.org/comments.pl?sid=121101&cid=10
The story that the Evolution e-mail client offers the e-mail data as a data model separate from the application? another information management issue.
The web? information management issue.
Distributed databases? information management issue.
Web search engines? information management issue.
Windows search tool? information management issue.
The Windows registry? information management issue.
The unix etc directory? information management issue.
Enterprise workflows? again, an information management issue. That's why there is no general workflow solution accepted and used worldwide.
Dynamic web site contents? information management issue.
The semantic web? another information management issue!
As you can see, from the numerous examples given above, all that an operating system should do, but no one does, is that it must manage information instead of files. If that is coupled with a distributed networked environment, 90% of the world's software would be considered obsolete overnight and the productivity and fun from using computers will increase 10fold.
If any open source developer is reading this, you may contact me for a private discussion on the idea. THIS IS OPEN SOURCE'S BIGGEST CHANCE TO LEAD THE TECHNOLOGICAL RACE!
Nice Try, Tim (Score:3, Insightful)
Still, every little bit helps. Certainly a "Semantic Web" would be more useful than the current one.
Re:The rest of us call this... (Score:3, Insightful)
And here is the problem: what "the rest of us" are going to do when Google goes south? Either collapses under its own weight or finally broken by its corporate overlords?
Can't put all the eggs in one basket. The only sane future is the one with unified, object-driven search and retrieval methods distributed amongst information consumers and producers.
Re:The rest of us call this... (Score:3, Informative)
Google identifies relationships between data using only on the links between pages containing the data.
The Semantic web represents relationships between data based on metadata [w3.org] (i.e. data about data). This is a far more powerful way to describe the meaning of data.
works for me.
Maybe, but that doesn't mean its the best way to accomplish what you are trying to do.
Re:The rest of us call this... (Score:5, Insightful)
And this is what makes me wonder if this will amount to much more then an interested research project for grad students. In order for the SemWeb to amount to anything useful, everyone is going to have to include the metadata necessary to integrate their data into the Semantic Web. How's that going to work? Who's going to make it work?
Parent
Re:The rest of us call this... (Score:3, Informative)
The Semantic Web just provides a method for expressing metadata. Maintaining the integrity of those expressions involves a different set of problems. Some of the solutions include trust metrics [moloko.itc.it] like Slashdot's own distributed moderation [umich.edu] (PDF) or Advogato [advogato.org].
Re:The rest of us call this... (Score:4, Interesting)
This is an important point. Google computes the pagerank [wikipedia.org] of a page based on the eigenvector of the web link matrix, which is a clever and usually effective approach. Unfortunately, each link only conveys a little bit of information. A link from page A to page B is assumed to be an endorsement of page B's relevance by page A. But what if you could add extra metadata to the links? Not just a URL and a human readable text label, but a machine readable label as well, like this?
If you could apply arbitrary attributes to web pages, google would have much better information to go on, and a user could specify the importance of certain attributes depending on what he/she is looking for.
-jim
Parent
Re:The rest of us call this... (Score:4, Interesting)
Google's a hack. No, really, it tries to extract meaning from web pages that really aren't engineered to store that kind of information.
Google is also an application. The Semantic Web is all about building the infrastructure so applications like Google don't have to chase the holy grail of AI to become more than a hack. Think of the Semantic Web as the layer underneath Google.
Parent
Actually, Google is a search engine (Score:5, Informative)
The rest of us call this... GOOGLE.
Google searches undifferentiated text. In contrast, the semantic web is all about differentiating text by adding meta tags.
For example, the word "Hilton" on a web page is ambiguous. It could be a hotel, or a celebrity. Which is it? With the semantic web we'd know:
Of course, this is a fairly trivial example. A more meaningful example:
Parent