Tim Berners-Lee and the Semantic Web 250
An anonymous reader writes "As we all know, Tim Berners-Lee is the hero of the Web's creation story--he conjured up this system and chose not to capitalize on it commercially. It turns out that Sir Tim (he was knighted by Queen Elizabeth II in July) had a much grander plan in mind all along--a little something he calls the Semantic Web that would enable computers to extract meaning from far-flung information as easily as today's Internet links individual documents. In an interview with Technology Review, the Web-maestro explains his vision of 'a single Web of meaning, about everything and for everyone.'"
What Does 42 Mean for Privacy? (Score:4, Interesting)
So, once this is off the ground, who wants to bet that the answer really is, 42?
Seriously though, this could be really cool, but I imagine that this could have some very adverse effects on privacy given the amount of information that finds itself on the web. Items that are linked by obscurity in disperate places would be easily linked into a single profile (If the stuff he's talking about isn't primarily smoke and mirrors). Either way, like any powerful technology, it will have both good and bad consequences. Here's hoping for the good...
'Twas a happy day on SemWebCentral... (Score:4, Interesting)
interesting technology... (Score:2, Interesting)
i wonder if this could be used for a computer's local file system as well. I know microsoft is working on this (WinFS or OFS or whatever it's supposed to be called), but it would be damn awesome to apply this not just to the internet.
Why is a hero? (Score:4, Interesting)
Two major problems to a semantic web-binaries. (Score:1, Interesting)
Ontology (Score:5, Interesting)
While this idea for more thorough, concise, and accurate searches is a good one, I would question whether embedding semantic tags into web pages is the way to go.
As outlined in Ontological Smenatics [nmsu.edu], there is an automated system of semantic processing already underway. Basically, it takes a text, then runs it through a parser, which looks up meanings in a lexicon, then reduces whatever translation it comes up with to a text-meaning representation (TMR), by pushing the concepts from the lexicon through an ontology / onomasticon / world-knowledge library. The TMR is basically the "pulp" of the semantics of the article, web page, book, or whatever it's been fed. It just contains the ideas, the things involved, and other relevant concepts, stripped of all other linguistic information.
TMR is great, becuase the TMR can be used then, by reversing the process and using the lexicon of another language, to translate a text from one language to another.
However, it seems to me that with the bits and pieces of the TMR stored in a search engine's index, this could be a huge boon for the search engine.
Instead of just trying to match keywords, by parsing the TMR of web pages and by parsing TMR of search strings, you no longer search for keywords, but keyconcepts.
The advantage to semantic searches / indexes by this implementation is manifold:
-Searches (and the web as a whole) will gain the richness Mr. Berners-Lee is advocating.
-Web authors will not be able to lie in their semantic tags, or otherwise misinform spiders what the page is about (remember tags?)
-No extra work is required in the actual construct of the web or *ML standards. The TMR is only generated and stored by the sites / processes that need it.
-Others?
Just an alternative solution, for fun
Re:Opposing view (Score:3, Interesting)
Having just read quite a lot of his article before becoming far too annoyed to go any further, I really wouldn't take him very seriously. The bulk of his complaint is that although the Semantic Web is about drawing conclusions from widely disparate pieces of data, people don't think like that. I have no complaint with this.
However, he attempts to illustrates his point with lots of syllogisms. Unfortunately, he doesn't seem to understand them. For example, he uses this one:
...to illustrate that despite the fact that all the above statements are correct, the only conclusion you can draw is that Romania is not real.
Huh?
The only way you can come to that conclusion is if you assume that statement 2 implies that, if X lives in Y and X is not real, then Y is not real. Which is an invalid assumption. Therefore his conclusion is not valid.
The entire essay is full of things like this. When he's talking in generalities, he makes a small amount of sense, but as soon as he starts using specifics, he stops making sense. There may be something to his basic point, but I'm not inclined to trust someone's opinions on a fundamentally logic-based concept who seems to be so inept at using logic. Treat with caution.
Tim didn't "invent" anything new with the web. (Score:1, Interesting)
Yawn yawn.
Incidentally, if you weren't on the net before NCSA mosaic, or have never used GOPHER, you don't need to bother replying to this post. Trust me, "surfing" gopherspace was trivially different from "surfing" the web, until the web went commercial.
This is like how Darwin constantly gets credited with "inventing" the theory of evolution (when actually Matthews published it in 1831, 30 years previously, as acknowleged by Darwin himself) or the way Uda's name always gets left off his invention (Uda was the principal inventor of the Yagi-Uda antenna).
Give Tim credit for helping develop the first web browser, he deserves that recognition. But calling him the "inventor" of the web is like calling Sir Isaac Newton the "inventor" of gravity!
Tagging vs. Understanding Conext (Score:3, Interesting)
IMHO, the problem with the Semantic Web is the same problem that evolved the Web from a linked knowledge store to a commercial-driven directory.
Yes, it would be nice if all data were tagged and understandable, but let's be honest: the commercialization (and its result: exploitation by marketers) of the web would certainly spill into the Semantic Web, and so Berners-Lee's vision would be once again ruined by 1) incorrect/misleading tagging, 2) competing standards and 3) out and out fraud.
I assume what Berners-Lee really wants is for a machine to truly understand that, using his example: something is a calendar, and that you are interetsed in it, and that you should add the event to your schedule and then book a flight for it.
But the chances are -- one day -- machines will be able to understand how data is typed by understanding the context around it (just as a human would go through the aforementioned process manually).
Obviously, this type of reading "comprehension" is a long ways off, but the "search engine wars" are resulting in a lot of mind power thrown at the problem of understand context. And I'm guessing it'll be a reality before anything as pure as the vision for the Semantic Web is realized.
(and to throw in a plug for my own copmaniy's attempt at understanding web context: theConcept [mesadynamics.com]).
Re:The rest of us call this... (Score:4, Interesting)
This is an important point. Google computes the pagerank [wikipedia.org] of a page based on the eigenvector of the web link matrix, which is a clever and usually effective approach. Unfortunately, each link only conveys a little bit of information. A link from page A to page B is assumed to be an endorsement of page B's relevance by page A. But what if you could add extra metadata to the links? Not just a URL and a human readable text label, but a machine readable label as well, like this?
If you could apply arbitrary attributes to web pages, google would have much better information to go on, and a user could specify the importance of certain attributes depending on what he/she is looking for.
-jim
Re:The rest of us call this... (Score:4, Interesting)
Google's a hack. No, really, it tries to extract meaning from web pages that really aren't engineered to store that kind of information.
Google is also an application. The Semantic Web is all about building the infrastructure so applications like Google don't have to chase the holy grail of AI to become more than a hack. Think of the Semantic Web as the layer underneath Google.
Re:Opposing view (Score:2, Interesting)
In order to prove that syllogisms are flawed, Clay presents examples of common English statements, and attempts to arrive at flawed deductions. Such flaws only work for Shirky due to the ambiguity of the English language.
In reality, a semantic web would neither store nor organize data according to the loose ambiguities of English. Rather, such information would need to be highly structured, using a formal system, in order for the accuracy of syllogisms to work.
As an example, let me examine a sentence that appears within a technical specification of a project I'm working on:
If this sentence were to be placed on the semantic web, it would be useless, given the ambiguity of several words and contexts. Instead, the meaning of each phrase, clause, and word would need to be made fully explicit using a formal semantic representation. Such a representation might be based on a hierarchical data structure such as XML.
If the above sentence were to be fully clarified, it would appear as:
Obviously this structure is much larger, but it contains all of the information necessary to resolve the sentence's ambiguities.
The above structure could also be expressed simply in XML. To examine a fragment of the above structure:
This would most likely appear using a structured representation such as:
(target)
(set)
(scope)entire(/scope)
(members)
(membertype)financial institution(/membertype)
(instancetypes)
(type)actual(/type)
(type)theoretical(/type)
(/instancetypes)
(/members)
(/set)
(/target)
The Slashot "comments" field is extremely broken, so I've been forced to use parentheses and omit indentation.
Isn't it funny how the english sentence fragment is so much easier for humans to understand, even though both representations contain the same information? It's amazing what our brains do "automatically" by operating under certain contexts. Similarly, a machine will have much greater ease in understanding and processing the formalized structure, in cases where it wouldn't even be able to guess at the corresponding english fragment (Well, it would be able to guess, but with hilarious results. What's that, a piece of toast rules over Utah?)
No doubt, translating normal human english sentences into a semantic web will be a lengthy and complicated process. But some mitigating factors:
Re:Statistical text analysis killed semweb (Score:3, Interesting)
Statistical methods excel at query relevance, not ontological interpretation. If the latter were the case, Google would be auto-constructing DMOZ instead of seeding page rank with it.
And the bigger problem: Trust (Score:3, Interesting)
Trust is one of the major stumbling blocks of semantic applications and automatic knowledge management issues.
The need for information management pops up again. (Score:5, Interesting)
Remember the CVS story a couple of days before? it's information management: http://slashdot.org/comments.pl?sid=123076&cid=10
WinFS is also about information management: http://slashdot.org/comments.pl?sid=121101&cid=10
The story that the Evolution e-mail client offers the e-mail data as a data model separate from the application? another information management issue.
The web? information management issue.
Distributed databases? information management issue.
Web search engines? information management issue.
Windows search tool? information management issue.
The Windows registry? information management issue.
The unix etc directory? information management issue.
Enterprise workflows? again, an information management issue. That's why there is no general workflow solution accepted and used worldwide.
Dynamic web site contents? information management issue.
The semantic web? another information management issue!
As you can see, from the numerous examples given above, all that an operating system should do, but no one does, is that it must manage information instead of files. If that is coupled with a distributed networked environment, 90% of the world's software would be considered obsolete overnight and the productivity and fun from using computers will increase 10fold.
If any open source developer is reading this, you may contact me for a private discussion on the idea. THIS IS OPEN SOURCE'S BIGGEST CHANCE TO LEAD THE TECHNOLOGICAL RACE!
Re:interesting technology... (Score:2, Interesting)
Re:What Does 42 Mean for Privacy? (Score:3, Interesting)
My point in the virtual vs. real persona is that you cannot expect the same behavior patterns from the same people given totally different situations. My killing your character in an online death-match does not mean I would be unethical enough to kill you. Likewise, if I pick up trinkets from the monsters you have slain (clearly, they are not my spoils to take), this does not mean that I will take tips off of tables at a restaurant.
Similarly, most of my 'online' activity is done from home. That does not mean that a symantec web is designed to tell the difference. In fact, just the opposite. It's designed to merge all data that's available on me into a single profile. Again, this could be misleading. If I spend 3 hours (average) per day gaming, does this make me less capable of doing my job? Maybe, maybe not. Would this change the way my employer perceives my performance? Probably, yes.
The other point which I think you are trying to make, is that if the data is out there, then it can already be searched out from other means already. This may be the case, but not necessarily.
Given a much more personal example: If my cross-identy is posted by a friend on an obscure site, Google may pick that up. If you then trace my cross-identity into the online world, you will find many, many postings - as well as political views (mostly by the name you see me posting under now). My politics definately don't agree with those whom pay my salary. Would they hold these politics against me if they were easily traced? I don't know. I honestly don't want to find out. Point being, the symantic web (if working) would quickly link me with my politics.
My greater fear, it would be just as easy for an advertiser to do this (not that they don't already to some extent), it would just be even easier. The only benefit? I may stop getting ads for things I don't need.
Re:Google can leverage its search (Score:3, Interesting)
Any form of information found on the web, from whatever trusted source, needs to be evaluated on the likeliness that it is true. From this likeliness, you can start reasoning and finally come up with a conclusion plus a degree of belief in that conclusion, but you will not be able to state that an assumption is absolutely true of false. As crisp logic only leads to valid conclusions assuming absolute truth or falsehood of its assumptions, any conclusion drawn from that meta-assumption is invalid, or at best unqualified.
No, the abberation called fuzzy logic is no solution
Enter the world of Bayesian reasoning. Here the truth of a proposition is never absolutely true or false, there are only degrees of belief and a system of systematic and consistent calculations to derive the likeliness of conclusions in the presence of uncertainty, plus a method to add new evidence to the calculations. Take a simple crisp assumption 'The sun always comes up in the morning'. For a semantic webber this statement is either true or false, and whenever two trusted sites claim two opposing views on the matter, the human operator needs to fix the inconsistency. A Bayesian webber might start to reason first: okay, first in the absence of any information I will assign an observation of true to the assertion, and an observation of false. This is my informationless prior and makes the likelihood 50%. Then I'm going to count: every time the sun has come up in the morning I count one for truth of the assertion. If it didn't I count one for falsehood. As I don't remember it ever not happening (and I would have noticed!) I can claim about the number of days I have lived to the truth of the assertion. That's about 4 9's of truths. Now I can ask someone else if they ever saw the sun not coming up. Assuming that I trust them for 95% to give me the correct answer, I can easily add a few extra nines to my belief in the assertion. Also reading some physics books adds to my belief, up to the point that it will take quite a lot of conflicting evidence to make me doubt that particular assertion.
It might be interesting to note that from this strong belief in the assertion I can actually deduce that somebody that tells me otherwise is very likely lying to me, and I should watch whatever the person says. A semantic web will fly flat on its face when there are conflicting pieces of information or outright lies on 'trusted' webpages
Note that the two approaches are completely at odds: for the crisp logic approach everything is either true or false, for the bayesian logic approach nothing is purely true or false(*). The Bayesian approach is well-known, but can easily lead to computational explosions. However, it seems to be the only way to reason in a world where evidence can (will) be contradictory and assumptions cannot be trusted. Without a consistent framework of reasoning with uncertainty (and the Bayesian framework is provably the only consistent one), the semantic web will be yet another failure of AI.
(*) Bayesian probabilities can be completely true or false (1 or 0), but no-one in his right mind would do that because from that there is no mathematical way to change your belief, 20 9's should be enough for anybody.
SemWeb == Huge Prolog program (Score:2, Interesting)
Re:Actually, Google is a search engine (Score:3, Interesting)
JVN quote. That was his exact point. You can model very restricted subsets successfully, but the whole thing is too much to encode. I've no problem with designing data structures. Its just when someone says data structure solve the grand AI problem that I have an issue.
Sure, you want to do an XML schema for books - go ahead. For CD's sure. In fact, for any domain. Although bear in mind the documentation of the API is going to get bigger and bigger until it is unmanageable (or you end up with natural language, and we are back where you started with, using IR techniques...).
Re:"Where's some semantic web software?" (Score:3, Interesting)
The God Emperor of XML, Tim Bray, doesn't seem to know of any such software so he posted a challenge [tbray.org].