Follow Slashdot stories on Twitter


Forgot your password?
Check out the new SourceForge HTML5 internet speed test! No Flash necessary and runs on all devices. ×
Links IBM The Internet

Broken Links No More? 212

johndoejersey writes "Students in England have developed a tool which could bring the end to broken links. Peridot, developed by UK intern students at IBM scans company weblinks and replaces outdated information with other relevant documents and links. IBM have already filed 2 patents for the project. The students said Peridot could protect companies by spotting links to sites that have been removed, or which point to wholly unsuitable content. 'Peridot could lead to a world where there are no more broken links,' James Bell, computer science student at the University of Warwick, told BBC News Online. Here is another story on it." See also the BBC story.
This discussion has been archived. No new comments can be posted.

Broken Links No More?

Comments Filter:
  • by Anonymous Coward on Friday September 24, 2004 @09:36AM (#10339306)
    There are two parts to this tool, one of which is bad quite and one of which is quite good.

    First, replacing links. This is a rather quite bad idea. Here's why, with an example.

    In general, we can all agree that the technology behind Google is pretty impressive. It has its own "More Pages Like This" feature, which we can assume is at least somewhat similar to this one. Complex content analysis amoung billions of pages, to determine which are similar and which are different.

    So, suppose we had a link to Major League Baseball, [] on our page. And suppose, for whatever reason, that their site went away (perhaps a few more players' strikes?).

    Well, what does Google suggest as a replacement? Check it out here [].

    First the National Football League (NFL), then the National Basketball Association (NBA), and then the National Hockey League (NHL). Followed by the ESPN sports network, and NASCAR racing.

    Obviously if wanted to link to a site about baseball, all of those (other than ESPN) are really entirely irrelevant.

    But if we wanted to link to a site about professional sports organizations, all of those (other than ESPN) are QUITE relevant.

    Can this software know our intent?


    You really have to question the ability of machines to select relevant links.

    The situation is this: If someone goes to the trouble to manually create links in the first place, those should not be automatically changed to other sites that some computer program thinks may be related. Links shouldn't be inserted automatically; if someone needs more information on something you haven't linked to, they can use a search engine. And then your company isn't liable to look idiotic by linking to irrelevant sites.

    Now, the other aspect of this product.

    Removing dead or changed links is quite another matter. Automated removal of links is a great idea and quite useful. For example, consider when someone's domain name expires and it is taken over by a porn site. It'd be great to have a program that automatically removes links to it from your site. Like this tool, this could be based on a percentage of changed content--if the content changes significantly, remove the link quickly and automatically. If the content changes some intermittent amount, flag the link as needing review by the webmaster.

    But in those both case, the software should present the webmaster with a list of such questionable links, those it has removed from the site temporarily, and then allow the webmaster to select replacement links.

    Manually. With relevance.
    • by GigsVT ( 208848 ) on Friday September 24, 2004 @09:42AM (#10339360) Journal
      The "related" search isn't what you should be looking at.

      Try this. []
    • by TheRealMindChild ( 743925 ) on Friday September 24, 2004 @09:48AM (#10339444) Homepage Journal
      Actually, I think something pretty simple that should happen (in the context of search engine links, not links in general) is that when a visitor searches google, and I click one of the returned links, google should do something like queue that perticular page for respidering. Make it so a page cant enter that queue more then say once a week, and you will find that we come up with less and less broken links.

      While this might seem like a LOT for google to be doing on the backend, I would have to think that a majority of the public ends up visiting the same 5-10% of the the internet each day (number pulled from my ass, but an educated guess at least).
    • by Anonymous Coward
      heh.... and what if "" for whatever reason doesn't respond ... the third thing on the list is ...
    • Removing dead or changed links is quite another matter.

      Here's another proposal: a tool that finds linked redirects and updates the link to the new URL. Even then, requiring individual approval for each change seems like a sensible precaution.

      Although, there would be seeing something elegant about seeing all the old links in the /. archives change to wherever the new site is now.

    • Corporate web sites rarely link to the outside for fear of implied endorsement. I imagine the best replacement links would also come from within the company's site. Then, even if the material is unrelated, at least you can attest that it doesn't break any policies or (worse) promote the competition.

      Regardless, it would be a simple adjustment to have the tool replace only links to the same site.
    • by dwerg ( 58450 ) on Friday September 24, 2004 @10:01AM (#10339538) Homepage Journal
      Sadly I can't get into details, but their not using technology like the 'related' functionality in Google. They try to find the document that was previously on the other end of the dead link, so the link will never be rewritten to something vaguely related to the original document.

      The reason why they want to replace the links manually is because some webmasters have to manage thousands op pages and don't want to press the 'ok' button every time the system detected a change.
      • by shawn(at)fsu ( 447153 ) on Friday September 24, 2004 @10:21AM (#10339699) Homepage
        After RTFA it doesn't seem like a fair comparison to say it's like google's "related" or Verisign '
        "product", this looks like a technology a webmaster would use on there own site. It also gives them they option to accept the suggestion or not. This could be really good for corporation with large intranet sites as webmaster leaves documents constanly get moved etc.

        I think had the original poster read the article they wouldn't have gone of half cocked. IBM must also be somehwat confident that this is new technology or else they wouldn't have filled two patents for it.
    • Obviously if wanted to link to a site about baseball, all of those (other than ESPN) are really entirely irrelevant.

      Yep. Because Major League Baseball has strong conceptual similarity to several other concepts: the game of baseball, professional sports, American culture, and others. Granted some are more specific than others, but that's a pretty tough judgment call that depends on the context in which the original link occurred. If I link to from a site about baseball, then it means something diff
    • That's Mr. Patented Bad Idea Jeans to you! (I'm afraid to RTFA to see how trival their patents might be.)
    • This could be interesting. This is some kind of "autolinking", and I guess like language it would change and evolve over time. So instead of linking to hard urls, one would link to abstract ideas. We do this today in our speach when we talk in America today about "our president" we mean George W. Bush. But in another country, or even again in the US the context could be different to mean something like the president of our company or club. So in the future we will have these intelilinks like: Weapons o []
    • Well, what does Google suggest as a replacement? Check it out here. First the National Football League (NFL), then the National Basketball Association (NBA), and then the National Hockey League (NHL). Followed by the ESPN sports network, and NASCAR racing.

      Google worked perfectly, isn't it obvious? MLB is not a sport; it is the corporation that is related to a sport, that controls the major professional players, but it is *not* the sport itself. You wanted to find similar items, and so Google brought up
    • This goes off the assumption that it just uses the link name or address to find the page.

      Basically when a page is indexed by a search engine such as google, the first step is to create a document vector from the document based on the repetition of words (terms) and how common these words are (ie. list of TFIDF values -- term frequency * inverse document frequency).

      Anyway, this document vector is what is compared against by the search engine to find matches (which is how google can return results is 0.14 s
  • Great (Score:5, Insightful)

    by gowen ( 141411 ) <> on Friday September 24, 2004 @09:38AM (#10339320) Homepage Journal
    Peridot could lead to a world where there are no more broken links,
    ... just links that don't got where the author intended. Gee, thats ... just great.

    Hang on. On similar lines, I've a great idea. Suppose I type a nonexistent hostname into my browser. Wouldn't it be good if the DNS server just gave me its best guess instead of an error message. Or some kind of Site Finding search engine. That'd be even better than ... :)
    • Re:Great (Score:5, Insightful)

      by rokzy ( 687636 ) on Friday September 24, 2004 @09:51AM (#10339466)
      I agree it's a bad idea and is imo looking in the wrong direction.

      I want things to be LESS tolerant of mistakes, not more. this is why the web is so fucked up. when people can get away with absolute shit, why produce anything better than shit?
    • mean while at Verisign...

      "Hey guys, we have grass roots support, check out slashdot!"

    • Re:Great (Score:3, Informative)

      by tannnk ( 810257 )
      Hey... We had this [] kind of features on internet before
  • by underpar ( 792569 ) on Friday September 24, 2004 @09:39AM (#10339329) Homepage
    A team in the Netherlands built an application that listens to contact centre conversations, picks out relevant keywords and automatically prompts the call centre agent with possible answers.

    Does this app take the form of a paper clip? Because that would be a great idea!
    • scary enough... (Score:3, Informative)

      by Rev.LoveJoy ( 136856 )
      FrontPage has been able to "Scan your web site for broken links" since it first came out in ... what 1997?

      Clippy indeed, must be a slow news day,
      - RLJ

  • Semantic Web? (Score:5, Insightful)

    by jarich ( 733129 ) on Friday September 24, 2004 @09:39AM (#10339331) Homepage Journal
    Wouldn't this idea work a lot better with semantic web markup attached to links and also to intranet pages?
    • Parent poster is exactly right. The semantic web is designed exactly for just this kind of thing, and would drastically reduce the amount of computing power needed to do it.

      For a good discussion of the semantic web, and why we need to get going and build it, read the relevant chapter in The Unfinished Revolution [] by Michael Dertouzos. I didn't quite understand what Tim Berners-Lee was getting at when he described the semantic web in Weaving the Web []. Dertouzos explains it better, I think.

      I had an idea

  • well (Score:5, Funny)

    by Anonymous Coward on Friday September 24, 2004 @09:39AM (#10339334)
    I think the link is broken... :)
  • by greppling ( 601175 ) on Friday September 24, 2004 @09:40AM (#10339337)
    ...would be good enough for me. I find it really annoying how many of the bookmarks I don't use often are broken after about a year or so.
  • by alta ( 1263 ) on Friday September 24, 2004 @09:40AM (#10339342) Homepage Journal
    My biggest problem is when I follow a link to a website that's no longer there. Yeah, moved pages happen, but I don't think they happen as often as deleted pages, expired domains, deleted websites, etc.
  • by blankman ( 94269 ) <blankman42@gma i l .com> on Friday September 24, 2004 @09:41AM (#10339347) Homepage
    This sounds a little like SiteFinder from Verisign. Click a broken link and isntead of a helpful error message you get whatever content IBM thinks is appropriate. Certainly this could be useful, but it could also end up as just another vehicle for advertising.
    • The big difference here is that in the case of SiteFinder, Verisign had control over where you ended up for basically the entire internet. This seems like it would be the type of thing that would run as an Apache mod that would get invoked when a 404 gets returned, and so would only affect that particular site. There's a big difference between going to and getting redirected to like this would probably do, and going to
    • If you read the article (the BBC one, which is the only link in there with any relevant information) you'll find that's not how it works. It alerts the webmaster and suggests a replacement, rather than randomly "fixing" other people's pages.
  • Not Entirely New (Score:4, Informative)

    by terrencefw ( 605681 ) <slashdot.jamesholden@net> on Friday September 24, 2004 @09:41AM (#10339351) Homepage
    I've seen lots of site that return search results based on bits of the broken link instead of 404's.

    Suppose you have broken link, some sites return a list of search results from within '' matching 'foo' or 'bar'. Quite clever, and much more useful than a plain old 'page not found' error.

    This just takes that one step further by doing the searching at the referring end instad.

  • The Slashdot use? (Score:3, Interesting)

    by makomk ( 752139 ) on Friday September 24, 2004 @09:42AM (#10339361) Journal
    I actually considered whether it would be possible to write some code to detect if linked-to content has been replaced. The reason I was interested was to make it impossible for someone to put up a copy of a slashdotted page, link to it in a posting, and then substitute it for a copy of goatse once they'd been moderated up.

    I decided it'd be too hard for software to decide whether a change was significant. I wonder how this software does it - presumably, you can change the threshold?

  • You can read more information about this process here. []
  • worrying (Score:5, Insightful)

    by TwistedSpring ( 594284 ) on Friday September 24, 2004 @09:42AM (#10339367) Homepage
    "Peridot could lead to a world where there are no more broken links". Yes, it could. Peridot could also lead to a world where broken links are not manually and intelligently spotted and repaired, but automatically repaired. Automatic resolution of what a link "ought" to point to is never going to be accurate (look at search engines), and could make a company website a minefield of confusion and frustration for the user.

    Only time will tell, I suppose.
    • Imagine a case where a broken link is pointed to another link, which later itself becomes a broken link, and so might even be possible that somehow the chain loops back on itself at some point. One thing I've realized in my career is that if you handle an error too gracefully, no one bothers to fix it. I prefer to have errors cause enough of a problem that there is feedback to fix it.
    • You know what, none of that link stuff worries me one bit. Links are bound to be irrelevant/stupid/broken unless someone really cares about them and monitors them manually.

      No, the big worry for me is PATENTS. What the hell are they patenting? What is the Big Idea here that deserves a patent? This is scary stuff. What, do we have to find prior art for every stupid idea someone decides to patent? The answer is "yes." We are all out of business if we let this continue. Support the EFF! Kill this stup
      • I personally think there's no singular mind at work on it, it's just one IBMer trying to get a patent listed on their resume and their manager trying to look important.

        • I personally think there's no singular mind at work on it, it's just one IBMer trying to get a patent listed on their resume and their manager trying to look important.

          I actually think it's funny that people will brag about how many patents their division or company received in the last year. After seeing the kinds of crap that get patents over the last 10 years or so, I'm not likely to be impressed, regardless of their numbers. In fact, a high number is more likely to be indicative of a large number o

  • Websites need to be useful before I start caring about broken links. I can think of any number of sites that started off with the best of intentions, but never quite live up to being useful.

    From bad layout, to missing options, to obscure names for common links, it seems that people are actively trying to hide crap from the end user, making their website utterly worthless.

    Can we devise a tool that fixes this problem first?
  • by Illserve ( 56215 ) on Friday September 24, 2004 @09:42AM (#10339374)
    Some algorithm cruising through my website, rearranging files as it sees fit?

    Sounds like a recipe for utter disaster in the worst case, and a source of mildly embarassing incidents at best.

    How about this algorithm just report dead links to a human instead of trying too hard to be clever?

    This sounds like someone had to come up with a final project, and settled on this one.

  • yawn (Score:2, Insightful)

    Look, I'm not being a troll or flamebait or whatever, but seriously, I've had enough of this fucking pipedream chasing crap that gets posted to BBc news and then swiftly chucked up on slashDot.

    The whole BBc News Technology section reminds me of the 'Tomorrows World' program when it was in full swing, saying how everything could be 'the next big thing' and that we'd likely se eit on shop shelves and in every home 'in a year'. Why do these people never learn that so much of this is just press release bullshi
    • Yeah, because the science and technology sections of so many other news gathering organisations are so superior to the BBC's, aren't they?

      Listen, let me explain this in simple terms: BBC News caters to a wide audience made up of mainly lay people and, as such, it pitches its articles accordingly. It's not New Scientist, Nature, The Lancet or whatever academic publication that's on your reading list and it doesn't pretend to be. It doesn't try to blind its readers with science because it's readers aren't al
      • I agree whole heartedly... It wasn't a rant at BBC... it was a rant at the slashdot 'ooo lets post it and not change a thing' attitude... .i don't see why someone can't put a more realistic healdine on what is supposed to be a more technically apt and trained website...
  • And... ? (Score:5, Insightful)

    by Doesn't_Comment_Code ( 692510 ) on Friday September 24, 2004 @09:43AM (#10339385)
    Maybe I'm being overly naive, but checking for broken links doesn't seem all that spectacular to me. It wouldn't take long to write a script to find all the broken links on a page.

    The only parts that seemed worth while are replacing the links automatically, and testing if links are relevant.

    I'm not so sure I'd trust a computer to do those things though. I'd much rather have the links flagged and checked by a human.
  • CMS (Score:5, Insightful)

    by Anonymous Coward on Friday September 24, 2004 @09:45AM (#10339404)
    Any good Content Management System should already take care of any internal broken links automatically, or notify the webmaster so he'll be able to take care of it manually (in the case of page deletion, etc).

    The only kind of people who'd go out of their way to use this software, probably have already use some sort of CMS.
  • by tod_miller ( 792541 ) on Friday September 24, 2004 @09:45AM (#10339408) Journal
    A link points to document X.

    If document X moves, and the link is invalid, a search for the link might actually find document X, and therefore, you have your benefit, and you would have saved a 404.

    However - if a document becomes deprecated and deleted, then how can you assume the link is valid?

    Or indeed, if the document has no relevant substitute.

    A genealogy providing a link to another Willian Wallace wouldn't be good news if the original page went missing.

    A better system is automated 404 alerting to the webservers administrator.

    A bad link gets hit, bam, what document, from where. You can work things out intelligently, not automatically.

    I think this is silly, perhaps grasping at straws, I see no reason why we would replace all our links to google 'I feel lucky' searches, so why do something like this?

    This is the essence of what they have, and all they have done is coulded the search IP field (which is important) with 2 more patents, again increasing costs and endangering open source innovation, the true innovative playing field.

    Of course, I could be wrong.
    • This is the essence of what they have, and all they have done is coulded the search IP field (which is important) with 2 more patents

      Anyone knows the number of those patents, I'd actually be interrested in reading them... to see if what my company is developping right now will be infringing...

    • Ideally, cool URIs don't change [], but in the real world they do.

      If document X moves and the link is invalid, you should be serving an HTTP 301 Permanent Redirect and well behaved user agents will update their bookmarks, and well behaved content management systems will update their code. If document X is gone, you should be serving an HTTP 410 Gone.

      Ideally, 404 is supposed to mean that the web server has never heard of the file in question before, but in the real world...
    • This isn't a one-time, forever-and-ever-amen technology. You start with an automatic link-checker and link-fixer. Then you add features like "list all the changes so an editor can filter the results", then you move to "direct potential changes to a team of experts", and so forth and so on. The idea is pretty good. When you're a Big Company with a huge website with thousands of links, having this automagic tool is a lot better than having (unprofessional) dead links.

      I, personally, hate dead links with a p

  • Obligatory RTFA (Score:4, Insightful)

    by acvh ( 120205 ) <> on Friday September 24, 2004 @09:45AM (#10339410) Homepage
    this isn't about replacing links on the internet as a whole... it's about replacing links on your company website, or at least reviewing those links.

    not everything that happens in the world is an attempt by big brother to steer internet traffic to verisign or microsoft.

    • From the fucking summary.

      Broken Links No More? "Students in England have developed a tool which could bring the end to broken links. Peridot............Peridot could lead to a world where there are no more broken links,'"

      I'll troll to hell for this, but I could care less, and I have no problems standing up for what I say. This is terribly irresponsible journalism. No fucking where in the summary does it mention intranet or corporate websites. A world would be pretty global, would it not? Again, the hea

  • by Nazmun ( 590998 ) on Friday September 24, 2004 @09:46AM (#10339413) Homepage
    Spyware/Adware and IE already give you search results and links. The only difference is that this automatically places you at a different link without a choice.
  • Slashdotted (Score:4, Funny)

    by genneth ( 649285 ) on Friday September 24, 2004 @09:48AM (#10339431) Homepage
    Damn you slashdotters!!! I work at IBM and the intranet server is down! I can't believe you've managed to cause the automatic load-balancer to kill the intranet in favour of a slashdot article.

    Damn you!!

    And purple hatstands
  • by tod_miller ( 792541 ) on Friday September 24, 2004 @09:50AM (#10339460) Journal
    some over funded jumped up interns have developed a high tech, method and software and system to stop the slashdot effect.

    Each webserver will return a redirect to a google cache lookup for itself if the load sever gets too high.

    1: Stupid idea
    2: Patent
    3: Wait 'til someone nudges at your generously worded patent
    4: happily license this unrelated technology to keep thier VC peeps in the green.
  • Simple solution (Score:3, Insightful)

    by mirko ( 198274 ) on Friday September 24, 2004 @09:50AM (#10339464) Journal
    ErrorDocument 404
    Where parse the wanted URL and ask an indexing engin to find the most relevant page associated with the query...
    • ErrorDocument 404

      That would be very useful if I could persuade everyone I link to to do it. However, since I can't, a solution that runs on the server where the links reside, not the linked content, is much more useful.
  • Vulnerability? (Score:3, Informative)

    by darkmeridian ( 119044 ) <william.chuang@gm[ ].com ['ail' in gap]> on Friday September 24, 2004 @09:55AM (#10339496) Homepage
    Remember Google-hacks at []? Basically, since Google effectively snoops millions of servers, you can use this information to break into servers and get information. Having an internal feature that connects broken links to real pages may be orders of magnitudes worse. What if I imaginatively "linked" to a made-up URL to see what's on your servers? This could be bad news if it's effectively done.
  • Prior Art (Score:2, Insightful)

    by Bill Dimm ( 463823 )
    They think there is something to patent here? Seems like there should be prior art all over the place. At [] we have been using software to repair our links to articles for years.
  • How about this, lets find a better way to eliminate bad links. Have a bot scan your companies, web site, and every time you find a link to an outside source, save that page to your servers, if the link gets screwed up, you can replace it with a link to the saved web page in your server until you can do something about it.

    This would not work with large web sites, but if it is just a link to a how-to guide or something small like that this would work.
  • How can you tell if the link's changed content is an update that's OK, or an update that's "not ok"? If the tool could do that, it could create a site of links related to whatever, kinda like google, but it sounds like it would be a whole level smarter than google somehow.
  • No thanks (Score:3, Insightful)

    by stratjakt ( 596332 ) on Friday September 24, 2004 @10:09AM (#10339605) Journal
    I'd prefer a more helpful 404 page, maybe with some links to the homepage or main sections of the site on it.

    Sort of a "cannot find hello.jpg, click here to go back to the main page".

    My point being, if the document I'm looking for is not there, I want to know it's not there. I don't want to read something else, thinking it's what I meant to read.

    Usually when I'm googling around and clicking stuff I'm looking for the answer to some coding or computer related problem. I don't want to click on a link for "configuring Samba 3.0 with AD support", and wind up on a "Configuring Samba 2.2 with LDAP" and waste my time following bad advice.
    • My point being, if the document I'm looking for is not there, I want to know it's not there. I don't want to read something else, thinking it's what I meant to read.

      Exactly. If I put up a web page with links to other sites, I want people to email me saying "hey this link is broken".

      Does a computer understand satire? If I've linked to a satirical, subtle, pro-aborition piece, is the algorith smart enough to know that, or will it relink to a anti-abortion site? If I link to a serious anti-war speech mad
  • German readers... (Score:5, Informative)

    by dukoids ( 194954 ) on Friday September 24, 2004 @10:12AM (#10339630)
    may want to take a look at the master's thesis of Nils Malzahn (from 2003, in German) to see (in detail) how this actually can work: _2003a.pdf []

    Basically, the thesis evaluates different methods to build a kind of "finger-print" of a page. The finger print is used to find the page with google if it is gone, or has changed significantly.

    The internet wayback machine was used to learn distinguishing disappeared pages from pages changing slightly over the time.

  • by theMerovingian ( 722983 ) on Friday September 24, 2004 @10:15AM (#10339654) Journal

    Wonder where it would send me if were down?

    *shudder* []

    (disclaimer: no, I didn't actually look to see what's on that site)

  • by tomhudson ( 43916 ) <`barbara.hudson' ...'> on Friday September 24, 2004 @10:17AM (#10339664) Journal
    The BOFH has a better solution isode_31/ []
    "Ladies und Gentlemans, I present to you... The Newsmaker!" the PFY chirps happily, waving his hand at his squid plug-in.

    "Which does?"

    "Give me a news headline, anything, no matter how ridiculous!"

    "Scientists discover intelligent life in Redmond!"

    >clickety< >clickety< >click< >clickety< >tap< >tap< >clickety< ... >click<

    "Right, now Google for it!"

    I dutifully fire up Google, bash in Redmond and Intelligence, and roger me senseless with a full height drive if the first 10 hits don't point show up the headline I've just created, pointing at Time Warner, Yahoo News, all the greats...

    "Interesting - injecting false links into Google to point at news sites. I like it!"

    "Ahem," the PFY interrupts. "Click on one of the links."

    I do so, and grab that hard drive for a second go if the site concerned doesn't come up with the headline in question!

    "You hacked the news site?"

    "Not at all! I used the base idea behind banner blocking to remove the lead headline of a news site and insert my headline instead. You can even add a picture if you want, but obviously only for things that are possible to prove."

    "So will this work for all the news sites listed?"

    "Oh yes. And more importantly, the various search sites as well. So no matter what common search engine you use, the proxy discards the first 100 matches and inserts 100 of its own 'matches' instead."
    Honestly, which of the two is more deserving of patent protectin as an "innovative, non-obvious invention"?
  • SED? (Score:3, Informative)

    by Kenja ( 541830 ) on Friday September 24, 2004 @10:23AM (#10339722)
    So, they've invented SED? Cause thats what I've been using for years to replace old/broken links. A simple script using the netsaint/nagios service tests can check if a link is still good and then build a list of bad ones to be replaced by script number two using SED.
  • Perfectly good and useful technology that everyone can use, and some asshat company has to f'n jump on the bandwagon and patent it in hopes to make a fucking dime... give me a break.

    I weep for the future of technology if this is what it's gonna come down to.
  • Instead (Score:3, Informative)

    by Dr. Stavros ( 808432 ) on Friday September 24, 2004 @10:35AM (#10339835) Homepage
    Just use the W3C's link-checker [].
  • "IBM have already filed 2 patents for the project."

    More evidence that IBM isn't really committed to an open/free philosophy.
  • What if (and the chances are high) I store the URLs in some DB, let's add a proprietary format + compression for fun here, and then let's fetch this URLs by a script depending on user entered parameters. Imagine full-text searching or stuff. How is it at all possible to write a universal checking tool here??? Man, I wish people stop wasting energy on stuff like "automatic C++ to Java converter" and similar bullshit when every semi-knowledgable person can instantly say that the thing is no go.
  • by Quickening ( 15069 ) on Friday September 24, 2004 @10:43AM (#10339897) Homepage
    For those running a real browser, just make this a link, preferably in your personal tool bar.

    javascript:Qr=document.URL;if(Qr=='about:blank') {v oid(Qr=prompt('Url...',''))};if(Qr)location.href='*/'+escape(Qr)

    Now when I click on a link that isn't there, I select my Archive search button and it shows me the Wayback Machine's history of that link. Of course it works only if the url hasn't been modified by the server. If it has it's another couple steps (copy link, ^T, archive search, paste url in pop-up dialog)
  • by Flamefly ( 816285 ) on Friday September 24, 2004 @10:50AM (#10339956)
    You could create your links using googles im feeling lucky feature, assuming it was just a generic link site looking for interesting sites rather then specific articles.

    e.g: +For+Nerds&btnI=Google+Search

    And voila, you'll site will take you to the most popular related site to news for nerds, automagically, if slashdot died one day, another site would take it's place in the google rankings. FF.
  • It is, unsurprisingly, extremely easy to just write a script which checks if links are working and ignored them if they are working or, if they are not working, reports them to the admin and makes them into Not-Links in the page that actually gets posted. Although that might leave a few gaps in navigation, at least the gaps don't let people follow them to dead-ends. And, with the admin warned, they can be fixed promptly.
  • Firefox extension (Score:4, Insightful)

    by malx ( 7723 ) on Friday September 24, 2004 @11:13AM (#10340181)
    On a slightly related note, a Firefox extension that searched links ahead and removed the link rendering for those that return a 404 might be handy (albeit fairly evil).

    On a less related note, I've long been disappointed that some 300 series status codes in HTTP are so under-exploited, both by clients (e.g. automated bookmark management) and people running web sites.
  • Then they will get a very large problem with me... Large enough to get hardball tactics from my closet...
  • by smagruder ( 207953 ) <> on Friday September 24, 2004 @11:19AM (#10340227) Homepage
    I periodically run dead link checking software [] to perform this function with regards to my bookmarks, some of which I publish on my web sites.

    There are many things that happen to links, such as redirects, but to conclude that a link is down because you get a 4xx or 5xx HTTP response is extreme. Sometimes sites go down for a period of time for various reasons. Such a link replacement process would need to have some kind of forgiveness mechanism. Further, sometimes links move elsewhere without the benefit of redirects--this replacement process therefore shouldn't replace links with "related" content, but the same content that's moved to another spot.

    The bottom line is that the replacement process requires a step in the process where a human being reviews link change recommendations.
  • If broken links are a problem, maybe the html/http pair would better be shaped more acording the original Xanadu project. []

  • by SuperKendall ( 25149 ) * on Friday September 24, 2004 @11:30AM (#10340319)
    The "I'm feeling lucky" link.
  • network down (Score:4, Insightful)

    by Pragmatix ( 688158 ) on Friday September 24, 2004 @11:41AM (#10340408)
    Of course, if your links happen to go to a network that is experiencing a temporary outage, this tool would wreck havoc.

    Soon the target network would be back up, but all your links would be lost and randomly changed to something less useful. Good Invention!

  • A better idea (Score:2, Insightful)

    by pdamoc ( 771461 )
    Maybe it would be better if that smart program replaced the links with links from: []
    or maybe google cache.
    Then ofcourse it has to be smart enough to know it did that and replace the links back with the originals if they come online.
    Sometimes "broken links" can recover.
  • The first virus to modify this and replace all links to goatse in 5... 4.... 3...
  • Sounds like a another fanstasy too goo to be true :-)
  • by pangloss ( 25315 ) on Friday September 24, 2004 @03:17PM (#10343174) Journal
    There were two fellows at UC Berkeley (Phelps and Wilensky) who implemented the idea of "fingerprinting" web pages at least as far back as 2000. It was a non-trivial fingerprinting (i.e. not just MD5 hash of a web page).

    As far as I know, they haven't done any more recent work on this and the software is only available via [].

    A paper []

    I gather that the IBM effort is different in significant respects, but it certainly employs ideas from Phelps & Wilensky.

Try `stty 0' -- it works much better.