Catch up on stories from the past week (and beyond) at the Slashdot story archive

 



Forgot your password?
typodupeerror
×
The Internet

Altavista - Open Sourced UPDATED 155

A lot of people have got the story at ZDNet about Altavista's latest move. In their continued bid to re-cast themselves as a portal, they've decided to open the source code to the search engine. They have created a network, calling it the Altavista Affiliate Network, for obvious reasons. Join the network, run the the Altavista engine and be paid three cents per click-thru. 'Course I have to imagine it'll take some powerful machines to run this well, but we'll see.Update: 02/01 08:23 by H : ZD Net seems to have pulled the story - I did however get a letter from Altavista explaining what's going on. Click below to read more.

Hi there,

The new affiliate program is based on a syndicated model, where we are providing the HTML and search box interface to Web sites, large and small to enable their users to access AltaVista's premier services including search, stock quotes, language translation, multimedia, news and discussion group content. Users can choose from an array of search boxes that fit their personal brand. The search box then acts as a gateway for users to tap into our robust index. Those Web sites that choose to participate inAltaVista's Affiliate Network will receive three cents per click-through when their users access AltaVista branded services. To learn more about the affiliate program visit http://doc.altavista.com/affiliate/.

This program is not to be confused with the other products we provide that do allow customers to access our source code and build their own search products. We provide an array of tools that allow customers to create their own customized engines and can be accessed at http://doc.altavista.com/business_solutions/search_products/search_intranet/ intranet_intro.shtml.

This discussion has been archived. No new comments can be posted.

Altavista - Open Sourced

Comments Filter:
  • Yeah! Open source is getting more an more popular. I didn't see anything about the liscensing agreement though. Are they going to go with the GPL?

    kwsNI
  • by Markee ( 72201 ) on Tuesday February 01, 2000 @05:55AM (#1315406)
    The article doesn't mention that Altavista is going to open source theit source code. It says that the source code will be given to applicants who can present a "real" web site they are running.
    It would be great news if the source code would truly be GP-licensed or whatever OS license model Altavista would choose, but I doubt they will do that. Also remember that the search engine that you can obtain from Alta Vista is not the same as the one that's running their web site. It used to be downloadable before, and my information is that it does not scale as good as the AltaVista.com web page search engine does.
  • An evolving search engine.. cool! They need some way of continuously verifying links. I used to use Altavista when it first came out years ago but I quickly started getting large amounts of 404's. I've heard they've improved, but that seems to be a common problem with search engines. Maybe Open Source can fix that.

  • by Rombuu ( 22914 ) on Tuesday February 01, 2000 @05:58AM (#1315408)
    ..I needed something to do with that huge pile of Alphas I had just sitting in the corner gathering dust...


  • by Wah ( 30840 ) on Tuesday February 01, 2000 @05:58AM (#1315409) Homepage Journal
    and language translation service

    Give a man a babelfish, he understands for a day. GPL the babelfish, then embed it, and he gets a real cool palm app next year.

    (Note: the letters GPL do not appear in the article, nor is app a real word)
  • How I love a good search engine, not wasting bandwidth on ads/affiliate programs/webmail/shopping/etc/etc

    Do I hear ... [google.com]?

  • I could not find the code in Altavista site. I guess the code is not open in the sense of the Open Source Definition. If you need to sign an agreement with Altavista to get the code, then it has a somewhat liberal license but not enough to be called open. I hope I am wrong and this is not just another company trying to capitalize on a trendy term without honoring it.
  • Wonder what the minimum spec will be?
    I remember Altavista in 1995 was running on a machine with 4GB of memory.
  • by twit ( 60210 ) on Tuesday February 01, 2000 @05:59AM (#1315413) Homepage
    I think that this, and the Netscape Communicator/Mozilla effort, mark a sea change when it comes to software development. When a body of code had no value to a company, they used to quietly bury the body. Now, more and more companies release the code to the community, gaining a huge investment in goodwill.

    Philosophically, it's a move away from the Marxist conception of value (which is paradoxically de rigeur in US business circles), where anything requiring work gained in value. This isn't obviously false, but false it is. Value, at least in the capitalist system, is based on the ability to sell at a profit. If you cannot sell at a profit, either using the technology or the technology itself, then it's valueless.

    Most businessmen stick to the assumption that all the work put into this or that piece has made it valuable and worthy of protection. New businessmen are thinking it through - not everything that takes work is valuable, and protecting something valueless is a waste of effort. By open-sourcing the work, they turn a loss into a gain.

    --
  • You get 404s in every search engine, not just AltaVista. I keep a list of dead URLs that I find, and then submit 'em en masse to my fave engines. It takes about 2 minutes, and it helps keep stuff clean.

    Yeah, it would be nice if every URL always worked, but hey, it's the Internet.

  • I somehow doubt you could store a fraction of Babelfish's dictionary files on any Pilot out there. Maybe in a few years...
  • by sql*kitten ( 1359 ) on Tuesday February 01, 2000 @06:01AM (#1315416)
    Keyword matching searches, even with Alta Vista's context database, are clumsy and commoditised. There is simply no business value in the company considering what is now a non strategic asset (i.e. very hard to prevent a rival duplicating) as a key piece of intellectual property, when products such as Autonomy [autonomy.com] are using AI and Bayesian Inference to perform searches on large document sets at an accuracy Alta Vista can't touch.

    Having said that, note that Alta Vista are keeping their actual database to themselves - this is the one real asset other than their brand which they possess. Taking these two together, we see a core competence (i.e. leveraging them provides a return disproportionate to effort in relation to the market sector), which is now the basis of their revenue plan.

  • Do you ever have one of those brain farts and type in altavista.digital.com? I did that the other day and started laughing when I realized what I typed.

    Sorry, slightly OT - but I thought it was funny. BTW, altavista.digital.com still works.

    kwsNI

  • While I'm psyched to see this happening, a few things pop into my mind.

    Isn't Alta Vista on track to go public shortly? Without seeing their licensing terms, it's difficult to tell if this is a sincere move, or if it's just so they can be an "open source dot com". I guess we'll find out soon.

    Not to be critical, because I've always wanted to have a commercial quality search engine, but Alta Vista's pretty advanced and just about the fastest search engine on the internet. What are they looking to gain by doing this? They don't need more developers, unless they're looking at laying off their team in hopes of the community shouldering the development effort for them.

    And lastly, I doubt that they'll go with the GPL. No major vendor that's released their core product as opensource has had the guts to go fully to the GPL. But then I guess AltaVista's quite different than an operating system or application, as in it will still take years and millions of dollars for anyone to catch up with them in terms of eyeballs, which is where their money comes from these days...

    Maybe slashdot could start a search engine though. Just a privatized one, picking up things like story links, member home pages, resume's etc... It could open some possibilities, but I still can't see how any of them would benefit alta vista.
  • Given how intrusive search engines can be
    (you want to download every single file
    in the [venona.com]
    Cypherpunks Archives? That's about 100k and
    growing!), and how similar a lot of what they're
    doing is, it would be really nice if the search
    engines banded together and shared their raw data
    over a private extranet, rather than every single
    spider anyone with a spare PC decides to run
    pillaging my website in turn. It's not such
    a big deal for a well connected site like mine,
    but for people on the end of a 9.6kbps link in
    the developing world, search engine hits can
    impose a high burden, but one which must be
    borne to have one's content searchable.

    The sites could still differentiate themselves
    in spider technology by using their own custom
    formats, analysis, etc., but ideally, whenever
    one downloaded a page via http from an end-user
    server, it would be available to the other
    search engines automatically over private, high
    speed links. By doing this, they'd all be able
    to update more frequently, yet reduce overall
    load on the net as a whole.

    I suspect this will be more of a problem, not
    less one one, in the future, and despite
    the pitched competition in the search engine
    industry, it'd be nice to see them work together
    to improve the quality of the net as a whole.
    After all, it's not a zero sum game!
  • I believe a good solutoin for this problem is the "cached" version Google keeps in his own database... i always try to access the real link first.... but the backup is on much occasions the only solution....

    My $0.02

  • So maybe this version of Altavista isn't "true" open source since it's not the actual version used online but instead a different, downloaded program. Doesn't the fact that they're even taking the initiative to make any version of it open-source say a lot? I could be wrong but i dont see any *other* search engines doing this yet...
    Besides it's just neat that more and more big names are doing things like this. Sure there's a lot of open source software, and a lot of companies and organizations making more programs that way every day. But this is impressive in that it's an every-day name. I dont know anyone who is tech-oriented enough to turn the computer *on* who hasn't heard of Altavista. So teh fact it's becoming more 'mainstream' is definately something...
  • by jbrw ( 520 ) on Tuesday February 01, 2000 @06:09AM (#1315424) Homepage
    ...from what I can tell, at least.

    I think the journalist in the above article has got it all wrong. I can't see anything on the Altavista site regarding the source being opened up.

    What they are doing, however, is running an affiliate program that pays web site owners a commission (3c per click through, in this case) for each user that is referred to one of the various AltaVista search facilities from another web page (that has applied, and been approved, for this program).

    This is not anything particularly new - as it happens, GoTo.com has been running a very similar scheme for quite a while. GoTo.com's program, as well as AltaVista's, is managed by befree.net.

    So, you sign up, put the search boxes on your site, typically pointing to a unique url so they can track your referals, and start collecting money. You don't host their search engine - merely point to it.

    If, on the other hand, i've missed something, I would appreciate any pointers to the actual AltaVista source code.

    ...j
  • One fact which all the search engines must
    realize, as well as cache companies like
    Inktomi and Akamai, is that the Internet is
    becoming increasingly dynamically-generated,
    personalized, and transactional -- exactly the
    kind of content least suited for static
    spider-driven search engines and static cache
    technology.

    Perhaps this will be the first Internet
    subcategory to fall from vastly overinflated
    stock valuations due to technical change.
  • by Hal Roberts ( 5525 ) on Tuesday February 01, 2000 @06:10AM (#1315426) Homepage
    I think you misread the part about valid web sites. That part refers to the Affiliate Network program. In other words, you have to have a valid web site to join the Affiliate Network program.

    The article actually doesn't expand on the source code freedom beyond the mention in the title, which is more than a little frustrating.

  • This is interesting, but really all they are doindg is starting yet another affiliate program, albeit with a source code giveaway. As people have already mentioned, the source is not open in the GPL/BSD/artistic license sense, rather they're letting people who enter their affiliate program use it. This is almost identical to the model used by Infoseek and Excite, except that people get to compile the source, and they pay by being an advertising shill instead of just buying a license to use the search engine.

    Yours truly,
    Mr. X
  • The article doesn't mention that Altavista is going to open source theit source code. It says that the source code will be given to applicants who can present a "real" web site they are running.
    It would be great news if the source code would truly be GP-licensed or whatever OS license model Altavista would choose, but I doubt they will do that. Also remember that the search engine that you can obtain from Alta Vista is
    not the same as the one that's running their web site. It used to be downloadable before, and my information is that it does not scale as good as the AltaVista.com web page search engine does.


    Another little piece of info regarding the use of a search engine. Basically you need a large ammount of disk space (on the order of terabytes) to actually get a search engine up and running. You need that database or else when I want to look at ancient Zulu fingernail clippings I will not find them in your search engine. This really will not empower many people to do anything special at all.
  • Right now I think it's generally about 3 months or so before Altavista rechecks a link. All it would take to try lower the number of 404's you get is to lessen that time. But they're still growing, so they devote more of their bandwidth to finding new sites rather than continuously rechecking already indexed sites. How will opensourcing their software will change that?
  • Altavista has been offering its search engine (executables) for download for quite a while here [altavista.com]. It is not the same engine they use on their web site.
    Here [altavista.com] is the press release from Altavista.
  • The Altavista Affiliate Program Agreement [altavista.com] states:

    Affiliate may not alter the
    HTML code within the Program boxes. If changes to boxes are detected, all accrued payments to Affiliate will be canceled. This includes changes that affect functionality, performance or tracking capabilities of Affiliate Links.

    (Italics added for emphasis.)

    I don't really see how Altavista giving people some HTML source - no matter how "proprietary" - counts as them opening their source code to their search engine, which seems to be what the article is trying to imply. Many other sites - Lycos, for one - have had similar programs in the past, though the $0.03 per clickthrough sounds like a different twist.

    Chalk it up as effective marketing - they put the words "open" and "source" in the same sentence, and managed to generate the expected amount of talk about what is essentially a non-event. 'Course, I may have missed the "Download Altavista search engine source here!" link on their site, but I don't think so :-)

  • Actualy altavista hosts one of the larger e-commerce affiliate program, shopping.com. my company recently signed up for it a month or so ago, and the datafeed requirement to them is horrible.
    Altavista is deffinatly one of my favorite search's but if anybody messes with my google, i'll have to lay the smack down
    Gentleman, you can't fight in here, this is the war room..
  • Depends on how much you want to index. There were millions of URLs back in 1995, so of course AV need that much RAM. If you try to index the entire web, you'll probably need the same amount. However, if you decide to start indexing subsets of the web, you'll be able to make do with much less.
  • Comment removed based on user account deletion
  • by jbrw ( 520 ) on Tuesday February 01, 2000 @06:16AM (#1315437) Homepage
    Be an Internet Search Partner [altavista.com] (from the home page). Right down the bottom of that page, you'll see a link to the AltaVista Affiliate Network [altavista.com], which is what the article is talking about.

    T&C's, FAQs, etc., can be found at the second URL.

    ...j
  • ...and here is why: "To join the 'network,' a site must demonstrate that it is a valid, working Web site that is updated on an ongoing basis."

    but... "AltaVista plans to target the owners of personal home pages in the future." so maybe there is hope that they will start releasing the code to the rest of us soon.

  • Comment removed based on user account deletion
  • Do we get the super-powerful keyword-to-marketing engine that AV runs with our favorite doubleclick? Or is it a plug-in? Can we plug in the slash ad code (when it gets released? ;)

    Seriously, tho, I'll believe this has happened when I have code in hand running on my intranet, without co-branding or marketing.
  • I've been waiting for someone to opensource the Altavista search engine. Now I can finally put the old rusting Cray to use that's in the backyard.
  • Can't you read? From the article refrenced:

    "The portal will begin giving away the source code for its search engine..."

    Th above part says: they will give away source code.

    Now for the AND part (that means they will be doing both things).

    " ...and will start paying sites that successfully refer people to the portal."

    This part, with refrences to other parts of the article, is the part about who gets to be PAID. Noplace in the article does it say you have to be "worthy" in some way to get the source code.

    And this tripe gets a score of 3?

  • Sounds liek an idea for an open source project -- a distributed search engine. Each node indexes a small part of the net and shares the results with every node that requests it... then all we need is a few high volume portals to direct requests to the various member nodes (taking into account their relative speed and balancing the load) and we've got one potentially big, reliable, and fast search engine.

    Any takers?

    -- WhiskeyJack

  • When you don't preview you mistype a href.

    Sigh... Pecavi

  • by kuro5hin ( 8501 ) on Tuesday February 01, 2000 @06:26AM (#1315445) Homepage
    By kuro5hin, InaccurateNet News
    UPDATED February 1, 2000

    In a shameless attempt to gain attention from popular news sites that will post any story that includes the phrase "Open Source," internet portal site AltaVista announced that it will begin giving away the source code to it's search engine while actually doing no such thing.

    Today Altavista rolled out an affiliate program which "allows" web sites to include html that links to the altavista search engine. Altavista did not address the question of why this is interesting, when people have been including search engine textboxes on their pages since 1994. Instead they prominently featured the phrase "Open Source" in the press release title, and went on to not mention even once how "allowing users to include html" could be interpreted as "releasing the source code to it's search engine."

    You may still download a crippled trial version of Altavista's intranet search tools, which you may uncripple for a registration fee. But the bold maneuver of issuing a press release that uses the words "open source" is taking the internet by storm.

    "We see this press release as an unprecedented opportunity to leverage traffic from weblogs that don't do even the most rudimentary fact-checking," said an Altavista spokesman. "And we know for a fact that there are some very high traffic sites which auto-post any press release that uses the words 'open source,' without a human editor even being involved."

    The perl scripts which post content at the popular computer news site Slashdot declined to comment on the allegations that no human is involved in story posting anymore, saying only, "It looks like a hole in the GNU GPL [may allow] people to practically turn GNU-free software into proprietary software..."

    ----------------

    Note: This is intended mostly to be a flame at altavista, and to mildly poke fun at slashdot. Please take it for the humor it is. Thanks.
    --The Mgmt.

    ----------------

    Wish you could moderate the submission queue?

  • I agree... However, I can tell you it's not English - specific, it's the same in french...
  • The vast majority of people who use search engines like altavista don't want to index the whole web. They just want to index either their web sites or some database content. For instance, my company builds intranet libraries for financial services companies. Part of the work involves receiving large feeds of stories from various sources, and our customers want to do free text searches on those stories. We use oracle for our main database, but oracle is very bad at solving this particular problem, so we need to use another database (such as the alta vista search engine) to do the text indexing.

    For people like me, who are the vast majority of people who want to use the alta vista search engine, the open sourcing of the product (if they will be open sourcing it) is terrific news.

  • I'm amused how in English, "a fraction of" implies "a fraction less than zero", when of course "a fraction of" could be, for example, "eight thirds".

    I don't think that's right. A fraction of (since I do speak basically only English and it is my native tongue) refers to a part of. I have never heard that it means less than zero anywhere. Plus anything less than zero is negative and there are not many ways you can have negative quantities in terms of something like a dictionary unless the dictionary had a method of erasing memory engrams.
  • WIth the open-sourcing of a search engine (yes, i know there are others too..) does anyone thijnk much about say a distributed computing based search engine? A sort of spider on every computer, using unused network resources plus cpu cycles? This seems pretty theoretical right now plus i could see privacy concerns a mile away, but a "peoples" search engine could at the least be an interesting marketing gimmick...

    chimchim.
  • The vast majority of people who use search engines like altavista don't want to index the whole web. They just want to index either their web sites or some database content. For instance, my company builds intranet libraries for
    financial services companies. Part of the work involves receiving large feeds of stories from various sources, and our customers want to do free text searches on those stories. We use oracle for our main database, but oracle is very bad at
    solving this particular problem, so we need to use another database (such as the alta vista search engine) to do the text indexing.


    So your saying that essentially the main function that this particular things was created for will not be used? Also that most of the things released under the GPL and various other liscences are usually associated with business applications? I am sorry but a great deal of the "practal" applications seem frightfully dull. Is there a way that a truly interesting use of this search engine can be utilized for something a little bit more relaxed such as say analyzing content on the web and creating a better series of topologigraphic maps of the internet? Now *that* would be cool. Transfering the data to an ascii environment would be even cooler.
  • by Markee ( 72201 ) on Tuesday February 01, 2000 @06:34AM (#1315451)
    It seems you have missed the point of Open Source entirely. OSS is not about "Anyone can get the source code", it's about "Anyone can get the source code, modify it, publish the results and do what the heck he wants with it (well, almost)".
    With respect to being "worthy": I read the press release as stating that you will have to become an "Affiliate" before you get the source code; and for becoming an affiliate you have to present a web site you are running.
    The press release is ambigious about this, so maybe I am wrong. (But if I am wrong: where is the download page for the source code?)
  • One fact which all the search engines must realize, as well as cache companies like Inktomi and Akamai, is that the Internet is becoming increasingly dynamically-generated, personalized, and transactional -- exactly the kind of content least suited for static spider-driven search engines and static cache technology.

    That's where I think it would be a great idea to embed the spiders within people's browsers for this distributed search engine project. Of course you'd need to be able to set up a system to selectively not spider sites / pages (account info, etc.) but the idea is as you're accessing the info in the web database the HTML that pours out gets indexed and then sent to your upstream. A lot of database information stays put or changes very little, but it's hidden behind a search or an index of some kind (see my knowledgebase [mixdown.org] for an example). If you embedded the spider within the browser, you'd get the content without hammering sites and all is good.
  • To my untrained eye, that seems to merely say that if you do decide to alter the code, Altavista isn't going to send you a check.

    That clause doesn't prevent you from modifying the code (barring the existence of a licensing agreement that does do that), it simply says if you choose to alter it, Altavista isn't going to pay you.

    Frankly, I think that's a very fair "restriction," because it doesn't limit your freedom with the code (again, assuming nothing else limits it). You can fiddle and twiddle to your heart's content. Just don't expect to get paid for click-throughs.

  • I believe a good solutoin for this problem is the "cached" version Google keeps in his own database... i always try to access the real link first.... but the backup is on much occasions the only solution....

    Here here! Now that is what I call an inovation. I can't tell you how many times this has saved me.
  • Babelfish is really the product of a company called Systrans (or similar). I think they're based in France.
    So I doubt AV could release that code, even if they wanted, they just licensed it.

  • I know, that is why I stopped previewing (and behold what that just caused...)

    My little mistake seems to be a little /. bug too. I entered the url as href='http://www.google.com' but that came out as href="'http://www.google.com'"

    That is: slashdot added the double quotes making me look really stupid. Next the script will automagically add "F1R57 P057" and Natalie Portman to the first five post on any subject :-)

  • that link is just Altavista's news syndication services republishing the ZD story which started the brouhaha -- they have not yet issued a press release as far as I can tell.

    bumppo
  • When the porn-industry can get to the sourcecode, they'll find even more ways to get the first thousand hits no matter what your keywords are... AltaVista became useless years ago...
  • It still doesn't make HTML into source code for the engine, though. The Zdnet article is incredibly misleading.
  • And let them know why! There are no plans by them to release their source. This is a cynical scam to boost revenue via banner ads, and to boost interest in their affiliate program.

    If enough people refuse to use the AltaVista service, =and= let them know why, they may either be pushed into apologising or releasing =some= source, which is better than nothing at all.

  • This has been suggested several times before on slashdot, and is as unlikely now as ever. Parallelizing a search engine would probably make it slower, as the latency between machines on the internet (not to mention the bandwidth) is terrible. Imagine if I search for United States and get 5 million hits on each from different machines. You then have to transfer the hitlists to one of the machines and do the comparison. You just got killed on your fast clause.

    What would be more useful is a cluster of machines, each having the whole database. You would have to update all machines every night, but you might gain something from this approach.
    --
    Mike Mangino Consultant, Analysts International
  • I'm amused how in English, "a fraction of" implies "a fraction less than zero", when of course "a fraction of" could be, for example, "eight thirds".
    I don't think that's right. A fraction of (since I do speak basically only English and it is my native tongue) refers to a part of. I have never heard that it means less than zero anywhere

    I think the author you are quoting meant less than one, rather than less than zero.

    When people say "I got it for a fraction of the retail cost!" they are implying that they got it for less than that normal cost, for example 1/2 price, when the litteral translation of "fraction" into most (?) languages would not nesseseryly imply this, but would mean any fraction - ie a number expressed as a/b, like 4/3 or 8/3. There are many numbers most accurately expressed as fractions.

    I appologise for my spelling - Thad
  • Now maybe someone can make a version of AltaVista like it used to be without all this portal crap.
  • Are you a native speaker of English? Because in standard English, the word fraction has (to my understanding) always meant a portion less than the whole, and often significantly less than the whole. In mathematical English, the technical term "fraction" is essentially a synonym for "rational number", which is a number which can be represented as the quotient of two integers. (Exercise for the reader: prove that the rational numbers have a one-to-one correspondence with the integers.)

    The technical meaning of mathematical terms often has little to do with the standard English meaning of those words. Another good example of this is "imaginary" (as in "imaginary number). Many people (collegiate mathematics students among them) still believe that imaginary numbers are somehow less real (to use the standard English meaning) than real numbers, when in fact all complex numbers (pure imaginaries and reals included) are equally abstract.
  • Open Source is very open (forgive the pun) in terms of what it applies to. I can Open Source my new Widget2000 code and only sell it to people who buy Widget2000 and require them to not reveal the source code to anyone else. Or I can Open Source it and post it on a public FTP site.

    Every appliction may have a different terms of agreement for how the source is handled. If anything, this is incentive to READ before you buy or mess with someone's source code. You should be reading the licensing anyways, but I know MS liks to put it inside shrink wrap before selling things. ;)

    Bad Mojo
  • by bumppo ( 15745 ) on Tuesday February 01, 2000 @07:02AM (#1315468) Homepage
    CMGI has finally seen fit to issue a press release. Surprise! they really are cheeky enough to suggest that a snippet of HTML constitutes open source.

    http://biz.yahoo.com/prnews/00 0201/ca_altavis_1.html [yahoo.com]

    Excerpt:

    "The AltaVista Affiliate Network is leading the expansion of our
    distinctive services throughout the Web, at a global scale," said
    Rod Schrock, president and CEO of AltaVista Company. "This
    program will effectively open source AltaVista Search and
    translation services thereby extending our brand to the Internet
    community."

    Smug bastards.

    bumppo
  • by Merk ( 25521 ) on Tuesday February 01, 2000 @07:03AM (#1315469) Homepage

    "Can't you read? From the article refrenced: ". Yeah, because we all know that ZDNet never makes mistakes or says untrue things, right?

    Maybe you should go read the AltaVista press release. They don't say anything like "Here is our source code, and here is the license". They talk a lot about business solutions and how you can obtain a modified version of their search engine. In a 5 minute look around their site I wasn't able to see anything about what license they were planning to use, or even verify that the search engine they were allowing you to download was not in binary form.

    I hope that ZDNet got their story right, but the way the said things I was expecting to see a press release from AltaVista saying "AltaVista open-sources search engine technology!". Not seeing that bothers me.

    C'mon, you should know better than to accept at face value what you read in something at ZDNet without checking the source of the story.

  • Comment removed based on user account deletion
  • Heh. Whatever floats your boat.

    I was just pointing out that the release of the alta vista search engine would be Really Cool, b/c there are lots of uses for it that don't involve buying millions of dollars of machinery and setting up Yet Another Web Index. Your 'topologigraphic' map project just furthers my point.

  • MMmmmm has anyone of us got either the hardware, bandwidth or the inclination to run altavista?

    How will it help open source development to look at a monolthic pile of code when no one part of it couldn't be written by any compedent programmer, many /.'ers could write an altavista clone but without the formentioned hardware/bandwidth whats the point??

    Now if altavista started hosting open projects home pages and CVS that would be news wouldn't it? or am I just predicting the newest portal scam? sorry andover.net couldn't resist ;-) stay free.

    Sparkes


    *** www.linuxuk.co.uk relaunches 1 Mar 2000 ***
  • Many entities "Give away" their source to certain individuals and entities. That is no way, shape, or form, means that they are opening up the source in general, as in an Open Source project.
  • Is this really good? "Of course", you yell. Anything open source must be good, people can work on it, and improve it and blah blah blah. Look at Mozilla, how much has it improved? How many people actually put work in it? If altavista had been open sourced a long time ago, would google have existed? Would the google guys have thought about coming up with new ideas, or would they have just blindly worked on simply improving the original source? ...

  • AltaVista is jointly owned by CMGI and Compaq, they get advertising revenue from DoubleClick, and one look at their press page [altavista.com] shows that they're making deals left and right, among other things.

    So why this click-thru service anyway? Isn't this just the last resort for porn webmasters and script kiddies? What exactly does this prove...that at the mention of the phrase "open source," people come running? This just doesn't make any sense to me.

    Thoughts?
  • Well, someone is in a world of trouble at ZDNet.
    The article just got yanked.

    If anyone read through the affiliate program materials, there is no mention of any source code. Just the program itself. You put code to reference the toold/search on Altavista's site, and get paid $3 per clickthrough! Not all that lucrative unless you have alot of traffic.

  • Actually it's $.03 per clickthrough.

  • I made no point about "OpenSource" at all, so how do you know if I missed it? I repeated what the article said, that AltaVist is "giving away" the source code. Yes, I know there is a difference, that is why I did not say "open source". Maybe you can e-mail Hemos about his misuse?

    BTW, ZDNET has apparently pulled the story already (no, not slashdotted, "page not found" and headline removed from ZDNET). I do not know where anybody saw the AltaVista press release saying you must be an affiliate to get the source, but it is not showing up here live.altavista.com [altavista.com], so I am just going from the articles that I have working links to.

    You guys MAY be right, but I have yet to see ANYTHING saying that AltaVista requires affiliation acceptance to get a copy of the source.

  • The part about targetting "personal home pages" doesn't seem quite right to me. Maybe I'm missing something, but:

    1) First, you have to have a personal home page which is regularly updated. I haven't updated my personal home page in quite a while- it was really just a very short exercise in HTML- but I still access it almost daily. I have links to Alta Vista and Deja.com which use the text-only interfaces. I use them all the time. It's here [usit.net] just in case anyone's interested. Does incrementing some web counter count as "regularly updated"?

    2) For my personal home page, I was only allowed access to two scripts provided in my ISP's cgi-bin. If you wanted your own cgi-bin, you needed to buy a commercial account. How many other personal home pages have similar restrictions?

    Even though I don't have anything worth indexing, I have to wonder just what Alta Vista's thinking with this.

  • Great!! Now where can we submit patches ??


  • Looks like zdnet took the story down, good call I say
  • Anyone want to write some plug-ins for Netscape, IE or Mozilla?
  • Read the open source definition.

    http://www.opensource.org/ [opensource.org]
  • Try LOOKING at the Altavista site? This article [altavista.com] is apparently accurate enough for AltaVista to post themselves. (probably written for ZD by AV anyway)

  • As most of us know already, major search engines use a hush-hush set of algorithms to reduce the number of spam'd enteries. (text set in the same color as the page background, really small text, keyword stuffing, etc.) By releasing the source to their engine, isn't AltaVista bascially giving the thieves keys to the treasure chest?



    ----
  • I was at Lotusphere (Lotus Notes conference) a few weeks ago, and one of the developers there made a reference to the fact that Altavista runs their entire database out of RAM. That's a LOT of RAM!!!!

    Micro$oft(R) Windoze NT(TM)
    (C) Copyright 1985-1996 Micro$oft Corp.
    C:\>uptime

  • But I sure don't see the source code anywhere on their sites. I did see a free version of their search engine which I downloaded and took a look at. That didn't have the source code for sure.

    So where is it? I'm getting the feeling that somewhere along the line something didn't get translated.
  • What would be more useful is a cluster of machines, each having the whole database.

    That was _exactly_ what I was proposing.

    Each node indexes a subset of the web. It then passes on its local index to the other nodes that make up the cluster, so that each individual node can accumulate a copy of the master index. Search requests are then routed to the individual nodes based on how many requests each node is currently processing, how quickly they've been responding, etc, so that none of the disparate machines that make up the search cluster get overloaded with requests. And if one of the nodes is slow to respond, the portal could just resend the request to another available node.

    Those that didn't want to host requests but still wanted to help with the effort could run spiders that index a small portion of the web and make that index available to the central cluster, thereby distributing the workload further (and allowing sites to index their local networks, forinstance, where they typically have higher speed connections, then dump the resulting index off to the cluster during offpeak hours). This allows local admins to index whatever portions of their site they want as often as they want just by setting up their webserver as an index-only node in the cluster. Get enough sites doing that, and you're going to get pretty up to date results.

    The indexes would only get passed on to the nodes that request them...with a little effort, it wouldn't be hard to set up request routes to allow indexes to flow from node to node via the fastest network connections possible, minimizing crosstalk between nodes (you just pass any downstream indexes upstream and vice-versa, adding in your own locally aquired index along the way) -- have the indexes propigate like news articles.

    The portal machine would only need to maintain a list of IP addresses weighted according to how big a load each site is willing and able to handle, so its processor load will be minimal. Put a moderate-sized machine on the end of a big network pipe, and a few thousand nodes scattered all over the net, and you might have one nice search engine.

    Sorry if my previous post confused you.

    -- WhiskeyJack

  • by Anonymous Coward
    Hey, it looks like their editors read /. or else someone pointed out the egregious error of the AltaVista story they had posted, since it isn't up on their site anymore (~12:45pm) and the links to the story no longer work. Go team /. !!!!
  • AltaVista has a copy of the ZDNet Story [altavista.com] up.

    http://live.altavista.com/scripts/editorial.dll?ca tegoryid=&only=y&bfromind=980&eetype=art icle&render=y&eeid=1461716&avr=1

  • Well what would be really cool is if the engine could somehow detect a bad link as soon as a visitor fails to access it.. then the server double checks it the next day, and if it's still down, then POOF the link gets munched.

    The point is to get some sort of real time checking and self monitoring. I've worked on numerous search engines, and most of them just have a batch verify command to parse the entire link database... there are better ways out there.

  • The problem is that the majority of geeks who would use it still have this pre-pubescent BASIC program running in their heads.

    10 for minutes=1 to ((day-sleep)*60*60) step 1
    20 if x%2=0 goto 50
    30 if x%9=0 goto 60
    40 next x
    45 end
    50 reload(slashdot); next x
    60 post(troll, 100, "Natalie Portman", "Grits", "Pants")
    70 next x

    You'd only ever see /., and perhaps some of the sites /. links to! On the upside, you'd have a complete historical record of 'Great SlashDot Trolls of the Late 20th Century'...
  • Well... that was strange...

    • PAGE BACK from a preview after you're finished checking it. Simple.
    • Use "double quotes" on HREF's, not 'single quotes'. You aren't editing fricking Java Script here.
  • Without portal crap try this...

    http://www.altavista.com/cgi-bin/query?opt=on&en c=iso88591&text=on&pg=aq

    Text only search - no ads or other BS...


    Later
  • I think the author you are quoting meant less than one, rather than less than zero.
    Yes, you're right. I meant less than one.

    Yes, fraction means "any rational number". For that matter, it may be 1/1.

  • I've actually used (written code) for the AV search engine.. it is in fact very easy to use. I don't think they care about "open source" in the development of the search engine. I suspect that they are more interested in 1) getting developers to create and fix interfaces in other than C and VB (the only official supported languages) and 2) getting a "serach by AltaVista" logo out on as many sites as possible.

    Regards, Barrie

  • if there really into open source, then let them release their database so that others can copy it too.
  • Seeing as I use Altavista advanced search many times a day right for work-related searches, I signed myself up for the Affiliate's program

    Free money.

  • Yes, I want the altavista software, but more
    important is to have access to the database and
    leave my agents fly around inside.


    retrocool
  • The press release is ambigious about this, so maybe I am wrong. (But if I am wrong: where is the download page for the source code?)

    First, where is the link to the "press release"? I posted a link to AltaVista's news area, has everything I was quoting.

    It also said Altavista TO RELEASE. Ahem, that indicates that it is NOT released YET (that is what "to release" means, as in "going to release" as in sometime in the future).

  • The word 'fraction' does not only refer to the mathematical definition. It also means 'A portion or fragment' (according to Webster's). It is certainly in this sense that the phrase is used.

    . o O ( In school they teach that plants have roots. How ridiculous! How can you take the root of something that's not a number? )

  • How would the server verify this? Rather than presenting links to the actual sites, would it present a list of CGI's that a browser could then click, causing the server to verify the page prior to passing it back to the client? That would be a major CPU killer.

    And what happens if a site is slashdotted or subjected to a DOS attack? In the first instance, they'ed risk removal because they were too popular. In the second instance they'ed risk removal because of an enemy or script kiddie...

    I think Alta Vista's fine... The entire nature of the internet is based on there being no central authority... sites come and go. Pages change. And there isn't really a reasonable way of dealing with it, in my eyes.
  • That's why I said to look at AltaVista's press releases. Check out the source of the information.


  • Hey, it looks like their editors read /. or else someone pointed out the egregious error of the AltaVista story they had posted, since it isn't up on their site anymore (~12:45pm) and the links to the story no longer work.

    As a long time ZDuh watcher, I can tell you that it is quite common for them to simply disappear anything that is remotely embarrassing, and they NEVER acknowledge an error. Remember the 'Jesux' hoax? They don't.

    ======
    "Rex unto my cleeb, and thou shalt have everlasting blort." - Zorp 3:16

  • by Wah ( 30840 )
    "Welcome to you first day of Writing Press Releases for the Internet class. Now, just to make it clear. You do not need to know what buzzwords mean, just how to use them in a sentence. Everyone got that? Class dismissed, see ya next year."
  • I ask this becuase the distributed concept needs to be applied to web indexing.

    I used to use altavista for all of my searching, but now that the search engines are lagging so far behind in indexing the content, we're forced to try multiple search engines to find what we're looking for. Yes, there are meta-search engines, but that's not what I'm looking for.

    If they're letting us at the crawling/indexing code, maybe we can build a distributed indexing system along the lines of seti@home, mprime or one of the distributed.net projects.

    Between that and an indexing system along the lines of the Library of Congress indexing code systems, we could jsut tame this beast yet.
  • i thought i might sign up for their affiliate program, just to get the occasional $.03 check from altavista, but when i saw the application form [altavista.com], i was appalled.

    not only does it ask for all your contact information (several times) but it asks for your social security number. all on an insecure form!

  • I didn't see any discussion of open sourcing their technology. It appears that they are merely providing a mechanism (in HTML) so your web site can "front" a query that is fielded and handled by Altavista, with a registration and payback mechanism.

    Such a scheme (without the registration or payback mechanism) has been in place for quite a while at Google:

    http://services.google.com/cobrand/fr ee_trial [google.com]

    Obviously, I would prefer to get money for the clickthroughs I generate, but I also want my clients to get great search results as well. At any rate, if I understand correctly, this does not appear to be the "open source" surprise represented in the article.
  • ... or boycott Amazon/MPAA, Microsoft sucks, dont forget BSD, beowulf clusters, complaints about spelling (Aaron?), IANAL, etc, etc, etc

    OK Good doctor. Can we end this thread now?

  • Interesting. I used Automony long ago and hadn't come across it since. I wasn using it on a very small dataset, so I fear it didn't really show the best of its abilities, but it was interesting. Bit expensive though...

You can tune a piano, but you can't tuna fish. You can tune a filesystem, but you can't tuna fish. -- from the tunefs(8) man page

Working...