Catch up on stories from the past week (and beyond) at the Slashdot story archive

 



Forgot your password?
typodupeerror
×
Science

Platform Independent, Searchable Info On CDROM? 18

Knuckles asks: "A friend of mine, who is an ethnologist, needs to author a CD-ROM with ethnographic source material (4,500 printed pages) on the indigenous population of Mexico. The CD-ROM should provide a platform independent way to retrieve the information with a simple interface and be fully searchable. He is computer literate but doesn't know anything about programming. What solution would you propose? I thought of HTML, but am lost on the question of searchability. Macromedia Director or similar? He would prefer free software, but would use proprietary if better fit for his goals."
This discussion has been archived. No new comments can be posted.

Platform Independent, Searchable Info on CDROM?

Comments Filter:
  • Sherlock and Sherlock 2, the Macs built in Search feature from OS 8.5 and beyond, are capable of indexing the content of many types of files, flat text being one of them, then allow you to search such content on your hard drive. FYI.
  • That's exactly right. In fact, Acrobat has indexing features that makes searching docs quick and easy. The full Acrobat costs around $250 U.S., and Acrobat Reader is free to use and distribute.

    Take care

    JL
  • BeOS has perl support, search BeBits [bebits.com].

    Best Regards, Ben Abbitt
    1. I don't think it's not open source, but it does a good job.

    Er...let's try that again.

    1. I don't think it's open source, but it does a good job.

    There. Sorry for any confusion.

  • Javascript and HTML look like an ideal solution to me. HTML gives you all the formatting power you need, and Javascript can be used to knock together indexing and searching.

    If I remember correctly the Intel Developer Insight CD set has a few searchable documents; the entire CD set (a mirror of http://developer.intel.com [intel.com]) is in HTML and the search scripts are in Javascript. I think.

  • If you want this CDROM to be generally available to access for all kind of users (Linux, Windows, BeOS, *NIX, MAC...) then Perl would not be a good idea on the CDROM for searching. It may not run properly in Windows (for those who doesn't have ActivePerl installed) and BeOS.

    However, you can use Perl script to generate a massive listing and index of the keywords and make this to be HTML files. HTML files are mostly accessable by most computer users. It's also good to include ASCII text files along with PDF files.

    One drawback of using only ASCII text files is that if you have a lot of pictures or diagrams in this document and expect the users to print them, you don't really want them to open each image in GIMP or Photoshop and do the printing.

    Also, you can try to write CGI scripts and put them on the CDROM along with the HTML files. You can make CGI script possible to do searching text for you.

    Aside, you may want to try Perlfect Search 3.08 [perlfect.com].
    Their web page description:"Perlfect Search is a sophisticated, powerful, versatile, customizable and effective site indexing/searching suite available under an open source licence. It comes as a pair of disctinct scripts. The indexer, that automatically, scans and indexes a web site, and the search engine, a cgi script that serves search queries for keywords over the index, and displays results pages in html, in a standard format including title, description and relevance ranking for each matching document. "

    Another possibility is PDF format. You can also try PostScript file too. But I'm not sure if you can get any script to search text in PDF or PS format.

    For making PDF file, you can buy the Adobe Acrobat, or try HTML DOC [easysw.com] from Easy Software Products [easysw.com]. It can convert HTML files into PDF and PS files and it's available in Linux, Windows, UNIX, IRIX, NT and Solaris.

    Both HTMLDOC 1.8.8 and Perlfect Search 3.08 are in GPL.

    Hope this help.

  • The CDs called Foo in a Nutshell, Deluxe Edition (Foo = {Webmaster, Java}) used JHLsearch, available from a fellow named (I think) John Leach who lives in Italy. I can't find a URL right now.

    The subsequent CDs used ASTAware [astaware.com]'s NetResults. I wasn't really happy with their engine or their grasp of Web standards. Just before I left, we were starting to look into JObjects [jobjects.com], which I'd had good recommendations for, but I don't know what became of that.

    In short though, if you provide HTML content, the users will be able to use technology of their choice to search the CD. Most users will expect you to provide the tools, but that either means platform-proprietary tools or something based in Java. And even with Java, you'll probably need to provide a VM for the most common platforms, just in case.

  • ...and the associated costs.

    Please refer to FAQ [trond.com]

  • I know it sounds ugly, but I believe that you can use Javascript to set up some sort of searching in pages.

    Maybe even having a flatfile with keywords to index the search...
  • The tech CDs O'Reilly puts out use a Java-based search engine. I don't think it's not open source, but it does a good job.

    You're kidding, right? I haven't been able to get the search facility to work on any of the platforms I've tried (Netscape on Linux and Solaris, and IE on Windows). Java is enabled for all of them, but the browser either does nothing, or hangs when you select the search option :-(

  • Unfortunately, the search plugin isn't present on all OSes (like Linux, last I checked).
  • .
    When I had to do this, I needed some dynamic functionality beyond simple search - I wound up using Apache+PHP, and simply pointing the browser to http://localhost:4711/. Surpringly, we never had a tech support question related to the actual technology (although we got a few newbie "can I run this cdrom if all I have is a DVD drive?" type queries).

    Of course, if it's just documents, Acrobat has a variety of really slick solutions that are *very* platform independant.

    --
    Evan

  • Acrobat is also prettry slick. I am pretty sure that if you buy the full version of Acrobat Creator from Adobe, it has all sorts of slick search functionality and indexing builtin. And although proprietary, free (beer) viewers exist for nearly every platform.

    A wealthy eccentric who marches to the beat of a different drum. But you may call me "Noodle Noggin."
  • Coldfusion used to have a java applet that would allow you to index and search the html manuals. You could write him a little applet like that.
    HTML is the way to go.

    /*
    *Not a Sermon, Just a Thought
    */
  • The tech CDs O'Reilly puts out use a Java-based search engine. I don't think it's not open source, but it does a good job.

    For network/Internet access, this isn't the answer, but for sending out CDs, it's almost ideal. There is some setup involved, so it's not a no-brainer but it is close.

  • HTML would probably be the ultimate solution becasue of the WYSIWYG editors on the market today make it very easy for non-computer literate people to make a simple interface into their setup. However, if you had more skills I'd suggest tcl or perl with the tk toolkits to do a better job of it.... If he wants to dish out a small amount of cash I'd be more then happy to write up some scripts :-)
  • It really depends how much you value platform independence versus features.

    Html is standard ( at least if you test it with more than type of browser ). You can try to use Html for everything, but that will make your search facility more like the index of a book.

    If you want dynamic content, you might want to ship the CD with a webserver, like apache( as binaries for the most popular platforms, maybe source too).
    Perl as a language is available for all platform but you will have to provide binaries for popular platforms on the CD. ( Of course other languages (python, java, apache+php) could do the job as well. ).

    Alternatively, you could use java or python without a browser to write platform independent applications. Might look better, but is more work, and not on all platforms a java virtual machine is often installed.

    For the data, you can use either tabular for more (columns separated by tabulators, and a header row naming the columns in front ), or an sql database dump. Database dumps sometimes contain extra information or idioms that are hard to read into other databases, but you can avoid that with a little caution. Tabular format can be imported into a lot of databases too, with moderate work effort, or none.

  • by InitZero ( 14837 ) on Sunday July 30, 2000 @11:02AM (#893747) Homepage

    From your description, it sounds as though the content is plain text. Thus, I would keep it in generic ASCII.

    As for the search interface, I'd use the whatever the operating system provides. For Windows, that would be Start->Find-Files which will allow for text-based searching. On *nix, you could use grep. I'm not sure what the Mac choice would be.

    The less dense and more format neutral information is, the more likely it is to be useable in the future.

    Keep it simple.

    InitZero

Any circuit design must contain at least one part which is obsolete, two parts which are unobtainable, and three parts which are still under development.

Working...