Platform Independent, Searchable Info On CDROM? 18
Knuckles asks: "A friend of mine, who is an ethnologist, needs to author a CD-ROM with ethnographic source material (4,500 printed pages) on the indigenous population of Mexico. The CD-ROM should provide a platform independent way to retrieve the information with a simple interface and be fully searchable.
He is computer literate but doesn't know anything about programming.
What solution would you propose? I thought of HTML, but am lost on the question of searchability. Macromedia Director or similar?
He would prefer free software, but would use proprietary if better fit for his goals."
Re:ASCII is Your Friend (Score:1)
Re:PDF (Score:1)
Take care
JL
Re:HTML + Perl (Score:1)
Best Regards, Ben Abbitt
Re:O'Reilly (Score:1)
Er...let's try that again.
There. Sorry for any confusion.
Javascript and HTML (Score:1)
If I remember correctly the Intel Developer Insight CD set has a few searchable documents; the entire CD set (a mirror of http://developer.intel.com [intel.com]) is in HTML and the search scripts are in Javascript. I think.
HTML + Perl (Score:1)
However, you can use Perl script to generate a massive listing and index of the keywords and make this to be HTML files. HTML files are mostly accessable by most computer users. It's also good to include ASCII text files along with PDF files.
One drawback of using only ASCII text files is that if you have a lot of pictures or diagrams in this document and expect the users to print them, you don't really want them to open each image in GIMP or Photoshop and do the printing.
Also, you can try to write CGI scripts and put them on the CDROM along with the HTML files. You can make CGI script possible to do searching text for you.
Aside, you may want to try Perlfect Search 3.08 [perlfect.com].
Their web page description:"Perlfect Search is a sophisticated, powerful, versatile, customizable and effective site indexing/searching suite available under an open source licence. It comes as a pair of disctinct scripts. The indexer, that automatically, scans and indexes a web site, and the search engine, a cgi script that serves search queries for keywords over the index, and displays results pages in html, in a standard format including title, description and relevance ranking for each matching document. "
Another possibility is PDF format. You can also try PostScript file too. But I'm not sure if you can get any script to search text in PDF or PS format.
For making PDF file, you can buy the Adobe Acrobat, or try HTML DOC [easysw.com] from Easy Software Products [easysw.com]. It can convert HTML files into PDF and PS files and it's available in Linux, Windows, UNIX, IRIX, NT and Solaris.
Both HTMLDOC 1.8.8 and Perlfect Search 3.08 are in GPL.
Hope this help.
Re:O'Reilly (Score:1)
The CDs called Foo in a Nutshell, Deluxe Edition (Foo = {Webmaster, Java}) used JHLsearch, available from a fellow named (I think) John Leach who lives in Italy. I can't find a URL right now.
The subsequent CDs used ASTAware [astaware.com]'s NetResults. I wasn't really happy with their engine or their grasp of Web standards. Just before I left, we were starting to look into JObjects [jobjects.com], which I'd had good recommendations for, but I don't know what became of that.
In short though, if you provide HTML content, the users will be able to use technology of their choice to search the CD. Most users will expect you to provide the tools, but that either means platform-proprietary tools or something based in Java. And even with Java, you'll probably need to provide a VM for the most common platforms, just in case.
You don't seem to understand Information Retrieval (Score:1)
Please refer to FAQ [trond.com]
HTML + javascript (Score:2)
Maybe even having a flatfile with keywords to index the search...
Re:O'Reilly (Score:2)
You're kidding, right? I haven't been able to get the search facility to work on any of the platforms I've tried (Netscape on Linux and Solaris, and IE on Windows). Java is enabled for all of them, but the browser either does nothing, or hangs when you select the search option :-(
Re:PDF (Score:2)
Acrobat or PHP+Apache (Score:2)
When I had to do this, I needed some dynamic functionality beyond simple search - I wound up using Apache+PHP, and simply pointing the browser to http://localhost:4711/. Surpringly, we never had a tech support question related to the actual technology (although we got a few newbie "can I run this cdrom if all I have is a DVD drive?" type queries).
Of course, if it's just documents, Acrobat has a variety of really slick solutions that are *very* platform independant.
--
Evan
PDF (Score:2)
A wealthy eccentric who marches to the beat of a different drum. But you may call me "Noodle Noggin."
HTML... (Score:2)
HTML is the way to go.
*Not a Sermon, Just a Thought
*/
O'Reilly (Score:2)
For network/Internet access, this isn't the answer, but for sending out CDs, it's almost ideal. There is some setup involved, so it's not a no-brainer but it is close.
HTML or even tcl or perl. (Score:2)
html + scripting + tabular format (or sql) (Score:2)
Html is standard ( at least if you test it with more than type of browser ). You can try to use Html for everything, but that will make your search facility more like the index of a book.
If you want dynamic content, you might want to ship the CD with a webserver, like apache( as binaries for the most popular platforms, maybe source too).
Perl as a language is available for all platform but you will have to provide binaries for popular platforms on the CD. ( Of course other languages (python, java, apache+php) could do the job as well. ).
Alternatively, you could use java or python without a browser to write platform independent applications. Might look better, but is more work, and not on all platforms a java virtual machine is often installed.
For the data, you can use either tabular for more (columns separated by tabulators, and a header row naming the columns in front ), or an sql database dump. Database dumps sometimes contain extra information or idioms that are hard to read into other databases, but you can avoid that with a little caution. Tabular format can be imported into a lot of databases too, with moderate work effort, or none.
ASCII is Your Friend (Score:3)
From your description, it sounds as though the content is plain text. Thus, I would keep it in generic ASCII.
As for the search interface, I'd use the whatever the operating system provides. For Windows, that would be Start->Find-Files which will allow for text-based searching. On *nix, you could use grep. I'm not sure what the Mac choice would be.
The less dense and more format neutral information is, the more likely it is to be useable in the future.
Keep it simple.
InitZero