Slashdot Log In
Open Source Library Card-Catalog Apps?
Posted by
timothy
on Tue Aug 22, 2000 03:39 PM
from the and-make-half-price-books-use-it,-too! dept.
from the and-make-half-price-books-use-it,-too! dept.
dmd writes: "Does there exist Open Source software for maintaining a small to medium sized library card-catalog? It seems all the tools are available:
a perl module for working with
MARC records, several for working with Z39.50 and XML, and even a web site apparently devoted to nearly this exact topic. An actual, working, catalog, however, seems to be missing. Is this something that would be valuable? I, for one, have nearly 5k volumes in my collection, and they're begging for some discipline." I'm sure cash-strapped public libraries and schools would like to be able to use free / Free tools for this, since paper books aren't going away anytime soon. Not to mention for CDs, videos, charts, museum holdings ... any ideas out there? Turnkey solutions?
This discussion has been archived.
No new comments can be posted.
Open Source Library Card-Catalog Apps?
|
Log In/Create an Account
| Top
| 111 comments
(Spill at 50!) | Index Only
| Search Discussion
The Fine Print: The following comments are owned by whoever posted them. We are not responsible for them in any way.
(1)
|
2
(1)
|
2
Great opportunity for a project (Score:3)
Those schools that do have money move to software like Eloquent [eloquent-systems.com] -- systems that are way more complex than a school library typically needs. Most schools don't need that much power/customisation, and can't afford it anyway. What seems to be needed is a basic system that offers searching on author/title/subject/keyword, and possibly uses MARC records (though for a school library this is not essential).
It would have to be easy to set up, and low maintenance (ie. a basic linux box shoved under a desk somewhere with a UPS and a tape backup). You need to keep in mind that libraries -- and school libraries in particular -- are likely to have a multitude of machines running different OSes, so something like a web interface would be perfect.
Considering the fact that most schools are getting networked these days, it's feasible to have a linux box sitting under a desk somewhere running a database, some library software, and Apache, and a bunch of Mac/PC clients running MacOS and Windows and interfacing to this thing via a web server. The checkout could be the same idea. This could be extended to have non-web clients running on various platforms and talking to the server via CORBA.
In talking with librarians, I've found that you can't just say "dump MacOS/Windows and put Linux on all your machines" because they don't just use them for searching. They use them to run all sorts of stuff -- CD-ROM based educational software, etc. In other words, it's important to remember that for software like this, you can't just get a bunch of developers together and make decisions and write code. There are a ton of assumptions you just can't make when you're dealing with libraries and schools. There's a bunch of research into what people really want that's required. That makes it a little trickier a project than, say, a mahjongg game -- no offense to mahjongg hackers...
Anyway, this is a fantastic opportunity for development, and one that I have been very interested in for a while now. It's also been on the GNU project's list of stuff to do for years now. Contributing a GPLed library system would be great not only for Free Software, but also for schools everywhere who can't afford decent software in their libraries.
MySql couldn't do it right (Score:3)
And I don't mean this to sound like a slam against MySql. No SQL database could do it in a way that a librarian would be completely happy with, primarily because of the wonderful MARC format.
The MARC format is the standard format used to store biliographic information. It was originally created in the early 60's, with the idea that the primary means of transmission would be on tape. It supports well over 300 different major fields, ranging from simple ones that anyone would understand (auther, title, publisher) to arcana that only a trained librarian could love (is the item a festschrift, unusual pagination comments, magazine run dates, and on and on.) Most of the major fields have "sub-fields", where the data is broken into different elements (i.e. an author field field will have a name sub-field, a dates sub-field, a title-subfield, and possibly others.)
Fields in the MARC format have a theoretical maximum length of 10,000 characters. Many of the fields can be repeated any number of times (co-authors, variant titles, subject headings). I've seen several attempts to model the MARC format in a relational model, and while it can be done, it's a royal pain in the ass and it inevitably winds up with trade offs.
For a simple catalog, where you aren't worried about working with the MARC format, a relational database (including MySql) will be perfectly adequate. But librarians love the MARC format, and it is such a basic element of modern librarianship that any system that couldn't import and export it would be considered unacceptable - like a car with a crank starter.
And I should know. I worked as a librarian for several years; I even have the MLIS to prove it.
Input from a library geek. (Score:4)
Commercially available library software that is actually used by libraries is much more than just a cataloging/look-up system to replace those old 3*5 cards.
You need an acquisitions module that has the ability to do electronic ordering and approval plan processing.
The search and report capabilities on the staff interface for these things is amazing. I can collect a list of all item records belonging to location X and created within [ range of dates ] that are attached to bibliographic records for [ material type ] within a [ call number range ], sort the records according to my criteria, then output selected fields from either the bib. or the item, or both, in the order I choose to the device of my choice (including print to e-mail or fax) and I haven't even begun to make the system sweat. Yes, this is a fairly straight forward thing to do (selecting records based on data spread across multiple related/linked records) in SQL but, you also need a front end that the end user can comprehend.
If you're going to code it, it will need to be able to interact with all of the prevailing vendors... Ebsco, Baker & Taylor, Basil Blackwell, Swets & Zeitlinger, Matthews, etc... You will want tech contacts from each of these vendors to fine tune the ordering/receiving/approval interfaces.
Finally, the amount of fiscal reporting done in libraries can boggle your mind. You would never suspect that something so seemingly simple could be so complicated. If you don't have the ability to generate financial reports you might as well go back to index cards and hand written ledgers.
You're right (Score:3)
You can't just code a database -- that's almost entirely useless; there's also the matter of controlling circulation, tracking books out/returned/requested/held/sent to bindery, etc.
Plus import & export from vendors, billing, accepting bill payments, cross-referencing, all kinds of freaky subject indexing, mondo-bizarro file formats from a zillion years ago (MARC), etc. etc. etc.
There's a reason library systems tend to be proprietary -- it's because nobody else in their right mind wants to get involved with things like MARC and Z39.50.
. . . but then again I could be wrong.
simpler and more complex than you'd think (Score:4)
Second is that half of the pieces that go into a big library management system (including the catalog part) are really generic business systems: EDI, invoicing, accounting, etc., but they haven't been abstracted out of the realm of our systems vendors. So the level of standards followed there is minimal so those modules generally don't interoperate with our trading partners (i.e. internal payment systems and external suppliers). Lots of redundant keying and more crappy systems to maintain there, all of which is typically deeply and proprietarily tied into the catalog data.
All that said -- and to our vendors' credit they are tending to get better these days -- we've been sharing catalog data like hackers are sharing code for over 100 years. We've been doing it online for about 35 years, but the way we do it now is pretty much the same way we've been doing it for those 35 years. i.e. largely dependent on one of two .orgs/vendors to be a clearinghouse for sharing catalog data. But those folks disappear if they can't sell the data back to us after we create it for them. So nobody running a library wants them to disappear. Especially because we've got to handle one-of-a-kind rare items in big research libraries as well as unusual local items in public libraries and so on.
Imho the solution is to first outsource all the standard business stuff to vendors+free software that can do the same job with existing standards-based tools. Then abstract away as much as possible of the catalog data into free references sources shared and maintained by the library community (think: you could run your own amazon.com recommendations site, etc.). This is what we're trying to do (shameless plug alert) with the jake project [yale.edu] for journals. Same thing applies for books, although there are probably >=100M records to normalize.
If we can get that done, then anybody could hack up a gtk+ front end to the free, shared catalog, and pick and choose the items you have yourselves. It would work sorta like dict.org or jake. Just imagine how much easier it will be to search for ebooks in gnutella once this is done... :)
Index Data (Score:3)
- Zebra information server. Eats Marc (UsMarc, other local variants) as well as XML, mails, newsgroups, etc. You can add more input filters. Talks Z39.50
- Yaz Z39.50 toolkit for client and server side
- Zap web gateway and a PHP module for building easy search gateways to anything that understands Z39.50, for example our own Zebra
- and more. Even more to come later...
I am of course biased, but these tools are designed for library applications. All open source, at Index Data. [indexdata.dk]
I was looking into this once... (Score:3)
The major problem I ran into with writing something of the sort is that there's lots of information that you really want to have that isn't on the web. Cataloging rules, the full description of the MARC fields, some of the lists (organization, I think, is one example). I could get some of those from a library, but strangely enough although I'm sure most libraries have them, they aren't necessarily on the stacks, but in people's offices. Even then, I'd have to keep them checked out for long enough that I'd rather buy a copy.
But, if anyone wants to work on it I'd be glad to help. My ideal app would have to
MARC tape issues - giving away your tax dollars (Score:4)
Unfortunately, some years back the firm that records these records in the MARC formats legally got control not only of their formatted tapes, but of any use of the information used after extraction from these tapes. In other words, they not only own the format, but the government funded information contained in the format.
This is critical because these MARC tapes are the primary source of library cataloging information for most libraries. There are some other independent networks, primarily of educational institutions in the western US, but most libraries depend on the Library of Congress OCLC tapes.
The whole thing stinks, and is ridiculous. As a former librarian, who also holds a BSCS, I was outraged at this theft of public assets. The worst part was dealing with my moronic former colleagues who screamed that of course this company should own this information - it was "intellectual property." Thousands of librarians wrote letters in supporting this company's "intellectual property rights" to work created at tax payer expense.
This happened because most librarians think that putting information into a data format is some mystical arcana mastered only by brilliant wizards. They do not realize that the far more difficult part of the operation was the original cataloging done by the awesome catalogers at the Library of Congress.
So, libraries pay for the nose for software. First, the fee that the vendor has to pay for using the MARC tapes, the royalties for the actual use of the data contained on the tapes, and then for the library software itself. BTW, most library software is so atrocious, buggy, and difficult to use that it's writers would receive a failing grade if it had been turned in as a senior project at any half way reputable college.
Yes, there is: Koha. (Score:5)
Public libraries, unfortunately, are too often dependent on fiercely proprietary-minded vendors for their daily operations.
Incidentally, the "go get MySQL, you dumbass" posters are missing an important point: libraries use the MARC [loc.gov] data standard for catalog records, and SQL doesn't cope well with the kind of tricks MARC can do.