Follow Slashdot blog updates by subscribing to our blog RSS feed


Forgot your password?

Advice for Building a Multi-Platform Lyrics Database? 65

AntonOnymous,Cowherd asks: "I am in the process of designing an application for general public use. The application will allow end users to search and display a large collection of songs (both lyrics and tunes) with annotations, all in text format. The intent is for this application to run cross-platform (Linux, Windows, Mac, and whatever else), so I want to avoid platform-specific binaries as much as possible. I also believe that the program should be Open Source. The end users will not necessarily be computer experts, so I want to avoid as much additional setup on their computers as possible. The application (data and program) will all be stored on a CD or DVD, and it should be able to be run locally. The most important part of this application is the data, not the program, so the guts of it should be fairly simple with a decent user interface. Does anyone have any suggestions as to general approach to setting this up, or have any pointers to existing open source programs which already perform a similar function?"
"One way to implement this would be to set up each song (with lyrics, tune, and annotations) as a single record in a database. I would like to avoid the inherent security issues and overhead of setting up and running a database on a user's computer.

Another possibility, which is fairly appealing, is to use a Web Browser to provide the user interface, and to use Open Source text indexing/searching programs (such as Lucene or Egothor) as the engine. It is probably safe to assume that most users have a Browser. However, most users probably would not have a web-server (even a local one) on their computer, and going by the principle of as little messing around with the user's computer as possible, I would like to avoid having to set one up, even a local one."
This discussion has been archived. No new comments can be posted.

Advice for Building a Multi-Platform Lyrics Database?

Comments Filter:
  • by CRCulver ( 715279 ) <> on Friday April 28, 2006 @06:32PM (#15224441) Homepage

    Whatever you do, please store everything in UTF-8 encoding, since most of the lyrics of the world's music are not in English. I was outraged the day I discovered that the old CDDB system required everything to be in ISO-8859-1. What is someone to do with music in foreign scripts? ISO-8859-1 doesn't even have the necessary characters from standard Latin transliterations (such the characters with carons for Cyrillic transliteration).

    If you don't have any experience with Unicode issues, a problem shared by a regrettable number of developers, try Gilliam's Unicode Demystified [] .

    • You're more right than you know. Really, every programmer should know the basics of Internationalization these days. The problem is that there are a lot of obsolete concepts floating around, as symbolized by the fact that most people still think that "ASCII" and "text" are the same thing. That's not been true for a long time, even if you're writing software that doesn't need to be localized.

      The inventors of Java had a the right idea: store all your characters using Unicode, and translate them to the local

      • Unfortunately, a lot of early Java class libraries did not implement it right.

        In fact, Java --- and Windows --- got it so catastrophically wrong (using 16-bit values for characters, instead of 32-bit value) that it was found easier to change the Unicode specification to prohibit most characters that wouldn't fit in a 16-bit value!

        There is a standard in place for encoding such things using 16-bit values; it's UTF-16, and given that it's a variable-length-character encoding like UTF-8, it rather defeats t

        • In fact, Java --- and Windows --- got it so catastrophically wrong (using 16-bit values for characters, instead of 32-bit value) that it was found easier to change the Unicode specification to prohibit most characters that wouldn't fit in a 16-bit value!

          Please. Both Java and Windows simply implemented Unicode. The decision to try to do every character set on the planet in 16 bits was Unicode comittee's decision, not Sun's or Microsoft's. And it's a mistake they've been able to work around.

          Your best bet

          • ...while UTF-8 is great even though it doesn't degrade gracefully for 8-bit characters? That's absurd. If all you care about is the 7-bit characters that UTF-8 supports, why not just use ASCII?

            Because when you're parsing text, you're usually only interested in a few special characters --- control codes, spaces, etc. These are all in the ASCII range. These means that all the UTF-8 extended characters will just pass straight through, unchanged, correctly. You don't need to worry about them. Because UTF-8 co

            • Oh, you're talking about parsing. I thought we were talking about reading. But you're still wrong. If you're parsing anything, you should be using well-tested parsing libraries, not rolling your own crap using libc functions. The problem with re-inventing the wheel is that homemade wheels are not very solid.

              It's absurd to call that kind of garbling "degrading gracefully" just because it's sort of readable. And the UTF-16 version will be perfectly readable, if the sender remembers to add the correct chara

  • web service (Score:2, Interesting)

    by fuct000 ( 950445 )
    store all of the data on a server and write either a .NET or Java EE program to share the information as a web service. Then just have a desktop client people download, which contacts the webservice to request the information.
  • Wiki on a stick (Score:4, Interesting)

    by Glonoinha ( 587375 ) on Friday April 28, 2006 @06:33PM (#15224445) Journal
    Sounds like a perfect application of Wiki on a stick. I set one up in a few hours, most of which I wasn't even sober - and it can install with a zero-footprint (designed to run from a thumbdrive.)
    I have a little more write-up in my Journal, along with links.
  • Copyrights (Score:5, Insightful)

    by AuMatar ( 183847 ) on Friday April 28, 2006 @06:38PM (#15224473)
    Music lyrics, unfortunately, are copyrighted. Every db on the web thats gained real size has been shut down by the RIAA. Whatever you do needs to be hosted out of a country that doesn't do copyrights, or you're dead in the water.
    • Did you not read the original post?

      The application (data and program) will all be stored on a CD or DVD, and it should be able to be run locally.
      • Uh.. did YOU read what he wrote? Distributing copyrighted material will get this guy a lawsuit.
        • Yes, but that's only if you get caught. The data isn't being served up through a central, obvious web server. If the app is open-sourced and distributed along with the data feely around the internet, it will be near-impossible for anyone to shut all copies of it down.

        • Re:Copyrights (Score:3, Insightful)

          by Kelson ( 129150 ) *
          You're assuming he's going to be distributing copyrighted material -- and that he's going to be distributing it without permission. (You can distribute copyrighted material all you want, if you've gotten permission to do so. Otherwise, the publishing industry would be vastly different, and GPL software wouldn't exist.)

          The story doesn't tell us anything about which songs he's going to be including. For all we know, it could be a collection of folk songs or hymns that are already in the public domain.

    • So, a friend of mine wrote one of the first online lyrics servers.

      Here's his story. []

      • Wow. Reading that I suddenly remembered my own experience providing a lyrics service on the web.

        Back in 1995, I put together a website that cross-referenced the lyrics to Les Misérables in English, French and German (all typed in by hand from the CD liner notes). At first it was hosted on webspace at AOL, but I later moved it to some space I had at college. From 1996-2000 I added songs in more and more languages, each time carefully cross-referencing and linking so that you could jump from each song
    • Actually, it is the NMPA (National Music Publishers' Association) and the publishing companies, who shut down the lyrics databases. The RIAA has no jurisdiction over the use of lyrics.

      If you want to start such a database, my advice would be to lay the groundwork for it, software-wise. Then, contact the publishers to get lyric reprint licenses. It would be nice to say that the publishers would be happy to provide you with such licenses, but chances are they would be difficult to obtain, since you are not act
    • Umm... Not all music lyrics are copyrighted. And, as I said elsewhere, the bulk of the songs I am going to be using are no longer (if they ever were) under copyright. Or I will be going through the appropriate hoops for licensing. Also, this is not going to be a web service, so hosting doesn't enter into it at all. Any insights into how best to set up the software rather than nitpicking on the data? Thanks.
  • Heh... (Score:4, Interesting)

    by djsmiley ( 752149 ) <> on Friday April 28, 2006 @06:39PM (#15224483) Homepage Journal

    Lots of websites already do this, why bog your self down with something that has already been done? Unless its for some kind of research project for university/college of course.

    Open source solutions which do the same? Amarok has a "lyrics" tab which brings up the lyrics to the playing song, i think they are pulled from wikipedia but im not sure.

    Also musicbrainz has a huge database of music too, this is why they are seemingly linked in amarok.

    So basicly your not onto a winner with this unless your going to offer something all the hundreds of others fail to offer.

    Amarok, wikipedia and musicbrainz are all open source.

    Im not sure however, how all of these cope with non-english alphabets, which is something lots of people tend to bring up.
    • Umm... You did read my post, didn't you? This is not going to be a web service - it should be run locally on a user's computer from a CD or DVD. As for your concern about whether or not I am "onto a winner", I appreciate the thought, but it doesn't really matter, and I would prefer some suggestions on the software implementation to commiserations.
  • I like how you plan to use open source software so that you can then violate someone else's copyright. You do realize that you won't have the rights to distribute the music and lyrics to these songs, don't you? That is unless, of course, you plan to only distribute songs that are in the public domain. In which case, you'll have a fairly small market (yes, I realize there are some instances where this wouldn't be true--church hymns for instance).
    • Sorry to spoil your day, but I'm not planning on violating anyone's copyright here. I won't bother thanking you for automatically assuming that that was my plan. The songs that I will be including in my application are going to be either public domain, or else (hopefully) properly searched out and licensed. Part of the idea of using OSS is to help keep costs down and provide more time / opportunity / funds for making sure that everything is done properly. If you have any suggestions as to the best way t
  • You will be sued the minute you launch the service. Lyrics are copyrighted, and fiercely protected by the copyright owners.
    • He seems to be saying that everything will be on the person's computer. He doesn't say where the lyrics are coming from. Perhaps the user is to cut and paste them from online. I don't see how this will be much of an improvement on either Googling for lyrics, or using local search on text files on your hard drive.

      However, [] is one database of lyrics that isn't getting sued, like most things anime.
      • I can't recall the name of it (PearLyrics, perhaps?) but there was an excellent program for OS X that integrated with iTunes and would query several sources and download lyrics to songs you were listening to. The author took it down under threat of being sued by the RIAA, if I recall correctly, and it didn't violate copyright in any way imaginable.

        Even if the guy would have won in court, there's likely no way he could have afforded the legal costs, unfortunately, and his programming time was wasted :(.
      • You're right. I didn't say anything about this being a web service. Contrary to the popular opinion that seems to be being voiced here, the songs in the database will not be current pop songs, and I will be trying my best to make sure that no copyrights are being violated. The lyrics and rest of the song information will be pre-entered into the database, and there is no question of the user entering anything in themselves.
  • You think setting up a local database is a security risk, but setting up a local web server isn't? Why? You are aware that databases don't have to be servers listening on public ports don't use? You could use something like SQLite [].

    The important thing is not the implementation itself. It's the data format and/or API. Make the data available, and plenty of people will be willing to write web interfaces, Qt interfaces, GTK interfaces, etc. Expose the API as plain C, and make the data easily importabl

    • Thanks for the tips on databases. I'll look into it. Sorry for my confusion on the database vs. web server security issue. I realize that a local web server is probably just as risky. The idea of using a web browser as frontend is tempting (cuts out a lot of effort), but if it means having to set up a local web server, then it's not as nice looking... As I said, I want to have an application that makes as little impact on a user's computer as possible, and if I can avoid setting up any new services, th
  • by Matt Perry ( 793115 ) <> on Friday April 28, 2006 @07:00PM (#15224599)
    You can start with the MusicBrainz [] codebase. The schema [] already supports albums, tracks, and annotations. You could extend it for your purpose to add lyrics. A daily dump of the database [] is available as is the source code to the server application [].
  • The Mozilla Platform (Score:3, Interesting)

    by Feneric ( 765069 ) on Friday April 28, 2006 @07:08PM (#15224645) Homepage

    Copyright issues aside (I'm assuming that you're talking about lyrics that you have the legal right to use) I'd say that there's a pretty simple answer to your problem. You're thinking through the pros and cons of using a back-end database versus a browser front-end, and you're not keen on running any flavor of server.

    You can get both the database and browser advantages without having to set up a separate server by building your app on the Mozilla platform. You can utilize its built-in RDF capabilities to store your data in a clean, extensible way, and fairly quickly put together a user interface using XUL and CSS that can work with Firefox, Seamonkey, Flock, etc., or even just the XUL app runner for a more stand-alone user experience.

    Because all of your data (and even interfaces) will be XML-compliant, you'll even be making it easier for third party apps to work with your stuff.

    • I second this Mozilla-as-a-platform is ideal for this scenario. While you, or any other potential programmer, may not know mozilla-the-platform: XUL, XBL, RDF and heavy CSS and JS; you or any potential programmer kinda sorta know it all, right now. And you, and your pool of programmers, will still know it 10 or 15 years from now. You cant say that about wxWidgets, QT, or GTK. Note also that XULRunner will support SQLite, Real Soon Now.
    • Thanks, both for the useful information and for the assumption that I am not intending to break copyright. Your described setup sounds like what I am looking for - something that has the database and browser advantages without having to set up local servers. Just a few questions. How well would this work with non-Mozilla-based browsers (such as Opera or Internet Explorer)? Unfortunately, not everyone has Mozilla or Firefox or..., and requiring someone to change browsers could be a roadblock to them. Wh
      • Thanks...

        You're welcome.

        How well would this work with non-Mozilla-based browsers (such as Opera or Internet Explorer)?

        If you know your users are going to be using a bunch of different browsers, it'd probably make sense to build your system around XULRunner. That way it'd be pretty much like a stand-alone app, but you (as the developer) would still get the advantages of having a built-in system to handle HTML, XML, CSS, RDF, etc. and the user would be none the wiser (although it'd be pretty almost

  • Now you've reached the point of actually needing a clue to accomplish it.

    Just pay someone, you obviously don't have a clue.
    • Why is that insightful? That's a textbook troll.

      What's with the "you obviously don't have a clue" part? He obviously DOES. He knows what he's doing. He has some ideas of how to do it. He is just asking for guidance from people who may know more than him. It's called learning. You think people are just born knowing how to write a complex application like this? They have to learn about it.

      Those people he should pay, how did they get their clue?

      If there is some obvious reason why he shouldn't continue, why

      • I disagree. Ask Slashdot used to be specific questions about a technology or how to go about something. Lately, however, it's been one question after another that goes:

        "I'm working on this project that will be able to do X? How do I do it?"

        There's a big difference between learning how to do something and asking somebody else to figure it out for you.

        • Thanks to MBCook (the parent post to your reply) for backing me up a bit here. I would like to think that I have a clue, being somewhat technically proficient, but not being completely up-to-date on all technologies. I'm sorry for offending your sensibilities with my question to Ask Slashdot, but I fail to see how it particularly differs from being a question about "how to go about something". I'm not asking for someone to do this for me (although it would be easier on me if they did :-). As I said, I a
  • ...but it seems to me writing binaries would be a mistake, and that the best route would be browser-based. If you're doing this as a "public consumption" application, requiring novice PC users to learn a special application to do something like this will make part of your audience reticent to try your application. If you tell them "Open your web-browser and go this private web-site at http://www.whatever/ [www.whatever]" it won't seem as imposing--so many musicians have MySpace pages, seeing it as a web-page to visit a
  • by MBCook ( 132727 )
    I would say use Java. That was you don't have to recompile the application for every architecture. You want it to run on Mac OS? Is that PPC or Intel? For Linux is that x86, PPC, Sparc, what? With Java it doesn't matter. Plus it would also run on Solaris and a few others.

    As for the database handling since this will be static (if you want it to run off a CD, it's static) here is what I can think of. You can embed an SQL server (I know there is one, can't remember the name) and do it that way. I don't know i

    • I would say use Java. That was you don't have to recompile the application for every architecture.

      Yes, saving a few milliseconds of CPU time, once, is important.

      Seriously, if Java has any major benefits, that you don't have to compile it isn't one of them. Or at least it wouldn't be one if all computers came with a C compiler.

      Also, Java isn't the only programming language with this property. Python, Ruby, Perl, Tcl ... the only popular languages which normally need a compiler are C and C++.

  • but what's the point? lyrics are copyrighted, and how useful is it to have "annotations" attachted to a song, and is it that hard to just listen to the lyrics isn't that why we have music in the first place?
    • Gee... I dunno... Some us of have a hard time hearing?

      It seems to be a commonly expected thing that you know the words to various songs. It's extraordinarily hard to do that by ear with a hearing loss.
  • Don't get sued by the RIAA, many a lyrics website has been taken down for copyright infringement.
  • Advice for Building a Multi-Platform Lyrics Database?

    Try not to get sued.
  • Focus on the data first, then the program logic. The data format, the license and the means of creation and distribution.

    You mention open source. What about the lyrics themselves? If you are the single provider of that CD or DVD, I don't care if the programs are open source or not. All I care about is that the data is in an open format so I can code against it myself. Closed-format content is useless to me.

The number of computer scientists in a room is inversely proportional to the number of bugs in their code.