Become a fan of Slashdot on Facebook

 



Forgot your password?
typodupeerror
×
News

Is There A Standard for Software Metadata? 92

tagish asks: "I'm sitting here preparing some Java source to release under the GPL and wondering how best to tell people about what I'm doing. It seems to me that there's a compelling need for a simple, extensible standard for software metadata -- an agreed way of describing any piece of code, what it does, who made it, what license(s) it's available under, what platforms it supports, what it is compatible with and so on. The first question then is: does such a standard exist? And if it does why is it not more popular?"

"It's one thing to make this stuff available, but if people can't find it I'm wasting my time. Of course there are places I can go to publicise what I've done (Freshmeat, Jars, Gamelan, and Servletcentral in this case) and those services perform a valuable function, but in practice it is still quite hard for someone to find some code in language X that performs function Y in a way that complies with constraint Z. There's no search engine that finds reusable code based on variable criteria and, given the number of incompatible ways source code can be packaged, described and distributed, little prospect of anyone building one.

Right now, when I release this code, if I want people to find it I have to:

  • write a description of it
  • set up a home page for it
  • register that page with numerous search engines possibly using the description I wrote
  • visit the appropriate repository and announcement sites making submissions at each
  • find out whether there's an appropriate usenet group and post to it

Assuming that such a standard doesn't exist does anyone want to get together with me and devise one. I'm thinking of something (human) language independent, simple, capable of encompassing all types of code, amenable to automatic processing. What about it?"

As much as I agree that something like that should exist, I believe that if you feel strongly about your code, then a home page is a must for your project (as well as writing descriptions about your project and registering it with search engines). A metadata standard would be a big help in this respect, but it's not going to be a replacement for going out there and spreading the word yourself as best you can.

With that said, what current data formats could be extended to serve as such a metadata standard, and if none of them are completely sufficient to handle this type of application, what would such a format need to be robust and flexible enough to serve this purpose.

This discussion has been archived. No new comments can be posted.

Is There A Standard for Software Metadata?

Comments Filter:
  • ...that the humble readme was the universal standard.

  • Sounds like a good idea

    XML would be the current standard to start with, you'd just need to develop a schema that contains the data you want to share. It'd definately help code repositories.

  • Freshmeat.net is good enough for me.

    --
  • I would have to agree with the XML idea -- it is definitely the way to go in the 90's . . . uh, I mean the 00's.

    If you really want to put together a standard schema for describing code, you need to get in touch with the World Wide Web Consortium [w3c.org]. They are currently working on standard schemas for generic databases, generic word processor formats, and just about everything else you could imagine.

    XML definitely fits the bill for this idea. If you design your schema correctly, it could easily be converted into a web page and it could easily be parsed by search engines and web crawlers. I'm done now.


    -----------------

  • This same story was on kuro5hin with exactly the same submission text. [kuro5hin.org]
    Methinks someone's trying to be funny.
  • None exists as far as I know (hmm - most package managers support at least some of what you want, but package management is not the way to go), but try writing up an XML spec for one, and see how it goes. Better yet, screw XML, and just make a colon-seperated format:

    $PROGRAM_NAME:$LICENSE_NAME:(Commercial|Shareware| Freeware|Semi-free|Free|Public-Domain):$ AUTHOR:$WEBSITE:$EMAIL Then hack something together with cpio to include it with your program.

    See? Problem solved.

  • The .lsm files have been standard (find them on SunSite, or whatever it's called this week), and I suppose Freshmeat's format is pretty standard on the web, too...
    ---
    pb Reply or e-mail; don't vaguely moderate [ncsu.edu].
  • by Q*bert ( 2134 ) on Monday September 18, 2000 @02:56PM (#770540)
    It is a very ambitious project. The goal is to make a single format not only for project metadata but also for package metadata, abstracting over RPMs, debs, ports, and the like. The leaning is toward making it XML-based.

    The leader of the project [cfcl.com], SF Perl Mongers' [pm.org] own Rich Morin [mailto], is being very circumspect about it, trying to gather lots of information from experts in different OSs and distributions, and of course working on it in his free time, so the product is not there now--but if you're interested in contributing to such an effort, this would be the place to help out.

    Vovida, OS VoIP
    Beer recipe: free! #Source
    Cold pints: $2 #Product

  • by Scorcher ( 86738 ) on Monday September 18, 2000 @02:58PM (#770541)
    "The LSM is a directory of information about each of the software packages available via FTP for the Linux operating system. It is meant to be a public information resource. All entries have been entered by volunteers all over the world via email using the template below..."

    ftp://ftp.execpc.com/pub/lsm/LSM.README
  • by Wdomburg ( 141264 ) on Monday September 18, 2000 @02:59PM (#770542)
    I'm not aware of one that is cross-platform, though there is one for Linux called the "Linux Software Map."

    The format includes the following fields:

    Title
    Version
    Entered-date
    Description
    Keywords
    Author
    Maintained-by
    Primary-site
    Alternate-site
    Original-site
    Platforms
    Copying-policy

    Given that there is a platform field, despite it being refered to as the *Linux* Software Map, this does qualify on most of the criteria that you mentioned.

    Freshmeat, though not a format, is also a fairly comprehensive database of software which provides much the same information as you mentioned, including:

    Title
    Description
    Author
    Licence
    Category
    Download
    Packages
    Homepage
    Changelog

    Freshmeat, aside from providing updates on their site, also provide them via text files, which are suitable for simple automated parsing.

    Though neither solution is entirely perfect, both are definitely close to what you're looking for.

    What I would like to see is an SQL backend, with a simplified query engine on top of it that returns an XML formated document back. This would take care of the extensibility portion of it, as fields could be added to the backend and XML format, without breaking compatibility with the client.

    Likewise, I would like the database to be available as a download so that mirrors could be created and/or alternative front-ends.

    (E.g. The search functions of Freshmeat aren't always flexible enough for me to easily pinpoint what I am looking for. I would definitely prefer being able to download a snapshot of the database and run custom SQL queries locally.)

    In any case, freshmeat and lsm are likely your best choices for the time being.
  • by Kanagawa ( 191142 ) on Monday September 18, 2000 @03:00PM (#770543) Homepage
    There are several such initiatives under way in the on-line library community -- librarians collect so much cruft and since they tend never to throw things out, they feel an even stronger need than you for good metadata. Dewey Decimal System is one such (very simple) metadata standard (sortof). Anyway, SunSITE.UNC.edu -- now iBiblio.org -- has required Linux developers uploading software to /incoming to include a inux Software Map (lsm) file for quite a long time now. The .lsm file is a basic metadata file in a fairly simple format. So, you might look at that: http://www.ibiblio.org/pub/Linux/LSM-TEMPLATE The Dublin Core initiative is a more generalized attempt to answer the question, "How do we standardize on a metadata format?" Dublin Core is using XML and XML DTDs as the basis for their work. It applies to not only software but also to other online resources. So, as one might guess its arcane and difficult to understand at best and completely impenetrable most of the time. You can find more about Dublin Core at http://www.purl.org/dc Sadly, most search engine companies focus on searching a specific kind of document type -- like HTML -- for arbitrary content. Interestingly, searching metadata is both an easiser computational problem to solve and more productive for the user. Unfortunately, its also a far more difficult social problem. Getting everyone to write common metadata is very, very difficult. Going back and writing metadata for any sizeable archive (say, iBiblio, for example) is a Herculean task. I think most of the coders who write search engines are more interested in the actual mathematics behind searching than they are in actual Document Retrieval. You might also check out http://www.cnidr.org, who were the authors of Isearch and some other good searching tools.
  • Someone else mentioned using XML as the format, and I would second that. It gives you both a human and machine readable format.


    I would also like to point out RDF [w3.org] as an existing use of XML for something very similar to what you're asking for. It can be extended with more tags if you need them. Right now it's targeted more towards web content, but I think it will give you some good ideas.

  • by pen ( 7191 )
    PAD [asp-shareware.org]

    --

  • W3C Resource Description Framework [w3.org] is the nearest thing to what you want; see also RDF and Metadata [xml.com] by Tim Bray.

    The most notable places where RDF is presently used for real things (as opposed to "we'd like it to be used here vaporware") include:

    The latter is exceedingly relevant, as it represents an encoding of metadata about Linux software packages in RDF form.

  • This may shock you, but both sites get stories from user submissions. So all this means is that the some person posted both stories to Slashdot and kuro5hin, big freaking deal.


  • OSD [microsoft.com] (Open Software Description), implemented by Microsoft in XML. Of course, it is the work of Satan.

  • Eric Raymond and several other bright folks have had a project related to this going for about two years now, called Trove [tuxedo.org].

    Trove is a next-generation Internet software archiving facility, intended to supplant the classical FTP-tree-with-decorations model.


    There's even real code available, in Python, which I confess I haven't looked at, so I'm vauge on what it does or soesn't do yet. I suspect there's that which is worth a look.

  • I'd check out the Linux Software Maps, hosted at metalab (now ibiblio). Mor interesting is probably the Dublin Core Project. Although It's not set up specifically for software, it's extinsible. You could create an XML DTD (which many have suggested) using the Dublin Core standard for sucha purpose.

    links:
    iBiblio Linux archives [ibiblio.org]
    Dublin Core homepage [purl.org]
  • But I thought that the humble readme was the universal standard.

    That was my thought too. Problem is that's really simple and obvious. We need something complicated. It should probably use XML and go through a standards comittee to get it right.
  • I think XML is definitely the way to encode the data. However, you may run into problems when you need something that is '(human) language independent'. How is software categorized now? Often by keyword - what (human) language shall these keywords be stored in when it's put into the XML document? Can we simply send it to the fish if someone queries on keywords in a different language?

    Describing databases and word processor formats is pretty easy - the 'language' for these is finite (and pretty small at that). We can't assume that we can simply iterate over all algorithms in all languages using all data types to devise the schema. (Human) language is the only method for supporting such a large problem space. So the question now becomes 'How do we take human language and make it human language independent?'

  • by Anonymous Coward
    Yes, of course it should be in XML. But saying "XML" is about as useful as saying "ASCII". You stiil need to define a good DTD on top of that, and decide what is reasonable info to keep.
  • While there are no set-in-stone standards for describing electronic media, including code, there are some evolving standards in this area. One place to start would be to look at the Dublin Core [purl.org] website - this is a descriptive schema that provides a 'core' or basis for metadata schemes. Also look at the work on RDF (links can be found from the Dublin Core site) for info on structuring metadata.
  • Doesn't seem to have an element to store the license the software is distributed under. Or a homepage. Or a download link. Or a file size.

    I guess XML is a good idea, but how about trying to mirror everything one can enter for a freshmeat.net entry? And consider what .deb, .rpm etc. have to offer.
  • www.freshmeat.net

    Duh.
  • Moderate this UP. This actually isn't as flaming as it comes off to be. Everyone is saying "use XML" but the reality, XML doesn't define anything. XML just makes it standard. In the real world, DTD's need to be written, and this is a HARD task. Fortunately, there are already metadata standards for XML, most notably RDF and Dublin Core. Check out purl.org for Dublin Core [purl.org] info.
  • There is already an XML-based format. It's called PAD (Portable Application Description), and is available here [asp-shareware.org]. Here [asp-shareware.org] is an example. Granted, it was developed by ASP (the Association of Shareware Professionals), but I doubt that they have anything against the standard being used for something besides shareware. There are even tools to help you generate PAD files.

    --

  • my god, you are a fucking moron. please re-read my post and you will find the second paragraph considers and discounts your 'explanation'.

    Abashed the Devil stood,
    And felt how awful goodness is
  • by account_deleted ( 4530225 ) on Monday September 18, 2000 @03:23PM (#770560)
    Comment removed based on user account deletion
  • Part of the .NET platform is a standard for metadata - "like IDL and type libraries on steroids..." This will allow cross-language, cross-platform code integration. VB can call C++ or Java directly, using the metadata information. At least in theory - that's what COM was supposed to be for :) Info at http://msdn.microsoft.com and MSJ, et al.
  • by fm6 ( 162816 ) on Monday September 18, 2000 @03:24PM (#770562) Homepage Journal
    I find it kind of ironic that this question is asked in connection with Java software. The specifications for both Java and Java 2 include conventions for software metadata: Javadoc comments. These do not support all the information Tagish wants to record, but they do support a lot of it. You can argue that Javadoc is for APIs, not for programs -- but in the Java world, a program is just a class that's meant to be called from a command line launcher.

    Perhaps people find the Javadoc Conventions [sun.com] to be just a little confusing?

    (Anybody who knows me knows I have a personal bias on things Javadoc. Probably not worth discussing on Slashdot. I mention it just to keep myself honest.)

  • Some parts of such a metadata standard are easy: language, compiler, platform, architecture, etc. But once you start trying to document the actual functionality of your code, you get into some sticky territory that is still the domain of researchers [cmu.edu] at a number of universities [mit.edu]. The problem first is to devise a language powerful enough to facilitate formal methods [ox.ac.uk]. The next problem is actually convincing people that it's worth all the effort to formalize their specs (I think it is, but there are many who disagree). The last problem is coming up with a search algorithm that is able to match specs. For this part, you can't just use a string match or unification algorithm... there's some deeper semantic and structural analysis that needs to be done to determine that a certain fragment of code meets the constraints you want. To make the whole problem even worse, we don't even know if such an algorithm is computable! So, a full-blown metadata standard seems a bit out of the question now, but if you're willing to lower your standards a bit, I bet you can whip up a more practical implementation (with some natural language thrown in).
  • The problem in using XML here is the same in any other application of the technology that by nature requires it be free from language bias: in what language do you create your standard? English? Indian? Cantonese? True w/ XSLT translators you can switch from one language DTD to another, but this is far from a 'universal' standard.

    Not that I have a solution to this problem ... I just can't help but think XML isn't the answer to _everything_ ... especially if English isn't your first language, though it is becoming the de facto standard for meta-data representation.

    - jc
    ---------------------------------------------- -----------------
    James C. Diggans
    jdiggans@excelsior-web.com
  • ... for hosting your open-source project or source code 'snippets'. They offer a wide range of services for free if you host your project there, including bug tracking, file/version archiving, etc.

    You can find them here [sourceforge.net].

    I haven't seen project documentation [templates / standards / requirements] on their site, but perhaps you can be the one to create them.
  • A few years ago I was involved in a project (non-CS) concerning metadata attributes for a particular type of data. Much of this past experience is pertinent to this discussion. At the base level some things are obvious:

    Name, Title, Organization, Address, Phone #, Fax #, URL, Date, etc... These metadata get assigned to the higher level of metadata such as: Originator, Copyright holder, Maintainer, Mirror Sites, etc...

    It gets more complicated at the next description level. For instance, a set of metadata for software would be something like: Programming language, Operating System, Compiler, Library requirements (dependencies), Hardware requirements, License, Distribution restrictions, Lines of code, etc... Along with this would be tags like version number.

    Then comes the software description: Application type (e.g., graphic converter, audio playback), User interface, Data input, Data output, Data formats, Batch/Interactive, Algorithms, Previous versions, Code stolen from, etc...

    Metadata should be flexible enough to take in new types. Metadata sometimes points to more metadata which points to more metadata. Not all the metadata attributes need to be filled in. One should strongly attempt to standardize some of the key words. Metadata are a bitch to come up with.:-)

  • Ooooh, did para_droid dare to point out your stupidity? Bad man para_droid! Bad man! Bitchslap para_droid!

    Abashed the Devil stood,
    And felt how awful goodness is
  • ..and comment your code in javadoc or QT style.. Javadoc style is better IMO..
  • The FILE_ID.DIZ will get the job done.
  • If you create a system like this, it would be useful for finding EVERYTHING.
  • If post #39 had contained that infomation it *wouldnt* have been so moronic and I wouldnt have had to reply in the way I did.

    You have suggested to me one possible explanation: Slashdot is full of losers who find this story interesing. Kuro5hin is read entirely by Slashdotters. Those same Slashdotters voted the story in.

    This makes me wonder about something else. Why does no-one apart from me have the guts to post abuse logged-in? Are you afraid of losing your precious 'karma'?

    Abashed the Devil stood,
    And felt how awful goodness is
  • by Jason W ( 65940 )
    I'm suprised no one has mentioned CVS yet. You can see every change made to every file, in numerous forms. I'm no CVS expert, but it seems like it does (or can be easily modified to) contain information about every characteristic about a file. And since you download the CVS directories when you download a package (most packages that use CVS don't bother to remove the CVS directories), the information is carried over to the local machine for access there.

    Of course, this is most relavent to Open Source projects that make extensive use of CVS, but in a few years there will be no conflict to worry about.

    ----

  • What part of XML don't you understand? Microsoft plans to use it for just about everything... If you want to be left behind, don't use XML.
    --
    Peace,
    Lord Omlette
    ICQ# 77863057
  • by DeadSea ( 69598 ) on Monday September 18, 2000 @04:03PM (#770574) Homepage Journal
    • write a description of it
    • set up a home page for it
    • register that page with numerous search engines possibly using the description I wrote
    • visit the appropriate repository and announcement sites making submissions at each
    • find out whether there's an appropriate usenet group and post to it
    How about
    • Put it in your slashdot sig.
  • Freshmeat.net is an excellent information source. Combined with sourceforge, or your own homepage, who needs more?
  • by eyeball ( 17206 ) on Monday September 18, 2000 @04:14PM (#770576) Journal
    I believe this is what you want: Open Software Description Format (OSD) [w3.org] from w3.org.

    Abstract: This document provides an initial proposal for the Open Software Description (OSD) format. OSD, an application of the eXtensible Markup Language (XML), is a vocabulary used for describing software packages and their dependencies for heterogeneous clients. We expect OSD to be useful in automated software distribution environments.
  • by Anonymous Coward
    I've looked at Project Meta and its ilk, and they do sound like a technologically wonderful proposal, but think for a moment about some of the political ramifications:

    If it can be described, then it can be classified

    If it can be classified, then it can be pigeonholed

    If it can be pigeonholed, then it can be demonized

    If it can be demonized, then it can be surpressed

    By registering your source code, you're giving cartels like the MPAA and the RIAA a heads-up about your code and all the sorts of "undesirable" uses it can be put to. Why do their work for them? Give your code a chance to live and breathe on its own, and maybe it'll flourish into something big before they catch wind of it and move in on you with their lawyers.

    For proponents of free software, metadata are antithetical to freedom-through-obscurity.

  • I don't mean this to be flamebait or a troll, please don't read it this way.

    I'm not sure about most people and can't speak for any of them, but I've personally never visited either the Linux map nor the Meta project.
    For me (and millions of others?) the tried-and-true method of software distribution tarball, website, and all-important README file have been sufficient all these years.

    Of course it's not efficient, it doesn't encourage searchability, etc... But it's what everybody uses and is used to.

    I personally haven't seen enough discipline in the Open Source community with regards to a metadata project. The RM and Meta projects are great, but people need to use them.
  • SourceForge has adopted the Trove taxonomy [sourceforge.net], so e.g. you could easily find out if there are any mature graphical Java apps under the X license that frob frobnitzes.
  • When you write a program to pull dependency information out of any README in such a fashion that you can build it into an autoconfiguration tool, please pick up your phd. I bet you think vi is for people too fancy for ed, don't you?
  • Isn't that what open source is all about? RTFC!(Read The Fucking Code)

    Reading source code requires moving source code somehow to your local host. Bandwidth costs money. What the OP is looking for is metadata, or something small that describes the code in a well-defined, searchable form.


    <O
    ( \
    XGNOME vs. KDE: the game! [8m.com]
  • How different is this from "X was designed on Windows! It sucks!"?

    If you look deeper into this issue, it sure makes you think. Doesn't it?
  • Microsoft COM, Microsoft .NET, what's next?

    Microsoft.org: Microsoft [microsoft.com] will blow big bux0rz on .NET (which requires an always-on, high-speed connection, making it inaccessible to a large number of Windows customers), stop making profits, and will have to become a nonprofit.


    <O
    ( \
    XGNOME vs. KDE: the game! [8m.com]
  • I'm thinking it would be nice if certain metadat could be embedded in the binary after it's been compiled. Say for instance, you try to open up a binary file using your web browser, instead of it not knowing what to do, you could have the first X number of bytes be a document in XML or HTML, with various info about the binary, maybe including things like; what other files are part of this program, and where they're located (this would have to be dynamic of course, created at compile time for instance?), the author, web sites, dependencies, etc.

    Just a thought.

  • that would mean that you have to move to some sort of symbolic token language. The fact is, the problem is even greater with programming languages. If you program in C/C++, Pascal, COBOL etc, you're really programming in the English version of that language. That leads to very bizzare and sometimes funny code in languages other than English, where keywords are English but identifiers and comments are in some other language.

    I think what's important to realize here is that we're not trying to find the meaning of life. XML is simply a standardized way of tagging information and it's not perfect. But the quest for perfection can sometimes prevent us from arriving at a solution at all. The whole struggle to standardize on various industry-specific markup languages is difficult enough and has led to enough feuds and confusion. Let's not make it even more difficult by obfuscating the whole issue with another order of complexity. Once XML has done its job well, we can worry about the finer points.
  • If you're going to use an existing standard (and really, you should) then Dublin Core is the standard to use. Despite what the previous poster said, it is not particularly confusing, and it has fields that are appropriate to software (e.g. Creator, Contributor, Version, Rights).

    ibiblio is also working on something they call the Opensource Metadata Framework which seems to be based on or even a subset of Dublin Core. I don't know why they didn't just use Dublin Core. See http://www.ibiblio.org/osrt/ldpcore/ldp_elements for the spec.

    Frankly, you're getting into the area of librarianship, so you should ask a librarian. (IANAL.) You might do this at lisnews.com or at oss4lib.com; particulary the latter. By the way, it's my experience that programmers are bad librarians - even digital library programmers like myself - who think they are good librarians (we invented search engines, didn't we?) so take everything you read in this forum with a grain of salt.
  • I'm sitting here preparing some Java source to release under the GPL
    Java.. GPL.. problems talks problems, questions, concerns.. ehhhhh.. geez man.. just code !!
    Are we coders or politicians ?
    Just code !!!
  • Of course, if you're going to refer to X, you shouldn't refer to it as Windows:

    The X Consortium requests that the following names be used when referring to this software: X X Window System X Version 11 X Window System, Version 11 X11
  • The only thing that makes this a little curious is that both posts were made within minutes of each other. Both sites are invested in by VA. It just looks too strange.....
  • Kuro5hin's [kuro5hin.org] coverage of this is quite extensive.
  • .nfo files were great until win2k decided to use that extension for system iNFOrmation files.
  • You wouldn't think anybody would be dumb enough to set plain .html files with a php mime type would you?

    Thanks for pointing it out, I don't host my own site, so I wasn't the one that set up apache like that, but I was able to add .htaccess files to each diretory to reset the mime type for html correctly.

  • I've always thought it would be cute for linkers to be aware of software licenses, and of license incompatibilities. License identification would be embedded directly within object files and libraries, presumably put there by license-aware compilers. If you tried to link GPL-incompatible application code with a GPL-covered library, the linker would report an error.

    Yes kids, that's why I'm at Berkeley. And if you patent my idea and make millions off of it, I'll sue you silly. :-)

  • I believe this question could be extended to more general one: How to assign description fields to any digitized data?

    I see no reason to restrict yourself to program code only. It could be extremely useful to be able to structure any information you want: consumer, scientific, entertainment, whatever.

    Also, it is not mentioned directly, but the spirit of such standard would imply that you also have ability to process this information. That is be able to search. In a parlance of our times this means "search from the web"

    Is there any open source solution for web accessible searchable database with user updateable and flexible structure?


  • I already have a system that does most of what is described in the story for in-house projects i do. The system uses XML in the back end and adds the file into the package. I also have an automated program that creates a readme.txt and readme.html from the data.

    The XML holds a short changelog, date/times of all builds, long descriptions of changes for each build, misc project information, a major and a minor general description field, and some other minor information. the generator appends the file size of both the final package and all files contained. This system is designed for in-house only, and is in no way currently useable as a solution for public use (part of a requirement i've made to co-ordinate development between our 4 coders). also, many of you may be disapointed to know its win32 only, as that is all the work we do internally. But if anyone wants to know some of the specifics, of such a system, let me know.
  • by JustBen ( 216031 )
    take a look at UML the Unified Modeling Language great for documenting software.
  • Just to clarify, the OMF is heavily based on Dublin Core. We actually generate a subset of DC metadata, formatted in XML using a DTD that we developed in house. The theory behind dublin is that you can extend it or create a subset to fit your needs, which is what we did. Thanks for the heads up on OMF, though
  • ...fileid.diz :-)

    Aaaaah, for the glory days of the local bbs scene...

    gfunk007
  • by Ent ( 88363 )
    Depending on what your software does you might want to take a look at UPnP.org [upnp.org] and see if their is a DSP template for your software yet. This is mostly for devices but, again, if your software can be shown as a device (might consider a streaming media server a device) this would be where you would want to go. Thanks, Kyle
  • Point taken. Back to the drawing board. I guess I should've done a bit more research... ^_^

    --

  • Well, freshmeat has it's appindex as text here: freshmeat.net/backend/ [freshmeat.net]

    That should do the trick :)
  • by samantha ( 68231 )
    I've been thinking along the same lines for a while now. Open Source can promote a great deal more reuse and expanded component libraries. But this assumes that potential users and clients can find the existing resources and know what they have.

  • Of course you should be using UML to DESIGN the software not just to document it!
  • Unfortunately the code is doing almost nothing and there is no progress on this for the last 2 years. So this not a solution.
  • Microsoft is part of the MDC (Metadata Coalition) which has a standard called Open Information Model (OIM) for storing metadata about various things. Each OIM sub-model has an XML DTD used for importing and exporting metadata from a repository and a mandates using SQL to query the repository. You could creation an OIM class model and DTD. You could also look at the OMG equivalent. XML on it's own is not enough - it's a data encoding standard, not a storage mechanism for a repository!
  • by Alesha ( 4187 ) on Tuesday September 19, 2000 @02:05AM (#770606)
    Sorry for self-quotation (from the TERENA [terena.nl] Technical Report FTP Mirror Tracker [terena.nl]):
    Unfortunately, there is still no coherent architecture for mirroring and for mirror sites to register their collections with the sites which they mirror. In fact, we lack even a common (de facto) standard for recording this replication information in a machine readable for-mat. Some progress was made on this a few years ago by the Internet Engineering Task Force s [1] working group on Internet Anonymous FTP Archives, with the creation of the so-called IAFA Templates [2]. These provided a simple machine readable format for recording per-resource or collection metadata, which could easily be created by hand or programatically. Although support for IAFA templates was integrated into some software packages, e.g. the ALIWEB search engine [3] and the ROADS resource discovery sys-tem [4] , this approach never became successful on a large scale. The World Wide Web Consortium s Resource Description Format (RDF) [5] and the Dublin Core metadata effort [6] may eventually provide a viable machine readable interchange format.

    Currently, the database underlying the freshmeat.net weblog [7] is perhaps the closest thing we have to a genuine mirror registry - though it focuses almost exclusively on soft-ware packages and operating system distributions, and only offers limited mirror informa-tion. RDF is also being used in this capacity as part of rpmfind.net [8], although mirror information is very limited in this case too. The Internet Engineering Task Force s Uni-form Resource Names effort [9] is also relevant here, since it would be very useful if there were persistent and location independent names for these collections of replicated resources.

    [1] http://www.ietf.org/ [ietf.org] Internet Engineering Task Force website
    [2] http://info.webcrawler.com/mak/projects/iafa/ [webcrawler.com] IAFA Working Group & IAFA Templates homepage
    [3] http://aliweb.emnet.co.uk/ [emnet.co.uk] ALIWEB website
    [4] http://roads.opensource.ac.uk/ [opensource.ac.uk] ROADS website
    [5] http://www.w3.org/RDF/ [w3.org] World Wide Web Consortium Resource Description Format (RDF) homepage
    [6] http://purl.org/dc/ [purl.org] Dublin Core website
    [7] http://freshmeat.net/ [freshmeat.net] freshmeat.net website P. Lenz & Andover Advanced Technologies, Inc.
    [8] http://rpmfind.net/ [rpmfind.net] rpmfind.net website
    [9] RFC 1737, Functional Requirements for Uniform Resource Names K. Sollins & L. Masinter December 1994

    Another attempt to create a framework for such a metadata was an "Open-Software-Index" [maruhn.com] that Oliver Maruhn and myself tried to create almost 2 years ago. After this document some discussion had started (code name "Russian Freshmeat") that had shifted mostly to localisation of such a metadata. Unfortunately no working code was produced.

    And at the end somewhat less relevant to the topic.

    This kind of metadata should be extremely valuable for implementation of the URIs and particularly for the I2C(s) (URI tp URC). Quote from the RFC 2483:

    "Uniform Resource Characteristics are descriptions of resources. This request allows the client to obtain a description of the resource identified by a URI, as opposed to the resource itself or simply the resource's URLs. The description might be a bibliographic citation, a digital signature, or a revision history. This memo does not specify the content of any response to a URC request. That content is expected to vary from one server to another."
    Hopefully we already have mechanism for the I2L(s) (FTP Mirror Tracker [squid.itep.ru]).

  • I'm part of a project working on a xml-format called CSDF, Common Sofware Description Format, which in many ways are similar to RDF. It's still in it's infancy, but coming along nicely. The goal is to have a repository where people can register their CSDF-file and users of the repository can define what categories or software packages they're interested in. Then the server will send emails when new releases appear and update central database. Automatic downloading of updated and/or new packages is also supported. Different distributions and platforms are implemented too. Finally the entire thing will define a query-format so that users can easily search and filter a repository. It's inspired by the Debian-package format, so that dependencies etc. will be supported too.

    Right now I'm working on a proof-of-concept kinda thing that will test the current implementation of the xml-format and the tools. Another guy is working on a Windows-implementation, using the same standard.

    If all goes well I'll have something to post by the end of this week. Keep an eye out on Freshmeat.

  • The book where David Korn et al describe a lot of the things they built and use at AT&T research I can't recall the name and my copy of the book got "lost" at a previous job however there was a software repository described that allowed fuzzy searching on characteristics. It may have some useful information in it. Even if that isn't exactly what you want the book is valuable source of information and ideas.
  • The National Institute of Standards and Technology [nist.gov] has a division in the Info.Tech. Lab that has metadata as one of their projects [nist.gov]. Looks like they're thinking XML.
  • I tend to find that the quality of comments / descriptions / keywords on freshmeat varies widely. While freshmeat is an excellent resource, a more formal, rigorous standard for describing code would, I think, improve things even further.

    Also, freshmeat deals mainly (almost entirely?) with *nix code, particularly linux. Some of us have to code for windows too, and a cross-platform search method would be *really* handy...
  • >Well, freshmeat has it's appindex as text here:
    >freshmeat.net/backend/

    I showhow didn't notice that :) Doesn't have the more lenghty descriptions, but I think I can live with that.

    Thanks.
  • by Anonymous Coward
    ...on freshmeat at http://freshmeat.net/search/?q=verinfo. It supports all the data items the original poster mentioned, and more.
  • While XML, PAD, and readme do have their uses what you really want to look at is how something like this could be automated and produce a machine readable format that might be rendered into any number of languages (including into other programming lanuages).

    Think about what a compiler does. It translates silly little 'human readable' files into machine readable instructions. Yes, compilers have lots more features than that, but the key feature is that it turns my poorly thought out ideas about a problem into a set of instructions that my processor can deal with. So what you want to do is create some sort of meta-file which isn't machine code, per se, but a representation of precisely what the code is doing. You'd want some way of including comments from code into a description block for the main program and for each function (maybe some sort of encoding scheme so that it could be language independent)

    The renderer would be able to take that and generate english text OR mandarin OR turkish OR C++ OR JAVA OR COBOL OR (god help us all) PL/I.

    So those of you who might have once written a compiler (I'm sorry) take a look at what that process is and think about what you might be able to do. I'd imagine something like:
    PROGRAM BOBSHOE
    some user defined text
    pseudocode
    USES OBJECT BLAH
    USES OBJECT ANOTHERBLAH
    OBJECT BLAH
    etc. So, what do you think?

  • Do any of the package managers do this? Should they? Or is this all part of one big problem of which package managers and all this other stuff are only pieces...
  • I dispute your last clause. Even historically, most things that are demonized are never successfully suppressed. Just look at Satanism: the churches invent a new, evil, religion so that they can accuse their enemies of practising it. What happens ? People say "that sounds fun" and actually start doing it.

    Look at all the publicity around Napster and DeCSS. They may be illegal, but if the MPAA and RIAA had never tried to suppress them, they'd be little niche causes, supported by a few geeks. Now they're turning into (small) mass popular movements.
  • XMI is a recently developed metadata standard by OMG. It's encoded on XML and is used on some tools: ArgoUML(http://www.argouml.org) Case Tool, Rational Rose and there's a tool from IBM, XMI Toolkit (http://alphaworks.ibm.com/tech/xmitoolkit), which automatically extracts metadata from a Java program. The standard is rather complex. To be productive with it, you have to understand things like metadata architecture, the OMF meta-meta-metadata standard (also by OMG). The alternative, XIF, used on its repository tool, is not much simpler, but there's very little information available to write a XIF-compliant tool.
  • XIF is used by the Microsoft Repository
  • You left out the solution SourceForge.net [slashdot.org] is working on, called Trove, or simply the Software Map. It contains fields for Development Status, Environment, Intended Audience, License, Operating System, Programming Language, Topic, and Description, and is centrally served from http://sourceforge.net/softwarema p/trove_list.php [sourceforge.net].

    Several of the categories are even hierarchical, which helps validate the values used. Another benefit is that if the license is open 'enough', you can host your web page and downloads at SourceForge, at which point it will help you track versions and release notes.

    Wow -- that sounds like I'm a SourceForge PR person. Please understand that I'm not necessarily advocating them as the best solution -- I think freshmeat and lsm are extremely valuable. I just wanted to make sure the SourceForge solution was mentioned.


    --Chouser
  • XSA [garshol.priv.no] seems to be focused on offering a mechanism for auto-updating installed software packages. I don't think it does a very good job of describing dependancies; personally, I'd like to see something that duplicates the functionality of the RPM .spec, but in XML. I think the W3C's OSD [w3.org] is pretty close, although I've only scanned the spec.

    Whatever format is used, it should

    1. Provide URLs for locating software and facilitating auto-updates
    2. Provide for digital signatures
    3. Provide obvious stuff, such as version, name, authors, and descriptions
    4. Provide some mechanism for describing dependancies. This is a tough one; even RPM doesn't do a very good job of resolving dependancies, which I believe is due in part to its format for describing them.
    5. Be XML based for machine readability, human-editable, and platform independant.
  • XMI is part of the OMG's Unified Modeling Language Specification 1.3, and it stands for XML Model Interchange. It is intended as a mechanism to reliably transport UML models between tools.

    The OMG metadata standard is the Meta Object Factory version 1.3. It allows the specification of the meta-meta data and provides IDL interfaces for accessing a repository based on the MOF definition.
  • The title is right, but the content is wrong. XML is not a meta-data standard. XML just provides the syntax, not the semantics. And Metadata needs semantics. What I said is that XMI is a meta-data standard. It's not a generic metadata standard for describing web resources like Dublin Core or IAFA. XMI is more suited for coding in XML the metadata of OO development tools. In the ISO metadata architecture, metadata standards are grouped in 4 levels, according to what they can describe. As much as UML (level 3) is a metadata standard for describing OO models, MOF (level 4) is a metadata standard for describing metamodels like UML. But MOF is not limited to UML. It can be used to describe entity-relationship models, data warehouse models, component models and even generic metadata. XMI is a mechanism for mapping ANY meta-model which can be described by MOF in XML. One might object that it's not a metadata standard because it cannot by itself describe anything without a meta-meta-model like MOF behind the scenes. I think this objection is useless. There are metadata standards for description, like MOF, and metadata standards for encoding, like XMI.
  • Apparently Microsoft's new .NET platform incorporates some kind of meta-data for "assemblies" (ala packages in Java I think). It might be worthwhile looking at what they've done. I believe it's XML based and includes documentation, authorship, licencing and other required assemblies amongst other things.
  • please re-read my post and you will find the second paragraph considers and discounts your 'explanation'.

    Idiot Alert! Learn something about both sites before you try to figure them out. Slashdot has editors that post stories. This was a question, so it was posted by an editor to "Ask Slashdot" a section for questions. Kuro5hin, OTOH, works by having the users vote on the stories to determine whether they should be posted. The users apparently decided the story should be posted.

    So, to repeat what I just said so that your small pea-sized brain can comprehend it, their was one editor, and a bunch of users.

    g'd day!
  • Ha ha ha, very funny.

We are each entitled to our own opinion, but no one is entitled to his own facts. -- Patrick Moynihan

Working...