Is There A Standard for Software Metadata? 92
"It's one thing to make this stuff available, but if people can't find it I'm wasting my time. Of course there are places I can go to publicise what I've done (Freshmeat, Jars, Gamelan, and Servletcentral in this case) and those services perform a valuable function, but in practice it is still quite hard for someone to find some code in language X that performs function Y in a way that complies with constraint Z. There's no search engine that finds reusable code based on variable criteria and, given the number of incompatible ways source code can be packaged, described and distributed, little prospect of anyone building one.
Right now, when I release this code, if I want people to find it I have to:
- write a description of it
- set up a home page for it
- register that page with numerous search engines possibly using the description I wrote
- visit the appropriate repository and announcement sites making submissions at each
- find out whether there's an appropriate usenet group and post to it
Assuming that such a standard doesn't exist does anyone want to get together with me and devise one. I'm thinking of something (human) language independent, simple, capable of encompassing all types of code, amenable to automatic processing. What about it?"
As much as I agree that something like that should exist, I believe that if you feel strongly about your code, then a home page is a must for your project (as well as writing descriptions about your project and registering it with search engines). A metadata standard would be a big help in this respect, but it's not going to be a replacement for going out there and spreading the word yourself as best you can.
With that said, what current data formats could be extended to serve as such a metadata standard, and if none of them are completely sufficient to handle this type of application, what would such a format need to be robust and flexible enough to serve this purpose.
But I thought... (Score:2)
XML all the way (Score:2)
Sounds like a good idea
XML would be the current standard to start with, you'd just need to develop a schema that contains the data you want to share. It'd definately help code repositories.
Freshmeat.net (Score:1)
--
Re:XML all the way (Score:1)
If you really want to put together a standard schema for describing code, you need to get in touch with the World Wide Web Consortium [w3c.org]. They are currently working on standard schemas for generic databases, generic word processor formats, and just about everything else you could imagine.
XML definitely fits the bill for this idea. If you design your schema correctly, it could easily be converted into a web page and it could easily be parsed by search engines and web crawlers. I'm done now.
-----------------
Crosspost. (Score:1)
Methinks someone's trying to be funny.
Try some XML (Score:2)
$PROGRAM_NAME:$LICENSE_NAME:(Commercial|Shareware| Freeware|Semi-free|Free|Public-Domain):$ AUTHOR:$WEBSITE:$EMAIL
Then hack something together with cpio to include it with your program.
See? Problem solved.
There are... (Score:1)
---
pb Reply or e-mail; don't vaguely moderate [ncsu.edu].
Check out Project Meta. (Score:5)
The leader of the project [cfcl.com], SF Perl Mongers' [pm.org] own Rich Morin [mailto], is being very circumspect about it, trying to gather lots of information from experts in different OSs and distributions, and of course working on it in his free time, so the product is not there now--but if you're interested in contributing to such an effort, this would be the place to help out.
Vovida, OS VoIP
Beer recipe: free! #Source
Cold pints: $2 #Product
The Linux Software map (Score:4)
ftp://ftp.execpc.com/pub/lsm/LSM.README
Well... (Score:5)
The format includes the following fields:
Title
Version
Entered-date
Description
Keywords
Author
Maintained-by
Primary-site
Alternate-site
Original-site
Platforms
Copying-policy
Given that there is a platform field, despite it being refered to as the *Linux* Software Map, this does qualify on most of the criteria that you mentioned.
Freshmeat, though not a format, is also a fairly comprehensive database of software which provides much the same information as you mentioned, including:
Title
Description
Author
Licence
Category
Download
Packages
Homepage
Changelog
Freshmeat, aside from providing updates on their site, also provide them via text files, which are suitable for simple automated parsing.
Though neither solution is entirely perfect, both are definitely close to what you're looking for.
What I would like to see is an SQL backend, with a simplified query engine on top of it that returns an XML formated document back. This would take care of the extensibility portion of it, as fields could be added to the backend and XML format, without breaking compatibility with the client.
Likewise, I would like the database to be available as a download so that mirrors could be created and/or alternative front-ends.
(E.g. The search functions of Freshmeat aren't always flexible enough for me to easily pinpoint what I am looking for. I would definitely prefer being able to download a snapshot of the database and run custom SQL queries locally.)
In any case, freshmeat and lsm are likely your best choices for the time being.
iBiblio's LSM and Dublin Core (Score:4)
Resource Description Framework (Score:1)
I would also like to point out RDF [w3.org] as an existing use of XML for something very similar to what you're asking for. It can be extended with more tags if you need them. Right now it's targeted more towards web content, but I think it will give you some good ideas.
PAD (Score:2)
--
RDF - Resource Description Framework (Score:3)
The most notable places where RDF is presently used for real things (as opposed to "we'd like it to be used here vaporware") include:
The latter is exceedingly relevant, as it represents an encoding of metadata about Linux software packages in RDF form.
Re:Slashdot stealing stories [Offtopic] (Score:1)
Try this... (Score:2)
Trove (Score:2)
There's even real code available, in Python, which I confess I haven't looked at, so I'm vauge on what it does or soesn't do yet. I suspect there's that which is worth a look.
2 sugestions - LSM and Dublin Core (Score:1)
links:
iBiblio Linux archives [ibiblio.org]
Dublin Core homepage [purl.org]
Re:But I thought... (Score:2)
That was my thought too. Problem is that's really simple and obvious. We need something complicated. It should probably use XML and go through a standards comittee to get it right.
Re:XML all the way (Score:1)
Describing databases and word processor formats is pretty easy - the 'language' for these is finite (and pretty small at that). We can't assume that we can simply iterate over all algorithms in all languages using all data types to devise the schema. (Human) language is the only method for supporting such a large problem space. So the question now becomes 'How do we take human language and make it human language independent?'
Re:XML all the way (Score:2)
Some Standards are out there (Score:1)
License?! (Score:2)
I guess XML is a good idea, but how about trying to mirror everything one can enter for a freshmeat.net entry? And consider what
Need I say it? (Score:1)
Duh.
Re:XML all the way (Score:1)
Re:XML all the way (Score:2)
--
Re:Slashdot stealing stories [Offtopic] (Score:1)
Abashed the Devil stood,
And felt how awful goodness is
Comment removed (Score:3)
Microsoft.NET has a metadata standard (Score:1)
Poor javadoc (Score:3)
Perhaps people find the Javadoc Conventions [sun.com] to be just a little confusing?
(Anybody who knows me knows I have a personal bias on things Javadoc. Probably not worth discussing on Slashdot. I mention it just to keep myself honest.)
Formal Specs (Score:2)
Re:XML all the way (Score:1)
Not that I have a solution to this problem
- jc
---------------------------------------------
James C. Diggans
jdiggans@excelsior-web.com
Check out Source Forge.... (Score:1)
You can find them here [sourceforge.net].
I haven't seen project documentation [templates / standards / requirements] on their site, but perhaps you can be the one to create them.
metadata primer(?) (Score:2)
Name, Title, Organization, Address, Phone #, Fax #, URL, Date, etc... These metadata get assigned to the higher level of metadata such as: Originator, Copyright holder, Maintainer, Mirror Sites, etc...
It gets more complicated at the next description level. For instance, a set of metadata for software would be something like: Programming language, Operating System, Compiler, Library requirements (dependencies), Hardware requirements, License, Distribution restrictions, Lines of code, etc... Along with this would be tags like version number.
Then comes the software description: Application type (e.g., graphic converter, audio playback), User interface, Data input, Data output, Data formats, Batch/Interactive, Algorithms, Previous versions, Code stolen from, etc...
Metadata should be flexible enough to take in new types. Metadata sometimes points to more metadata which points to more metadata. Not all the metadata attributes need to be filled in. One should strongly attempt to standardize some of the key words. Metadata are a bitch to come up with.:-)
Re:Slashdot stealing stories [Offtopic] (Score:1)
Abashed the Devil stood,
And felt how awful goodness is
Use doxygen.. (Score:1)
Look to BBS years! (Score:1)
It isn't just software (Score:1)
Re:Slashdot stealing stories [Offtopic] (Score:1)
You have suggested to me one possible explanation: Slashdot is full of losers who find this story interesing. Kuro5hin is read entirely by Slashdotters. Those same Slashdotters voted the story in.
This makes me wonder about something else. Why does no-one apart from me have the guts to post abuse logged-in? Are you afraid of losing your precious 'karma'?
Abashed the Devil stood,
And felt how awful goodness is
CVS! (Score:2)
Of course, this is most relavent to Open Source projects that make extensive use of CVS, but in a few years there will be no conflict to worry about.
----
XML (Score:1)
--
Peace,
Lord Omlette
ICQ# 77863057
Thats all you do? (Score:3)
Re:Freshmeat.net (Score:1)
OSD from w3.org (Score:4)
Abstract: This document provides an initial proposal for the Open Software Description (OSD) format. OSD, an application of the eXtensible Markup Language (XML), is a vocabulary used for describing software packages and their dependencies for heterogeneous clients. We expect OSD to be useful in automated software distribution environments.
Re:Check out Project Meta. (Score:2)
If it can be described, then it can be classified
If it can be classified, then it can be pigeonholed
If it can be pigeonholed, then it can be demonized
If it can be demonized, then it can be surpressed
By registering your source code, you're giving cartels like the MPAA and the RIAA a heads-up about your code and all the sorts of "undesirable" uses it can be put to. Why do their work for them? Give your code a chance to live and breathe on its own, and maybe it'll flourish into something big before they catch wind of it and move in on you with their lawyers.
For proponents of free software, metadata are antithetical to freedom-through-obscurity.
The traditional methods (Score:2)
I'm not sure about most people and can't speak for any of them, but I've personally never visited either the Linux map nor the Meta project.
For me (and millions of others?) the tried-and-true method of software distribution tarball, website, and all-important README file have been sufficient all these years.
Of course it's not efficient, it doesn't encourage searchability, etc... But it's what everybody uses and is used to.
I personally haven't seen enough discipline in the Open Source community with regards to a metadata project. The RM and Meta projects are great, but people need to use them.
Re:Trove (Score:1)
Re:But I thought... (Score:2)
Reading The Fscking Code co$t$ money (Score:1)
Isn't that what open source is all about? RTFC!(Read The Fucking Code)
Reading source code requires moving source code somehow to your local host. Bandwidth costs money. What the OP is looking for is metadata, or something small that describes the code in a well-defined, searchable form.
<O
( \
XGNOME vs. KDE: the game! [8m.com]
Re:Well... (Score:1)
If you look deeper into this issue, it sure makes you think. Doesn't it?
Microsoft COM, Microsoft .NET, what's next? (Score:1)
Microsoft COM, Microsoft .NET, what's next?
Microsoft.org: Microsoft [microsoft.com] will blow big bux0rz on .NET (which requires an always-on, high-speed connection, making it inaccessible to a large number of Windows customers), stop making profits, and will have to become a nonprofit.
<O
( \
XGNOME vs. KDE: the game! [8m.com]
It would be nice... (Score:1)
Just a thought.
True, but... (Score:2)
I think what's important to realize here is that we're not trying to find the meaning of life. XML is simply a standardized way of tagging information and it's not perfect. But the quest for perfection can sometimes prevent us from arriving at a solution at all. The whole struggle to standardize on various industry-specific markup languages is difficult enough and has led to enough feuds and confusion. Let's not make it even more difficult by obfuscating the whole issue with another order of complexity. Once XML has done its job well, we can worry about the finer points.
Dublin Core / Opensource Metadata Framework /etc (Score:1)
ibiblio is also working on something they call the Opensource Metadata Framework which seems to be based on or even a subset of Dublin Core. I don't know why they didn't just use Dublin Core. See http://www.ibiblio.org/osrt/ldpcore/ldp_elements for the spec.
Frankly, you're getting into the area of librarianship, so you should ask a librarian. (IANAL.) You might do this at lisnews.com or at oss4lib.com; particulary the latter. By the way, it's my experience that programmers are bad librarians - even digital library programmers like myself - who think they are good librarians (we invented search engines, didn't we?) so take everything you read in this forum with a grain of salt.
Get busy ! (Score:2)
Java.. GPL.. problems talks problems, questions, concerns.. ehhhhh.. geez man.. just code !!
Are we coders or politicians ?
Just code !!!
Re:Well... (Score:1)
kuro5hin vs Slashdot (Score:1)
IMO (Score:1)
Re:But I thought... (Score:1)
Re:Thats all you do? (Score:1)
Thanks for pointing it out, I don't host my own site, so I wasn't the one that set up apache like that, but I was able to add .htaccess files to each diretory to reset the mime type for html correctly.
license-aware linkers (Score:1)
I've always thought it would be cute for linkers to be aware of software licenses, and of license incompatibilities. License identification would be embedded directly within object files and libraries, presumably put there by license-aware compilers. If you tried to link GPL-incompatible application code with a GPL-covered library, the linker would report an error.
Yes kids, that's why I'm at Berkeley. And if you patent my idea and make millions off of it, I'll sue you silly. :-)
more general question (Score:1)
I believe this question could be extended to more general one: How to assign description fields to any digitized data?
I see no reason to restrict yourself to program code only. It could be extremely useful to be able to structure any information you want: consumer, scientific, entertainment, whatever.
Also, it is not mentioned directly, but the spirit of such standard would imply that you also have ability to process this information. That is be able to search. In a parlance of our times this means "search from the web"
Is there any open source solution for web accessible searchable database with user updateable and flexible structure?
Something I alerady have implemented (Score:2)
The XML holds a short changelog, date/times of all builds, long descriptions of changes for each build, misc project information, a major and a minor general description field, and some other minor information. the generator appends the file size of both the final package and all files contained. This system is designed for in-house only, and is in no way currently useable as a solution for public use (part of a requirement i've made to co-ordinate development between our 4 coders). also, many of you may be disapointed to know its win32 only, as that is all the work we do internally. But if anyone wants to know some of the specifics, of such a system, let me know.
UML (Score:1)
Re:Dublin Core / Opensource Metadata Framework /et (Score:1)
Of course.... (Score:1)
Aaaaah, for the glory days of the local bbs scene...
gfunk007
UPnP (Score:1)
Re:XML all the way (Score:2)
--
Re:Well... (Score:2)
That should do the trick
YAY (Score:1)
Re:UML (Score:1)
Re:Trove (Score:1)
Re:Microsoft.NET has a metadata standard (Score:1)
Metadata, URI, mirrors etc..... (Score:3)
And at the end somewhat less relevant to the topic.
This kind of metadata should be extremely valuable for implementation of the URIs and particularly for the I2C(s) (URI tp URC). Quote from the RFC 2483:
Hopefully we already have mechanism for the I2L(s) (FTP Mirror Tracker [squid.itep.ru]).CSDF (Score:2)
Right now I'm working on a proof-of-concept kinda thing that will test the current implementation of the xml-format and the tools. Another guy is working on a Windows-implementation, using the same standard.
If all goes well I'll have something to post by the end of this week. Keep an eye out on Freshmeat.
Have at look at Practical, Reusable UNIX Software (Score:1)
U.S. national standard (Score:2)
Re:Freshmeat.net (Score:1)
Also, freshmeat deals mainly (almost entirely?) with *nix code, particularly linux. Some of us have to code for windows too, and a cross-platform search method would be *really* handy...
Re:Well... (Score:2)
>freshmeat.net/backend/
I showhow didn't notice that
Thanks.
Check the verInfo project... (Score:1)
Machine Readable with multi-lingual rendering... (Score:1)
While XML, PAD, and readme do have their uses what you really want to look at is how something like this could be automated and produce a machine readable format that might be rendered into any number of languages (including into other programming lanuages).
Think about what a compiler does. It translates silly little 'human readable' files into machine readable instructions. Yes, compilers have lots more features than that, but the key feature is that it turns my poorly thought out ideas about a problem into a set of instructions that my processor can deal with. So what you want to do is create some sort of meta-file which isn't machine code, per se, but a representation of precisely what the code is doing. You'd want some way of including comments from code into a description block for the main program and for each function (maybe some sort of encoding scheme so that it could be language independent)
The renderer would be able to take that and generate english text OR mandarin OR turkish OR C++ OR JAVA OR COBOL OR (god help us all) PL/I.
So those of you who might have once written a compiler (I'm sorry) take a look at what that process is and think about what you might be able to do. I'd imagine something like:
PROGRAM BOBSHOE
some user defined text
pseudocode
USES OBJECT BLAH
USES OBJECT ANOTHERBLAH
OBJECT BLAH
etc. So, what do you think?
hmmmm... (Score:1)
Re:Check out Project Meta. (Score:2)
Look at all the publicity around Napster and DeCSS. They may be illegal, but if the MPAA and RIAA had never tried to suppress them, they'd be little niche causes, supported by a few geeks. Now they're turning into (small) mass popular movements.
XMI is the way to go (Score:2)
Correction (Score:1)
SourceForge software map (Score:2)
Several of the categories are even hierarchical, which helps validate the values used. Another benefit is that if the license is open 'enough', you can host your web page and downloads at SourceForge, at which point it will help you track versions and release notes.
Wow -- that sounds like I'm a SourceForge PR person. Please understand that I'm not necessarily advocating them as the best solution -- I think freshmeat and lsm are extremely valuable. I just wanted to make sure the SourceForge solution was mentioned.
--Chouser
XSA is an XML version of the lsm (Score:2)
Whatever format is used, it should
Correction: XML Is NOT a meta-data standard. (Score:1)
The OMG metadata standard is the Meta Object Factory version 1.3. It allows the specification of the meta-meta data and provides IDL interfaces for accessing a repository based on the MOF definition.
Correction: XMI is a meta-data standard (Score:1)
Microsoft's .NET incorporates something like that (Score:1)
Re:Slashdot stealing stories [Offtopic] (Score:2)
Idiot Alert! Learn something about both sites before you try to figure them out. Slashdot has editors that post stories. This was a question, so it was posted by an editor to "Ask Slashdot" a section for questions. Kuro5hin, OTOH, works by having the users vote on the stories to determine whether they should be posted. The users apparently decided the story should be posted.
So, to repeat what I just said so that your small pea-sized brain can comprehend it, their was one editor, and a bunch of users.
g'd day!Re:Well... (Score:1)