Kahn Overhauling the Internet 72
Whanana sent us an article about information objects as visualized by Robert Kahn.
The article is written from a fairly childish place (it explains DNS for crying out loud, and the bulk of it is a history lesson obviously designed for a mainstream paper) but Kahn's Digital Object Identifier concept is interesting. If anyone has links to RFCs and the like, please post them in the comments.
CNRI's handle system website (Score:2)
Try http://www.handle.net/ [handle.net]. Stumbled across it some time ago with some Python doco, IIRC. I had no idea it had acquired any kind of acceptance.
Will it hang together? (Score:1)
It sounds like a good idea to abstract the identifier for a document (or whatever) from its location, but will this mean an incompatibility with current Web names? Are we going to have two different standards of access to information resources, or can they peacefully coexist?
Personally, I love the Franklin quote:
'We must all hang together, or assuredly we shall all hang separately.'
Just as long as you're well-hung.
Re:DOIs are cool and scary (Score:3)
First of all, what's scary about the DoD putting its library on-line?
Second of all, only people who create nothing think that the creative work of others should be free. If copyright holders want to be able to track their work and make sure that their work is only available to people who have acquired a licence, I don't see a problem. In fact, it will be a HUGE help to individual authors/musicians/artists/whatevers, since they can take care of managing distribution all by themselves without needing a big company to handle it. Of course, promotion is still an issue, but that's another debate...
If you want to be a thief, you'll hate this. If you want to actually use the net to find stuff and be reimbursed for the things you create, you'll love it.
-jon
Internet becoming what was originally envisioned? (Score:3)
Of course, it'll have to compete with
Steven
Re:ICANN part 2 (Score:2)
(1) Who would do the human readable -> OID translation.
(2) Using a centralized database to find things would make censorship really easy. I've seen a lot of people here asking "who would own the centralized database." This question is totally irrelevent as any government would strongly regulate the database owners. the real question is "what country would be able to pass laws about it?" i.e. who's version of censorship are we going to force on the world.
First, there maybe be a solution to (1), but it's not totally clear how to implement it. Specifically, you need a "philosophical" cross between search engines and alternative DNS servers. I do not see how to d this, but it seems like you want to have the "athoritative" qualities of DNS, but allow eople to switch as easily as going to a diffrent search engine.
Second, the only real solution to (2) is to eliminate the centralized database. Actually, you really should just junk all this guys ideas an use freenet. Now, information on freennet is not perminant, but there are soltions to that too. Specifically, get people to permenently rehost thngs they think are importent.
Anyway, issue (1) is central to freenet too, so there is really no point in even considering this guys proposals. Freenet is beats these proposals in every way.
Re:What about Xanadu? (Score:1)
Had no idea that some people hated it so much (see first reply to "What about Xanadu"). Is it really that crappy? Damn, and I thought that it would rock and it would change a bunch of stuff or something.
Philosophical (Score:2)
I think it would be grat to be able to access the closest copy of an article (or music, or the drawings of a historical organ, or the latest Linux kernel) without worrying whose computer it is on, and if they have moved it to a different location.
As far as I can see, this scheme does nothing towards solving the (admittedly real) problems of intellectual property. If I can fire up a nslookup or its relative, and translate the ID to anb URL and then to an IP address and a filename, then at most it can obscure the path to direct access. And we all know how badly "security by obscurity" has performed...
This brings up the whole philosophical discussion of what is information, and how it can be or should be owned or controlled. Not all information wants to be free - at least my credit card number wants not.
No matter how the legal and philosophical discussions go, this scheme may provide a valuable tool for identifying information, and that I see that as something positive. But will it take off? Only time will show.
Re:You've got to be kidding! (Score:1)
-
Re:DOIs are cool and scary (Score:1)
Re:DOIs are cool and scary (Score:2)
I'm guessing that real radicals like to eat and have shelter. If they are going to give away what they produce for the Greater Good, they have to live off of the money/goods/etc. from other people, and they can get it either by force or the good graces of other people. The second option is wonderful in theory, but the first is more common in practice.
The sole reason why the Open Source movement exists is students and professors who have been living off of parents and/or government grants.
Granted, there are companies which are trying to make money from Open Source projects, but they are trying to profit from obscurity; their products are so hard to use, that people are willing to pay for support. I don't see that happening with books or music any time soon, and when people start putting easy-to-use interfaces on these Open Source products, these companies are sunk.
I've written code in my spare time which I've given away, but if I (or my company) was unable to make money from the code I write for them, I wouldn't be writing code, and I probably wouldn't have the spare time to write code that I give away. As much as I love to code, I love to take care of my family even more.
-jon
Why make it central? (Score:2)
Ouch (Score:1)
Re:You've got to be kidding! (Score:1)
Where does the censorship come in? It sounds to me like the owner of the domain (in the example give, MSNBC) would be responsible for maintaining the object ID table. DNS would resolve the domain name to an IP address, then the handle would be resolved by the object ID table that resides on that domain's server. Different than how it works today, but I don't see any new censorship opportunities.
Can you please explain where the possibility for censorship lies? I think I'm missing something here.
-
Re:DOIs are cool and scary (Score:1)
Many Open Source developers are students and professors, it is true. But there are others: Linus Torvalds has a day job and still finds time to direct kernel development, the KDE team is largely made up of people who work for TrollTech, and there are many many sysadmins who Open Source tools they have created to help themselves in their jobs. Furthermore, we can assume that there are *some* developers who have made themselves independently wealthy through their own hard work and can therefore afford to code for free. If I am lucky enough to find myself in such a position, that is what I hope to do.
You are mistaken when you imply that those who don't believe in the ownership of ideas are themselves incapable of making a valuable creative contribution. If you believe in "Intellectual Property", that's your business, but it doesn't give you the right to denigrate the work of those who believe differently than you. There is not *yet* a law that states that all intellectual activity must be undertaken in service of the profit motive.
Re:IHS (Information Handle Server) (Score:2)
DNS has a record called "hinfo" for Hardware Information, however due to security concerns, not many people use them now. The record is just a text string that can be almost anything to discribe the machine including hardware information, physical location, etc.
We could use this record for the IHS information without any changes to the current DNS system.
Comments?
Re:ICANN part 2 (Score:1)
Or even worse... ICANN somehow gets to be the Root owner for this too....
Kierthos
Re:Objects (Score:1)
Strong data typing is for those with weak minds.
What about Xanadu? (Score:1)
Speaking of which, anyone heard any news on that darned Xanadu's progress?
rfc's (Score:1)
All the best (Score:1)
Re:The One True Transmission Path (Score:1)
DOIs are cool and scary (Score:1)
I have gigantic mixed feelings on DOIs, handle systems, and other metadata schemes. I come at this as an anarchist, a librarian, and as a person who has actually purchased a DOI prefix for my employer. I've even been to the DOI workshop that was held several months back at CNRI in Reston, VA.
First, the postive side of DOIs. As most of you know, there is alot of information on the Internet and it isn't organized logically or in a way that a library would organize it. Librarians have been trying to instil some order on the Internet for years, mostly via various metadata schemes. A metadata-based system, like the DOI file handle system, would get us away from identifying content based on location (URLs) and get back to identifying content based on classification (i.e. like Dewey or LC call numbers in your local library). So, if you've installed the proper DOI plugin into your browser and you click on a DOI-enabled link, you'll be given a choice of where you want to get the item. The article by Professor X on nanotechnology is identified by a number, not a URL. Youc an choose to get it from a variety of sources, some of which will give you free access, say if you are a student at a particular university.
In other words, DOIs would greatly help people find information on the Internet.
Now for the flip side. If you read the MSNBC article carefully, you notice a few scary things mentioned, like "[it] is using it to build the Defense Virtual Library" and "another problem is with copyrights and other protections of intellectual property." If you care about the free flow of information on the Internet, which tech like Napster has enabled, DOI and handle schemes should throw up lots of red flags. The music industry is salivating over the DOI project. They are involved in it, the extent to which is unknown to the public. I suspect that the DOI system will be sold as a cool way to find Internet content and that its use to police the Internet for intellectual property owners will be downplayed. If Microsoft and the AAP are involved, you can bet that they don't have the interests of Internet freedom in mind. They simply want to protect the profits they make from the intellectual work of other people.
This is another example of why technology is never neutral. There are always socio-political ramifications from every new tech. Is this new system, which allows you to find content easier, worth the tradeoff in how it makes intellectual property fascism easier?
Re:You've got to be kidding! (Score:1)
Naming authority? (Score:2)
Under the handle system, my last column might have an identifier like: "10.12345/nov0700-zaret". "10.12345" is MSNBC's naming authority, and "nov0700-zaret" is the name of the object. MSNBC would then keep a record in its handle registry that told the computer what server the object is on, what file it's stored in, as well as the copyright information and anything else it may want in that record.
Scary stuff given the recently introduced $2000 price of the
EMUSE.NET [emuse.net]
Newspapers (Score:1)
--
Re:DOIs are cool and scary (Score:2)
And that's simply not true. If you remove the cost of creation, then distribution and mass production costs predominate. It would be trivial for a large company to steal a novel, song, movie, whatever from a person who works alone and produces something. It's virtually certain that the creator will be screwed and the large company will profit. This is what people refuse to understand: copyright and patent are intended to protect the little guy against the big guy and the tyranny of the masses, not the other way around.
others: Linus Torvalds has a day job and still finds time to direct kernel development, the KDE team is largely made up of people who work for TrollTech, and there are many many sysadmins who Open Source tools they have created to help themselves in their jobs.
Linus started Linux when he was at school. His work for Transmeta is owned by Transmeta, not him. It pays for him to spend time working on Linux. If he didn't get paid by Transmeta, he probably wouldn't be working on Linux.
I'm not sure about TrollTech's funding, but how do they make money? VC funding? How profitable is the company? Companies based on open source are probably long-term doomed.
Sys admins who are contributing stuff done during work hours are technically stealing from their company; it's almost certain that they signed a work contract which stated that anything they create during work hours is company property, and anything RELATED to company work created at any time is owned by the company, too. Just wait until the first lawsuits which try to remove that sort of code from an Open Source project...
If you believe in "Intellectual Property", that's your business, but it doesn't give you the right to denigrate the work of those who believe differently than you. There is not *yet* a law that states that all intellectual activity must be undertaken in service of the profit motive.
If you subsidize your salable creative work with non-creative work, even if you don't enjoy your non-creative work solely because you think it's morally wrong to profit from creative work, you're either a saint or a moron. I can't decide which. If you're doing some creative work for money and some creative work for free, then you're a hypocrite.
-jon
Re:You've got to be kidding! (Score:1)
The point of this exercise seems to be making navigation revolve around the 'what' of the information instead of the 'where'. So, if I want to go looking for yahoo.com/nazi_auctions, I no longer simply ask the DNS server for the IP yahoo.com and then ask the server for nazi_auctions. I ask a distributed DOI database for the whole thing, if I understood it correctly. Conceptually, this would mean that intermediaries will know 'what' I'm looking for and not just 'where' I'm looking for. That's what I mean.
--
Been done, didn't work, but fragments are in use (Score:2)
jim
Re:What about Xanadu? (Score:1)
Re:What about Xanadu? (Score:2)
BTW, if you are looking for the current incarnation of Xanadu, look for zigzag [xanadu.net].
jim
And we get where? (Score:1)
Of course, an OID system still doesn't solve the problem it purports to address. So you have a registry and an object handle in that registry - what happens when the object is removed, or moved to a different place?
If you change service providers, will you still be using your old OIDs? I doubt it ... use of the registries is hardly going to be free. So you're back to the 404 problem ... only this time you have to remember what looks like a phone number with a name on the end, instead of a nice simple URL.
Oh, and while we're at it ... let's throw DNS into a new crisis by negating the value of everyone's domain names ... whoops!
Re:heh (Score:1)
Shooting a camel out of a canon (Score:2)
Re:Internet becoming what was originally envisione (Score:1)
Re:Will it hang together? (Score:2)
Re:Why make it central? (Score:2)
Re:Yawn. (Score:2)
> control stuff. Very interesting but users
> will reject it. Sorry!
You presume that you will have a choice. Bad mistake.
I refer you to the volume of deCSS discussion @
http://www.kuro5hin.org/?op=displaystory;sid=20
May I also remind you that there is nothing stopping either nationalisation or "registration" of ISPs and POPs.
Afterall, a modem might be considered a burglary tool.
heh (Score:3)
Yawn. (Score:2)
The One True Transmission Path (Score:1)
-----------------
RFC's (Score:2)
How about (Score:1)
Re:heh (Score:1)
-----------------
Re:Objects (Score:1)
Why ICANN? (Score:2)
Re:heh (Score:1)
J
freenet seems to be similar (Score:2)
The Net Object ID That Wasn't (Score:3)
A system serial number with bits reversed, and packed against the top of the 64 bit word.
An object creation counter for that system serial number -- under localized control/increment.
I had to continually fight off people who wanted to subdivide the 64 bits into fields, the way IP was. The primary discipline I wanted people to follow was to keep routing information out of the object identifier so that object locations could be changed dynamically. It was amazing how many times I had to explain this to people who should have known better.
Unfortunately, I didn't explain it to the right people at DARPA, although I did have a couple of meetings with David P. Reed about it when he was still at MIT's LCS.
I touch on some of this history in a couple of documents, one written recently [geocities.com] and one written at the time [geocities.com].
Until I read the article about Kahn, I didn't realize that DARPA chose the IP nonsense at almost exactly the time that the AT&T/Knight-Ridder project that was funding me made a bad choice of vendors that resulted in my resignation from that particular high-profile effort and try to strike out on my own turning 8MHz PC's into multiuser network servers (which I actually succeeded in doing after a lot of blood letting, but that's another story).
Elitist Bastard (Score:1)
Copyright protection?? (Score:1)
So the question is, is this object thing they are talking about a way to make it easier for these type of people (and other who wish to maintain copyright compliance because they happen to be operating publicly and will get sued if they don't, perhaps an online radio station for example), is is this supposed to be part of a trusted client (the perpetual motion machine of the information age) scheme like the failed Divx DVD players?
Re:Some thoughts... (Score:1)
This is similar to akamai's setup. Akamai (for those who don't know) is a cache that a lot of bigger sites use (Yahoo, etc). The way it basically works is as follows:
ISPs are part of Akamai's network. They maintain a certain amount of cache for the websites involved and, in turn, provide fast content to thier users (dial-up, etc) and others
Something similar to this would work. Multiple distributed, _independant_ machines would be responsible for maintaining some segment of the the library. They would provide this content to their users (and everybody else) and everybody should be happy.
The objects would be fast for the local users, the objects would be permanenent (because it's distributed).
Of course it would need to be more complicated than this... but the idea is there
-andy
Piper (Score:1)
http://theopenlab.org/piper [theopenlab.org]
--
This sort of thing has cropped up before. And it has always been due to human error.
Re:heh (Score:1)
DOI's and alternatives to them (Score:3)
Date: Thu, 20 May 1999 16:46:26 -0400 (EDT)
From: "Arthur P. Smith"
To: discuss-doi@doi.org
Subject: Re: [Discuss-DOI] DOI: Current Status and Outlook
On Wed, 19 May 1999, Norman Paskin wrote:
> A paper which provides a summary of the current thinking on DOI has
> just been published in D- Lib magazine at
> http://www.dlib.org/dlib/may99/05paskin.html
This does answer a lot of questions we had, mostly in what seems
to be the right direction. The relationship with INDECS on metadata
issues looks like a particularly good resolution ("functional granularity"
is essentially what I was looking for in one of my earlier
questions). It looks like a specific metadata "Genre" needs to be
worked out in detail for journal articles (re reference linking) - and
it's not clear who has responsibility for this (the IDF or someone else?)
but at least at the level specified in this article it looks workable.
But to some extent the paper shows the DOI is a solution in search
of a "killer application" (mentioned several times in the article).
There's a chicken-and-egg problem here: the potential applications seem
to require widespread adoption before they become useful.
As one of the final bullets says: "Internet solutions are unlikely to
succeed unless they are globally applicable and show convincing power
over alternatives" - does the DOI as described show convincing power
over the alternatives?
It's sometimes hard to know what counts as an alternative, but the
following systems (some listed in the article) could be
alternatives for at least some of the things the DOI does:
1. the handle system itself
2. uniform resource names
3. IETF's DNS-based Naming Authority Pointer
4. Persistent URL's (PURL's)
5. rule-based reference linking (link managers, Urania, S-Link-S)
6. a global LDAP/directory service
Alternatives 1-4 provide a variety of routes for creating a unique
digital identifier for something - we really don't NEED the DOI just
to have digital identifiers, though DOI does provide a handy rallying
point for those of us providing intellectual property in digital form.
Alternative 2 is the highest level of digital identifier, but perhaps
that is all we really need? There is room for many "naming authorities" -
perhaps even each publisher could be their own naming authority. That
would depend on widespread adoption of (3) which may or may not happen,
and resolution of general registration processes too.
As the article mentions, general implementation of URN's is quite
limited even after almost a decade of work. Is there a reason why
nobody has found it particularly useful yet?
Alternative 1 is, to some extent, a non-issue (a DOI is, after all,
just a handle) and is also, to some extent, the same issue. Any
publisher could, with or without DOI, register as a handle naming
authority and create handles for its digital objects. Is some of
the DOI work duplicating what has already been done (or should have
been done) for the handle system itself? As the handle system web
pages mention (http://www.handle.net/) it is at least receiving some
use as a digital identifier of intellectual property by NCSTRL,
the Library of Congress, DTIC, NLM, etc. Does the DOI provide
convincing power over using the handle system directly?
Alternative 4 (PURL's) is critiqued at length in the article,
particularly on the issue of resolution (section 3). Perhaps I
don't understand properly, but I don't quite agree with some of
the arguments against PURLs. Any digital identifier can be used to
offer great flexibility in resolution - a local proxy can redirect to a local
cache or resource, for example, for ANY of the unique identifiers
under question. Once resolved, the "document" resolved to can
itself contain multiple alternative resolutions. And a handle is only
going to have multiple resolutions if the publisher puts it there
(who else has the authority to insert the data?). So I think the
single vs. multiple redirection issue is a red herring. I do agree it's
nice to have a more direct protocol (though from looking at the details
of the way handles are supposed to resolve there is a lot of
back-and-forth there too). As far as being a URN or not, there's
no reason why PURLs couldn't be treated as legitimate digital identifiers,
even if they are simply URL's at the moment. On "scalability" - the
current handle implementation doesn't seem particularly scalable
either. Only 4 million handles per server? Only 4 global servers
(with 4 backups that seem to point to the very same machines on
different ports)? And those servers seem to all be in the D.C. area...
Not that I think PURLs are wonderful, but does the DOI provide
convincing power over using PURLs, as far as identification and
resolution goes?
Which is presumably why we've been told DOI's have to do
more than just identification and resolution. Hence metadata, to
provide standard information to allow "look-up", multiple-resolution,
and digital commerce applications. This actually makes a lot of
sense. And the other id/resolution alternatives do not
seem to meet the INDECS criteria as well as the DOI can.
But what does this have to do with reference linking, the
first "killer application" mentioned? The look-ups required there
are almost certainly going to be more easily performed with
specialized databases (A&I services) or direct rule-based
linking (alternative 5) and in fact this is already
being done, generally without the use of DOI's. The DOI does not seem to
make the linking process easier, so there's no "convincing power"
here it would seem.
I added alternative 6 (global directory service) as a wild-card -
this seems to be a major focus of "network operating system" vendors -
Novell's NDS, Oracle's OID, Microsoft's Active Directory - these seem
to be systems intended to hold information on hundreds of
millions of "objects" available on a network - an example being the
personal information of a subscriber to an internet service provider.
But another potential application of these is to identify and provide
data on objects available on the net - intellectual property or other
things available for commerce. Is this something the DOI could
fit into, or is it something that could sweep URN's, handles, DOI and
all the rest away? I really don't know, but it seems like
something to watch closely over the next year or so.
Re:DOIs are cool and scary (Score:1)
It's virtually certain that the creator will be screwed and the large company will profit.
Telling me that performing an operation on a patient will kill the patient would not dissuade me from performing it if the patient were already dead. That is to say, strong copyright and patent protection have not prevented the large companies from screwing the little guy, so it's not a valid reason to maintain the status quo. It is a well known fact that the only people who can count on making a profit from most publishing are the publishers themselves. Maybe big wheels like Stephen King can afford to walk away from publishers who refuse to allow them to retain copyright, but most authors that fit the description "little guy" wind up having to give all rights to the publisher. How many recording artists and authors have been screwed by their publishers? It is clear that copyright is not protecting *them*.
In the software world, distribution and mass production costs have now been brought to virtually nothing. The same could be said of e-publishing of copyright material. The large companies to which you refer would probably not touch anything in the public domain anyway, because they would know that they couldn't prevent other large companies from competing with them. GPL'd materials will be avoided by most large companies for the same reason and also because of the redistribution clauses in the GPL. Even where these factors do not prevent the large companies from trying to crush the little guy, many customers would make the decision to buy from more ethically clean sources, because Barnes and Noble and the GNU web page are just about equally accessible from the web.
Besides, according to those who wrote the Copyright Laws, the reason for those laws is *not* to protect anyone, but to benefit *society* by encouraging people to add to the public domain by giving them exclusive rights to publish *for a limited time*. Disney and other major copyright holders are being quite successful at removing the time limits; it is widely acknowledged that nothing published by a major outlet today will *ever* enter the public domain. Add to that that the technological protections being put in (DVD-CSS, DOI, etc.), together with the DMCA, have made it illegal to access copyright material without the publisher's permission, even after (or if) it has passed into the public domain. In this setting, copyright holders are no longer obligated to compensate society for the exclusive right to publish, and so are basically getting a free ride.
As for the issues of paying the bills while producing public domain and GPL work, maybe it is true that those who believe that intellectual property *should* be free have to make compromises to the current legal environment in which it is not. Releasing as much of your work into the public domain as you can afford to is for many a way of giving back to society. The same motivation leads many ISPs whose businesses are largely based on Open Source software to encourage their sysadmins to GPL the tools they develop.
In closing, I'm not calling you or people who believe as you do morons or hypocrites, or corporate shills, or any of the other things I might like to in a fit of emotion, but instead attempting to reason with you. If you do not understand why I believe the way I do, I will try to explain, but please refrain in any reply from this unseemly name-calling.
Re:DOI's and alternatives to them (Score:2)
Date: Mon, 24 May 1999 13:21:35 -0400 (EDT)
From: "Arthur P. Smith"
To: discuss-doi@doi.org
Subject: Re: [Discuss-DOI] DOI: Current Status and Outlook
On Sun, 23 May 1999, Larry Lannom wrote:
> [
> Stu's comments on policy development being key. In talks about the
> handle system I usually describe DOI and other handle uses as policy
> laid on top of infrastructure.
I found myself agreeing with Stu's comments on this too. But policies
and practices won't be adopted unless they are either evolutionary,
based on existing well-tested standards, or truly revolutionary,
allowing some wonderful new thing to be accomplished that can't
be done any other way. As I was trying to convey earlier, we have a lot of
choices for both the technology and the content of unique identifiers,
including long-lived ones, and it doesn't look like DOI's or even handles
meet the revolutionary criteria. There are also more application-specific
alternatives to the DOI (such as SICI) that I didn't include earlier, many
of which have also not received much use despite their ease of creation.
If we're talking about identification for the purposes of intellectual
property, shouldn't the Copyright Clearance Center and the other
Reproduction Rights Organizations be at the center of
determining such standards? Don't they already have unique identifiers
that they use (there is some CCC number at the foot of every page
we publish now)?
> [...] there
> are hard technical issues around ease of use, both from an end user as
> well as an administrative point of view. Especially from an
> administrative side, there is a 'good intentions' factor that I believe
> has been here since we all starting talking about this stuff almost ten
> years ago now. The net makes it easy to distribute information in an ad
> hoc fashion. It also makes it easy to lose things.
Things get "lost" either through neglect, deliberate removal, or
relocation (though I would call that "misplaced" rather than "lost").
DOI is unlikely to help either of the first two situations.
If there is no economic incentive for anybody to
support the preservation of some piece of digital information, there
will certainly be no incentive to keep the DOI pointer up to date
for it. And if the owner of a piece of information wants to remove
it, how could a DOI stop them?
Where the DOI would help is if a piece of information is relocated -
but so would any other unique identifier coupled with a location
system (PURL in general, and S-Link-S, Urania, PubMed, etc. specifically
for scholarly articles already exist - A&I services are also doing a lot
in this area). The more such systems pop up
and gain "market share" in different applications, the stronger the
incentive for the publisher never to change the location of anything
ever again because of the work required to keep them all up to date.
Administrative ease is basically a factor of how much work is required
to register each new published item, plus how much work is required
to change all the location information when things are relocated.
One can even write an equation for this:
Burden/year = B * New items/year + R * (total items) * relocations/year
where B is the "burden" associated with inserting a new item,
and R is the "burden" associated with updating an existing item.
Even if much of this is handled with automated systems that make
the initial per-item burdens tiny, there is still a need for quality
control, ensurance of the interoperability of systems (for example,
what is the standard for representation of author names containing
special characters? mathematics in titles? etc) and programming
work whose complexity is at least proportional to the per-item
information and translations required. DOI without metadata
had the advantage that the per-item information required
was minimal. With metadata it's not clear which would have
lowest burdens, though the unfamiliarity and lack of applications
for the handle system could be a disadvantage to DOI here (increasing
the required programming effort).
Except that this formula does not apply to S-Link-S, and in
some cases PURLs. S-Link-S uses rules to locate ALL the articles
for a particular scholarly journal, not on an article by article basis.
PURLs can handle relocation of a large number of URL's with a single
change - but the "suffix" URL's must be unchanged for this to work,
which is not true of many publisher relocations. In those cases
where it is true, and especially for S-Link-S, the burden becomes:
Rule-based Burden/year = B' * New journals/year +
R' * (total journals) * relocations/year
where B' and R' are probably larger than B and R, but comparable
at least for smaller publishers that don't have enough items
to justify a lot of programming work. Once a journal has 10 or so
items to publish, rule-based locating is the easiest approach, and
for larger publishers the zero per-item burden would always be
an advantage.
Now rule-based locating systems are not global unique digital identifiers -
but they keep the administrative burden very low, and so are by
far the most likely candidates to solve the "lost" information problem
as far as it can be solved.
> [...]
> Re. Arthur Smith's wondering about handle system scalability and the
> number of current servers: the global system currently consists of four
> servers - two on each of the US coasts. The primary use of the global
> service currently is to point to other services, e.g., the DOI service,
> for clients who don't know where to start. Most handle clients, e.g.,
> the http proxy, do know where to start most of the time since they cache
> this information, so in fact the global service is not much stressed and
> four servers are plenty at the moment.
Thanks for the clarification - however if we're proposing to put direct
HDL or DOI clients in every web browser, that burden is going to
go way up, unless we get cracking on installing local handle
resolvers in the same way we have local DNS resolvers all
over the place. And then who's going to administer them and ensure
that every client is configured to point to the local servers rather
than the global ones? We at least have an established system for DNS,
that when new machines are configured with an IP address they are
also assigned a local DNS resolver, with several backups. Are we
proposing to add another "local HDL resolver" to the setup
procedure of every machine on the net?
The http proxy of course is even less scalable, since it's a single
machine somewhere (admittedly http servers can be scaled pretty
large, but this really doesn't solve the problem).
And as far as I could tell, the handle system doesn't seem to have
the same redundancy built in that DNS has. Perhaps I misunderstood,
but the four global handle servers seem not to contain duplicate
information - rather they each are responsible for a different group
of handles based on the MD5 hash. The redundancy is really just
a single secondary server, which also as far as I could tell right
now resides on the same physical machine (at least the same IP
address) for all four existing global servers.
And remember the DOI/HDL system needs to be able to handle
hundreds of millions or billions of digital objects - that is
one or two orders of magnitude beyond what DNS has to deal with now.
> [...]
> The four million handles per server is a specific implementation
> limit that will go away later this year, to be replaced by some
> extremely large number that escapes me at the moment.
Well that's good. I'm guessing a 2GB or 4GB file size limit was
the problem? The DOI has several hundred thousand items with
handles - how many do the global handle servers contain right now
for DOI and other uses?
Arthur (apsmith@aps.org)
Re:DOI's and alternatives to them (Score:2)
Date: Tue, 1 Jun 1999 14:02:29 -0400 (EDT)
From: "Arthur P. Smith"
To: discuss-doi@doi.org
Subject: December '98 JEP article?
See: http://www.press.umich.edu/jep/04-02/davidson.htm
An article by L. Davidson and K. Douglas in the December 1998 issue of
the Journal of Electronic Publishing raised in a different sense many
of the issues I recently expressed some concern on with the DOI, as
well as other issues I haven't seen discussed here at all. Was there
ever a discussion here of the points in the Davidson and Douglas paper?
The authors indicate a feeling of encouragement that these problems
will be resolved, but has much changed in the six months
since their paper appeared? I'm enclosing their "summary of selected
concerns" below. Point 2 was the one that I particularly was concerned
with in the most recent exchange.
Arthur Smith (apsmith@aps.org)
----------------------------
Summary of Selected Concerns
The importance of the work being done on the design of the DOI
System, and its consequences with respect to digital identifiers in
general, would be difficult to overrate. Solving the problems of
identifying specific objects on the Internet is extremely important,
and the work being done on the DOI System will help with that
solution. Still, there are a number of current issues concerning this
system that have no easy solutions and particularly concern us:
1.At present, only established commercial and society
publishers are purchasing publisher prefixes and so are
allowed to issue DOIs. This means that most individual or
non-traditional publishers are not participating directly in
the DOI System, but are merely acting as end users. Since
the biggest problems with URL stability and the lack of
persistence of Internet objects lies outside the products
provided through large publishers, it is unclear how the DOI
System is going have any generally beneficial effect on the
solution of the Internet's problems.
2.Those who participate in the DOI System will need to
include in their operating costs the overhead of detailed
housekeeping of the DOIs and each item's associated
metadata, upon which many of the DOI's more advanced
functions will depend. In addition, there are the fees that the
Foundation will need to levy to support the maintenance of
the resolver-databases server for the continued tracking of
traded, retired, erased, or simply forgotten and abandoned
identifiers. Even with computerized aids, the cost to
publishers of maintaining the robust and persistent matrix of
numbers and descriptive text that a handle-based system
requires will be considerable. Under the current model, the
annual fees exacted by the Foundation from its participating
publishers must cover operating expenses. Since no one yet
knows how high these fees might be, we are concerned that
costs for smaller publishers and not-for-profit participants
might be so prohibitive that they will be largely excluded.
3.At up to 128 characters, DOIs are simply too long to be
practical outside of the digital universe. The Publisher Item
Identifier (PII), for example, at seventeen characters, is a
much more reasonable length and probably is still long
enough to identify every item we will ever need to identify.
Indeed, Norman Paskin estimates that only 10^11 digital
objects will ever require identification.[33] Since it is
unlikely that we will never need to copy DOIs manually
from print into electronic format, and since both their length
and limited affordance (mnemonic content) will make it
difficult to transfer them accurately by any manual means,
this could turn out to be a nuisance factor that will hinder
their widespread acceptance. Long identifiers are also
harder to code into watermarks, especially in text objects
that lack background noise in which to hide such data.
4.DOIs will probably not lead to more open access to online
materials, at least to those commercially published. In fact,
most DOI queries from most users, except for those that can
demonstrate access rights, will probably lead to invoice
forms of one sort or another rather than directly to the
primarily requested object. This aspect of the DOI System
could make the Internet even more frustrating for the
majority of the users than it is now.
Re:DOIs are cool and scary (Score:1)
Does this mean that I will not seek patent protection for my ideas? No, because the fact is that, without such protection, some unscrupulous person may deprive me of my rights to use the thoughts in my own head. However, my reluctant willingness to pay protection to the patent lawyers does not change the fact of my non-belief in "intellectual property", nor does it change the fact that I would be extremely reluctant to prosecute someone for infringing on my IP.
While I think it is nice to be compensated for my work, I can't just proceed from there on purely economic and non-philosophical grounds to a belief in something which doesn't exist. I think there are many in the Open Source community who feel this way; in fact, I think this belief is a large part of the philosophical foundation of the Open Source community.
All that having been said, I think it is probably true that many of the people exclaiming "Information wants to be free!" simply want to get movies, music, etc. for free for less lofty reasons. However, just because selfish and immature cynics exist doesn't mean that real radicals don't.
Re:Yawn. (Score:1)
Objects (Score:1)
seems blatantly obvious (Score:1)
This is silly (Score:1)
"(The) architecture can not write the law, but it provides a technical design that matches the legal structure that is expected to emerge," the Library of Congress says on its Web site.
Heh.
alternativ to current use of DNS (Score:1)
On the other hand what if firms (e.g MS, AOL) started to support that, who could resists?
Or maybe we get a second Internet finally...
Good idea but scrap the IP protection crap.. (Score:2)
The idea of content objects with unique ID's isn't at all new but is a good one. I always liked the idea of using encryption signatures as the keys. give it sig for itself and one for it's owner and build a simple search engine mechanism into the Net itself and you have a nice lil system. An important note might be that such a system does not need to, and possibly should not, replace TCP/IP or even rely on TCP/IP as it's only supported carrier. It should be as agnostic about transports as possible for the most flexibility.
Jabber might be a good start for this layer since it is a very flexible system for transporting XML-ized content and contact-type information. I really expect something like this to assimilate the web in a couple years. Maybe Jabber merged w/ FreeNet.
Someone who doesn't know the resource they want could search for it by known facts just as they do now at Yahoo, Google, etc.. once they find it they could store the objects unique id and then every time they needed that object again they could ask the net for it and the closest copy found would be returned.
Reinvention of URNs (Score:1)
Quoting from RFC1737: Functional Requirements for Uniform Resource Names
URNs are actually specified in RFC2141: URN Syntax, which gives identifiers in the form "urn:NAMESPACE:NAME", where NAMESPACE could be something like "dns" and NAME could be "slashdot.org".
The actual method used to retrieve the object that a URN refers to is left as an exercise to the reader.
---
The Hotmail addres is my decoy account. I read it approximately once per year.
Some thoughts... (Score:1)
What is needed is something much like freenet, but with a different twist. Let's call this system infonet. Now, number one priority on infonet would be that information should never disappear. To make this work, there should always be at least three hosts at different locations storing each file. This should be ensured by infonet software. If one host go down, another should take over. Thus, there will always be redundancy. Furthermore, we need a central organization governing infonet, let's call it infonet-adm. It would have two tasks: (1) managing namespace (usenet is a fine model for this) (2) managing content and ensuring enough machine resources are available. The last task is the most difficult. I see several business models for infonet-adm
Who invented the Internet? (Score:1)
I thought Al Gore invented the Internet. Somebody is lying! ;^)
If you can't say something nice . .
And how does this make sure that content is kept ? (Score:1)
The problem will be that someone needs to maintain this mapping.. and for how long do you keep it ? (1 year, 2 years, 5 years, forever ?)
RFCs? Who are you fooling? (Score:2)
ICANN part 2 (Score:3)
IHS (Information Handle Server) (Score:1)
the ip of the machine. IHS gives you the exact
location on the internet of some subject/information handle. So the handle is something that is generic and lasts a lifetime.
If you quiry the handle to the IHS server it
returns the exact URL, which can be dynamic
and changes every week.
You've got to be kidding! (Score:1)
Oh, and think of the new and exciting ways things can be censored. Filtering information by it's nature sounds like a real possible evolution of this.
Can't we just have IPv6 and take it from there? I'll take my class A and give an IP address to everything I want people to go straight to.
--