For the last several weeks I've been thinking about the word "distribution" because the meaning of that word and the way we interpret are going to be one of the most important debates for the free software movement in the next several years. The problem is that a loose definition of the word opens up many loopholes in the GNU General Public License, both for corporations and average Internet users.
The word is crucial because it lies at the core of one of the most distinctive requirements of one of the most distinctive open source licenses. If you modify software protected by the GNU General Public License and then distribute the new version, you must also distribute the source code.
On the face of it, this requirement sounds pretty easy to satisfy. Every time you give someone a copy of the binary version of the code, you give them a copy of the source code. The two should travel together or at least in close proximity.
The main reason the clause is part of the GNU GPL is because Richard Stallman, the license's principal author, believes that anyone who drinks from the commonweal should give back. If you benefit from the sharing of others, then you should make sure to give something back.
But, if you dig a bit deeper, the notion of distribution becomes a convoluted. Some of the ways that people share software with each other may not really count as distributions.
The distribution requirement is one of the concessions that Stallman made to the users and their sanity. You don't need to share all of the changes you make to the code -- only the code you distribute. Sharing everything might unleash a painful process, one that would push too much untested code on the world. Do we really want everyone sharing their files everytime they save a copy to do a new compile? The compromise, which sounds reasonable, only requires you to share the source code when you share the software.
The problem is that the act of giving someone a copy is getting a bit harder to define, because of the new features embedded in operating systems, the near omnipresence of the Internet, and the new interest by corporations in the phenomenon of open source software. Many of these new wrinkles are confusing because they push the boundaries of both technology and license.
The biggest problem is that new features are making it easier and easier for two programs to work in synchrony without being formally linked together. You might use one piece of GPL protected software to edit files and one proprietary program to process them. The GPL embraces these sorts of bright lines between programs; mixing closed source programs with open source ones is not forbidden.
However, it's easy now for people to write scripts that link seemingly disparate programs -- thousands of them, even -- and then execute them in concert. I know one company that uses Adobe Photoshop to process images created by a proprietary, in-house tool. Is the software linked together? The process is entirely automated and works with no human intervention once initiated.
The line blurs elsewhere, too. On some cool multiprocessor machines, two supposedly separate and independent programs can execute on different processors and send messages back and forth. Where do we draw the line?
The process is getting even more confusing when the Web gets involved. Imagine one programmer who creates a tight weather prediction package for the Web that stores the forecast in a GPL-protected database. The programmer links all of the proprietary code together with the database. The result is a new package that extends the database and thus must be shared completely with the world according to the GPL. This is certainly fair. If anything, the GPL-protected database code is doing the bulk of the work. The programmer succeeded by standing on the shoulders of giants.
Now, consider a different programmer who, for the sake of example, stores the database of forecasting information on one Web server in California. The main website which dispenses the data to the world sits in NYC on the other end of a fast backbone. The main Web site uses the proprietary code to look up data in California before publishing it on the Web.
Should this programmer be forced to share the NYC code with the world? Let's say the programmer starts selling the package as a $10,000 proprietary package for adding cool weather graphics to a Web site. Anyone who buys it must install the GPL-protected database and make sure that it's always running. But are the two programs technically linked together? The California server might be GPL-protected, but does this extend to NYC? The NYC site certainly can't operate without the California server, right? What would happen if the server was right next door? What if it was running on the same machine under a different user's login?
Stallman anticipated this problem and offered a Lesser version of the GPL which would let people link with GPL-protected libraries without releasing the software to the larger program. But that lesser version, known by the acronym LGPL, is a bit rare.
I don't envy the people who try to make sensible decisions about what counts as a distribution and what doesn't. Stallman has done as good a job as he possibly can. He sensibly recognized that GPL-protected code was going to have to live in close proximity to non-GPL code. He realized that these programs might be linked together by shell scripts and other tools. But where do you draw the line?
The problem is being stretched as the world of corporate computing discovers open source software. In the past, the definition of "distribution" was easy because everyone was just an individual hacker. If you gave a program to your buddy, you distributed it.
But imagine that MegaSoft decides that it really needs an internal editing system for filling out proprietary MegaSoft paperwork. The programmers love Emacs so they take GNU Emacs and add a few tweaks for providing the user with forms. Some of it is written in Emacs LISP and some of it requires a few neat extensions to the basic Emacs module. Everyone loves the software and they start shipping it to all of the PCs in the corporations.
Is this a distribution? Some might argue that it isn't. A corporation is just a legal fiction for a single person. It's not much different than Bob the hacker writing the code for his own use. Bob doesn't need to share the source code until Bob starts giving it to Alice, the other hacker. By this argument, MegaSoft doesn't need to share the source unless MegaSoft ships the software to another company or non-employee. Even if there are 100,000 employees in MegaSoft, there hasn't been a distribution.
There are millions of problems with thinking of a corporation as a single hacker. What if the corporation splits in three like AT&T? Do all three get the code? Should only one? What if the corporation is aquired by SuperMegaSoft? Is this a distribution? What if the form-enabled Emacs was the only reason that MegaSoft was worth anything because the rest of MegaSoft wasted the rest of their VC money on a plan to sell clothing advice to fashion victims? (www.DrBoo.com)
But there are other problems with forcing corporations to share all of the code all of the time. Are corporate teams that much different than free software teams? Shouldn't they have the freedom to work for several months without distributing the changes? It saves us from buggy pre-alpha code and it saves them from repetitive bug reports. ("It crashes when I start it.") Where do we draw the line? Why can't they enjoy the same freedom as individual hackers to make a few, krufty changes to the source code and leave it at that?
There are deeper problems in corporations. By many measures, Tivo is a good example of the power of free software. The digital video recorder for television signals runs on top of the Linux kernel. The only reason anyone knows this is because Tivo gives Linux credit and it shares copies of the changes it made to the Linux kernel. In many ways, it's a model of a great corporate citizen in the world of free software.
But Tivo didn't share the source code to their television recording front end. It's a separate program running on the machine. No one's gotten in trouble for running proprietary code on top of a GPL-protected kernel.
The Tivo, though, is different. It starts up the proprietary code when it boots and the user has no way to communicate directly with the kernel. The user can't use any of the standard UNIX commands to control the machine. The user can't do anything that the proprietary front end doesn't allow. This will probably save millions of users the grief of reformatting their hard drive.
But is this really fair? If the user can't pry apart the Tivo front end from the Linux kernel, are the programs intertwined enough to become the same program? If so, shouldn't Tivo be releasing the source code to the front end as well?
There are deeper problems on the horizon. Some companies are now "loaning" or "renting" software. In some cases, you don't even keep copies on your local machine. You just download it from the server and use it for a bit.
Is this a distribution? On one hand, the user doesn't get to keep anything. On the other hand, who do we think they're fooling? The whole system is just ephemeral clouds of bits flying around. To think that anyone "owns" something as abstract as software is like saying that someone "owns" a cat.
In fact, we can take this one step further. What is the real difference between using the software on their server and downloading it? Is there much difference between using the Hotmail web-based email system or running Eudora on your desktop? There isn't much difference to the user, even though there are big legal differences. In one case, Hotmail still owns the software and it's all proprietary. In the other, Eudora sold you a copy. Well, maybe they sold you a license. Well, who really knows?
I won't try to find answers for any of the questions about distributions. This is, in some respects, a chicken's response. There are millions of ways to find things wrong with the world. A real leader would find solutions. But it's also important for these answers to come from the community at large. There should be a long debate that focuses on the needs of the users and the creators of free software.
The biggest problem is that the answers are more political than technical. It's easy to define what an piece of software should do if it, say, encounters a request to divide by zero. It's much harder to handle definitions of what is and is not a distribution.
The community needs to weigh two different features of free software: On one hand, there's the fun of taking apart the source code and fixing it. On the other is the responsiblity for contributing back to the common code base. Stallman chose to tie these two together by requiring programmers who benefited from GPL protected software to share their source code when they "distributed" the new version.
The notion of distribution was a simple notion that worked well when the typical coder was just an individual hacker spinning code and sharing it with his buds. Now the game is bigger, and much more complicated. We need to find a new mechanism that balances the freedom to hack with the responsibility to give back. We need to find a better, more clearly defined line to draw.
For the record, here are my proposals:
- Corporations (and everyone) should be required to release the modifications to their source code every six months to a year, if the modified versions are shared with more than, say, three people.
- Two piles of code are considered linked if one will crash or cease to provide more than 90% of its functions without the other. Note that this doesn't mean that any piece of software running on a GNU/Linux machine is considered linked to the GPL-protected kernel. If the software can be moved to a different OS, then it doesn't depend on the kernel.
Tune to http://www.wayner.org/books/ffa/ for information on Wayner's book on Free Software. It launches in July 2000.