typodupeerror

## Estimating the Size/Cost of Linux196

2bits writes "Wow... A Billion Dollars Worth Of Software On My System For Free! Check This Guy Out, He Came Up With A Counting / Pricing Method For Quite A Few Types of Source Code. Here is the Program. The results on the site are sorta dated, based on RH 7.1, but the app is pretty cool!... Hey, I can finally find out how much all my side projects are worth / costing me..."
This discussion has been archived. No new comments can be posted.

## Estimating the Size/Cost of Linux

• #### Billion dollars? (Score:1)

Where did he get the billion dollar estimate from? I see no direct correspondance between lines of code and monetary value.
• #### Re:Billion dollars? (Score:2, Informative)

Where did he get the billion dollar estimate from? I see no direct correspondance between lines of code and monetary value.

He specifically talks about cost not value. But you are right that the correlation between sloc and cost is a non-trivial one. That is one reason why cost estimation is hard but it is far easier than guessing cost of a project before one has the source.

--
virve
• #### Re: Billion dollars? (Score:2)

> Where did he get the billion dollar estimate from? I see no direct correspondance between lines of code and monetary value.

Using his numbers, I calculate that my part time effort on a hobby project over the last 9 months has resulted in a quarter of a million dollars worth of code.

Any takers?

• #### lets see here..... (Score:4, Funny)

by Anonymous Coward on Friday July 05, 2002 @10:14AM (#3827225)
[cmdrtaco@localhost]\$ est slashcode
Analyzing slashcode.....
Result: \$6.66

[cmdrtaco@localhost]\$
• #### Re:lets see here..... (Score:1, Insightful)

by Anonymous Coward
who the fuck is modding that offtopic? did you not read the article? the article deals with cost estimation of source code. In this post, we see a satirical representation of what CmdrTaco might experience if he were to run the cost estimator tool (the topic of the article) against slashcode, the code that runs slashdot. The little value returned is the running gag that slashcode is a pos (similar to the one gag of slashdot always being infected with the latest IIS security hole)

The resulting value of 666 is also a common joke among geeks.

sigh -- Maybe this is why some people have .sigs saying offtopic means the moderator missed the joke.
• #### funny, but actually closer to \$1,000,000 (Score:4, Funny)

on Friday July 05, 2002 @12:15PM (#3827906) Homepage Journal
Sloccount run on Slashcode 2.25 gives us this:

Total Estimated Cost to Develop = \$ 996,916

I would have posted the entire output of the program, but unfortunately, their million-dollar lameness filter wouldn't let me!
• #### WTF? (Score:1, Flamebait)

Okay, so now Slashdot is posting this story that is over a year old?

From the header of the paper:

More Than a Gigabuck: Estimating GNU/Linux's Size
David A. Wheeler (dwheeler@dwheeler.com)
June 30, 2001 (updated November 8, 2001)
Version 1.06

• #### Re:WTF? (Score:2, Funny)

The funny thing is that this story was posted on Slashdot a year ago!
• #### Slow news day, Taco? (Score:5, Interesting)

on Friday July 05, 2002 @10:16AM (#3827239)
Good god, people. This app has been out there for years. It's been mentioned in prevoius /. stories. Most people already know about it. This isn't news.

I know I'll get modded down for saying this, but Taco, as an "editor", couldn't you at least have fixed This Guy's Moronic Capitalization Scheme?

• #### Re:Slow news day, Taco? (Score:3, Funny)

...couldn't you at least have fixed This Guy's Moronic Capitalization Scheme?

That's not a scheme. The entire post is a very long title for a very short book he's writing...
• #### Re:Slow news day, Taco? (Score:2)

Forget it. This is Slashdot. You can find articles with,
- Typing errors (25 hours per day)
- Incorrect information
- Seen "n" time stories
everday. We also have "trolls", "flameblaits" here. Once we also had "first posters". But I think they are gone (at least after I set minimum rating to +2).

Get used to it!
• #### Re:Slow news day, Taco? (Score:1)

I'm aware of all this ("The Who Towers" :-). But this just seems worse than usual.
• #### Yeah.... (Score:3, Funny)

on Friday July 05, 2002 @10:16AM (#3827240)

Yeah, that's what happens when you use P2P _WAY_ too much
• #### Interesting. (Score:2)

Although I rember this article in the Past a fiew months ago. But I am to lazy to look it up. But it is instering how the Open Source movement just by a lot of people just doing a lot of little things (and some not so little) has created a product that would take a lot resources for a large company to complete. Open Source Software in my opinion is the only way the Little Guy to play with the Big Guns.
• #### Re:Interesting. (Score:2)

Open Source Software in my opinion is the only way the Little Guy to play with the Big Guns

Not the only way. A bunch of coders could put together a software company and develop great products and recruit top talent. The company would grow and might eventually displace Microsoft.

Microsoft was once a couple of college-age kids who stayed up all night writing code who happened to get the DOS contract.

Companies have an advantage over OSS developers in that when the company is poised for success, people want to invest money in the company in order to reap larger returns later. This gives the company the advantage of more money to recruit top full time talent, etc. Most people regrettably have bills to pay, and the poorly funded nature of most OSS projects will always limit the amount of some people's time that the projects can obtain.

• #### Re:Interesting. (Score:2, Interesting)

Microsoft was once a couple of college-age kids who stayed up all night writing code who happened to get the DOS contract.

The chances of that happening again are fairly slim. This was clearly a case of being in the right place at the right time. A couple of years later and they would have found themselves trying to supplant the standard desktop OS. The combination of the right hardware platform, a 'new' OS and a viable business app all had to click at the same time. Had the PC revolution started years earlier and those same two college kids tried to unseat that alternate universe's Microsoft juggernaut it wouldn't happen, no matter how good a marketeer Bill is.

Companies have an advantage over OSS developers in that when the company is poised for success, people want to invest money in the company in order to reap larger returns later.

Precisely. Given the dominance of Microsoft in the market, those savvy people aren't likely to gamble with funds they want a return on. That's why OSS really is a viable way significant inroads can be made in the market. You now have several companies helping to fund that development. Entire countries are looking to OSS to free them from the Microsoft treadmill of costly upgrades and zany licensing fees. The momentum is building and Microsoft sees it. They don't have a problem with Apple because they see them as a niche player, but I don't think they'd be writing licenses with anti-GPL language in it if they didn't genuinely see it as a threat to marketshare. As much as some of us like to bash Microsoft the executives are not stupid and are quite capable of interpreting the GPL and understanding that their 'take' on the license just isn't supported by the GPL's language.
• #### Re:Interesting. (Score:1)

Although I rember this article in the Past a fiew months ago. But I am to lazy to look it up. But it is instering how the Open Source movement just by a lot of people just doing a lot of little things (and some not so little) has created a product that would take a lot resources for a large company to complete. Open Source Software in my opinion is the only way the Little Guy to play with the Big Guns.

--
If My spelling bugs you. Then my work is done.

In that case, you can go home now.
• #### bad news for Linux? (Score:5, Funny)

on Friday July 05, 2002 @10:19AM (#3827262) Homepage Journal
This looks like a serious problem for Linux distributors like Red Hat, Mandrake, and Debian. They sell their products (which consist of software and support and manuals) for \$40-\$100, usually. Now we see that what they put into their product (i.e., the cost) is orders of magnitude beyond that. Even if Red Hat sold every single copy it packaged (it doesn't even come close), and even if nobody downloaded it for free or copied the CDs for a friend (again, an incredibly optimistic assumption), it would still be looking at huge losses.

This might have worked a few years ago, but with accounting practices coming under scrutiny across the board, I fear that these companies are headed for trouble.
• #### Re:bad news for Linux? (Score:2, Flamebait)

This looks like a serious problem for Linux
distributors like Red Hat, Mandrake, and Debian.
They sell their products ... for \$40-\$100,
usually.

Wrong. Debian doesn't sell anything.

Now we see that what they put into their product
(i.e., the cost) is orders of magnitude beyond
that.

Wrong again. Red Hat's costs are what they actually spend, not what the stuff they distribute would have cost if it had not been given to them.

• #### Re:bad news for Linux? (Score:2, Insightful)

A serious problem for them?

The IRS is going to love me come audit day...
• #### Re:bad news for Linux? (Score:3, Funny)

To: ceo@redhat.com
From: Congress

Dear Sir,

We figured out recently that you are selling software which worths 1 billion dollar at suspiciously low price(~\$30-\$200).

I hereby invite you and your accountants to come to congress to answer some of our questions.

Best Rgds,

P.S. Do not attempt to destroy any accounting records, we are watching you.
• #### It's even more interesting from an accounting view (Score:1, Redundant)

If a corporation buys a Linux seat (or heck, downloads an ISO) then it has acquired an asset. Admittedly a digital one, but an asset nonetheless.

Now, if GE can revalue its pension assets upwards, when their value has gone down, then surely the corporation can revalue it to a 'market' rate of (say) \$10,000 a seat.

Rolling it out to all the people in your organisation then, gosh!, your company is suddenly as profitable as Enron or WorldCom were.

Best of all, so long as you never run out of blank CDs, your company can continue to make massive profits.
• #### Hmmm... sloccount, you say? (Score:1)

woody:~#apt-cache search sloccount
sloccount - Programs for counting physical source lines of code (SLOC) ...so it appears theres a *.deb of it already (or is this an old story...) Hmmm... you be the judge.
• #### Re:Hmmm... sloccount, you say? (Score:1)

EVAL: it appears theres a *.deb of it already (or is this an old story...)

RESULT: TRUE.
• #### value? (Score:3, Insightful)

on Friday July 05, 2002 @10:25AM (#3827292) Homepage
It's fun to see someone do somthing like this. However the fact that most people don't use Linux means that the value of using Linux is less than the cost of using linux. Therefore, since the source code is free there must be other costs that are preventing most people from using Linux.

Instead of wasting time figuring out ficticious pricing based on the way that corporate america prices software, why not figure out a way to remove the aforementioned hidden costs from Linux so that the masses can begin to see what many of us on /. have known for a while: That GNU Linux and Open Source Software represent a great choice.

• #### value / payback Linux-centric? (Score:2)

the fact that most people don't use Linux means that the value of using Linux is less than the cost of using linux.

The cost analysis was done based on linux, however most of the code analysed in fact is for things that run on other platforms, and much of which was in development for years before linux 0.9 hit the 'Net.

So the measure of value based on who uses Linux includes everyone who uses linux-hosted apache servers. The more general case includes everyone who accesses servers that depend on (Perl, BIND, sendmail, mysql .... etc) or were/are developed using (X11, CVS, bitkeeper, emacs, gcc .... etc)

The economic value isn't small. That much I'm pretty certain of, just how big, well it works for me, I'll leave the analysis to the economists.

• #### Re:value? (Score:2, Informative)

cost of using linux.

For many Windows "sysadmins", the cost of is the cost of actually learning the basics of how TCP/IP works, some basics about how their computer works, and basics about how some application level protocols work.

The hidden cost of Linux is the time you have to spend learning things you should already know, for many Windows admins.
• #### Hmmm (Score:1)

It may well containt "over 30 million physical source lines of code (SLOC)", but what about the lines of source code? Eh?

Didn't think about that, did you?
• #### Nonsense (Score:2, Interesting)

I don't think the measurement of the length of code or the time one has or might have been taken to produce the code is in any way related to the value for the use of the software produced.

The same people that argue in these categories do also try to legitimate open source software by their better "quality" in terms of fewer errors. The result of this argument is that MS software would be great to use if it contained less errors. But that's not the main point. As can be seen when MS does such horrible things like allowing themselves to destroy your software (DRM EULA change) the problem is not the result but the way they produce their software. I'd argue that because their development model is bad the resulting software is bad, too, bad that's only a minor problem in comparison to the harm they do to the software culture in general.
• #### No more functions for me... (Score:3, Funny)

on Friday July 05, 2002 @10:31AM (#3827336) Journal
I'll never use macros, functions, classes, or the stl again!

"Look, I wrote a program which does the exact same thing as another program, but mine is worth much, much more!"

• #### Re:No more functions for me... (Score:3, Insightful)

Thats precisely the point. Not using STL or standard functions increases the time taken to code, the amount of programming required and decreases the maintainability of the code -- in short your code would _cost_ _more_ to develop if you were company paying for it.

cost != value in general
• #### Re:No more functions for me... (Score:2)

Not using STL or standard functions increases the time taken to code
If I was to essentially rewrite the STL myself, sure...

I was just implying that I would cut-and-paste every relevant piece of STL code into my program, rather than '#include'ing it.
• #### Re:No more functions for me... (Score:2)

Good one!
I'll never use macros, functions, classes, or the stl again!
"Look, I wrote a program which does the exact same thing as another program, but mine is worth much, much more!"

Costs much, much more. Almost certainly.
Worth much, much more. Maybe.

With the cheaper way, you are at the mercy of the subroutines (of whatever binding) that you are using. The price is some variant of DLL hell.
With the more expensive way, everything is or can be optimized for exactly what you are doing. You don't need to solve problems you don't have. The price is a vastly larger scope of responsibility.

Which is better depends of course on the context.
Good example of the difficulties of defining any rational metric on software.
• #### Yeah, whatever... (Score:1)

Just try explaining it to your insurance company after your house gets robbed, or some idiot airport security inspector accidently trashes your laptop.

Heck, given that theory, one fire should net me more than enough to retire on.
• #### Slashdot costs industry \$1billion/year (Score:5, Interesting)

on Friday July 05, 2002 @10:34AM (#3827359)

I love these kind of stats.

Slashdot has, say, 100,000 US readers per day.

Each spends an hour reading slashdot when they should be working.

Let's say an average Slashdot reader is worth say, \$40 an hour, and they read Slashdot on 300 days during the year.

That means Slashdot costs the USA \$1,200,000,000 dollars a year! Crikey! Don't tell Bush!

• #### Re:Slashdot costs industry \$1billion/year (Score:1)

You're assuming that if I'm not goofing off reading /., then I'm not goofing off come other way...

You underestimate me, sir.

Garg
• #### Re:Slashdot costs industry \$1billion/year (Score:1)

While your post is still funny, I don't think the average Slashdot reader earn \$83,200 a year. If they did we would have a hell of a lot more buying power and could change the landscape of the software industry overnight with the right coordination.
• #### Re:Slashdot costs industry \$1billion/year (Score:2)

He's not saying that you're paid \$40/h. That's what a typical Slashdot reader costs to his employer (salary, rent for building, phone line, equipment, etc.). It's the amount of money the employer must pay in a year for employing an employee divided by the hours worked by the employee. It's probably used with regular time only, else it'll end up lower (more hours without the related rise in costs).
• #### But.. (Score:1)

A shorter program that did the same thing as a longer program, but was more efficient than a longer program might have taken much more time/effort to code.. I don't think it could possibly take this into consideration.
Personally, I'd feel bad if I wrote a program which was just a bunch of spaghetti.
• #### Now we know why... (Score:2)

Microsoft puts so much code bloat into their programs...
• #### His Paper Is Bunk (Score:5, Insightful)

on Friday July 05, 2002 @10:42AM (#3827412) Homepage
To put it mildly...

In his paper, he uses the basic COCOMO model for estimating the cost. This model, quite frankly, sucks. Boehm's book even states, more or less, that the COCOMO model is only accurate to a factor of 10.

Since I no longer have the Boehm book, this quote from a google-found web page will have to do. This is a quote of a quote from Boehm's book, Software Engineering Economics:

"Basic COCOMO is good for rough order of magnitude estimates of software costs, but its accuracy is necessarily limited because of its lack of factors to account for differences in hardware constraints, personnel quality and experience, use of modern tools and techniques, and other project attributes known to have a significant influence on costs."

Basically, this means that the estimate could be anywhere from \$100M->10B in true cost.

At the very least, this kid should have stated which of the model variants he was using.

Better yet, he should have subdivided the source code into multiple categories: kernel+drivers, tools, productivity software, etc. etc., and then applied the various models to them.

Just my 2 bits.

BTW, here [nasa.gov] is the google-found page which has the quote I stole. Plus, it gives a nice, albeit brief, overview of COCOMO.

-d
• #### Re:His Paper Is Bunk (Score:2)

If it's off by a factor of 10, how could it range between 100M and 10B? Wouldn't that be 2 factors of 10? And that's a whole hell of a lot of linux!
• #### No, he's right (Score:2)

Because we don't know if it's off to the low or to the high. If his estimate was 10 times too low, it was really 10B; if it was 10 times to high, it was really 100M.
• #### Re:No, he's right (Score:2)

Oh, okay. Makes sense. I didn't see it said that the "estimate" in question was \$1B. Had that been there it would have been much simpler.
• #### Re:No, he's right (Score:1)

If only someone had included that estimate someplace obvious where you couldn't possibly miss it, like in the story itself...
Wow... A Billion Dollars Worth Of Software On My System For Free!
• #### Re:His Paper Is Bunk (Score:1)

factor of 10 difference from 1B [either greater or lesser]. The original poster is correct.
• #### Re:His Paper Is Bunk (Score:2)

Yes, thank you. Bear in mind that he did not say that the "estimate" was \$1B, which was a key assumption that I did not make. And isn't it possible to tell whether you're off by a factor of 10 too high or too low? I mean, that's something a human should pick up on pretty easily, so one of the options could be dropped, in all likelihood.
• #### Re:His Paper Is Bunk. You're right! (Score:5, Interesting)

on Friday July 05, 2002 @11:14AM (#3827600)
A proof point from Abiword. A just ran the program over our abi-unstable directory. About 300,000 LOC estimated cost to produce about \$10,000,000.

I also ran the program over the abiword plugins directory. Estimated cost to produce, \$1,200,000.

Now I know from direct experience that building the main code base of the AbiWord Word Processor took about 100 times more effort than the plugins.

Cheers

Martin Sevior
AbiWord Developer
• #### For a single project yes. (Score:1)

The idea is that the inaccuracies go both ways. And for a whole lot of projects even out. If you get enough data then the low precision(* won't matter it the accuracy(* is good.(or was it the other way around)

*) Yes i'm using the math definitions of these words, not the dictionery ones. Because the dict. ones suck.

• #### Re: His Paper Is Bunk (Score:1)

> Basically, this means that the estimate could be anywhere from \$100M->10B in true cost.

So if you're buying argue for \$100M, but if you're selling then politely suggest that \$10B is more accurate.

• #### Re:His Paper Is Bunk (Score:2)

Another quote by Boehm, as quoted in Software Engineering A Practitioner's Approach, 3rd edition, by Roger S. Pressman:

Today, a software cost estimation model is doing well if it can estimate software development costs within 20% of actual costs, 70% of the time, and on its own turf (that is, within the class of projects to which it has been calibrated)...This is not as precise as we might like, but it is accurate enough to provide a good deal of help in software engineering economic analysis and decision making.

I type this in from the dusty book sitting on my desk, which was the textbook for my last CS class in college, back in '93. Software engineering. Most useful class I ever took in college.

This is hardly an endorsement of COCOMO. (COnstructive COst MOdel) Not to slam the author of the paper, it was an interesting idea. Just don't go around thinking that his findings are entirely accurate.

• #### Re:His Paper Is Valuable (Score:3, Insightful)

His paper is valuable, priceless even, in that it is throwing a spotlight on a part of the Open Source phenomenon that has not yet come into public discussion.

While I don't know COCOMO, I accept that his numbers are highly suspect. But you have provided a range of accuracy that corrects for this. I am very confident that any reasonable assessment of the Linux development effort is going to be greater than \$100 million and less than \$10 billion.

So it is indisputable that Linux is a resource whose development effort exceeds \$100 million.

And no reasonable person can question that this resource is now available at very low cost to anyone or any institution, on a global level.

It is difficult to see how anyone could not recognize that the use of this resource increases global wealth. Linux does make the world pie bigger.

I think that is the real story here. Linux is a tool, a lever, that has required at least \$100 million of effort to develop, but which anyone can put to work for extremely low cost. I think this kind of phrasing needs to be brought to the attention of those who are being FUDded by groups that feel threatened by Open Source.

• #### Re:His Paper Is Bunk (Score:1, Flamebait)

Talking about accuracy: his program estimates 11.71 person-years to build one of the applications I have developed. Actually, I am working three years in my spare time on it ... maybe I have unknowingly figured out how to warp time ?
• #### Re:His Paper Is Bunk. You're Right. (Score:1)

In his paper, he uses the basic COCOMO model for estimating the cost. This model, quite frankly, sucks. Boehm's book even states, more or less, that the COCOMO model is only accurate to a factor of 10.

I have the COCOMO II book, and I have used the COCOMO model for certain projects. I agree that it is not appropriate here. COCOMO was designed with a narrow focus in mind, and applied best to repeatable projects in a structured work environment. It requires you to estimate parameters for factors such as "Programmer Unfamiliarity", "Precedentedness" "Development Flexibility", "Team Cohesion", "Process Maturity", "Multisite Development", etc. Each of these fudge-factors makes it extremely difficult to correctly apply the model to someone else's work.

Also, each of these factors is likely to be different for each major component.

"I was unable to find a publicly-backed average value for overhead, also called the 'wrap rate.' This value is necessary to estimate the costs of office space, equipment, overhead staff, and so on. I talked to two cost analysts, who suggested that 2.4 would be a reasonable overhead (wrap) rate."(from here [dwheeler.com])

He is using an average overhead rate for a large corporation. He forgot to take in to account the fact that Open-Source developers (generally) don't get office space or health insurance or secretaries. They use their own equipment in their own homes. So a more reasonable overhead rate for this project would be close to 0.1.

So taking all of this in to account, he's probably off by a factor of more than 100. (If you want to know how accurate he was, compare his estimate to the actual cost of developing a Linux distro... ;) While it might have made interesting headlines, I see little value in the actual number.

• #### Don't be confused (Score:2, Interesting)

Well, when I saw the tidbit on /., I thought, wow, a billion dollars worth of software in a Linux distro? That is not what this article says. It simply says that RedHat (would have) had to pay the developers a billion dollars to complete that much work. To find out how much it should probably cost, add some money for profit, and divide that by how many probably users there are. This would only make sense for Linux as a whole, and not just RedHat.
• #### isn't SLOC junk? (Score:3, Interesting)

on Friday July 05, 2002 @10:47AM (#3827433)

if analyzing SLOC says nothing about developer contributions, efficiency, or effectiveness - then isn't estimating value based off SLOC fundamentally flawed?

i mean, you can't have it both ways. Either SLOC shows how productive programmers are, or it doesn't.

if it does - then get over the SLOC analysis in your job reviews.
if it doesn't - then you cannot even remotely accurately guage monetary worth through SLOC.

good luck to the people trying to estimate worth of OSS. good luck to the people trying to estimate the worth of programmers.

i just don't know why people don't count 'Customer Problems Solved Over Time' as the end-all, be-all.

(and time and energy fixing software bugs doesn't count. that's not the customers problem. it's the developers)

who cares how many SLOC are in a product. how many needs of the end user does it fulfill, and how long did it take to get done from the word 'go'?

yeah, you'd need to define customer needs much more carefully than most shops do... but isn't that part of the eXtreme Programming retinue /. loves to trumpet?
• #### No, SLOC isn't Junk, and You Missed the Point (Score:3, Insightful)

if analyzing SLOC says nothing about developer contributions, efficiency, or effectiveness - then isn't estimating value based off SLOC fundamentally flawed?

1) SLOC says nearly *EVERYTHING* about developer contributions. After all, the SLOC is what the developer contributes.

2) Efficiency is a measurable metric, and can be quite as simple as (SLOC/MM)-(NumBugs/MM), where MM=Man-Month.
While there is a variance in the efficiency of programmers, for any given company a median efficiency can be determined. From this, a decent cost-estimate for SLOC may be determined.

i just don't know why people don't count 'Customer Problems Solved Over Time' as the end-all, be-all.

That collected metric would have almost no utility, unless you could atomize the concept of a 'customer problem'.

"Well, it took us 6MM to craete that web-based
accounting system, so it should take us about
the same to develop these kernel drivers"

Something like the above doesn't help anyone. It doesn't help the programmers who take part in recording the data; it doesn't help the managers plan and predict the product lifecycle; it doesn't help the customer in letting him know when to expect to see the next product release.

What you failed to do was drill down further in your analysis of the problem.
Let's say you just finished putting out product "X", which solved some customer problem. Now the customer wants product "Y" to solve some other problem. How do you estimate "Y" based upon "X"?
Answer: Break it down. "X" required the following capabilities: A,B,C, and D. You recorded and tracked the amount of time it took to accomplish each capability.

Now, you break down the customer problem, "Y", and determine what it would take to solve it.
If you did a good job at atomizing the customer problem on project "X", then you should have been able to come up with an average amount of time/AtomicProblem. Apply this metric and Viola!, you should have a good idea about the scope of "Y".
Many people like to take the AtomicProblem and equate it to a SLOC estimate.

What SLOC counting does is try to establish a commonality among various projects so that future projects of various natures may be estimated using previous metrics. This is not perfect, but it should be used as an aid in determining overall project scope and costs.

i mean, you can't have it both ways. Either SLOC shows how productive programmers are, or it doesn't.

SLOC shouldn't be used to estimate programmer productivity. It should be used to estimate project productity.

-D
• #### Re:isn't SLOC junk? (Score:2)

SLOC is not a good measure of how "good" software is; merely of how complex it is, and how long it takes to develop. Studies have shown that SLOC is better at this than most other metrics:
...lines of code has commonly been found to outperform many of the more complex composite measures of software development.

-
Powell, 1998 [nec.com]
(Citeseer says it was published in 1996, but it's actually from 1998 [york.ac.uk].)
• #### Inflated prices? (Score:2)

I kind of hope that nobody uses this to price software that they're selling to a company, lest they lose their credibility. There is no assurance that this guy did not lean toward making this software seem more valuable than it really is, thus making open source software more attractive (because you're getting something for nothing). I'd be careful using this in any other capacity than your home computer for the purpose of having fun.

On a similar note, do the prices seem accurate, for those of you who have used this thing?
• #### handle (Score:1)

"2bits writes"
With such a handle, how much pennies is his opinion worth ?
• #### An interesting thumbsuck (Score:5, Interesting)

<twylite@cMONETrypt.co.za minus painter> on Friday July 05, 2002 @11:04AM (#3827535) Homepage

Running the same SLOC figures against the statistics from the Function Points methodology and you get a different picture. You are looking at 2500 person years of effort, with a cost optimum development time of 6.5 years. However, to deal with the complexity involved you will need approximately 3000 average and 1500 above average developers (at average development rate you could expect a 13 year delivery!). Total price tag: around \$2 billion (that's 2e9, in case your definition of billion is different).

Of course, this is still a very skewed figure. There is no accounting for the quality of code (at the end of such a complex development cycle, you could expect as many as 7 million defects!), and both FP and COCOMO estimate development effort inclusive of design work and documentation, which in OpenSource typically don't match those in mature commercial development environments (from which the FP and COCOMO statistics are derived).

There is also a huge, and invalid, assumption made by the author, regarding the application of COCOMO (and my FP calculations suffer the same problem). The complexity of a system is MORE than the sum of its parts. This is because developer productivity declines as system complexity increases.

At 10,000 FP, as developer is often only 60% as productive compared to 1,000 FP. The situation is obviously far worse at 300,000 FP (the entire distribution), yet the kernel itself only weighs in at around 20,000 FP. And even then, clear modularisation reduces complexity for individual developers. So it is grossly unfair to base calculations on the system as a whole.

The kernel (around 2.5 MLOC) as a single system would be a task for 300 skilled developers over around 3 years, while the Gimp (around 500 KLOC, still near the top of the list in size) would be looking at 50 developers over 18 months. More complex projects need relatively more time and more developers. Doing all these projects in parallel (assuming it were possible - which is isn't because of dependancies, and that's another factor) would take less than the most complex task (kernel = 3 years) and relatively less developers than estimated based on the complexity of that task (30 MLOC / 2.5 MLOC * 300 developers = max 3600 for entire distribution). Max cost: 3600 * 3 * \$55k = \$594 million.

And you're STILL not accounting for the fact that employing someone costs a lot more than just paying a salary. Which puts all estimates (mine and the authors) up.

• #### Re:An interesting thumbsuck (Score:2)

Total price tag: around \$2 billion (that's 2e9, in case your definition of billion is different).

Man, what a bargain! Over two thousand man years of effort for only \$1,024!

Of course, the poster meant 10e9, not 2e9. Or 2e30, I guess, but I'm assuming 10e9.
• #### Re:An interesting thumbsuck (Score:1)

Err...
2e9 is a short form of 2 * 10^9, ya know...
So, yes, he was right...
• #### Re:An interesting thumbsuck (Score:2)

2e9 is a short form of 2 * 10^9, ya know...

Really? No, I didn't know. Ooooops....
• #### Re:An interesting thumbsuck (Score:1)

He meant 2e9 which expands as 2 * (10 ^ 9)
• #### as everybody knows : (Score:2, Insightful)

<flamebait>
Linux is free (as in beer) if you time is worthless.
</flamebait>
• #### Linux's true cost: (Score:4, Funny)

by Anonymous Coward on Friday July 05, 2002 @11:18AM (#3827621)
Priceless
• #### Lies, damned lies, and statistics (Score:3, Funny)

on Friday July 05, 2002 @11:32AM (#3827681)
Obligatory Simpsons quote:

"Oh, people can come up with statistics to prove anything, Kent. 14% of people know that."
• #### Worrying? (Score:1)

Disclaimer: I know very little of what goes on at kernel level on a unix system.

Does anybody else find it worrying that the kernel is by far the largest component of RHL? I kinda expected it to be one of the smaller of the large projects; way smaller than the likes of KDE / GNOME / Gimp / etc..

• #### Debian 10k packages (Score:1)

I just wonder how much the Debian/GNU Linux would have costed based on the same calculation knowing that it now includes more than 10K packages [debian.org]
• #### value? (Score:2)

I thought the value of a program (or any other noun) is related only to the amount of money that someone will pay for it ... If you can convince someone to pay \$1,000,000 for linux, then it's worth \$1,000,000. that's it.

a nifty little formula which analyzes the actual FUNCTION of a program to figure out how much it's worth is all well and good, but it doesn't mean anything. I bet the functional worth of Internet Explorer is quite a lot, but no one's willing to pay for it, so it's, in reality, worth nothing.

• #### Re:value? (Score:2, Informative)

>If you can convince someone to pay \$1,000,000 for linux, then it's worth \$1,000,000. that's it

I bet if it was an exlusive licence, M\$ whould shell it up :)
• #### What would M\$ pay (Score:1)

What would Microsoft pay to buy up an exclusive right to use all of the Linux distributions? Maybe \$1B is on the low end?
• #### Fun but meaningless... (Score:2)

These stats, of course, are fun but entirely meaningless.

If you are going to take the entire design cost into one copy, ok, so let's also add the cost of the CD (probably five billion or so in development cost) and the cost of the Microprocessor used to beta-test: around 50 billion I am guessing. Quite an expensive copy of RedHat.

The serious point is: to be at all meaningful, "cost" needs to be divided by number of users over the lifetime of the product. I would love to see those stats (and compare them to MS).

I venture Linux would still outvalue MS on that basis (if only because there are fewer users).

Michael
• #### visible man, with invisible shirt (Score:4, Funny)

on Friday July 05, 2002 @12:12PM (#3827894) Homepage

Note to Mr. Wheeler: when your shirt is the same color as the background of your web site, you might want to put a thin border around the picture with your favorite free image editing software.. though I'm wondering why exactly your picture is there at all..

• #### Of course it's a billion dollars. (Score:2)

Of course it cost a billion dollars to write the software everyone has on their machine. But Microsoft has \$40 billion in the bank and collects \$7-15 billion a year in revenues.

You do the math.

--Blair
• #### The software industry is losing BILLIONS! (Score:1)

Just as I thought! Every copy of linux is costing the software industry over a billion dollars!
• #### Wow! (Score:2)

One thing I got was that the amount of lines of code in Mozilla were about the same as everything else (minus the kernel) put together...
• #### damages incurred.. (Score:2)

so does this mean that all the people who had their place raided and their linux box taken that they now incurred \$1billion in damages???
• #### Software is not worth "SLOC" (Score:2)

Software value should not be calculated by the amount vendor spends, but by the amount "user gains".

Linux saves software cost. Also linux saves you from NIMDA. But linux means more expenses in tech team.

So value of linux is =
Value of Windows
+ Value that would be lost due to NIMDA, etc
- Cost of tech department difference

Which I guess is "much" more than \$1G in total.
• #### cost to develop != market value (Score:2)

I mean, come on, sure, some of this stuff was written by the finest minds in the industry, who could easily have feched premium rates for their work, but chose not to for "the good of humanity" (or some other variation of the rationalization). Then there's the contributions from people who might not be able to hold down a job bussing tables at Denny's. Those are two extremes. You could easily compute an average cost from hours spent there.

But what could you sell the software for?

Nothing. It's market value is zero - because it's market is a Linux box, and we all know that nobody will pay for software on Linux, right? ;)

• #### Unfair! (Score:1)

This method severely underestimates Perl programmers' efforts! :-P
• #### Maybe we should call it Mozilla/Linux :-) (Score:2)

* The largest components (in order) were the Linux kernel (including device drivers), Mozilla (Netscape's open source web system including a web browser, email client, and HTML editor), the X window system (the infrastructure for the graphical user interface), gcc (a compilation system), gdb (for debugging), basic binary tools, emacs (a text editor and far more), LAPACK (a large Fortran library for numerical linear algebra), the Gimp (a bitmapped graphics editor), and MySQL (a relational database system).

Since the second largest part of the system is now Mozilla and not gcc mabye we should stop calling it GNU/Linux and start calling it Mozilla/Linux. :-)

Vanguard
• #### Utterly ridiculous (Score:2)

This method of software cost estimation is patently ridiculous. I can't even imagine how anyone could take him even remotely seriously.

Counting MySQL, PHP, etc. lines of code as part of the OS is misleading -- did he count MS SQL, Access, etc. and other pieces of software which could be bundled with a particular flavor of Windows? Consumer Windows OS distribution contains a lot more application code (e.g. Office bundled, vendor-supplied drivers/goodies/etc.) than the 'stock' Windows code numbers listed in his comparisons. Further Windows does not contain individual drivers for every single piece of hardware out there, it has some generic drivers and then relies upon you vendor to supply the drivers for them, which is typically free. How many vendor-supplied drivers vs. homebrew are in Linux?

Further, he bases his cost as if Red Hat 7.x was a complete rebuild -- as if every single line of code was re-written from the previous version, so therefore so-much-ever-million-man-minutes went into making it is wrong. Someone invented the wheel many (tens of?) thousands of years ago. I bet a lot of man hours have been spent refining the wheel. Do auto manufacturers include that into the cost of cars? Do they make you pay for 10,000 years of refinement from the rock-with-a-hole-in-it to wagon wheels to the run-flat tires of today? No, they include the cost of the materials that went into making it and certainly *some* R time, but that cost calculation is determined from various sources, not 'how many molecules of rubber are in my tire'.

His LOC calculation is misleading as well.
if( something )
{
stuff
}
else
{
stuff
}

Contain 4 superfluous lines of code. According to his calculations I did 2x more work than if I wrote it like this:
if( something )
stuff
else
stuff

If you're frisky you can write it in a single line:
if( something ) { stuff } else { stuff }

Why this article was even mentioned here is beyond me. If it I could moderate it I'd put it at (-1: Stupid).
• #### Yeah, right. (Score:5, Insightful)

on Friday July 05, 2002 @01:41PM (#3828461)
According to this program, a little calculator program I've occasionally worked on in my spare time over the last couple years would have cost \$ 85,659 to develop. (At the money that I was making as a co-op, roughly 3 years, full-time.) Another project, which my two roommates and I have been working on for most of the last year, again in our spare time, is reported to be \$ 1,877,009.

So either I'm doing enough work to be worth several hundred thousand dollars a year, or this thing is complete nonsense.

• #### Wow! (Score:2)

I must be a genius...I ran this on the free-time project [burnap.net] I started last December and it tells me that it would take a man-year to reproduce!

Wow! Apparently I can do the work of four normal programmers... time to talk to my boss about a raise!

• #### Responses from the author!! (Score:5, Informative)

on Friday July 05, 2002 @03:02PM (#3829037) Homepage Journal
Since I'm the author of this paper (More than a Gigabuck: Estimating GNU/Linux's Size), I suppose I should respond to some of the comments made here:
1. How did I arrive at the estimate of \$1 billion? The short answer is "see the paper". I wrote a tool to compute the number of physical source lines of code (SLOC), used Boehm's well-repected COCOMO model to determine the effort (in person-years) from the SLOC, and then converted that effort into an estimated development cost using programmer salary averages and wrap rates. See the paper for the details.
2. It's true that there's no necessary relationship between cost and value. I don't see how that contradicts the paper; the paper never claims that there is one. Clearly, you can spend \$1 million to develop a program that is worthless; it happens all too often. Proprietary vendors make money by making more money from sales than it cost to develop the software, so proprietary software vendors are very aware of the difference betwen value and cost. Look carefully at the phrasing. All the paper says is that "Had this Linux distribution been developed by conventional proprietary means, it would have cost over \$1.08 billion (1,000 million) to develop in the U.S. (in year 2000 dollars)." The paper does not claim that Red Hat actually spent \$1 billion, or that their distributions' sale value is related to this development cost figure. Indeed, what the paper shows is that by using OSS/FS approaches, it's possible to build large systems that would cost over \$1 billion to develop using conventional proprietary means.
3. Several have complained about the use of COCOMO for estimating effort from lines of code. COCOMO is certainly not perfect, but it's a well-tested, widely accepted, and widely used model. It's also very clearly documented, so there are no "hidden assumptions". In particular, the model and constants used in COCOMO are based on a wide variety of real projects. It's rediculous to believe that its results are accurate to the nearest hour; as noted throughout the paper, this is only an estimate. A few people have noted that their software took less time to develop, but there are many factors at work. One is that highly experienced people can develop code more quickly; however, not everyone is equally skilled, so with large systems and many developers this effect should even out. Another is that COCOMO includes design time, documentation time, and testing time. Also, this includes not only an average U.S. programmers' salary, but also the wrap rate for overhead (building costs, insurance, and so on) - which programmers don't see in their paychecks, but are certainly paid for by traditional businesses. Don't like COCOMO? That's fine - use your own model, preferably one that's been widely tested in the industry. This paper shows you exactly how to do this sort of analysis.
4. I do not claim that every line of code is a "complete rebuild". I'm simply trying to estimate how much it would be take to build the system if it was rebuilt.
5. The problems with physical SLOC's sensitivity to formatting is well-documented, and I note that in the paper. It's not as bad as you'd think when analyzing larger systems, due to averaging. But if you would rather use logical SLOC, feel free to write code to do that and contribute it to sloccount. In short, instead of complaining, contribute.
6. As documented in the paper, I only used Basic COCOMO. I don't have enough information about each project to really use the more detailed COCOMO models effectively. However, the paper has all you need if you want to do more detailed analysis using other effort and cost estimation models, including the versions of COCOMO that require more input (e.g., Intermediate COCOMO).
7. SLOC isn't a very good measure of productivity, but it's generally a very good way to estimate effort. This distinction is important. If programmer A can do something in 100 SLOC, and programmer B needs 10,000 SLOC to do the same thing, it's crazy to think that programmer B is more productive. But it is reasonable to believe that it will take more effort for programmer B to do the same thing (and thus more money). It's possible to game this (e.g., creating separate print commands for each letter to be output as a string), but the resulting code is pretty ugly and programmers generally only intentionally game things if they believe having higher SLOC values will improve their salaries (an unlikely claim for the software in the Red Hat Linux distribution). The paper only measures effort to develop Red Hat Linux 7.1. You'll have to determine if that's a comparable level of functionality to other systems.
8. This doesn't count "the operating system". It counts "Red Hat Linux 7.1". Thus, it includes the word processors, spread sheets, and so on. It's not as easy to determine what to leave out; you could compute just the minimal "base", but few people would want to use such a system. Again, I think that's extremely clearly stated in the paper.
9. Others have been inspired by my paper to do an analysis of the Debian GNU/Linux distribution, using my tool sloccount [dwheeler.com]. You can see their very interesting paper Counting Potatoes: The size of Debian 2.2 at http://people.debian.org /~jgb/debian-counting [debian.org]. They found that Debian 2.2 includes more than 55 million physical SLOC, and would have cost nearly \$1.9 billion USD using over 14,000 person-years to develop using traditional proprietary techniques.
10. Yeah, I need a better picture. I just haven't gotten around to it.
• #### I don't see what all the fuss is about... (Score:1)

"Estimating the Size/Cost of Linux"

Let see now: the size is five letters (thank god I don't have to use my other hand!) and the cost is of course "Free" (look ma... no hands!)

• #### What. A. Load. Of. Bollocks. (Score:4, Interesting)

on Friday July 05, 2002 @03:12PM (#3829097) Homepage
We've run some metrics here at work.

We worked out that it took 8 MAN YEARS to write some code.

That's all well and good, but it's been mostly me writing it on 37.5-hour weeks for the past 10 months.

This is a big "duh" in my book.

The human mind ordinarily operates at only ten percent of its capacity -- the rest is overhead for the operating system.

Working...