Want to read Slashdot from your mobile device? Point it at m.slashdot.org and keep reading!


Forgot your password?
The Internet

Distributed.net Suspends OGR project 110

st.n. writes "According to this statement, distributed.net is suspending its new OGR-24 project, which was started just a week ago, because of a missing ntohl() call in the buffering code. They were 24% done already and have to start over again now. "
This discussion has been archived. No new comments can be posted.

Distributed.net Suspends OGR project

Comments Filter:
  • By "So far, we have completed approximately 24% of the total stubs, mostly smaller ones." doesn't that mean that they're not 24% done in terms of time? I think the statement "They were 24% done already and have to start over again now." is a little misleading...
  • With encryption cracking efforts, it is easy to know if the solution was correct. But for problems like OGR, who verifies the results of such computations? I mean, I think there must be at least two other separate efforts that arrive at the same solution for me to be convinced that 'yeah, this is the answer'. Does anyone know how this is currently being done? Can they prove it mathematically?
  • Well, as far as I'm concerned, if this produces just one shorter ladder, its a 'success'...as far as producing *THE*SHORTEST*ONE*, well, I wouldn't say that this is a gauruntee it is produced. But, so what?
  • If Intel x86 chips, where the bytes are ordered the wrong way round, had not been so sucessful due to their most unholy alliance with software where much more is the wrong way round, we would never have had this problem.

    If you can figure out this sentence, then you are probably too smart to think of a reason why one would write a number backwards (in memory). ;-)
  • One of my boxes is a sad old DX2/66 and had only just managed to do one plan. Moving to 5-stubs instead of 4-stubs will stop me thinking that one had frozen.
  • Aren't there some more useful problems to work on? Optimal Golumb rulers aren't very useful and they're not very interesting mathematically. It's just a problem that's NP-hard. It isn't even really NP-hard, in that you can make an near-optimal Golumb ruler with much less work. (Many problems, from linear programming to the travelling salesman problem are like that - the absolutely optimal solution is NP-hard, but the almost-optimal solution is cheap to compute.)

    How about a big distributed 3D rendering job instead? At least there'd be pretty pictures.

  • by marcus ( 1916 ) on Wednesday February 23, 2000 @08:06AM (#1251326) Journal
    Is this childlike reasoning ever going to stop?

    You left computers on last week. Were they going to be on anyway? If so, there was no waste.

    Is it cold where those computers are? Would the heater be running anyway? If so, there was no waste.

    If the only reason that the computers were left on was so that you could gain ground in the stats race, then guess what? YOU wasted resources. No one else did.

    So, pay your electric bill and live with yourself as you are and shut up about it, or learn from your mistake and don't do it again. Either way, we really don't need to hear whining about the resources that YOU wasted.
  • Umm, I think those "idiots" are doing this in there spare time and for free.

    Anyone is welcome to spend all there free time working on a project. But please let us know though when you get started, so we can call you an idiot and slow when it doesn't live up to my expectations on how it should be running.

  • I noticed this morning and switched my client back onto RC5. I fully understand how this kind of thing happens - there is no way the people at the center of a project like Distributed.net can test all possible combinations of hardware and software.

    This may be a silly question, but I'm going to ask it anyway: Did (some) of the OGR blocks take a huge amount of time for others, or was it just me? I'm running the client on a Celeron 300A (not a power machine, but a lot faster than the 386 I started running RC5 on) and some of the OGR blocks took over 14 hours. I didn't know we'd done anything like 25% of the 'keyspace', but it looked to me like this project was going to go on for ever, given the speed of my computer.

  • It's a pity, but I agree with you. RC5-64 is incredibly boring and pointless; it proves nothing that wasn't shown by the previous RC5-56 contest. OGR interested me because it was new and was potentially, slightly useful to some people. I only run the d.net client on one box now (plus any others I've forgotten about at work), and the rest of them are in the "APM Contest", a global network of computers attempting to conserve whatever small bits of electricity they can in their spare time. A more worthwhile cause IMHO, and it's open-source too.
  • "1 little, 2 little, 3 little endians...." :)
  • So far as i know the intent is to find the shortest *known* ruler, not the absolue shortest. I'll go out on a limb and say that the difference is important :) The shortest known may not be the shortest... and it would be pretty easy to confirm if the new ruler is shorter than the existing shortest known... no in this context confirmation wouldn't be hard at all... Although this is unrelated... i fear the day when it is announced that 20% of the RC5-64 keyspace must be rechecked... Urgru
  • Given the past screwups of distributed.net, this latest snafu isn't suprising. In fact, I had a bad feeling about the whole thing when I switched my clients over to OGR from RC5 - just call it a gut feeling that somewhere during the contest, they would announce they had a bug and their results were bogus. I guess this is it. Can anyone tell me about other distributed projects they are aware of? I know of dcypher.net and thats about it. Maybe its time we start a "new" distributed.net - heck, throw in a "new" slashdot too while we're at it....
  • It's nothing to worry about, some stubs is fast some is way slower. It's not like RC5 where it takes the same amount of time for _any_ key.
  • My second stub took almost 2 days to complete, the next two where finished within a few hours, then I got another that ran a bit more than 24 hours.
  • My P2-266 has been taking just under 24 hours to complete a stub.
  • OGR stub 24/3-5-18-21 (396,509,188,869 nodes)
    took my computer 4.19:18:59.28 at 897,815.86
    nodes/sec. So I don't think, it's your computer
    alone :)
    But the FAQ states that OGR takes long
  • by marcus ( 1916 )
    In case you can't figure it out, that's "Huh?" spelled backwards.

    If you can comprehend this equation, then you are probably too smart to think of a reason why owt would write a word the wrong way around(at all).
  • I had some blocks take 2.5 days on this 350 PII. Then again, I had a few go through in about 6 hours. Depends on which stubs you were given to check.....

  • Um.. the reason you would store the number that way is easy. And.. it's not "storing a number backwards" it's little endian byte order, and it does make perfect sense.

    The reason they are stored that way is because the current processors are based on the original 8-bit processors plus the 16-bit addons. Now you get a 32-bit processor that needs to be backwards compatible.

    They are stored in memory in reverse byte order, and processed in the same order.
    Say you have a 32 bit unsigned int variable at memory location 0x100 (for simplicity) that means it's taking 4 bytes, 0x100->0x103. So, if you store a number like 45 (0x0000002D) in it, it would go into memory as 2D 00 00 00. If you want to copy that to a byte, you copy memory location 0x100, and you've got it. In big endian byte order, it'd be stored as 00 00 00 2D (natural form) but if you want to store it in a byte, you have to grab memory location 0x100 + sizeof(unsigned long int) - sizeof(byte) so you'd get location 0x103, which is where that single byte is stored.

    Does all that make sense? You may think something like "oh, storing a number backwards is evil and makes no sense", but if you are programming in assembler, it does make ALOT of sense, and you're very glad that the processor does it for you.

    Just my 2 cents.

  • I still participate in distributed.net efforts using many boxes. I have them up all the time anyways. I consider it increased productivity to let someone do something marginally useful with my idle clocks.

    That said, if they would release the source for their clients they would find these problems sooner (I suspect) and there would be less wasted time and resources...

    GPL the client!
  • I programmed assembler on a SPARC, and never missed the reverse byte order. Little Endian is counterintuitive. Binary backwards compatibility has always been an issue of binary only programs. If you have the source, just recompile... IMHO little Endian is an ugly hack and looks really ridiculous nowadays.
  • However, the question is whether their software is mathematically correct when it comes to the brute force approach. Are there any bugs in there which would produce a non-optimal solution? If so we'll end up with a series of near-optimal Goulomb rulers, which is better than a poke in the eye but not what the project was about.
  • by Pascal Q. Porcupine ( 4467 ) on Wednesday February 23, 2000 @08:32AM (#1251346) Homepage
    Well, the "justification" for little-endian machines was, back in the Good Old Days, it was very useful to be able to have a free downward typecast on pointers (with big-endian you have to either add some value to your index, or AND with a bitmask after the read, whereas with little-endian you just take the same pointer and use a smaller-sized read).

    I never said it was a good justification. :) After all, in situations like that, you usually aren't using pointers anyway...

    Unfortunately, because of x86's influence, a lot of other vendors have bastardized their architectures. For example, newer Alphas have both big- and little-endian modes, and apparently AlphaLinux runs in the little-endian mode simply for easy compatability with x86. IMO, they should do it in big-endian so that fun bugs show up causing them to need to properly ntohl() and htonl() all their data. It'd make for much more consistency with the porting efforts to REAL platforms, such as PPC and Sparc (that isn't to say that Alpha isn't a real platform, of course, but it can hardly be treated with respect when it's got a little-endian mode simply to pander to x86 apologists).

    At least IA-64 is switchable endian, though (except in IA-32 mode, obviously), so at least there's some validation on that front. Hopefully the IA-64 Linux porting effort is doing the Right Thing and using the big-endian mode.
    "'Is not a quine' is not a quine" is a quine [nmsu.edu].

  • AAAALLLLLLRIIIIGHT! My first down-moderation! I LOVE IT!

    BRING IT ON! I've got 100+ Karma to burn and it STARTS TODAY!

    Let the word go out to both moderators and trolls alike, TODAY DONKPUNCH IS OFFICIALLY ON THE DARK SIDE! I have become a moderator's worst freakin' nightmare -- an over-caffeinated offtopic troll with a default 2!

    Why did this have to happen? Where did things go wrong? Was I forced into it? Did the down-moderation destroy my self-esteem? Am I just a burnout? Is my unique humor and insight unappreciated by my peers in my time? Will I be remembered as a misunderstood genius when I'm gone?

    I predict a new article: "Ask Slashdot: DonkPunch -- when good posters go bad. How can we keep this from happening again?"

    E! News and VH-1 will feature a special "Behind The Dot" episode: "The Rise And Fall of DonkPunch's Karma" They'll show scenes of me posting pro-Linux suckup posts to desperately get my Karma back up to 50 or so. All of my posts will be at least 200 lines long, requiring a "Read the Rest of This Comment" link.

    Ye Gods, Moderators, don't you see what you've done? You've created a monster! You've banished me to the land of the trolls AND I LIKE IT HERE! Seems to me the trolls have a heck of a lot more fun on slashduh anyway.

    Now you will pay the price for your lack of vision!
  • by Phil Gregory ( 1042 ) <phil_g+slashdot@pobox.com> on Wednesday February 23, 2000 @08:37AM (#1251348) Homepage

    You might be surprised at the varied applications of many "pure" mathematical problems.

    The only application I am certain of for OGRs is radio telescope arrangement. When surveying space, the bigger the telescope (although these tend to look more like satellite dishes), the better. However, you can have two smallish dishes a certain distance apart function in tandem just like a single dish with a diameter equal to the separation of the dishes.

    With an array of smaller dishes, an ideal arrangement will maximize the number of different distances between dishes (maximizing the frequencies which can be observed). Sound familiar? OGR solutions can be mapped onto radio telescope placements.

    I'm sure that there are other applications where the number of differences between a cetain number of points needs to be maximized, but I don't know of any off the top of my head.

    --Phil (I remember first being introduced to Golumb Rulers via a link from the (now defunct) Geek Site of the Day.)
  • www.mersenne.org [mersenne.org] List of distributed projects that make sense (at least some of them) in contrast to most distributed.net projects. Once the distributed RC5 cracking make sense, but now it is pretty useless as the computing time is just an extrapolation of the time needed to crack the other messages...
  • It's quite easy to verify, actually. The ruler will only have 24 marks on it, after all. Anyone with a lot of time and a calculator with an subtract button could do it. There's...let's see... 23+22+21+...+2+1 pairs to check. That works out to 276(?) subtractions you would have to do to verify it manually.
  • OGR solutions can be mapped onto radio telescope placements.

    Of course, radio telescopes don't have to be arranged in a one-dimensional fashion.

  • [snip] REAL platforms, such as PPC and Sparc (that isn't to say that Alpha isn't a real platform, of course, but it can hardly be treated with respect when it's got a little-endian mode simply to pander to x86 apologists).

    Er, doesn't PPC also support both endians?

  • Go smoke a duck. *quack*

    The only alpha system that uses big endian is Cray. Yes, even the PowerPC is switchable. In fact, most RISC "based" chips are switchable. I think TaligentOS used big endian as well (IIRC).

    <i>"Hopefully the IA-64 Linux porting effort is doing the Right Thing and using the big-endian mode."</i>

    I'd rather have an Elbrus 2k any day. ;-)
    Or how about a snap-on mood ring for Celeron CPU's? Alas.. I digress..

  • They were 24% done already and have to start over
    again now." is a little misleading...

    It's very misleading - They don't have to start from scratch, as they could just re-issue the not-intel (little-endian?) stubs.

    (Correct me if I'm wrong :-)

  • They were 24% done already and have to start over again now." is a little misleading... It's very misleading - They don't have to start from scratch, as they could just re-issue the not-intel (little-endian?) stubs. (Correct me if I'm wrong :-)
  • by Pike ( 52876 ) on Wednesday February 23, 2000 @09:07AM (#1251358) Homepage Journal
    Let me get this straight...you switched back to RC5 because you thought OGR was going to take forever?

    Isn't RC5 the contest that has been dragging on for more than 2 years?? And they still haven't finished even a quarter of the keyspace!

    With that kind of delay, D.net won't be proving anything about the vulnerability of RC5-64 when/if they find the solution. They may get a $10,000 check, but they won't score any usefulness or political points.

    Who cares it takes 3 days for a single box to complete an OGR node if we still finish the project in only a month or two? I welcome useful, fast, record-breaking projects like this to break up the glacial RC5 stuff.

  • I don't mean to nitpick here, but wouldn't 0x0000002D be 00 2D 00 00 in little endian, not 2D 00 00 00? The savings aren't as great as you think..Actually I heard the reason why intel chose little endian was because of legal problems with motorola, although I could be way off...
  • RC5 may be taking forever, but when they'll finish it, they'll be going really fast. Breaking another code would take less than 2 years, maybe even less than a year so all is not lost.

    Just my 2 cents,
    Benoit Potvin
  • Stop placing the blame on the sites that provide software for you to run in the background for wasting our resources. You were the one that chose to ran the software... It's generally known that nothing in life is perfect...

    If you're that concerned, just shut your computer down at night...

    This isn't neccessarily directed at you... what you posted may have been a joke, but i got quite irked that while back when there were problems with seti@home and people started guestimating how many resources were wasted as a result.
  • by scheme ( 19778 ) on Wednesday February 23, 2000 @09:40AM (#1251364)

    OGRs have application to data communications, cryptography and lithography. I would say that it has a lot of use since it may lead to faster/better encryption/data transfers as well as better/cheaper chip fabs and indirectly cheaper cpu and microprocessors. A lot more useful than pretty pictures,

  • No, 0x2dL is 2d 00 00 00 in little-endian. 00 2d 00 00 is middle-endian (aka "PDP-endian" or "fucked-endian").
  • There is nothing wrong with little-endian. I remember a long ways back on lkml when some people were trying to convince Linus that the IA64 port should by big-endian. They came up will all sorts of reasoning, such is it would make it easier to look at hex dumps. The point is that it doesn't matter (BTW Linus I believe made up his mind to use little-endian in order to get better IA32 compatibility). Little-endian makes no less sense than big-endian. Arabic numerals make no more sense than some sort of numeral system that uses little-endian (is there a numeral system that uses little-endian). The only thing wrong with either system is the people who think that one is better than the other.
  • Once again, much to the detriment of the project, we have proved that closed source, closed beta doesn't work.

    Why not open it up before you loose everyone who's working on the project?

    Hey Rob, Thanks for that tarball!
  • OK, so how do you think when you think of memory? "Backwards" is only a state of mind, i.e. in what order do you read bits and bytes?

    Let's store the number 0x12345678 in little endian.

    Remember that one writes a bitmask always like this:
    that would make a 32 bit number written as:

    corresponding with a byteorder of
    0x03,0x02,0x01,0x00 in memory.
    | 78 | 56 | 34 | 12 |

    Wrong way? No. Logical way. If you want byte X of this word, get [BASE+X]

    Oh, and I usually see memory in an incremented form, starting at zero, going to xxx Megs.
    0x00,0x01,0x02,0x03 in memory.
    | 12 | 34 | 56 | 78 |

    Little endian now suddenly seems the right way!

    For Japanese and Chinese people, please rotate your monitor 180 degrees and reverse all statements I made in this article and you might be a fan of big endian, who knows.

    Also, but I don't remember exactly, there seem to be an advantage for little endian if you want to do fast hardware-addition and an advantage for big endian if you want to do fast hardware-multiplication.

  • Actually I think it is 00 00 2d 00.

    PDP's stored 16-bit quantities low-byte first. They had no hardware concept of 32-bit quantities. When some joker added the 32-bit support they put the 16-bit pieces in big-endian order resulting in "fucked-endian".

    I think this format was used for 32-bit floating point on a lot of machines, even if their 32-bit integers were consistent.

  • Look at http://www.distributed.net/source/ [distributed.net]. If you had visited the distributed.net site, you probably would have noticed the "Source" option in the lefthand menu bar. But, hey, it's easier to whine, yes?


  • they should do it in big-endian so that fun bugs show up causing them to need to properly ntohl() and htonl() all their data

    Uh, no. If you bothered to type "man ntohl" you will see that these calls are no-ops on big-endian machines, and do something on little-endian machines. Developing on a big-endian machine would be even worse for finding these bugs. The only way to find them is to use *both* types of machines.

    Also, little-endian makes perfect sense. A memory location can be accessed as an 8,16,32, etc bit quantity, and a significant subset (including the interesting area around zero) is reported as the same value. On a big-endian machine NO integers other than 0 and -1 are reported as the same value.

    I for one am very glad that Wintel has switched the standard to little-endian. Human brains are programmed backwards. Nothing about a number can be determined until the entire thing is parsed when we read big-endian. In little-endian a great deal about a number, in particual what it's divisors are, can be determined without reading the whole thing. Admittedly useless for human use, but rather important for a computer.


    C'mon wimps! Is that the best you can do? I'm laughing in your humorless, petrified, grits-covered, moderating faces.

    What is that!? A FreeBSD pin!!?? ON YOUR UNIFORM!!!???

    Just you wait.... I won't be the last. Even if you crush my karma with your dogma; even if you cancel my login, there will be others. Foogle has already started to turn. I am convinced that Signal 11 will someday turn. In fact, I believe that Signal 11 is already a troll who is just building up unstoppable karma for THE DAY OF RECKONING.

    As Mariah Carey sang so eloquently in "The Matrix", "My Heart Will Go On."

    Someday, perhaps even Bruce Perens will submit a down-moderated post? WHAT WILL YOU DO THEN? Will it be the end of everything you've believed in? Will it be the end of all you hold dear? Will you have to go back to actually WRITING CODE instead of sharing your feelings on what it means to be a geek?

    I know some of you long-time slashduh readers will be frightened by my tone. Fear not. I'm still the same warm, fuzzy, lovable DonkPunch. You can still order plush DonkPunch toys from the Copyleft website.

    But the humorless moderators have wronged me and today I must dwell in the land of the trolls. You know what? It's kind of nice here! These guys have cable and a VERY nice cappucino machine. Best of all, they actually WRITE CODE instead of whining for big companies to do the work for them. If Trollmastah, GritsBoy, and NakedAndPetrifiedMan don't mind, I might stay awhile.
  • As I understand it from dB's note, it's not any one client that's broken. The problem is introduced by sharing buffers between different endian machines. Intels will read it's buffers correctly; Sparcs will read it's buffers correctly. HOWEVER, a sparc reading an intel buffer and vice versa will corrupt the counts (and possibly more, but no one has said anything about that.)

    There's no way they can point a finger at any specific binary. And in fact, detecting the corruption is not easy -- just how many nodes qualifies as "messed up"?

    This is a very bad omen for Distributed.Net. It's taken two years to deploy OGR only to see it broken out of the starting gate. This is a very stupid mistake; one that would not have happened if people had paid attention to their work and tested a supported function (buffer sharing)
  • I programmed assembler on a SPARC, and never missed the reverse byte order.
    Perhaps you'd miss the performance by doing it the other way when your memory bus isn't as wide as your operands (and, if you're programming in an HLL, why are you fussing about byte-order anyway?).

    History: In the ancient past, before dinosaurs evolved and foot-long dragonflies sported above the cycads in the massive forests of the carboniferous, there was the 8-bit memory bus. Now, with an 8-bit memory bus, you have to fetch your operands 1 byte at a time. Suppose you are doing a 16-bit add-immediate with your carboniferous-era processor (which you may still be able to find fossilized somewhere, like at ham swaps). To do your add, you have to first add the least significant bytes, then add the most significant bytes with carry. If you store your operands big-endian, you have to complicate your processor in one of two ways:

    1. Use a temporary register to store the high byte until after the low byte is fetched and added. Then you can finally use the high byte.
    2. Increment the PC by two, fetch the low byte, decrement the PC, fetch the high byte, then increment the PC by two.
    Primitive processors of the carboniferous could not handle such complexity. They evolved the little-endian system to allow them to pipeline. Storing the operands little-endian allows the operand fetch to occur during the instruction decode cycle, then the least-significant byte can be added (in the 8-bit ALU) dudring the same cycle as the most-significant-byte fetch. This requires less hardware and takes only three cycles (assuming no microcode). The 6502 stores its offsets little-endian also, if I recall correctly.

    Once you have a wider memory bus which can pull entire operands in one memory cycle there's still some pressure to remain little-endian and no reason for changing things; you've got all this design inventory and software tools and other things that are little-endian, and the only reason an HLL programmer would care is if she's type-punning or doing some other untoward thing. So that's why things haven't changed.

  • Um, no it is NOT Open Source.

    They release only parts of the code. They do not release the code for sending/recieiving buffers (the very part that was broken in this case). If you had *followed* the link to the source and read the FAQ you would know that.

    But I guess it's easier to whine, eh? ;-)
  • Consider this. If he had been running the seti@home client, the resources would have been put to better use. Maby that is what he ment by wasted resources... and not some silly status thing. In this case the resources would have been wasted. Perhaps he really wishes to put his extra cpu cycles to a good use and this frustrated him.

  • That your perspective matches his. However, this kind of comment has been all over the d.net mailing lists - for years, and I have yet to see that perspective spelled out.

    On the other hand, he still doesn't have to gripe to us about his frustrations. He can run both projects and thus risk less if either one fails. Remember, Linux is a multiuser, multitasking os and he can choose to risk only half of his resources if he bothers to think about it beforehand.

    So far, in my experience with others that have made the same post, they have simply not bothered to think, or read any mailing list archives, or put any effort into the project other than just installing/running the client. When something goes wrong, rather than think about the situation and possibly come up with a solution to the problem, they whine as if by reflex.
  • You've hit it square on the head. Big-Endian and Little-endian are essentially arbitrary. There may be technical reasons why one is more useful than the other for certain purposes, but considering little-endian "backwards" is just silly.
  • The alpha is little endian because it's grandfather (the PDP-11) and its father (the vax) were little endian. It has nothing to do with the X86. See www.op.net/docs/RFCs/ien-137 about the holy wars between big and little endian (back when the X86 was a gleem in the 4004 and 8008's eye's.
  • Your info is correct but I think your resoning is flawed.

    The endian switching on an alpha is a legacy DEC thing that has nothing to do with Intel. Alpha engineers needed to be data compatable with PDP-11 and VAX architechtures... See page 6 of Alpha RISC Architecture for Programmers by James S. Evans and Richard Eckhouse. It shows that Alpha is Little endian just like the VAX, and PDP-11. Further on it the book it shows you how to put the Alpha into VAX floating and integer support modes.

    BTW=> Mips also has multiple endian support. I think that both PPC or at least Power[1,2,3] have endian switch ability in at least the spec. No of these platforms do this for intel compatabilty. They do it so that people will use there processors. Some tasks work better on certain endian systems. I couldn't find any reference to endian switching in my Sparc documentation but I only have documents for the latest revs of the sparc spec. I believe (do to the mips like nature of the sparc line) it origninaly had endian switching abilities but again I could be wrong.

    Remeber many people using RISC processors are comming from 68XX series chips or intel chips and need to have data compatability. It has very little to do with pandering to Intel and much more to do with providing a rebust flexible solution to the customer.


    "... That probably would have sounded more commanding if I wasn't wearing my yummy sushi pajamas..."
    -Buffy Summers
    Goodbye Iowa
  • by Anonymous Coward
    This is a very bad omen for Distributed.Net. It's taken two years to deploy OGR only to see it broken out of the starting gate.

    In the meanwhile COSM hasn't released anything but bloated headerfiles with only comments, no code.

    And you are going to release something you call a client-server developerkit because COSM isn't coming in the near future at all.

    And even on your website (here [mithral.com]), you admit that there are weeks that you do NOTHING.

    We'll see when (if?) you release that precious Cosm and have it bugfree from the beginning.

    You should be glad that the guys over at distributed.net found this bug within 1 week of the OGR start.

  • by Nugget94M ( 3631 ) on Wednesday February 23, 2000 @11:53AM (#1251385) Homepage
    Yes, you're correct. It's very enlightening that this error existed in one of the few pieces of code which is not present in the public source. It's unfortunate that we're required to obscure the buffer-handling and network protocol aspects of the client for project integrity, and we'd very much prefer it not be that way. It's unlikely that this sort of error would have survived the scrutiny of public source.

    For those hwo haven't read it, Jeff Lawson wrote a document [distributed.net] which explains why there are still portions of the client which are necessarily closed-source. The link is easy to miss, so I'm assuming those who are raising the issue here on slashdot have simply missed it.

  • Hey, how about they start a new distributed project that will allow me to boot Windows in under 30 minutes. I figure we could have it down to 15 minutes if we go like 30000 other computers involved. :-)
  • First of all, the only thing d.net and Cosm have in common is that I started both. The two projects are NOT related in any way, and aren't even in the same field.

    And it's obvious you've never even looked at the Cosm website, or you would know it wasn't "COSM".

    You'd also know there is already 467KB of code released, not even counting the headers. Which isn't bad since noone is getting paid.

  • So far as i know the intent is to find the shortest *known* ruler, not the absolue shortest.

    We don't need to "find" the "shortest *known* ruler" because its already known (DUH). The project is to find the optimal ruler (i.e. the absolute shortest) for any given number of marks. Why else would you search the entire space of possibilities?


  • I joined the Distributed.net effort specifically because I wanted to help out working with the golomb rulers, a very interesting mathematical project. I really have no interest in cracking any form of encryption. Yes, it can be done brute force. We know that already. We don't know what the 24 mark optimal golomb ruler is. So finding it would be cool. Anyhow, nowhere on the distributed.net page does it say when the new clients will be available... does anyone have any insider information on whether the client upgrades they are considering are going to take awhile?

    Adding the conversion call is easy, sure, but the 'improved progress reporting' and such... any idea how long it'll take?

  • Since you say "we're required to obscure," I presume you are part of distributed.net. Please understand that I respectfully disagree with your policy. In other words, it's not the choice I'd make, but I don't consider you to be a bunch of blinkered philistine code despots either!

    I simply do not think hiding the code prevents a thing and opening might prevent embarassing incidents like this one.

    I *do* understand that opening the code makes it easier to generate "fake" data, and that it requires person-hours to undo such shenanigans. If you had more bogus data, it might overwhelm your ability to remove it and block the generators of it.

    You might find, however, some creative remedies out in the world if you let your peers review it.

    In any case, I did read the document you cite, I just disagree with it. That disagreement is tempered by respect for your point of view and your accomplishments. I certainly haven't built anything that matches the acheivements of distributed.net.

    Good luck on the fix, and meanwhile, back to RC5-64!
  • by linuxci ( 3530 ) on Wednesday February 23, 2000 @01:54PM (#1251395)
    So it appears that we'll all have to go and download new clients when they're released to get round this bug. AFAIK distributed.net will discard any blocks submitted by the old clients but the old clients will still attempt to fetch blocks off the keyserver. It'd be a good idea to change the project ID slightly (e.g. to OGR-a) so that the old clients will not try to fetch these blocks in the first place. Because if these old clients are just downloading OGR blocks that just get discarded it's a waste of the CPU time where they could be doing RC5 instead.

    Basically there has to be some system to stop the buggy clients downloading blocks and wasting their time.

    Make use of your spare CPU time!
  • by Nugget94M ( 3631 ) on Wednesday February 23, 2000 @02:02PM (#1251396) Homepage
    This is slashdot... After the horse is dead, we skin it and learn to play drums.

    The fear is not bogus data that we have to remove. Rather, the true damage to the integrity of a project comes from bogus data which is indistinguishable from legitimate data. Infinite man-hours of effort cannot correct the damage done by a false-negative in the case of a crypto contest.

    It's also a bit optimistic to assume that we'd be able to isolate a committed vandal to the degree required to successfully filter their bogus submissions. An attacker could simply instruct their malicious client to submit work using participant emails randomly taken from stats, easily blending their work in with legitimate work. We can't assume that every attacker will send in their work with a consistent IP or email address.

    I'm not making the argument that there's not room for improvement in the current scheme, but it's difficult for us to become too enamored with solutions that only offer a marginal improvement over the current model.

    We welcome suggestions and creative remedies from out in the world. If someone has a solution to this quandary, we'd love to implement it. This client trust issue is the holy grail of distributed computing projects, and we hope that it's solvable. I don't think that a lack of access to our buffer file formats is a stumbling block which would prevent a creative and insightful person from devising a solution, however. We don't need to open that source in order to allow someone to solve the issue.

    Thanks for your comments and support, and if you do have any proposal which would allow us to trust the work performed by an open source client, we'd love to put it to work.

  • by Anonymous Coward
    This comes as no surprise to me. In fact I almost expected it. The first sign came as far back as the fall of 1998, shortly after they quietly announced the OGR project. Silby updated his plan and stated that his was trying to work out how stats would work with OGR. Pretty innocent right? No. Stats are very simple with OGR, just give each person credit for how many nodes they have checked. Thats all. Of course it is different from all the crypto projects which have blocks which are known how many keys are in each before they are checked, but the concept is so simple to comprehend, credit people for how much work they have done in a quantative way using the natural quantative way that comes with checking for Golomb Rulers, the number of nodes checked. This failing of understanding such a simple concept was a clear foreboding event.

    Another thing that foretold distributed.net emminent failure is that it took them almost 2 years to roll out. Compare this to the time it took to roll out for the first DES-II contest, less than a single month. Admitily DES-II was easier to add to their clients then OGR but it also tells of their inability to understand the fundamental parts of an OGR search.

    Another bad sign for distributed.net was when the original people who ran the OGR-20 and 21 searches started up again after distributed.net had contacted them and announced their plans of doing OGR. The original OGR people then went on to do OGR-22 and OGR-23.

    The most recent failing came with the CSC contest. They released a new client with both CSC and OGR support, and later they stated that they would start CSC. But no word of OGR, they obviously thought they had a working OGR client, but quickly found out otherwise. When I used the client it was buggy as hell. I submited a bug report to distributed.net's bugzilla database that I could faithfully reproduce a bug with OGR reported by one of distributed.net's own. The bug was later marked as INVALID, even though I was still able to get the client to tell me that I had completed a full OGR stub in less than a second.

    Finally, distributed.net was not able to provide even a rough estimate of how much work, or which stubs would take longer. The original OGR project had easily created a mapping of the expected number of nodes in each stub before the project even started.

    In conclusion distributed.net doesn't seem to have a clue on what it is doing. It doesn't seem to really want to fulfil its mission statement, they only seem bent on relinquishing their control over so many users computers.

    I am *NOT* saying that this is what distributed.net is trying to do. This is what I have come to think about distributed.net given all the input I can gather from distributed.net. I would really appreciate a reply from someone inside distributed.net to try and explain to me what is really going on behind the scenes that can rationally explain why a group of people with good intentions can come out of it with such bad results.

  • .... Why can't we find Silby's .plan pages anymore? Would Nugget or someone care to comment on that? I understand he was critical of what D.Net has become, but is that really reason to wipe his plan pages?

    Hey Rob, Thanks for that tarball!
  • by Anonymous Coward
    Why can't we find Silby's .plan pages anymore?

    Although I'm sure the official reason is that Silby is no longer a member of distributed.net so therefore he does not have a plan file any longer with them. But that doesn't explain the more than swift action of removing his plan file within minutes of his posting his resignation from distributed.net which contained a very critical view of distributed.net. I think what is going on here is that distributed.net didn't like what Silby had to say in that plan update and immedially deleted so that no one could view it. This is the only incident where a former distributed.net person has publicly denounced distributed.net. There is of course a few other disgruntled distributed.net insiders that have left, but they haven't done something as public as what Silby did.

    Of course Silby outsmarted them by posting his plan update just minutes before midnight UTC, which is when the daily plan updates get mailed out on the distributed.net mailing list. I will paste here his last plan update for historical sake.

    [begin Silby plan update]

    silby :: 10-Jan-2000 23:53 (Monday) ::

    Although I'm saddened to say so, this will be my final plan update as a distributed.net staff member (though for historical reasons I do hope my plan archive will be maintained.)

    In my absence over the past few months, I've tried to observe distributed.net from the outside in order to determine if it is an organization I really wish to devote my time to. After much study, I determined that it is not.

    My observations indicate that distributed.net has fallen into the pattern set by Microsoft: Ensure that all important code is kept private so that competitors will not gain advantage, and viciously attack competitors. This strategy has unfortunately had the same effect it has on Microsoft; late to market products with sometimes disasterous bugs.

    While there are undoubtedly positive aspects of distributed.net I'm missing, the end result is that distributed.net has not become what I have dreamed. I had always thought that it would grow from a group of people who compete in encryption contests into a group who creates tools which allow anyone to create their own distributed networks. While becoming a standard tool like apache/bind/sendmail may have been too lofty of a dream, I'm still disappointed that no efforts have been made to push in that direction.

    Nonetheless, I'm sure distributed.net will have a long future, and I wish the remaining staff good luck. I simply don't see myself fitting into the organization any longer.

    [end Silby plan update]

  • One of the things that I find to be almost as interesting was the fact that this was NOT carried by Slashdot. It WAS submitted. I even submitted it. I asked why it was rejected, and was told by CmdrTaco himself that they were looking into it. I guess they never finished looking into it. :-) It was low of them to remove it, and very unprofessional IM(NS)HO. I hope they restore it, and appologize to Silby for deleting it.

    Hey Rob, Thanks for that tarball!
  • You might wish to consider fixing your computer.
  • Put the resources to better use?! SETI duplicates their data to hundreds of different clients! D.Net has never duplicated more than twice, and the vast majority of our work is new!
  • I had one of 1.3 teranodes, so don't worry :)
  • You miss the point. The point wasn't that Silby left. The point was that D.Net censored him. Badly. THAT was the story, and I think it still is one. YMMV.

    Hey Rob, Thanks for that tarball!
  • by Anonymous Coward
    You miss the point. The point wasn't that Silby left. The point was that D.Net censored him. Badly. THAT was the story, and I think it still is one.

    Even if that is your point, I still don't think it is worthy of slashdot. First of all, it is only a single person being censored. And second of all it is not true censorship, Silby can post that plan update to his website if he wants. Much worse censorship is happening, stuff that could change the Internet in the future, just look at the story today about the vote about filterware in Holland, MI. Another point is that distributed.net does lots of stuff like this. I have heard that they discovered a very serious bug in a Solaris client in the RC5 core, and they kept it completely under wraps, so that they wouldn't look bad. And who knows what else, I have a feeling that there is more hidden beneath the deeps.

    So in summary my point is that there are bigger censorship things going on then just this one Silby thing in the world, and that inside distributed.net this event is far from unique.

  • What about SETI@home? [berkeley.edu] Has it become too "Popular" to be the in thing to run anymore? It's a moot point for me anyway; Both of SETI@home's linux distributions for my box (i686-pc-linux-gnu-gnulibc2.1 and i686-pc-linux-gnulibc1-static) core dumped on me under RH 6.1. So, I guess I'll keep my piddly 133mhz pentium busy doing distributed.net stuff, even if they do occasionaly have bugs with their code; at least it runs.
  • You're not the only one-- I'm running this on a fairly beefy box and it still takes a LONG time (as in several days) to complete a single work unit. In order for the daily stats to be useful, it seems like one ought to be able to finish more than one work unit per day.

    It's my hope that this is what they mean by: we will have the opportunity to improve some other aspects of client operation. In particular, we plan to add more configurable checkpointing and a better display of progress in their announcement [distributed.net].

    As to the speed of the whole search-- that would depend as much on the size of the search space as on the speed of the client. Clearly we are looking at a real small search space if it were 25% searched in only a few days.

    I know they never counted my seven days' work since it's all still sitting in my buff-out.ogr file. I'm using the dnetc v2.8007-458-CTR-00020606 for Linux (Linux 2.2.12-20) client-- perhaps it's client-specific?

  • Yes, I KNOW they're NOPs on big-endian, but they actually DO stuff on little-endian... the problem is that a lot of code out there assumes the endianness of the machine, so rather than storing it in network byte order it stores in host byte order. The reason I said to properly-encapsulate all data in ntohl and htonl is so that the same code would work on both types of platform, which is, of course, the whole POINT.
    "'Is not a quine' is not a quine" is a quine [nmsu.edu].
  • by Anonymous Coward
    I wish people would stop complaining!
    D.Net is a non-profit project, they do it on their spare time.
    If you don't like them, don't install the client.

    It's soooo easy to get annoyed when you don't know what's going on behind the scenes...
    Yes, I'm also annoyed, but I can live with it. I'm more annoyed with all the "D.Net is a failure"-posts here on Slashdot.

    Get a life
  • > Basically there has to be some system to stop the buggy clients
    downloading blocks and wasting their time.

    A tidier way might be to encode the minimum client version needed in
    each block. So, when OGR is restarted, the blocks given out are tagged
    as requiring build 460 or whatever, and older clients don't touch
    those blocks. If this were part of the protocol when clients talk to
    keyservers, clients could avoid downloading "newer" blocks entirely. A
    client could also tell the user when an upgrade is required to cope
    with the latest blocks.

    Of course, this doesn't get around the current problem but could work
    in future clients, and also as a way of making sure old beta-test
    clients don't do any "real" work.
  • http://cosm.mithral.com/
  • Easiest way to illustrate this, that I can think of, is the date format.

    Suppose you had your files with filenames corresponding with dates and alphabetical listing on your machine.

    Using the 'human' date format of: dd/mm/ccyy or mm/dd/ccyy would produce dates like:
    06151979 (15th June 1979)
    2612200 (26th Decemeber 2000)
    which you couldn't easily look at and figure out the latest file date, but a 'computer friendly format' of ccyymmdd (in 'most significant' format) produce:
    and the dates would be in the correct order.

    It's just way things are - things that are easy for humans to understand aren't always the easiest/quickest way for computers so the method needs changing.

    Just my 0.02c
    RIchy C.
  • <I>It's my hope that this is what they mean by: "we will have the opportunity to improve some other aspects of client operation. In particular, we plan to add more configurable checkpointing and a better display of progress" in their announcement.</I>

    Your hope is correct. During our beta testing (using OGR-23), we found -4 stubs to be perfectly sized, but they are obviously a bit large for OGR-24, so we will be switching to -5 stubs. The estimates I've heard are that the -5 stubs will be ~1/10th the size of -4 stubs, but this is only an estimation.

    As mentioned, we will also be changing the status bar so that it updates more frequently.

  • I've thought of dozens! Unfortunately, all of them are just as easily compromised as the original. It is a tough nut to crack. I've thought about an MD5 hash that includes the result and the client code memory image, but since a programmer can just write a routine that calculates an MD5 sum over his bogus data set and his real client image, it isn't much of a solution, is it?

    The same problem seems to exist with networked games. How do you prevent cheating?

    I'm not sure you can prevent cheating, but can't you at least use public key cryptography (specifically digital signatures) to definitively identify sources? You then double check a random packet that comes in with that signature. If it is good, you can be reasonably sure that everything that comes in signed with that key is good? (You don't want to validate a fixed packet, say, the first packet -- an attacker would send back a real first packet and then send fake ones). From then on you retest random packets from random users. If you retain the cryptographically verified identity of the origin of each result, you can quickly isolate all results from a source that shows up with false negatives in a random check.

    If everyone participating knows that they will definitely be checked at least once for validity, and may be checked additional times at any time, then I think the incentive to cheat will be brought down several notches.

    Sure, in this scheme someone can implement the public key crypto algorithm solely to leigitimately send fake data, but since they have to send you the public key and must sign each result set with the private key, you WILL be able to identify and remove the bogus source when you detect it.

    I realize this is a lot more server side work! I also realize it may be impossible because of crypto export regulations (ding dang it!), but I still think a scheme along these lines could be implemented without too much difficulty.

    This idea may be full of holes (I worked it out as I typed, so I haven't exactly "bench audited" it!), but I think the premise is sound. It doesn't prevent anything, but it is likely to detect abuse and any abuse can easily be isolated and removed...

    Thoughts, criticisms, abusive epithets?
  • Do this sum in your head......

    + 4,417,091

    Did you do it Left to Right, or "Backwards?"

    Nipok Nek

    Or, call me by my new Indian Name... Little Bigendian :)
  • OGRs have application to data communications, cryptography and lithography.

    Could you tell us more about that? I know a bit about the data communications part, but where are the applications in cryptography and lithography?

    Or do you have any links to sites describing that?


    - Stephan.
    Carpe diem!
  • First, _WE_ have jobs. That means we have more important things to do than write free software -- no matter how much we may like writing code for Cosm. Second, there are very few core people writing code for Cosm -- and we are actually writing code, not using someone else's stuff (most of DCTI's cores originated outside DCTI. DCTI eventually generated their own 'core' code or optimized the aquired code.)

    As for "releas[ing] something", the Cosm development has been open to the public for a year. There's an IRC channel on EFNet, several mailing lists, a web site full of information, and a public cvs tree of the code. I invite you to go look. (the irc channel is #cosm, btw)

    Yes, there are weeks that go by when "nothing" happens. We have lives to live and jobs that pay our bills to attend to. Work may be slow at times, but it's never at a full stand-still. There's alot of stuff going on that isn't in the CVS tree (yet) pending some legal wording to prevent people from stealing our work as their own. (Plus there's no point in making it available yet. And before you ask, I'm refering to stats processing code -- extremely fast and efficient. (Blindingly fast compared to the existing DCTI stats.))

    My philsophy is that there is no such thing as "bug free". However, there is a distinction to be made where things are supposed to work. This is called "testing" and "verification" in the software industry. In this specific case, DCTI failed to verify proper functionality of sharing buffer between machines -- a published feature, or it used to be.
  • Sigh. You're right. I think it's part of a larger puzzle too. That's why it's important to report. D.Net (which I DO support) should be above such petty behavour.

    Hey Rob, Thanks for that tarball!
  • This problem seems like it would be a prime candidate for a DNA computer to solve. It would have made a nice race to see which technology could have arrived at the answer first.

Matter cannot be created or destroyed, nor can it be returned without a receipt.