Want to read Slashdot from your mobile device? Point it at m.slashdot.org and keep reading!

 



Forgot your password?
typodupeerror
×

Comment Re:Seriously? (Score 1) 366

This post has just the right amount of contempt to put a smile on my face. I like you.

Thankyouverymuch! I'll be here all week.

Don't forget to tip your bartenders and waitresses...

But seriously, it does make me feel a little bit like we're living in the "Idiocracy"-universe to think that this obvious of errors actually made it into the original design, let alone off the launch pad.

I read up a little on this, and supposedly, there IS a hardware watchdog timer; but the timeout appears to have been set to either 30 or 45 DAYS (WTF?!?); so half the mission will be over before they even get a CHANCE at a hardware Reset. But, since the frickin' log file may have already written over critical parts of the OS, it may be a very moot point.

As for the log file debacle, they explain that they had a fix ready to upload "on the next pass"; but that is when the bird fell silent. Ok, whatever. NASA has to upload code in-flight, too. But I can't understand why there even IS a log file in the first place. Who is going to read it? If they are transmitting "beacons" every 15 seconds (WAY too often IMHO), then they should have simply transmitted the last "n" records of the log file at that time, and then wiped that buffer clean, rather than keeping a log FILE, FFS!!!

But unfortunately for "Dr" Bill Nye and friends, that's all hindsight now.

Comment Re:Seriously? (Score 1) 366

Yes, I know, I was simply providing information about what he ACTUALLY said.

(Since I was/am too lazy to look the rest of it up for sure, the full line was something close to "You played it for her, so play it for me. Play it, Sam.")

I think you are correct; but this is where I have to sheepishly hand-in my movie-geek card, and admit that in all my 59 years, I have never once seen Casablanca from start to finish. In fact, I think I've only seen about two scenes from that movie, ever! (But that happens to be one of the two scenes that I have seen)...

Comment Re:UAT (Score 1) 366

But that's not really the point I was making. There's a difference between writing an app for a phone or a desktop and writing software for a safety critical embedded systems. The whole approach you take towards developing the software is different.

Hence my diatribe above about the flywheel-balancer and handicap-van projects I have worked on. IOW, you don't have to go to space to find yourself involved in "mission-critical" applications. It isn't like there's designing for "Spaceship OS 1.2" vs. "Candy Crush Saga" with nothing in-between.

It's the old adage. "Good judgment comes from experience. Experience comes from bad judgment." Having more people with good judgment would have helped them enormously.

Or, as I always say: "Experience is what you get, when you don't get what you want."

Comment Re:UAT (Score 1) 366

I've been doing this kind of work for decades

So have I. And I my specialty is in R&D of industrial control systems. Although I have never sent anything into space, I have been designing controls that if they crash, or even if they crash-then-recover, must do so in a graceful manner to avoid causing damage to equipment, or even injury or death.

For instance, one of my first embedded projects was a controller for a dynamic balancing machine. This particular dynamic balancer happened to be spinning-up Flywheels for Caterpillar Earth-Movers. Each flywheel was about 4 ft. in diameter, and weighed about a ton (literally). Then we spun it up to 1800 RPM, and figured out where the imbalance(s) were.

I figured out REAL early on (and without a "team") that if I "watchdogged" (or otherwise found myself back at the start of the code), that I couldn't just ASSUME that I could re-initialize Ports, Data-Direction Registers, etc; but rather had to "look around" at various inputs to see WTF was the REAL state of the machine, THEN try to do an ORDERLY shutdown and restart. Never once caused a flywheel to act like a Frisbee...

BTW, at that time, I was 20 years old, and completely self-taught.

So sorry; just because you are an "engineer", doesn't automagically make you a better Developer. Stupid is as Stupid Does.

Oh, and then there was the Project where I was contracted to develop a "Failover" system for Handicapped vans. Worked a treat. Never failed to detect input/output mismatch or switchover to the backup systems, and in far less time than a human driver could detect the failure, let alone reach for the "switch to the backup" switch while trying to keep their out-of-control van from flying into the ditch...

This is rocket science we're talking about. It's hard.

So are a LOT of embedded industrial control tasks. And MOST of them don't really allow-for a simple "Reset" in the middle of a Run-condition without "Very Bad Things"(tm) happening.

Moral of the story: You don't need a degree; you need an IQ. And experience.

Comment Re:Seriously? (Score 1) 366

indeed!

(of course, my comment stems from the more likely than not scenario of any intelligence visiting from outside the solar system will be of the noncorporeal nature - a radio signal or less likely, but still more likely than an organic being, a computer program maybe encased in a robot probe).

I AM NOMAD!

Sterilize! Ster - I - LIZE!!!

Comment Re:Seriously? (Score 1) 366

I was going by the original "quote"

And I say "quote" because it is not clear he ever said it.

Yeah, now that you mention it; I seem to remember something about that. Just like the Greta Garbo "I want to be alone" or the Humphrey Bogart "Play it again, Sam" quotes-that-were-never-actually-quotes.

Comment Re:UAT (Score 1) 366

This kind of failure is caused by amateurs making amateur mistakes. It was caused by application programmers who don't understand the consequence of failure in a constrained environment where you can't just click a mouse to restart the program. It was caused by poor planning and a lack of understanding of the environment in which they were designing. This was caused by hiring coders instead of experienced engineers. It was caused by trying to do it cheap rather than spending the money to do it right. They got what they paid for.

I agree wholeheartedly with everything you said, except the last 3 sentences. I submit that they could have found PLENTY of "coders" (a/k/a "hobbyists") that would have not made these kinds of easily-foreseeable design errors.

These were errors that anyone with a few embedded designs under their belt, "engineer" or no, would have caught with ease. I know, because I have, repeatedly.

Comment Re:UAT (Score 1) 366

One way to test that is to simulate time. A simulation wouldn't need to wait 15 actual seconds, it could speed up time such that transmissions run immediately after the last, until the test has surpassed the expected lifetime of the mission.

If this were able to be done once every millisecond instead of once every 15 seconds, they would have run across the bug within 14 minutes.

But, even before the testing, comes the DESIGN. No one with more than two active neurons should have designed a system that cannot reboot itself, or that tried to grow a file (especially one that shared storage-space with the OS!!!) infinitely.

Bad Developer. Bad! Bonk Bonk on the head!

Comment Re:Systems Administration 101 (Score 1) 366

A thousand times, this. Any competent sysadmin would have pointed out this flaw in the design process (as well as the one about not being able to remotely reboot your server), but I'm willing to bet good money that they didn't involve a competent sysadmin, because who needs those any longer...

SysAdmin, hell! Any 12-year-old HOBBYIST would have pointed this out; or would have known better in the first fucking place!

Comment Re:Seriously? (Score 0) 366

One report I read made it sound like they were aware of the bug for a while. It's possible that they had to launch with an old version of the software because the patch wasn't ready yet, and being a secondary payload on a launch you have no say whatsoever as to the launch date. They probably expected to be able to upload the patch after launch, but the log filled up faster than expected.

That being said, it is shoddy programming to blindly write to a log on a resource-constrained embedded platform (or any platform, really. Just especially so on something like this), so somebody definitely goofed. All I am saying is that it probably was caught by testing, but couldn't be fixed in time due to various constraints. It was a dumb move on the developer's part to not do enough diligence and to rely too heavily on QA in the first place.

Quit making excuses for them. They DESERVE to lose their spacecraft!

They could have trolled the customer-list for Sparkfun Electronics or HackADay and in 5 minutes found a better developer than whoever designed THIS piece of shit-pile of code. Seriously.

Comment Re:Seriously? (Score 4, Insightful) 366

Testing might have found it, but I'd say that regardless of testing they should assume something bad will happen with the software and have a mechanism in place to force reboot & update on a locked up system. Maybe they thought they did. Its a shame if they can't get it fixed.

Speaking as an embedded developer, this is completely inexcusable.

Not having a Watchdog, PLUS not making the limited-filesize log file "roll-over", is clearly Amateur-Hour stuff. Who wrote this code, anyway? An eight year old???

Next we're going to hear that they bricked it with a software update, because they didn't think they needed to checksum the uploads, or provide enough RAM to hold the updated code before they re-flashed the OS, or something similar.

Pathetic. They deserve to lose their spacecraft.

Fortunately, if extraterrestrials discover the floating hulk of this abomination, they will (rightly) conclude that there is no intelligent life worth exploiting on this planet, and will decide not to enslave us...

Slashdot Top Deals

Today is a good day for information-gathering. Read someone else's mail file.

Working...