History's Worst Software Bugs

cy

Flashaholic
Joined
Dec 20, 2003
Messages
8,186
Location
USA
History's Worst Software Bugs

"Last month automaker Toyota announced a recall of 160,000 of its Prius hybrid vehicles following reports of vehicle warning lights illuminating for no reason, and cars' gasoline engines stalling unexpectedly. But unlike the large-scale auto recalls of years past, the root of the Prius issue wasn't a hardware problem -- it was a programming error in the smart car's embedded code. The Prius had a software bug."

http://wired.com/news/technology/bugs/0,2924,69355,00.html?tw=wn_tophead_1
 

The_LED_Museum

*Retired*
Joined
Aug 12, 2000
Messages
19,414
Location
Federal Way WA. USA
In the early Commodore 64 computers, if you were at the bottom of the screen and type 81 characters and then hit backspace, the computer would lock up tighter than Fort Knox, requiring a cold start to become functional again - of course wiping everything out that was not previously saved.
 

greenLED

Flashaholic
Joined
Mar 26, 2004
Messages
13,263
Location
La Tiquicia
Commodore 64! I remember those! We had a Tandy, saved programs (BASIC) on regular audio tapes. [/highjack]

As software becomes more and more sophisticated, bugs seem to proliferate. It must be really hard to think of ALL the potential user actions.
 

gadget_lover

Flashaholic
Joined
Oct 7, 2003
Messages
7,148
Location
Near Silicon Valley (too near)
The Prius bug has caused no deaths or injuries, so it was not the worst. It's simply quite visible.

You could have several categories: Most people killed, most injured, most horrific, most people impacted, nmost publicized, most exotic.

Most impacted was probably the one in 1992 where a programmer left out an "=" sign and knocked out the long distance phone network nation wide. The phone network itself was fine, but the data network that lets you place the phone calls (the SS7 network) was disabled by the bad code.

Most exotic was probably the Mars probe that litho braked upon attempting to land on mars. That's litho as in "rock". Crashed right into the surface at full speed. The programmer used one unit of measure, the data entry person used another. Feet and meters IIRC.

Most horrific (IMHO) is the bug that caused a radiation therapy machine (for cancer treatment) to deliver a terrible overdose resulting in eventual death. The code multiplied where it should have divided in certain cases.

Most killed? I know that at least one plane crash has been attributed to the Airbus fly by wire system. It uses "modes" to determine how to fly the plane (cruise, assent, descend, etc) and there have been at least one case where the pilot had it set to the wrong mode. The pilot tried to force it to rise while the plane thought it should be landing and the conflict ened in a stall and crash. That was attributed to pilot error. That's what I remember of the report, anyway.

Most injuries due to programming? That would probably be the NBC line-up this year. Absolutely terrible.

:)
 

drizzle

Enlightened
Joined
Oct 23, 2003
Messages
840
Location
Seattle, WA
Nice list gadget_lover!

The only thing that came to mind for me was not technically a software bug, but it is such a classic engineering screw-up I had to share...

Most people are aware that when the Hubble Space Telescope (HST) first started sending back pictures there was a big problem. They eventually had to use another Shuttle mission just to go up and repair it. I happened to attend a talk given shortly after the discovery of the problem by an astronomer close to the project who had the story behind it.

NASA contracted with two different companies to make the main mirrored lens for the HST. The reasoning was that if one were to break or be flawed they could use the other. They also reasoned that the two teams should have no communication so that if one team used a flawed approach it wouldn't affect the other.

The two teams made their lenses and both were tested extensively and both were judged to have passed the rigorous standards set by NASA. The problem was that, on the lens that was used, the same wrong calculation was used by both the designers and the testers to determine the curvature of the lens. So when the testers ran their tests on the finished lens it tested out perfect even though it was the wrong curve. It's a bit like having a very high quality pair of eyeglasses that are perfectly polished and curved but for the wrong prescription.

The real irony was that the restrictions that NASA put on keeping the entire process from each company separate actually prevented the discovery of the flaw. Had the other team been allowed to test the flawed lens, the flaw would have been found and they would have simply used the other lens, which as far as I know is perfect and is probably sitting in a warehouse somewhere.

As you might have guessed, NASA changed their policy after this.

I hope I haven't completely bored you all, this was really interesting to me, especially having worked on some government contracts. It was not at all suprising that this could have happened.
 

gadget_lover

Flashaholic
Joined
Oct 7, 2003
Messages
7,148
Location
Near Silicon Valley (too near)
Thanks Drizzle.

There is a software engineering maxim that says the programmer should never be the only one to test the software. The programmer is likely to make the same mistakes testing as he did when coding. By the same token, there are usually two types of testing. White box and black box.

White box testing (IIRC) is accomplished when the testers have the full source code and design documents. It's good for spotting logic problems and typos as well as making sure every bracnh of the program gets tested.

Black box testing is driven by the design (or was it the requirement ???) documents. The testers don't know the program logic except as it is revealed by the documents and their tests.

A good professional testing team will do white and black box testing.

It sounds like the Hubble team used white box testing. I thought their protocol would have had each team test the other's product.
 

KC2IXE

Flashaholic*
Joined
Apr 21, 2001
Messages
2,237
Location
New York City
I remember the ESS7 crash - Ugly - I've alternately heard it was:

= instead of == (as you said)

and

a switch statement without a break, so it fell through

Both of which would be a valid syntax, but NOT give you what you want
 

elgarak

Flashlight Enthusiast
Joined
Jul 30, 2004
Messages
1,045
Location
Florida
Most killed? I know that at least one plane crash has been attributed to the Airbus fly by wire system. It uses "modes" to determine how to fly the plane (cruise, assent, descend, etc) and there have been at least one case where the pilot had it set to the wrong mode. The pilot tried to force it to rise while the plane thought it should be landing and the conflict ened in a stall and crash. That was attributed to pilot error. That's what I remember of the report, anyway.
If this is the one I remember, then this is neither the "most kill" category nor a software bug. The one I remember was during an airshow near Paris, where the Airbus flew over the runway into the woods and crashed. The plane was nearly empty, only two or three persons aboard, and the only ones killed. The software acted as designed; it was a true pilot error, in which the pilot tried to force the plane into a flying mode not useful nor safe for a passenger plane, and therefore 'forbidden' in the software, though the maneuver was possible, albeit risky and not recommended, on older Boeing planes. If I remember the details correctly, if the pilot would have let go of the controls, the plane would have corrected itself into a stable mode, without crash.
 

gadget_lover

Flashaholic
Joined
Oct 7, 2003
Messages
7,148
Location
Near Silicon Valley (too near)
The one I remember was during an airshow near Paris, where the Airbus flew over the runway into the woods and crashed.

The one I remember was in southern aisia, possibly Indonesia. Maybe I can find a reference....


An the SS7 phone outage, I found a reference to the 1991 outage in RFC 3439. ( http://rfc.net/rfc3439.html )

The PSTN's SS7 control network provides an interesting example of
what can go wrong with a tightly coupled complex system. Outages
such as the well publicized 1991 outage of AT&T's SS7 demonstrates
the phenomenon: the outage was caused by software bugs in the
switches' crash recovery code. In this case, one switch crashed due
to a hardware glitch. When this switch came back up, it (plus a
reasonably probable timing event) caused its neighbors to crash When
the neighboring switches came back up, they caused their neighbors to
crash, and so on.

The problem is that I looked at the reference for this paragraph and traced it to a Jan 1990 report of a failure. ( http://catless.ncl.ac.uk/Risks/9.62.html#subj2 ) I'm not sure that was the same outage.

It's been such a long time, but I do remember the extra testing of the SS7 software after those failures. The local phone company built a whole captive network of SS7 components including switching systems just for testing.

Good times.
 
Top