An Analysis of the Patriot Missile System
The Patriot Missile System is an example of how a
software project can change over time. It was proposed as a way to protect
against Soviet planes, then soviet missiles, and twenty years later, was used
against Iraqi missiles. The system was meant to be temporary, but was used in
permanent settlements. If a system can be patched to perfection, then the
Patriot was an attempt to prove this point.
The Patriot Missile project began in the late sixties as a portable system meant to protect local airspace against intrusion by enemy aircraft. One of the selling points of the project was that although it would not be programmed to shoot down missiles, the system could later be altered to serve as an Anti-Ballistic Missile System. The system was first tested against a drone plane in 1974, and later, in 1986, the system’s developers at Raytheon Labs modified it to be used as a portable, short-term defense against Soviet Missiles and Aircraft. [1]
The system was first deployed in 1990, and shot down it’s
first Ballistic Missile in January of 1991. The ballistic missile was an
Al-Hussein, more commonly known as a SCUD missile. The patriot missile was
never designed to handle Scuds, which have an estimated maximum speed that is nearly
twice that of the soviet missiles for which the system was designed. This, as
well, as several other inconsistencies between the system requirements, and the
usage practices should lead a responsible programmer to ask if his or her
software will be used for purposes not stated in the requirements, and if so,
then what are the unwritten requirements for the system?
In mid-February of 1991, Israeli troops had discovered a
defect in the patriot missile system. They discovered that if the system runs
for long periods of time, then it becomes inaccurate. They also estimated that
after twenty hours or operation, the system would become too inaccurate to
successfully target, track, and hit a Ballistic Missile. The U.S. military
denied the significance of the discovery, stating that the system is meant to
be portable and provide a short-term defense against missiles: that nobody
would ever run the system for more twenty hours at a time.
On February 16th, a Bug Fix was released, but could not
immediately reach all units because of wartime difficulties in transportation.
So, the Army released a memo on February 21st, stating that the system was not
to be run for “very long times.” The military did not specify how long a “very
long run time” would be.
On February 25th, 1991, a Patriot Missile system that had
been running for over 100 hours at Dhahran, Saudi Arabia had failed to
intercept a SCUD missile. The SCUD hit an Army Barracks, killing 28 Americans.
On the next day, the Bug Fix for the system arrived at Dhahran[2].
The reason this bug occurred is because of a problem with
storing time in a 24-bit register. The problem is that time is stored to an
accuracy of 1/10th of a second, but a 24-bit register does not have enough
precision to store 0.1, so a small fraction of each second is lost. The result
is that the register used to keep track of time is off by 0.0001% of the amount
of time that the system has been in operation. The problem is that computers do
not store information as a standard decimal. Instead, they use binary code,
which can not accurately store 1/10th of a second.
Figure
1*

Step 1
nTimer represents the amount of time the missile has been in operation.
Time represents the current time.
Step 2
1/16th of a secondą is added to nTimer as each 1/10th of a second passes.
Step 3
When an enemy missile is spotted, the current time is converted to the format of nTimer.
The Converted Time from Step 3 is used to calculate the upcoming position of the enemy missile.
Step 5
The Patriot Missile is aimed.
Step 6
When nTimer and the Converted Time are equal, the Patriot is fired.
But the REAL problem was not with inaccuracy, but with
inconsistency. During one of the updates, Raytheon Labs, the developer of the
patriot missile, had fixed the previously mentioned inaccuracy problem by
creating code that used a pair of 24 bit registers to accurately make the time
calculations. The problem was that that most, but not all of the time
calculations made by the system were replaced by calls to the newer, more
accurate function. So, the system was keeping track of the current time using a
function that loses time in much the same way that a clock with a weak battery
will gradually lose time. But, the system would track missiles, aim itself, and
decide exactly when to launch it's own missiles using the internal clock, which
was accurate. In effect, the system would use an accurate timepiece to decide
where the missile is located and how fast it is moving, and when to fire the
defensive missiles. But while waiting to fire the missiles, the system would
use the less accurate clock to determine when it should fire. It was estimated
that after running the system for twenty hours, the calculations made using the
old algorithm and those made by the new algorithm differed by as much as 1/3 of
a second[3].
A SCUD missile can travel more than one mile per second.
Had the same piece of code been used for all time
conversions, then the inaccuracy of the Patriot Missile would not have
increased over time the way it did in this case. Instead, every time
calculation would be off by approximately 0.000001 seconds, and the system
would be much more likely to have defended against any missiles launched at it.
This is a good argument for reuse of code whenever possible. Although the
developers at Raytheon Labs had tried to replace all time conversions with
calls to the new function, they missed a few and the result was a system that
was less reliable than it would have been if they had chosen to ignore the
conversion error.
Part of the reason this error was not found sooner was
that the program was written in assembly language, 15 years earlier. Over time,
it was patched and new things were added. In short, because the system was
written in assembly code, it was difficult to understand and maintain. And
because the system was fifteen years old and had been patched several times,
the very people who had written the code were not as familiar with the code as
they would be if it were written more recently. Then, during the gulf war, the
system had to be modified to handle the SCUD missiles, and time was a critical
factor. The developers could have been influenced by the fact that prolonged
testing could have caused a disaster by keeping a necessary system out of the
hands of soldiers in a time of war.
The Software Engineering Code of Ethics And Professional
Practice states that a responsible software engineer should "Approve
software only if they have well-founded belief that it is safe, meets
specifications, passes appropriate tests..." (sub-principle 1.03) and
"Ensure adequate testing, debugging, and review of software...on which
they work." (sub-principle 3.10). Unfortunately, defects did make their
way into the system.
Perhaps one of the lessons to be learned from this case
is to write code to be easily maintainable, and to acknowledge the difficulties
that may be inherent in the maintenance of the code. For example, the Patriot
Missile system was altered in 1986, to be capable of tracking missiles as well
as aircraft. During that project, the developers had time to re-code the system
in a high level language. If they had done so, then the patches that were
required during the gulf war would have been easier and less prone to defects.
But, more importantly, by spending extra time to improve the system during
times of peace, the designers would have decreased the amount of time needed to
update the system during times of war, when it would be needed most.
The software engineering code of ethics also states that
a responsible software engineer should "Treat all forms of software maintenance
with the same professionalism as new development." The Patriot Missile
System is a good example of how a small change can break an existing program.
Raytheon Labs should not have been patching and re-patching this code. For a
safety critical project, the developer must be familiar with the code with
which he is working.
But ethically speaking, the people at Raytheon Labs had
some tough decisions to make. How much testing do you perform, when the tests
require the destruction of functioning missiles and aircraft? It would be easy
for most of us to say that you perform as much as the system requires. You do
not stop until you are one hundred percent sure. But if that means re-enacting
a twenty-hour battle, using real aircraft and missiles, then it becomes a more
difficult decision. And this was the situation faced by the crew of Raytheon
labs. During the initial testing, a very long and expensive battle was
re-enacted. Had they re-run this test every time the system was patched, then
the problem would not have occurred, but the decision to do so would have been
a very difficult one to make, and an even more difficult one to justify.
One cannot say that Raytheon was blameworthy because it cannot
be said that Raytheon was guilty of negligence or malpractice. They were
responsible in a causal sense because they introduced the bug in the system,
but the details show that the problem with the system was not necessarily the
developers, but that the system was modified often in inconsistent ways
Copyright 2002 Tom Morgan and Jason Roberts.
This case may be published without permission and at no cost as long as it carries the copyright notice.
[1] Team Redstone Patriot Missile System Chronology
http://www.redstone.army.mil/history/systems/PATRIOT.html
[2] General Accounting Office Report Number B-247094
http://www.fas.org/spp/starwars/gao/im92026.htm
* The numbers used in this table are not meant to represent the exact computations used in the patriot missile system. They are intended to demonstrate the concept of why the patriot system failed.
[3] Robert Skeel “Roundoff Error and the Patriot Missile.
http://www.siam.org/siamnews/general/patriot.htm