Thursday, November 21, 2013

The Gravy Plane, costly failures for air-traffic control


On October 1, 2013, the federal government's Web site, healthcare.gov, crashed and burned at launch--probably attracting more attention than any other government software failure. It provides public access to the federal health-care insurance exchange, a key element of Pres. Obama's health-care reform program. The government said the site attracted an average of two million visitors a day during its first four days, while only six managed to enroll in insurance plans the first day. [1] [2]

Nevertheless, that was just a Punch-and-Judy show compared with a horrible reign of errors lasting 32 years to date: repeated failures by the U.S. government to produce well integrated and reliable automation in support of air-traffic control. From a perspective of public service, those efforts have been a bipartisan disaster: the offspring of Reagan, Herbert Bush, Clinton, Walker Bush and Obama administrations. For two major government contractors, however, an air-traffic automation disaster became "The Gravy Plane"--a lush source of high-flying government spending--now around $6 billion and climbing. [3] [4] [5]

Tragegy of mismanagement
So far, this has been a tragedy in two acts. The first attempt by the Federal Aviation Administration (FAA) at improving automation for air-traffic control--Advanced Automation System (AAS)--collapsed after 13 years work and $3.7 billion spent, with little useful delivered. It began as a pie-in-the-sky concept ordered up during the first Reagan administration, in the wake of a 1981 strike by federal air-traffic controllers. At 7 am EDT on August 3 of that year, nearly 13,000 of 17,500 members of the former Professional Air Traffic Controllers Organization walked off the job. Over 11,000 failed to return and were fired two days later. [6] [7]

Former Pres. Reagan, a technology buff who bought into "Star Wars," expected that computers could supplant air-traffic controllers. After a long siege of planning and prototyping--at government expense--IBM's former Federal Systems Division won a large continuing contract over its chief competitor, the former Hughes Aircraft. However, some promises made for AAS proved beyond capabilities of the era's main technologies, and project management was flagrantly bungled by both FAA and Federal Systems. Eventually, during the first Clinton administration, the Federal Systems contract was ended and the 13-year AAS project cashiered. [8] [9]

The second Clinton administration released so-called "NextGen" air-traffic control plans, a long-term program for automation to be carried out in a sequence of stages. NextGen was to start with an effort less ambitious than AAS. In 1998, FAA began planning the En Route Automation Modernization (ERAM) project. During its first year, the first Walker Bush administration channeled the ERAM project to Lockheed Martin--a company with key contacts in that administration--under a sole-source, no-bid contract. [10]

During more than 12 years of implementation through 2013, ERAM has never worked well enough to be placed in full and regular service. [5] Although books have been published on problems of AAS, so far no comparable account has been published on problems of ERAM. Unlike AAS, but like healthcare.gov, ERAM may emerge from a morass of problems and may in time be marshalled into a full-service system. However, FAA had initially advertised full and regular ERAM service by 2007. In 2012 its inspector general found achieving the goal to be unlikely before 2016--making that project 18 years in planning plus implementation. [4] [11] [12]

Top dogs and watchdogs
The U.S. government was able to land men on the Moon in eight years, using technologies of the 1960s that were much less capable than those available 20 and 50 years later. Why would it take more than three decades to improve earth-based air-traffic control automation? Part of the answer looks to be manipulation of contracts, in a program far less visible than a Moon landing, by large companies that had little to lose when failing. In neglecting to plan and manage programs intelligently, monitor work rigorously and prepare alternatives, a rigid and foolish agency allowed those companies to function as monopolies--leaving the government with no other ready sources of services.

The government has several watchdogs expected to uncover mismanagement. They include the General Accountability Office, the Office of Management and Budget, the Congressional Budget Office and, since 1973, the inspectors general assigned to cabinet departments and later to agencies. The armed services have had inspectors general since the Revolutionary War, but the Department of Agriculture was the first civilian agency to utilize one. However, while all the watchdogs exhibit financial skills, few have management skills and none have technical design skills or competence with automation technologies. They live up to their reputations as "bean counters" and often fail to uncover mismanagement, if only because they just don't understand what's going on.

Software and turkeys
Computer software has sometimes been considered a field of engineering. However, unlike development of designs in other engineering fields, with software the final design is the product. There is no physical product to inspect and measure. Moreover, ultimate software designs--"code"--are often thousands of times more complex than other final engineering designs. It took about 30 years for the critical lessons about how to cope with these challenges to emerge within the software professions. So far those lessons have rarely penetrated the heads of government watchdogs. They visit software shops as foreigners, literally not speaking the languages. Thus, back in the shop, bean counters have long been called "turkeys"--not the brightest birds.

The inspector general responsible for FAA in 2005 displayed an exemplary turkey profile. Here the bean counter was foolishly preoccupied with "lines of code." The report revealed that Lockheed Martin was reusing code from legacy systems. It must somehow be interfacing new Ada code with ancient code in Jovial. It must never have performed a first-principles analysis of communication design. Both are major barriers to success, yet the bean counter was busy counting "lines of code." The bean counter did worry over the cost of finding software expertise with Ada although not with Jovial. Both emerged from military environments, with neither attracting much other use. Strong skill with Jovial today would be comparable to fluency in ancient Phoenician, while Ada might be compared with Old High German. [11] [12]

The bean counter's main sally was to recommend a "value-engineering analysis" to see if fewer new computers could be bought than one per long-range ("en route") control center. Even a bean counter might be able to compare at least $2 billion going for software with perhaps $20 million for computers and see that potential savings on computers were not going to be much of a benefit. That was where the turkey profile sharpened, but the bean counter had already given the game away with the title of the report: "FAA's En Route Modernization program is on schedule." Later events showed the project was effectively years behind schedule by that point. The bean counter had either been enlisted or been hoodwinked.

Air-traffic control automation
U.S. air-traffic control is a distributed system, currently with 22 long-range control centers, 164 regional-range centers and a local-range center for each large and medium-size airport. These are known, respectively, as "air route traffic control" or "en route" centers, "terminal radar approach control" or "Tracon" centers, and "airport traffic control towers" or just "towers." FAA oversees all the facilities, and it operates the en route centers, the Tracon centers and some of the towers, known as the "federal control towers." [13] [14] [15]

Starting in the 1960s, air-traffic control automation was developed by FAA personnel to assist the air-traffic controllers. It has undergone upgrades and computer replacements, but there has been no change to its basic structure since the 1960s. Like military backgrounds from which many FAA personnel came, but also like computers typical of the era in which it originated, the structure is hierarchical. A central coordinating computer communicates with "en route" center computers, which communicate with Tracon center computers, which communicate with radars serving their regions. Air traffic controllers coordinate handovers of flights between Tracons and towers by voice. Officially called the National Airspace System, the software is informally known as Host--an emblem of the era in which it originated.

The Host and its keepers
In the beginning was the Host. The first model of distributed computing, emerging from the regimented 1950s, was God-fearing and hierarchical. At the center was a "host computer" that coordinated activities. It took many years for software developers to recognize and overcome liabilities of the model: its susceptibility to overloading communications channels and its vulnerability to failure of host computers. If the Internet were to depend on a hierarchical organization with host computers, it would frequently fail. Instead, it was designed in the late 1980s using a robust model with distributed, independent coordination. FAA. however, continues to operate U.S. air-traffic control on the back of first-generation automation, scrambling to keep a brittle, creaky system working.

Between 1969 and 1977, Host was mounted on IBM model 9020 computers--a specialized version of the System 360, model 65, introduced in 1966. It was programmed by FAA personnel in Jovial--an early, high-level language derived from IAL, later called Algol-58--and in IBM 360 machine code. In 1982, amid fears that equipment would become unmaintainable, FAA began plans to renovate the system. A key problem was software, written partly in a little known, obsolete high-level language and partly in machine code for an obsolete computer. Because of a sense of urgency, that effort became independent of AAS and was treated as a short-term measure to sustain operations. In 1985, IBM was awarded a contract. IBM replaced the original computers with model 3083, rewrote machine code for the new computers and adapted Jovial language support to run on them. [16]

After termination of the AAS project, IBM sold its Federal Systems division to Loral, which later sold it to Lockheed Martin. As an arm of Lockheed Martin, the former Federal Systems division continued to do business with the federal government, maintaining air-traffic control under Host for FAA. Meanwhile, both Clinton administrations tried to sweep cobwebs out of FAA and to develop a sustainable architecture for air-traffic control automation. Yet a 1997 report from the (then) Government Accounting Office still complained, "FAA lacks a system architecture." Yes, indeed. That was more honking from bean counters and turkeys. Few people like those who wrote the legacy Host software, starting in the 1960s, would have recognized the term "architecture" as meaningful in their work. Like the builders of Roman roads, they became skillful at using the resources available to them to do the jobs they were assigned. By 1997, however, FAA had produced an architecture to guide future air-traffic control--NextGen--and was about to plan its first stage: ERAM. [17]

Unfortunately, time ran out. No contract for ERAM was awarded before the second Clinton administration expired. When the opportunity to let a contract for ERAM fell into the lap of the new Walker Bush administration, Lockheed Martin turned up in first post position. Norman Mineta, the new secretary of Transportation, had been senior vice president and managing director at Lockheed Martin. His deputy, Michael Jackson, had been vice president and general manager at Lockheed Martin IMS, Transportation Systems and Services. Lynne Cheney, wife of Vice President Dick Cheney, had been a Lockheed Martin director until shortly before Mr. Cheney took office. The fix was in; the new air-traffic control architecture was mostly out. [10]

Chiseling a contract
Lockheed Martin lacked practical interests in NextGen architecture except as an advertisement. To the contrary, its interests lay in preserving cranky and poorly documented Host software, for which the government had paid the company to develop unique expertise. Other organizations would be unable to compete with Lockheed Martin at projects based on Host, because they would lack the gradually acquired, hands-on expertise. The Walker Bush administration negotiated a sole-source, no-bid ERAM contract with Lockheed Martin in 2001. While it appeared even-handed, with fixed-price deliverables and performance incentives, it had an escape clause for Lockheed Martin: get the government to "accept" ERAM and then subsequent work is billed at cost-plus. [4] [18] [19]

During the Walker Bush administrations, FAA failed to require, and of course Lockheed Martin did not perform, a first-principles analysis of communication design--to find an optimum approach taking best advantage of decades of progress since Host was produced. Instead, Lockheed Martin grafted ERAM onto the legacy Host software, committing the government to indefinite support of a marginally reliable antique, housed inside a new shell. The company pushed the envelope to deliver ERAM early, promising to deploy it to the FAA "en route" control centers starting in 2005. [20]

With Lockheed Martin at work, ERAM began reproducing vulnerabilities of legacy air-traffic control. Although the company accumulated a substantial number of problem reports, nevertheless in October, 2007, FAA signed off on government acceptance. The acceptance was based on bench testing only, at the FAA Technical Center, without using ERAM in a field setting, much less using it for air-traffic control. Qualifying for "early delivery" got the company out of jail financially and justified a large "performance incentive" fee. Through 2011, the ERAM project paid Lockheed Martin over $150 million in incentive fees. [21] [22]

Truth or consequences
With the start of the first Obama administration, Lockheed Martin lost its key allies at FAA, and FAA got a new administrator in J. Randolph "Randy" Babbitt, a former commercial airline pilot and former head of the Air Line Pilots Association. Mr. Babbitt's background prepared him to take on chronic morale problems at the agency but left him unequipped to deal with decades of mismanaged automation technology. He made the mistake of assuming, just because the FAA Technology Center said ERAM was OK, that it must be. In the spring of 2009, he authorized the "en route" center in Salt Lake City to test ERAM with live air traffic--never having exercized the software by using it only to provide a backup. [23]

Once again, a government factotum ignored a well known rule taken to heart by professional software developers for decades: "What you haven't tested doesn't work." It didn't. Luck was with Mr. Babbitt, the crews and the passengers: the first three system crashes in the wee hours of weekend mornings attracted mainly protests from air-traffic controllers who could see, first hand, the dangers that were caused. They, in turn, got the ear of Utah senators, who forced a delay in further testing. It didn't help much; the bugs in ERAM were too deeply embedded to be readily detected or corrected. The next live test got more attention. At 5 am ET on November 19, FAA systems crashed so badly that air-traffic controllers had to send information between centers by Fax and hold up flights that were not already airborne. Delays rippled nationwide for most of the day. [24] [25]

The highly public consequences of the November, 2009, ERAM crash led to a stop-and-go pattern of further tests and repairs persisting through 2013, four years later, and probably for at least a few more years. The most congested regional centers--New York, Washington, Atlanta and Miami--still can't use ERAM, and seven other centers are able to use it only in low-traffic conditions, mostly at night and on weekends. Most deployment for the NextGen program remains on hold until ERAM becomes a full-service system. When and if it does, it will still lack the architectural integrity that was intended for NextGen in the mid-1990s and therefore lack the long-term reliability and extensibilty that should have accompanied a sound program. [5] [26]



[1] Tim Mullaney, Obama adviser says demand overwhelmed healthcare.gov, USA Today, October 6, 2013, at http://www.usatoday.com/story/news/nation/2013/10/05/health-care-website-repairs/2927597/

[2] Susan Cornwell and David Morgan, Documents show enrollment in Obamacare very small in first days, Reuters, October 31, 2013, at http://www.reuters.com/article/2013/11/01/us-usa-healthcare-surge-idUSBRE99U16R20131101

[3] Robert L. Glass, ed., Software Runaways: Monumental Software Disasters, Prentice-Hall, 1998, p. 71

[4] Jeffrey B. Guzzetti, Weaknesses in program and contract management contribute to ERAM delays and put other NextGen initiatives at risk, USDOT Report No. AV-2012-179, September 13, 2012, at http://www.oig.dot.gov/sites/dot/files/ERAM%20Final%20Report%5E9-13-12.pdf

[5] Jeffrey B. Guzzetti, FAA has made progress fielding ERAM, but critical work on complex sites and key capabilities remains, USDOT Report No. AV-2013-119, August 15, 2013, at http://www.oig.dot.gov/sites/dot/files/DOT%20OIG%20ERAM%20Report%20508.pdf

[6] Rebecca Pels, The pressures of PATCO: Strikes and stress in the 1980s, Essays in History 37 (University of Virginia), 1995, at http://www.essaysinhistory.com/articles/2012/121

[7] Willis J. Nordlund, Silent Skies: The Air Traffic Controllers Strike. Praeger, 1998

[8] Mark Lewyn, Flying in place: The FAA's air control fiasco, Business Week, April 25, 1993, at http://www.businessweek.com/stories/1993-04-25/flying-in-place-the-faas-air-control-fiasco

[9] Robert Britcher, The Limits of Software: People, Projects and Perspective, Addison-Wesley, 1999

[10] Matthew L. Wald, FAA to skip bids on air traffic system, New York Times, March 7, 2001, at http://www.nytimes.com/2001/03/06/business/faa-to-skip-bids-on-traffic-system.html

[11] David A. Dobbs, FAA's En Route Modernization program is on schedule, USDOT Report No. AV-2005-066. June 29, 2005, at http://www.oig.dot.gov/sites/dot/files/pdfdocs/av2005066.pdf

[12] Mike Paglione, Metrics-based approach for evaluating air-traffic control automation of the future, Federal Aviation Administration, 2006, at http://acgsc.org/Meetings/Meeting_99/SubcommitteC/8.1subc.ppt

[13] Federal Aviation Administration, Air Route Traffic Control Centers, 2013, at http://www.faa.gov/about/office_org/headquarters_offices/ato/artcc/

[14] Federal Aviation Administration, Terminal Radar Approach Control Facilities, 2013, at http://www.faa.gov/about/office_org/headquarters_offices/ato/tracon/

[15] Federal Aviation Administration, Airport Traffic Control Towers, 2013, at http://www.faa.gov/about/office_org/headquarters_offices/ato/atct/

[16] John Andelin, Review of FAA's 1982 National Airspace System plan, Office of Technology Assessment, 1982, available at http://www.princeton.edu/~ota/disk3/1982/8222/8222.PDF

[17] Randolph C. Hite, Complete and enforced architecture needed for FAA, General Accounting Office, February 3, 1997, at http://www.gpo.gov/fdsys/pkg/GAOREPORTS-AIMD-97-30/html/GAOREPORTS-AIMD-97-30.htm

[18] Calvin L. Scovel, III, Challenges in meeting FAA's long-term goals for the Next Generation air transportation system, April 21, 2010, Testimony to Subcommittee on Aviation, House Committee on Transportation and Infrastructure, at http://www.oig.dot.gov/sites/dot/files/WEB%20FILE_NextGen%20Testimony.pdf

[19] Anthony N. Palladino, Protest of Raytheon Company, FAA docket no. 01-ODRA-00180, June 15, 2001, available at http://www.pubklaw.com/rd/other/01-ODRA-00180.pdf

[20] Unattributed, Lockheed reports good early progress on en route projects, World Aviation (Beijing, China), August 16, 2004, at http://www.huisi02.25u.com/ac/ac0400/ac0403_content.asp?id=618

[21] James K. Reagan, Independent assessment of the ERAM program, MITRE Lincoln Laboratory, October, 2010, available at http://assets.fiercemarkets.com/public/sites/govit/mitreindependentassessment_eram.pdf

[22] John Sheridan, FAA remains quiet on ERAM budget overruns and delays, Aviation International News, December, 2011, at http://www.ainonline.com/aviation-news/aviation-international-news/2011-12-02/faa-remains-quiet-eram-budget-overruns-delays

[23] Sholnn Freeman, FAA asked to do more to fix morale, Washington Post, December 1, 2009, at http://www.washingtonpost.com/wp-dyn/content/article/2009/11/30/AR2009113004066.html

[24] Joan Lowy, Associated Press, Utah lawmakers say air traffic computer not ready, Salt Lake Tribune, June 13, 2009, at http://www.sltrib.com/news/ci_12584509

[25] Matthew L. Wald, Backlog of flight delays after computer problems, New York Times, November 20, 2009, at http://www.nytimes.com/2009/11/20/us/20air.html

[26] John Sheridan, ERAM development is reminiscent of failed AAS program, Aviation International News, October, 2013, at http://www.ainonline.com/aviation-news/aviation-international-news/2013-10-02/eram-development-reminiscent-failed-aas-program

No comments:

Post a Comment