LAX Meltdown Caused By A Single Network Interface Card

According to the LA Times, the LAX computer meltdown that stranded 20,000 international passengers was the work of a single malfunctioning network interface card on a single desktop computer in the LAX international terminal. From the LA Times:

The card, which allows computers to connect to a local area network, experienced a partial failure that started about 12:50 p.m. Saturday, slowing down the system, said Jennifer Connors, a chief in the office of field operations for the Customs and Border Protection agency.

As data overloaded the system, a domino effect occurred with other computer network cards, eventually causing a total system failure a little after 2 p.m., Connors said.

“All indications are there was no hacking, no tampering, no terrorist link, nothing like that,” she said. “It was an internal problem” contained to the Los Angeles International Airport system.

LAX outage is blamed on a single computer [LA Times]
(Photo:Kenny Miller)

PREVIOUSLY: 20,000+ International Passengers Stranded At LAX

Comments

Edit Your Comment

  1. ChrisC1234 says:

    “As data overloaded the system, a domino effect occurred with other computer network cards, eventually causing a total system failure a little after 2 p.m.”

    WOW… what a load of CRAP.

    Also, anything stranding 20,000 passengers is NOT a “partial” failure.

  2. TheBigLewinski says:

    They must have a “Third Grader” doing their local area networking. Sorry for the disperaging remark against a nie year old, they are smarter than that….

  3. rmz says:

    Single points of failure are pretty great huh guys

  4. C2D says:

    I find it hard to believe an airport as large and busy as LAX has/had NO redundancy in their network what so ever. Not to mention a single uplink card like this failing without any IT/IS staff noticing before it failed completely.

  5. misterfancypants says:

    Is this the technological version of finding a fall guy?

  6. topgun says:

    I smell a Geek Squad story coming up.

  7. TWinter says:

    Strange. You would think an important system like this would have some sort of redundancy or backup capacity built in.

    Of course, this is American where we don’t invest in infrastructure, give contracts to the cheapest bidder, and don’t worry about fixing things until they are very very broken. So I guess I shouldn’t be too surprised.

  8. CumaeanSibyl says:

    Isn’t this what error messages are for?

    I find it hard to believe that they somehow managed to set up a system with no notifications for hardware failure. Christ, even the crappiest versions of Windows will throw up a dialog box when something’s not working right.

    (Linux, in my experience, will go into kernel panic and shit itself. Maybe they were running Debian.)

  9. Nemesis_Enforcer says:

    Ahh you see this is the highest quality the lowest bidder could provide. Think about that next time you use anything mildly dangerous. Plus well LAX Sucketh the big one. I will always try to fly out of Burbank even if it costs more than go thru LAX.

  10. homerjay says:

    It was probably a Linksys… Stupid Linksys….

  11. dieselbug says:

    Seems like LAX is still using Token-Ring for their network . . . . . I’ve seen TR cards fail and generate Ring Purges that *kill* traffic on the segment for 3-5 minutes at a time.
    I doubt this could have happened with an Ethernet setup.

  12. FLConsumer says:

    I’m with the others who are shocked (and horrified) that a single bad NIC can bring down an airport. That sort of thing may be fine for a small business, but for anything close to critical should have some redundancy.

    Cumaeansibyl: I”ll give them SOME (maybe 0.00001″ inch) of credit on this one — Windows only pops up error messages for a total lack of connectivity and when an IP can’t be resolved. If you have a truly malfunctioning NIC, Windows might see it as working properly. Living in lightning country here in Florida, I’ve seen countless NIC & networking failures and seen some odd partial-failures where everything appears right but not properly passing packets.

    Fortunately, this wasn’t malicious, just incompetence, but while just sitting here, I’ve thought up of a few ways which someone could attack such a network for their own gain or mayhem. Worse, you could cause an airport to “ground” flights or run everyone back through security, then let a bomb go in the screening area queue. Not good.

    Kinda sad that this happened with US Customs. I expected better from them. If this said TSA, sadly, I don’t think anyone would have been surprised.

  13. FLConsumer says:

    @dieselbug: Come play with some Florida lightning or a serious case of static electricity. Then you’ll see it happen with Ethernet.

    Not “corporate” LAN related, but if you really want to cause some major outages, hook up a cheap 27MHz transmitter to your CATV line. That’s the reverse channel for many cable modems. Let’s just say you’ll end up pissing off a lot of people.

  14. dieselbug says:

    @FLConsumer:
    My point was that it’s a known issue with TR that a failed NIC on ANY device in the segment can pull the segment down for extended periods of time. This is one of the reasons (apart from the ridiculous price) that T-R has been replaced by Ethernet as a “standard”. The industry has may legacy technologies in use (don’t ask about ATCs – you’ll never fly if you knew how old some of their technology is. . . )

  15. pinkbunnyslippers says:

    This is bullsh*t. There’s no way that a major airport hub like LAX allows for one single point of failure somewhere within it’s network, and has no redundancy built in. I think this is a PR scumbag trying to make it sound a lot less worse than it is.

  16. InThrees says:

    What I got from the article was that the NIC experienced more of a malfunction-fail than a pure fail, flooding the network or something.

    Sort of as if some worker had “CLICKED HERE FOR THE SCREENSAVER!!!111″ and gotten a nasty infection.

  17. Greasy Thumb Guzik says:

    If true, they could have bought a replacement NIC at a local Fry’s for 99¢ after rebate!

  18. Peeved Guy says:

    I didn’t RFA, but I could see this happening if the “computer” they were referring to was part of the infrastructure (i.e. a router or a switch). If a blade of a router “partially failed” and started a broadcast storm, that could easily bring a network to its knees. It might be unlikely, but its not impossible.

  19. Malethos says:

    It seems most likely that this was a Token Ring Network — they are semi infamous for this. OTOH, even normal ethernet can have similar issues, but, it should be mitigated if not eliminated by a switch — a NIC can flood a hub( if its cross talk detection fails for example), and easily keep it flodded, but a properly configured switch should confine this issue to only that network segment.

    The bad NIC may have been what caused it, but why it happened was inadequate/ improper setup on the network.( or possibly a well designed, but already damaged/degraded network( old hw/ hw needing work) being driven over the edge).
    Additionally any even meso competent network person should be able to fix this fairly quickly.

    My bet is that its an older token ring network — failures on those can be much more of a pain to fix. ( This of course leads us to the question of why do it that way)

  20. Malethos says:

    @Peeved Guy:

    However, even if a blade goes bad its easy to isolate, and replace once you notice the storm. ( assuming you have spares on site — but I’d assume that this is a mission critical system so they’d better have them)

  21. bonzombiekitty says:

    My first thought was also “Token ring network”. Followed shortly by “They’re on a token ring network?!!!!”

    If the network was actually ethernet, whoever set it up needs to be sued. An ethernet network should not fail like that. Hubs, switches, etc, should be keeping flooding from occurring (to the point that a single network card is bringing down an entire airport network). It was a NIC on a desktop computer, which indicates that it was a leaf on the network. If it had the ability to bring down the network, that’s design so crappy, I can’t even comprehend.

    I’m not sure what’s better. The possibility the network is token ring, or that an ethernet network was set up so poorly.

  22. Jaysyn was banned for: http://consumerist.com/5032912/the-subprime-meltdown-will-be-nothing-compared-to-the-prime-meltdown#c7042646 says:

    @dieselbug:

    @bonzombiekitty:

    I was thinking the same thing thing. I was thinking, “How did they get ethernet to fail so spectacularly?” I do know some municipal networks around here are still on tokenring (i.e. if it ain’t broke, don’t fix it), so I guess it’s still in use. God knows why.

  23. Peeved Guy says:

    @Malethos: Aha! You know what happens when you ass|u|me…

    My little off the cuff hypothetical diagnosis also ignored the fact that, as others had mentioned, any network engineer worth their salt would have take router failure into account and had redundant circuits out the wazoo. If they were too shortsighted to build a highly available network, they were probable too shortsighted to have hardware stockpiled too.

    Just my 2 cents. I’m probably WAY wrong.

  24. pestie says:

    @CumaeanSibyl: (Linux, in my experience, will go into kernel panic and shit itself. Maybe they were running Debian.)

    Nice troll!

  25. gibsonic says:

    it sounds like a chatty NIC making broadcast storms on a flat ethernet network with no VLAN’ing to segment the end user desktops from the core switches and servers.

    The wifi networks at the airports are completely isolated from the airport/airline functions. They probably have 1000 times better hardware and configuration for just the casual internet surfer waiting on their plane.

    this is not good news. hopefully/luckily this incident was an accident and unintentional.

    However any 9th grader(or terrorist) these days could use a flood script on an exposed network jack in the airport to do the same thing…

    the airport has a large data network but it probably isn’t as large and complex as one might think. I guessing about 1000 endpoints total for just the airport/airline internal systems.

  26. Peeved Guy says:

    @gibsonic: One would still assume, however (I know, see my previous ass|u|me post) that the network that the travelers use would be completely segregated from the mission critical networks that they airport requires to operate (ticketing, flight data, etc).

  27. Erik_the_Awful says:

    Even if the network was token ring, you just don’t put your desktops on the same ring as your servers and router(s) to the outside world. That would be really really stupid.

    What ever network technology they are using, there’s just no excuse for this.

  28. gibsonic says:

    @Peeved Guy:

    the reason i say they are separate is b/c i know they are based on being in the industry.

    the airports/airlines are using ancient systems that have been around for decades.

    3rd party companies such as t-mobile and their partners ran all new infrastructure cabling in the airports they service with their own firewalls, routers, servers, etc. connected to their own circuits from the CO.

    parallel systems in the same building.

  29. mkrigsman says:

    The failure itself was a relatively common incident. What’s really interesting is the set of management failures that led to this failure. I’ve blogged about this over at ZDNet:

    [blogs.zdnet.com]

    Michael Krigsman
    [projectfailures.com]

  30. skapunk84 says:

    I hope their IT/Network guy or gal is scanning the Help Wanted section.

  31. drjayphd says:

    @topgun: Only if they ordered the NIC from the Chinese Poison Train (sm).

  32. CumaeanSibyl says:

    @pestie: Yeah, but nobody went for it! What gives?

    Actually, I speak from personal experience. I have a laptop with a faulty hard-drive connector that occasionally causes errors in Windows. When I ran Debian, the damn thing crashed at least once a day. Windows just seems to be more forgiving of that kind of thing.

    I do wonder what software they’re using. It might be some godawful decades-old proprietary thing that they paid the programmer for in beer and Cheetos.

  33. Chairman-Meow says:

    @bonzombiekitty:

    Its called a broadcast storm. Sounds like LAX still has some old hubs in their network.

    This time its not TSA’s fault, its the dumb-asses at LAX who manage their IT infrastructure. Geeze, hubs went out 10 years ago with the introduction of switch-based networking.

    It amazes me when stupid people running multi billion dollar corporations and have ONE-SINGLE-PC take down an entire system!!!

  34. FLConsumer says:

    It’s US Customs, not FAA/TSA/NTSB. They’re running Ethernet. The problem here is how their network was designed with no redundancy.

    I wonder just how dated the ATC equipment is compared to other gov’t agencies. The last time I checked, NOAA was looking for 8088′s to handle their upper air soundings. The WSR88D’s (doppler radars) still run on Fortran. Nothing wrong with the old stuff as long as you’ve designed it well.