The Real Reason Behind The $23 Quadrillion Errors
The secret of the $23 quadrillion VISA debit errors looks like a specific and not uncommon programming error. Take the insanely large number, if you convert 2314885530818450000 to hexadecimal, you end up with 20 20 20 20 20 20 12 50. In programming, hex20 is a space. Where a binary zero should have been, there were spaces instead. What made this instance special is that it wasn't caught in time. A Slashdot commenter identifying himself as working in the industry explains more about what very likely happened:
The only novelty here is that the error got into production, and was not caught and corrected before it went that far.Submitters send files to processors which are supposed to be formatted according to specifications.
Note I wrote 'supposed to be'.
Some submitters do, from time to time, change their code, and sometimes they get it wrong. For instance padding a field with spaces instead of zeros. Woopsie...!
Seems that's what happened here. Sounds like a hex or dec field got padded with hex 20, and boom.
This is annoying, especially when the processor gets to help correct the overwhelming number of errors, and then tries to explain that it wasn't their fault. Plenty of blame to go around with this one.
And then explains why they don't both validate/sanitize input, and test for at least some reasonable maximum value in the transaction amount. A max amount of $10,000,000 would have fixed this. That and an obvious lapse in testing. This is what keeps my bosses awake sometimes, fearing they will end up on the front page of the fishwrap looking stupid 'cause their overworked minions screwed something up, or didn't check, or didn't test very well. I love one of the guys we have testing. He's insufferable, and he catches genuine show-stoppers on a regular basis. They can't pay him what he's been worth, literally $millions, just in avoiding downtime and re-working code that went too far down the wrong path.
Believe me, this is in some ways preferable to getting files with one byte wrong that doesn't show up for a month, or sending the wrong data format (hex instead of packed binary or EBCDIC, for instance) and crashing the process completely. Please, I know data should never IPL a system. Tell it to the architects, please. As if they don't know now, after the one crash...
If you knew what I know, you'd chuckle and share this story with some of your buddies in development and certification.
And pray a little.
At least it didn't overbill the cardholders by $.08/transaction. That would suck. This is easy by comparison. Just fix the report data. Piece of cake. Evening's worth of coding and slam it out in off-peak time. Hahahahaha!
Nothing to see here, keep moving along please... [Slashdot] (Photo: Ballistik Coffee Boy) (Thanks to Toland!)
PREVIOUSLY:
The $23 Quadrillion Meal
The $23 Quadrillion Pack Of Cigarettes
Unruly Teen Charges $23 Quadrillion At Drugstore
Post a comment
Comments:
@umbriago: Its only written in a second language for common people. Sadly I couldnt pass off being able to use that as my second language in high school or college.
fing Spanish costing me deans list for a semester -_-.
Yes, I actually understood that. And it's true, at least this error was large enough to make the parties involved pay attention and fix the error. If it was something small like the $0.08 example they probably would have ignored it for a while then finally changed it, but probably wouldn't have refunded any money.
I call shenanigans. No-one that knows what they're talking about posts on slashdot. It's the ultimate 'cult of the techno-moronic'. Hell, I'd trust an answer from a Best Buy salesman before the /. crowd.
(I say this as someone who has dealt with quite a few of the stories on /. before they made it on there, and read the comments posted)
@ktetch: Well I'm not sure what part you're disputing, hex20 really is a space. You ever seen %20 in a URL? That's just a fancy space.
@ktetch: You have an excellent point. As a matter of fact, I caution against believing anything you read on the Consumerist; it's the ultimate 'cult of the clueless freeloading consumer'. I'd trust a BoA press release before the Consumerist crowd.
(I say this as someone who has dealt with quite a few of the stories on the Consumerist, and read the comments posted.)
"Please, I know data should never IPL a system."
Wow... That takes me back a few years, Rick. Been a long time since I've been on a system where a program crash require a full restart of the box.
The only problem is that eventually the data has to be validated SOMEWHERE. This kind of approach only means you're going to pass the buck, and let someone else risk not catching an 0c7 abend, or worse, introduce the error in your own code and not catch it.
But I know what you're talking about, though. I worked in the telecom industry for several years. And let me tell you, when I pick up a phone, I get a dial-tone. Every time! It's absolutely amazing given some of the practices I've seen there.
Of course, if you took just a fraction of a cent off each transaction and put it in your own account, ... you could eventually end up on a tropical island threatening the wait staff with burning the place down.
@Anathema777: Though it's quite possible they don't/didn't have to call at all--that the fees would be taken off the next day or so when Visa sorted out the error. This might be a situation where fortune favors the less responsive.
Probably a lazy programmer. I see it when people do 'MOVE SPACES to [01-level-work-field]'. If they are using COBOL the proper way is 'INITIALIZE [01-level-work-field] REPLACING ALPHABETIC WITH SPACES, NUMERIC WITH ZEROES'. I am amazed that the system didn't abend on a OC7 type error. When we do testing we know the value of every field and we try boundary errors for large/small values along with text/spaces in numeric fields to verify that the system catches those errors.
@ktetch: But, his post makes sense. Just because you don't like /. doesn't mean he can't be correct. As someone who studied programs, stupid little errors like that can do crazy things. At least this one was such a blatant error it was easy to catch.
@chrylis: Haha! That was my first thought, too. Someone could easily say disparaging things about Consumerist.
@floraposte: Yes, Ms. Poste, but please consider this: VISA removed the $23,148,855,308,184,500.00 from my account within a few hours, but I had to call, and argue (politely) with a supervisor to get them to remove the $20 "negative balance fee."
Perhaps they would have eventually removed the $20 without prompting. Given the stories that we hear here and on slashdot, I wonder.
As someone that used to QA and support CC card transaction software, he is dead on.
Got to be very cautious about checking that input and setting bounds checking.
@Pixelantes Anonymous: To be fair, that was more of a programming "backdoor" or unused feature than a bug.
Or at the very least, an intentionally coded joke.
@Thomas Traynor: Probably an overworked programmer in a shop where all they do is fix production issues all day and don't have a standard process to ensure quality...such as performing code reviews and properly QA'ing changes.
@Anathema777: I think what he's trying to say is that it SHOULD be an easy fix for Visa, and no one should have to call about an overage fee. I could be wrong, but I feel like his mockery is more directed at Visa, which is having to be prompted to fix the issue, especially as related to the overages.
At least, that's where MY mockery is directed!
@Anathema777: Welcome to the world of programmers. We could not care less about the customers. In fact, our job would be sooo much easier if it werent for all the users.
@Anathema777: That tone is typical Slashdot snarkiness. It sounds wrong reproduced here on The Consumerist, but it fits right in there.
On the other hand, having now seen an explanation of what went wrong, I have to say that I would be similarly irritated if such a freshman mistake had come to my attention. It smacks of multiple failures, and I think I catch a faint whiff of something being rushed through by management, hence the shortage of QA.
@Thomas Traynor: It's a binary field, not a packed decimal, so an 0C7 shouldn't occur. Otherwise, yeah. I always loved that nearly any 80 character sentence in COBOL was compiled to a single IBM 360/370 instruction. COBOL is good for improving your typing skills, though.
@Pixelantes Anonymous: BigPapaCherry is right, Hot Coffee was not a bug.. a bug in a game would be something not colliding with something else when it's supposed to, or something that causes the game to not be playable.
@Mr.Gawn: Because newspapers are making soooooo much money nowadays, right? Absolutely no one is talking about the slow demise of print media at all. /sarcasm
@Skeetz: Here's the scary thing: any programmer worth his or her salt will tell you that 100% error free code does not exist. In thousands upon thousands of lines of code, it simply isn't possible to spend the time or money to track down all the errors in a program. This is why programmers have to pray that things work sometimes :)
@Mr.Gawn: I like your sarcasm.
Game Informer (GI) is an American-based monthly magazine featuring articles, news, strategy, and reviews of popular video games and associated consoles. Formed in August 1991,[1] the magazine has nearly 3 million subscribers according to Andrew Reiner, making it the highest circulated video game magazine,[2] and as of the first quarter of 2009, it is listed as the 12th largest overall magazine.[3] Game Informer is now ranked among the top four magazines for reaching males 18 to 34.[4]
[en.wikipedia.org]
@Anathema777: You make a good point. There are stages in managing these types of situations, some of which which are:
1. Prevent as many errors as possible
2. Find and fix the errors that do come up
3. Make things right for the people affect by the error
If VISA went wrong, I think that stages 1 and 3 are to blame. There should have been more error checking before the software made it out into the real world, and VISA should have done a better job at correcting the problem for the customers affected (so that they didn't have to call to have the overdraft fee waived).
That's not quite accurate. If CHARACTER field gets padded with the wrong character, funny things can happen, but not THIS. What happened in this case is that the data (being the ASCII space character) was INTERPRETED as raw binary ... the bit representation machines use internally to do arithmetic in CPU registers. It appears the code skipped a critical number conversion, or the conversion flagged an error that the calling program ignored.
@umbriago: It is written in geek, and it's not even correct. See my post in this thread further down.
@Josh_G: If I had a penny for the number of times a customer service rep told me that computers never make mistakes ... well I might not be rich, but I could at least afford to buy a new computer.
@thnkwhatyouthnk: And video game programs don't use legacy financial transaction data formats that were designed on 1960's COBOL based computers, either.
@Skaperen: What doesn't make sense is that, if we assume there was some conversion gone awry in converting a character array into a number, wherein it got converted into a 64 bit integer (presumably a long int on the specific system), then where did the 0x12 and 0x50 come from? 0x50 is "P"; 0x12 isn't even a valid ASCII char.
@Cheapskate Brill: Apparently the problem happened on just a few cards. It's not happening to every Visa transaction. Some kind of different condition is causing it. It may well be that the tests set up never triggered this. The tests need to be amended, obviously.
@SacraBos: Yes, I do get a dial-tone every time. But I don't always get connected to the phone at the number I dialed. If I get a busy or reorder, that's one thing. I have gotten connected to the wrong phone a few times. Once, I was connected into an already existing phone connection and it became a 3-way, and neither of the other two were the party I was calling.
@Megalomania: That's a very good question, particularly because that number is the same for everyone's account or transaction. So it's not sourced from the account number or the transaction code or the amount. Maybe a field conversion with the wrong width? But what is it converting. Or maybe it's memory overwrite (wrong array index, pointer in error, freed memory being re-used). I'm inclined to believe the latter, which can easily happen in languages like C or C++ (harder, but not impossible, with languages like COBOL and Java). I've been programming in C for over 2 decades so I know about this.
@Megalomania: One more possibility is that a conversion that produces a floating point result (example: strtod) was used when it should have been one that produces an integer result (example: strtoll). In any case, this code apparently only does this in very few cases ... enough to make it into the fishwrap, but not enough to shutdown the entire card processing system.



















Thanks for the simple explanation. The Slashdot explanation reads like it is written in another language (like "Fortran Geek").