Sun, Oracle save Microsoft's Pink after Danger data disaster

Posted:
in iPhone edited May 2015
Microsoft has announced the restoration of Sidekick users' contacts as the first milestone in recovering data it lost in the cloud computing disaster affecting its Danger subsidiary, while a new source explains why the restoration was possible without a backup and why it is taking so long.



A source familiar with Sun SAN hardware used in the Danger datacenter has provided AppleInsider with additional insight explaining why it is taking Microsoft weeks to recover its users' lost data after initial reports stated that the data was completely lost and that no suitable backup existed.



Microsoft's problems began at the beginning of the month, when the cloud servers its operates under contract to T-Mobile began falling offline. It was initially announced that large amounts of T-Mobile's Sidekick subscribers' data had been lost and that no backup existed for the user data, which was stored entirely on Microsoft's servers. (Sidekick devices are not designed to backed up locally in the same way the iPhone backs itself up to iTunes on the user's desktop computer.)



On October 6, T-Mobile issued a statement saying, "Regrettably, based on Microsoft/Danger?s latest recovery assessment of their systems, we must now inform you that personal information stored on your device ? such as contacts, calendar entries, to-do lists or photos ? that is no longer on your Sidekick almost certainly has been lost as a result of a server failure at Microsoft/Danger."



On October 15, two weeks after the problems began, Roz Ho, Microsoft's vice president of Premium Mobile Experiences, issued an apology for the outage and announced that the company had determined that, contrary to initial reports that all the data was permanently lost, the company now thought that it should actually be able to recover most of the data that had been lost, but that the recovery effort would take some time.



This week, on October 20, T-Mobile announced the availability of users' restored contact data dating back to October 1. It provided subscribers with instructions on how to restore their data, merge the restored contacts with their information they currently had, or to just ignore the restored data and continue using the contact information currently on their devices.



In addition to contact data, the announcement also said, "We?re making solid progress on the next phase in this restoration process, including your photographs, notes, to-do lists, marketplace data and high scores. We appreciate your ongoing patience."



Why the data recovery takes so long



Speaking as a "Sun technical guru who has been pulled into similar scenarios in the past," a source has explained that in circumstances like this one, where Microsoft and its IT services contractor Hitachi were facing lost data and did not have a conventional backup to restore, "[SAN storage vendor] Sun and [database vendor] Oracle have sent in their best people, and they are stitching the database back together for Microsoft, and they have a good estimate of what data is recoverable."



"It will take several days to actually get the database back up," the source noted, echoing earlier reports that indicated that it took 6 days just to create a normal full backup of the data. The time and storage resources involved in backing up the tremendous amount of data were cited as the reason why Microsoft's Roz Ho reportedly instructed Danger employees to proceed with work without the full backup in place over their objections, after sources say she was assured by Hitachi that a full backup was not necessary.



Salvaging the damaged data storage without a real backup in place takes even longer, the Sun storage expert explained. "The first thing to do is wheel in a big pile of new disk space, and copy the individual disks so there is a raw backup. This is like making a copy of a jigsaw puzzle one piece at a time. Then they would assemble the puzzle using the copied pieces, in case any pieces need to be re-made from the original.



"This is very hard, requires detailed inside knowledge of how SAN addresses and volume manager layouts fit together with Oracle tables. Finally, they need to start up the database on top of the assembled puzzle, and Oracle will do its own clean up to get into a consistent state.



"The next thing you do is a fresh backup (several days), before you allow any users access to it. So it's not surprising that this would take over a week, even after it was possible to say that the data is recoverable."



Cause of the datacenter problems still secret



While the recovery effort is being delayed and complicated by the lack of an external backup, Microsoft is still remaining quiet about the cause of the incident. After AppleInsider first reported that insiders were blaming the failure on either an aggressive and poorly orchestrated upgrade or possibly even deliberate sabotage by a disgruntled employee, CNET "Eye on Microsoft" reporter Mary Jo Foley stated "I?ve also heard that foul play has not been ruled out because the failure was so catastrophic and seemingly deliberate. Microsoft is supposedly continuing to do a full investigation."



Whether the incident was the result of an accident or a malicious attack, Microsoft has learned an important lesson that eventually hits everyone in the world of computing: never work without a backup. In last week's apology to Sidekick users, Ho wrote, "we have made changes to improve the overall stability of the Sidekick service and initiated a more resilient backup process to ensure that the integrity of our database backups is maintained."



The company has also (understandably) worked to distance itself from the high profile datacenter disaster by describing its Danger operations as running non-Microsoft technology, specifically associating Sun and Oracle with the incident. Somewhat ironically, Microsoft's capacity to recover most of its Sidekick users' data is entirely due to the availability of Sun and Oracle experts and the inherent resilience of those company's products to disasters of any kind, even in cases where customers do not maintain proper backups of their data.



Recovery begins amid lawsuits



While at least some of the lost data has now been recovered, T-Mobile continues to list all Sidekick products as "temporarily out of stock" on its website, and multiple lawsuits have already been filed by users. A report by CNET "Beyond Binary" columnist Ina Fried cited one attorney's complaint as stating, "T-Mobile and its service providers ought to have been more careful the use of backup technology and policies to prevent such data loss."



A second attorney pursuing a case against both parties wrote, "T-Mobile and Microsoft promised to safeguard the most important data their customers possess and then apparently failed to follow even the most basic data protection principles. What they did is unthinkable in this day and age."



The suit added, "Further complicating the data loss is the fact that Sidekicks, unlike iPhones, BlackBerrys and other smartphones, are not designed to sync locally with a user's personal computer without additional software and hardware. This means that most users were not able to backup their data locally, but were encouraged and required to rely on Microsoft/Danger."



Last week, T-Mobile volunteered a peace offering to its affected users in the form of a $100 gift card and a month of free data service. However, given the deep pockets behind the event, the company's million Sidekick subscribers are likely to be looking for a bigger settlement. And while subscribers don't have an iron-clad contract specifying specific damages in compensation for any service outages on T-Mobile's part, the mobile provider does have an explicit contract with Microsoft's Danger subsidiary.



Writing for Mobile Crunch, Greg Kumparak reported that T-Mobile's SLA (service level agreement) with Danger is believed to specify penalties that amount to around 87 cents per day, per user, anytime availability dipped below 99.5% (that's less than two days per year of unscheduled downtime, 3.6 hours of downtime within a month, or 50 minutes of downtime in a week).



For T-Mobile's million Sidekick users, that could add up to $870,000 per day over weeks of service outage, even without including any substantial additional penalties for dropping down through multiple availability ceilings in the extended problems Microsoft faced, the lost business T-Mobile experienced after suspending its Sidekick sales, and the damage it suffered to its reputation as a service provider.



Microsoft has advertised "five nines" availability for its own servers, which means 99.999% uptime, a standard that only allows for 5.26 minutes of unscheduled downtime within a year. Providing such "high availability" requires multiple redundant servers and highly resilient shared storage systems, and of course, appropriate backups.
«13

Comments

  • Reply 1 of 55
    this might sound as a cheap shot but how can you have a company represantative going by the name of Ho? They are usually picked up from the streets with this name they don't represent companies. And why on earth doesn't she just modify her name a bit, lots of people who had moronic surnames changed them. But maybe with a non idiotic surname she d lose her gravitas in ms. Again, I know it's a cheap shot but losing thousands of peoples personal data isn't much better either.
  • Reply 2 of 55
    Quote:
    Originally Posted by myapplelove View Post


    this might sound as a cheap shot but how can you have a company represantative going by the name of Ho?



    She's asian. That's her name. Last I checked, asking an asian girl to change her name wins her a few million in court.



    The bigger issue here is that she's clearly incompetent, and has no business managing anything. Honestly, though... Is anyone really surprised by the way Microsoft has handled this?
  • Reply 3 of 55
    Roz Ho is clearly incompetent, as seen by her management of this and the MacBU, and it's amazing she still has a job.
  • Reply 4 of 55
    I am wondering why MS makes their hiring choices nowadays so crapfully:



    Bribing people from Apple stores gets very unloyal employees.
  • Reply 5 of 55
    Wow, lame digs on the woman's name? I thought that only happened on sports oriented sites?

    Anyway, the question I have is, if there is a VP of Premium Mobile Experiences, does it follow that there is also a VP of Inferior Mobile Experiences?

    Stupid padded titles are usually the sign of bloated management. If there ain't no Junior Blah Blah, then don't give some blow hard the title of Senior Blah Blah. But if M$ titles where anything like accurate Ho would be VP of *!ed Up Mobile Experiences, and Balmer would be IIC, Idiot In Charge.



    G'Day.



    And myapplelove, be careful. Your name sounds like maybe you hump pomaceous fruit.
  • Reply 6 of 55
    Not so sure that the word "premium" should be in her title. Who, with ANY computer experience, doesn't understand the value of having a backup of important data? Either Roz was simply incompetent or didn't care about the end user. Another reason to ditch the $idekick and buy an iPhone.

    Here's her Micro$oft bio: http://www.microsoft.com/presspass/exec/rozho/
  • Reply 7 of 55
    You can also bet that MS will never say thank you publically to Sun or Oracle for their help in rescuing their butts.
  • Reply 8 of 55
    Quote:
    Originally Posted by FormerARSgm View Post


    Not so sure that the word "premium" should be in her title. Who, with ANY computer experience, doesn't understand the value of having a backup of important data? Either Roz was simply incompetent or didn't care about the end user. Another reason to ditch the $idekick and buy an iPhone.

    Here's her Micro$oft bio: http://www.microsoft.com/presspass/exec/rozho/



    microsoft is releasing windows 7, do you think they care about end users?
  • Reply 9 of 55
    sheffsheff Posts: 1,407member
    So how much does microsoft owe Sun now? Each one of those "gurus" should be sent to Cancun for a year for saving microsoft's butt.
  • Reply 10 of 55
    Quote:
    Originally Posted by myapplelove View Post


    this might sound as a cheap shot but how can you have a company represantative going by the name of Ho? They are usually picked up from the streets with this name they don't represent companies. And why on earth doesn't she just modify her name a bit, lots of people who had moronic surnames changed them. But maybe with a non idiotic surname she d lose her gravitas in ms. Again, I know it's a cheap shot but losing thousands of peoples personal data isn't much better either.



    Always impressive when, through the sheer power of idiocy, someone manages to single-handedly change the general condemnation of someone into a rally in her defense. And you gotta love that rhetorical move at the end: "I know racism is a cheap shot, but losing people's data isn't much better." WTF.
  • Reply 11 of 55
    Quote:
    Originally Posted by sheff View Post


    So how much does microsoft owe Sun now? Each one of those "gurus" should be sent to Cancun for a year for saving microsoft's butt.



    Owe??



    I wouldn't put it past MS to refuse the bill, and/or sue Sun & Oracle for the downtime. I.e. the American legal premise of "If I'm an idiot and hurt myself, it's your fault".



    You heard it here first.
  • Reply 12 of 55
    john.bjohn.b Posts: 2,742member
    Quote:
    Originally Posted by AppleInsider View Post


    "It will take several days to actually get the database back up," the source noted, echoing earlier reports that indicated that it took 6 days just to create a normal full backup of the data. The time and storage resources involved in backing up the tremendous amount of data were cited as the reason why Microsoft's Roz Ho reportedly instructed Danger employees to proceed with work [b]without the full backup in place over their objections, after sources say she was assured by Hitachi that a full backup was not necessary.[b/]



    1) VLDB platforms require a VLDB recoverability strategy. (Note: I didn't say backup strategy, I said recoverability strategy.) No database is ever "too big" to backup.



    2) Our SAN vendor is EMC, and they won't as much as put in a new bin file (SAN configuration file) without getting a director or IT manager to sign off on a document that states all data may be lost, and that current backups are required and are in place. No bravado from those guys.



    3) If this report is true, Roz Ho needs to be moved to a far less technical part of the business. Even in this crap economy I would've resigned on the spot before I'd let this fiasco occur on my watch. After all, its far easier to find a new gig if your reputation is intact.



    John.B (who knows a thing or two about RDBMSs and SANs)
  • Reply 13 of 55
    elrothelroth Posts: 1,201member
    Quote:
    Originally Posted by AppleInsider View Post


    In addition to contact data, the announcement also said, "We?re making solid progress on the next phase in this restoration process, including your photographs, notes, to-do lists, marketplace data and high scores. We appreciate your ongoing patience."



    "Honey, I just can't do my household chores until Microsoft restores my to-do list."
  • Reply 14 of 55
    Seriously, I don't see why this is big news here on an Apple oriented site. Maybe I'm missing something because of my poor English?



    Is it really a surprise that Sun and Oracle experts helped to recover when MS has nothing to do with the setup? It's like calling the plumber isn't it?



    And why does Appleinsider not report on the guest account issue in such detail?
  • Reply 15 of 55
    It seems unlikely that a single person could have maliciously killed the system. After all, if the most recent backup is dead, isn't there another backup? And isn't that backup held far, far away from the primary data center, with different people holding the keys? And wouldn't that backup be held "off-line" in order to minimize risk?



    It seems much more likely to me that the backup/recovery strategy was flawed, and was never properly audited or tested. And they can't blame it on Danger, because this software system should have been examined and understood and PROVEN periodically ... and especially before a SAN upgrade (duh!)



    For instance: in my shop, we test backup and recovery processes annually. For highest-criticality systems, we test them monthly. Yes, it costs money to do this, but it sure they hell beats losing tons of valuable data.



    In short: There is absolutely NO excuse for this failure, and it is quite likely that senior management heads at Microsoft will roll (if they haven't already).
  • Reply 16 of 55
    what I don't underestand is why the top gurus from Sun and Oracle were called in. Isn't Microsoft running their businesses on their own OS and database software? Did they really set up a major IT project that they control using Sun Hardware and an Oracle database? Surely I've got this wrong?
  • Reply 17 of 55
    -ag--ag- Posts: 123member
    Quote:
    Originally Posted by goodcow View Post


    Roz Ho is clearly incompetent, as seen by her management of this and the MacBU, and it's amazing she still has a job.



    Um its Microsoft, They are run by Captain Bobo the Wonder Clown now. How does her qualification make her any less competent than a poker playing college drop out?



    Quote:
    Originally Posted by JSmith View Post


    You can also bet that MS will never say thank you publically to Sun or Oracle for their help in rescuing their butts.



    It will be Sun/Oracles fault for not running Access or some other MS owned DB solution. Because nothing ever goes wrong with MS databases. /sarcasm font off
  • Reply 18 of 55
    ibillibill Posts: 400member
    How ironic that Snoracle pulled microsoft's bacon from the fire.



    I was quite enjoying watching them roast.
  • Reply 19 of 55
    Quote:
    Originally Posted by -AG- View Post


    Um its Microsoft, They are run by Captain Bobo the Wonder Clown now. How does her qualification make her any less competent than a poker playing college drop out?







    It will be Sun/Oracles fault for not running Access or some other MS owned DB solution. Because nothing ever goes wrong with MS databases. /sarcasm font off



    haha, exactly! I like how Microsoft is trying to back away and ready to point it's fingers on Sun/Oracle. Yep, this possibly could never gone wrong with a Microsoft platform....shhhh, don't mention the hotmail thing.



    As for Ho, She's a 'tard and should go. That was a bad management decision whether it was a sabotage after that point or not.



    Anyway, if I were Microsoft, I would be extremely grateful for Sun and Oracle's help in this.
  • Reply 20 of 55
    al_bundyal_bundy Posts: 1,525member
    Quote:
    Originally Posted by iBill View Post


    How ironic that Snoracle pulled microsoft's bacon from the fire.



    I was quite enjoying watching them roast.



    there was probably a support contract in place



    believe it or not MS, Oracle, Sun, Apple and other competitors do a lot of development work together that benefits each other and probably give each other a lot of business
Sign In or Register to comment.