zimmie

About

Username
zimmie
Joined
Visits
172
Last Active
Roles
member
Points
2,737
Badges
1
Posts
651
  • Facebook says 'faulty configuration change' to blame for 6-hour outage

    dewme said:
    The concept behind Facebook is glorious if all people were nice. But all people aren’t nice. Some people are evil. Facebook has found a way to monetize evil and reap incalculable financial rewards from doing so. But as many others have said, if Facebook wasn’t doing it someone else would. The social media genie can never be put back in the bottle.
    Is it, though? The concept behind Facebook was to rate female Harvard students' hotness.

    dewme said:
    For six hours we were able to have meaningful in person conversations.   Maybe we need to start a GoFundMe for the poor IT guy that blew it today?  It was probably his last day.  
    I think that IT guy will never make that mistake again, so he would be a great employee now.

    The one forbidden word for surgeons and network engineers is:   "Ooops!"

    I can only imagine the "conversations" going on today between FaceBook engineers, managers, etc...
    The blame game will have reached new, unheard of levels by now....

    It does seen odd that they would not employ some sort of checkpointing scheme on their configuration database to allow them to roll back to the last known good state. This is a very common technique for high availability systems and even some individual products, e.g., take a snapshot of the configuration settings before performing a software or firmware update. While I have zero love for Facebook, I'm sure that its stakeholders don't appreciate the financial losses incurred during the protracted downtime.
    They do have the ability to undo changes ... at the level of the management platform. When the management platform can no longer reach the device to manage it, it needs manual intervention to restore that communication. Then the configuration can be restored to what it was before the change. They also lost external access to the management platform, but that's simple enough to fix (if somebody doesn't confirm the change, roll it back automatically). The access from the management to the managed devices is more difficult.
    dewmeGeorgeBMacwatto_cobra
  • Facebook, Instagram, and other services seeing widespread outage [u]

    Xed said:
    Ugh ... DNS problems.

    I've experienced DNS problems - heck I caused DNS problems.

    Taught me a lot about default TTLs, inserting manual TTLs, and changing them well before initiating changes.

    DNS for infrastructure is like Social Security for politicians or the third rail for transit trains.

    Once burned, you avoid making changes as much as possible 😥.
    Wait until you have to deal with a BGP table error. Oopsie.

    I can't wait to read about what happened. It's looking like a deliberate hack, not an accident.
    Almost definitely not an attack. This has all the hallmarks of a company overconfident in its intelligence designing itself into a pit. "We have a reliable system, so let's use it for internal discussion. Oh, and let's use it for badge access. And automation is the future, so let's execute all the changes without human intervention."

    When I was more involved in business IT, I had to constantly fight against people creating dependency cycles. That's the kind of situation where A has to be working for B to work, B has to be working for C to work, and C has to be working for A to work. For example, your VM environment has to be working for your password vault server to work, and your password vault server stores your passwords for fixing the VM environment if it dies. "Oh, but we never have problems with the VM environment!" Until somebody removes a LUN from your SAN and kernel panics all your VM hosts, and now the outage is 6 hours instead of 30 minutes.
    killroywatto_cobra
  • Facebook, Instagram, and other services seeing widespread outage [u]

    It's actually a BGP issue. They advertise around 5000 prefixes into public BGP. A few hours ago, they stopped advertising around a thousand of those prefixes. It seems the prefixes they are no longer advertising contain most (possibly all) of their authoritative DNS servers. Even though a lot of their systems are still up, nothing can get to the DNS servers to find them.

    For extra fun, their internal communication platform is a separate instance of Facebook ... which is also inaccessible. People are speculating it's an attack, but if you're making a change remotely, and it takes out both your remote access and your ability to coordinate with others on the team, four hours or more is commonly the best case recovery time.

    Edit: Just heard (and second-sourced) that their badge access to buildings also isn't working. They take physical security pretty seriously. You can't just call a locksmith to pick the lock. It's looking likely this will last 12+ hours. I really feel for their IT operators.
    elijahgdewmeappleinsideruserfastasleepkillroywatto_cobra
  • 'Foundation' is beautiful, lavish, and boring say reviews

    ZooMigo said:
    Aside from falling asleep halfway through (not unusual), I had 1 big complaint. Why is it every “epic story” is full of people with British accents? Did the UK conquer the universe and leave everyone else behind? Star Wars, Dune, so many shows, so many English accents. I know, it’s a small quibble, but it bothers me enough to pull me out of immersion. 
    I think part of it is that certain UK accents sound "sophisticated". Cockney? Not so much. Welsh? Nah. But in a lot of English-speaking places, that's where we get our concept of aristocracy, and what aristocrats should sound like.

    I suspect a fair chunk of it also has to do with the British being seen as imperialistic. The kind of people who would develop a galactic empire, because they certainly built one here on Earth.
    StrangeDaysJWSCh2pbyronlBeatswatto_cobra
  • Compared: iPhone 13 Pro and iPhone 13 Pro Max vs iPhone 12 Pro and iPhone 12 Pro Max

    Any word on the dimensions of the camera bump? From the video and press shots, the 13 Pro's looks a LOT bigger than the 12 Pro's, both in footprint and thickness.

    I could not possibly care less about the thickness of the rest of the phone. All I care about is the thickness of the whole phone, bump included.
    hippo