Don’t Be Stupid

You remember that Google problem that flagged the whole Internet as malicious?

Shall we count the myriad failures inherent in it?

  1. They trusted administrators to be infallible. The process or tool they use to perform the updates didn’t question whether the user really meant to flag the whole Internet. It did not validate on obvious problems. If it’s like anyplace I’ve ever worked, the administrative tool is a pasted together bit of interface and workarounds, rife with problems and untested or undertested because only administrators would use it.
  2. They didn’t test it immediately. The “on-call site reliability team” took care of it. That sounds like me that either an automated process dialed some pagers or a help desk person got a phone call. They probably ought to have someone check that right away, hey? Nah, they’re Google. Everything they touch turns to good.
  3. A variety of sources show that Google blames its data provider initially. That is a sign of hubris of the commonest order. Maybe they shouldn’t have been so hasty to lay blame before understanding and fixing the problem. On the other hand, Google did fix it quickly. Some organizations spend more time on the casting aspersions part of the program, sometimes foregoing fixing the problem entirely.

If Google can make these mistakes, so can your company. The key is not to use that as an excuse when your organization blows it (“Hey, it’s okay, even Google does it!”). You have to learn from these mistakes and make sure your company doesn’t trust its administrators and tests things the minute changes are made.

Comments are closed.