“Help me, Supportability. You are my only hope.”

Supportability? Wait, don’t you mean “Serviceability?”

No, I don’t. In my mind, Serviceability is a subset of Supportability.

In order for normal humans to support your product, it definitely has to be “serviceable.” To me, Serviceability encompasses things like:

  • Managing the product
  • Configuring it
  • Deploying it successfully without intimate knowledge of its guts
  • Viewing system health/status

If have to call you every time I want to change a feature’s behavior because it settings are strewn across 10 different properties files, that’s not a supportable product because it is not serviceable.

Jiffy Lube != Ferrari’s F1 Pit Crew

To make a [crappy] analogy, Jiffy Lube can definitely service my car by changing the oil. But I would not take my car to them if I had issues with the engine stalling on cold mornings. That’s a support issue that is best handled by mechanics who intimately understand how an engine works and have the tools and knowledge to dig in at a low level to remedy the issue.

When people lump fixing Day2 problems/outages [AKA: “the unexpected”] with normal design/deploy/operate issues [AKA: “the expected”], they are missing a harsh reality of our industry: we have way more Jiffy Lube-types than Formula 1 mechanics.

[ Note: That was not intended to be a slam of the Jiffy Lube folks. My mechanic can’t change my oil in 5 minutes.

I’m not nearly as good a basketball player as Dwayne Wade, but I suspect I’m a touch better than he is at decyphering Tomcat logs.

We all have to specialize. But, I digress… ]

Plenty of people can configure and maintain systems. However, an overwhelming majority of people are simply not the tech equivalent of a mechanic.

They can change the oil, bolt on new wheels, maybe even install a new stereo. But, they are unable to dig into each of the “black boxes” of your solution [the engine or transmission, to continue the horrible analogy] to find out why things aren’t working. When issues arise, they poke the boxes from various angles hoping to get a desired outcome, like some kind of twisted digital pachinko machine.

DevOps as a stop-gap

Luckily, we DevOps types have emerged to help bridge the gap between coding large-scale products/platforms and keeping it afloat. But, that’s not a model that scales (though it is short- to mid-term job security ;) ).

As the person in my BU who thinks the most about holistic Supportability, my job is to make the platform so easy to support that I don’t have a job anymore. Unfortunately, “easy to support” is a (falsely) assumed feature that should just be there.

OK, the number of DevOps-savvy folks are limited. I get that. Why can’t “normal” humans do the same? You guys aren’t that damned smart.

Because you haven’t built the tools to help people support the product.

True ability to support a given platform requires not just intimate knowledge of the target platform, but proper visibility into its operation. If I call my mechanic and have him listen to the noise my engine makes, he can take a guess at the problem. But, if I get the car in their hands, they can truly see what’s going on. They can took up the car to their computers and see valve timing, air/fuel ratios, any sensor alarms, etc.

Oh, is that it? We’re all set, dude. We have very verbose logging.

Suuuuuuuure you do. As Merlin Mann puts it, everyone also thinks they’re great at french kissing. Maybe you should call some of your ex’s just to make sure your perception matches reality.

I’d be willing to guess a fair amount of your “verbose” logging comes from code like this:

try {
someMethod(someArgument);
} catch (Exception e) {
log.error("Error occured", e);
}

… which will get you something as wonderful as this:


May 26 19:10:01 appserver-1.company.com myapp[]: DEBUG [Post] - [Post] - [] - []:Error occurred
[ ... followed by multiple KB of stacktrace ... ]

If by “verbose” you mean “voluminous,” I’ll give you that.

Pop quiz, hot shot: You’ll know what to do with the resulting stacktrace when you’re bleary-eyed and troubleshooting an issue at 3AM, right? Right? Your entire Tech Support staff are Java “Ninjas,” so it’ll never get to you anyway, right?

The awful truth is that not only is the logging pattern above exceedingly common, it ensures that almost no one outside of your Developers can tell you what in the hell is going on.

As some of us here in Texas would say, “That dog won’t hunt.”

We will put in Supportability features in the next release…

I know. You’d love to add this kind of stuff, but you have code to write, deadlines to meet. I get it. You’re totally swamped. No worries. We’ll get around to making the product supportable by people other than the Developers in the next release. I promise.

If you have the fortune of making a product that is successful, you will wish you would had invested in making that product easier to support. Once you ship – and customers ask for more, better, and faster – you won’t have time to right the SS Supportability. She will be horribly off course and headed for a watery grave.

Either do it properly from the start or be willing to suffer the lovely consequences, which include:

  • Pulling your rockstar Devs away from new features to work on issues that can’t be supported by anyone else
  • Causing schedule slips
  • Pissing off your customers
  • Ruining your “brand”

Few companies get Supportability right because it is not a feature; nor does it fit into a User Story. Supportability has to be embedded in the culture of your team. It is not something ScrumA can do and ScrumB can put off until the next release (which is French for “It’ll never happen”). It has to permeate and affect everything you do. 

Supportability: It’s not just for logging anymore!

While I’ve harped on logging here, Supportability applies to anything that make things run smoother. The reason I’m beating up on logging is that I have supported multiple products and I have yet to see a product that’s really gotten this right. Thus, I felt it deserved some air time.

Note, when I say products, I am talking not just about ones I directly support – though I (perhaps incorrectly) like to think we’ve done a decent job of being better than most. I also talking about the infrastructure components (DBs, etc).

Case in point: Have fun hunting down a suspected corrupted key/value pair your app is storing in memcached – especially if you’re using some driver/abstraction layer to manage said key/value pairs (I’m looking at you, Hibernate). If you don’t have the key handy, there’s no really feasible way to find it – especially if you shard across multiple memcached servers.

Self-healing/auto-correcting code, proactive alerts [ more on this is a later post ], simpler integrations, more rapid deployment [e.g.: Continuous Integration], less downtime. This all comes together to create the overall experience of supporting the platform, not just servicing it.

This keeps it running, making it do more, better, faster – hopefully without the need of help from folks like me.

Advertisements

2 thoughts on ““Help me, Supportability. You are my only hope.”

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s