Driving Service Discovery with Sensu: Two Great Tastes that Taste Great Together…

Prologue (AKA: WTF are you doing this?)

In my new gig, I’ve been given some new and interesting challenges to solve. One of them is Service Discovery (SD).

Aside: If you are unfamiliar with Service Discovery, check out this post by Julia Evans over at Stripe. Not only does it provide a succinct explanation and real-world application, but in typical Julia fashion, it has a cute comic that explains it all.

Another thing I am tasked with is to modernize our monitoring, enabling a more DevOps-y way of building monitors (read: “Build your own damned monitors… erm, I mean, let’s help enable you to create your own monitoring”).

For me, that means bringing Sensu to the party. I am a huge fan of using standalone checks, rather than driving monitors from the server down. Not only does this allow for that DevOps-y workflow, but it means I can focus on improving my monitoring platform and enabling teams to use it. But, I digress…

After an internal bakeoff, we settled on Consul as our SD platform of choice.

OK, We Picked Consul… What Now?

We loved Consul, but there was a problem: YAN (Yet Another Agent).

Being both a large Enterprise and a telco, there is a fair amount of hoop-jumping (InfoSec, customer data protection, etc) that needs to be done to add anything on a box – especially when it’s something you want on all boxes.

That said, even if we did install the Consul agent, it doesn’t pay us any benefit without the service owners either registering services themselves or using config management to create the required JSON files on each host.

While we have some teams who are living in Microserviceville, an overwhelming majority of our services are aging J2EE stacks built with proprietary frameworks; “Java middleware and sadness,” as Bridget Kromhout would call it. This means many of them have very little automation (which is something my team was brought in to fix!).

Rather than pack up the SD effort, I realized I could come at this from another direction. Consul has an HTTP API, so why not use the Sensu checks we have for each service endpoint to drive the registration – and state – of those services?

Go on.jpg

That way, we get a “twofer” – monitoring and service discovery. Two great tastes… wait, I did that one already.

Making Some Reese’s

Now we get to the fun part. It turns out that having Sensu drive the SD updates is pretty easy. You just need a couple of moving parts:

  1. Metadata about the service
  2. Handler

Note: Being relatively new, I’m working to find out what permissions I need to get to put my docker-compose based PoC on GitHub. I’ll update this post with the URL when/if that happens. That said, most of the important bits are detailed below.

Service Metadata

This part just involves adding a bit of metadata to whatever sensu-client check is monitoring the service is question. You are monitoring the services you care about? Right?

Say you have a web server you are monitoring like so:

{
    "checks": {
        "check-http": {
            "command": "/opt/sensu/embedded/bin/check-http.rb -u http://localhost",
            "occurrences": 2,
            "handler": "consul",
            "subscribers": ["web"],
            "standalone": true,
            "interval": 60,
            "ttl": 120,
            "service_registry": {
                "service_name": "web",
                "port": 80,
                "tags": ["green"]
            }
        }
    }
}

It’s that “service_registry” section which drives the interaction with Consul. While it’s pretty self-explanatory, let’s cover the pieces just to be sure:

  • service_name: How you want the service to appear in Consul
  • port: The port on which the service can be found
  • tags: This is optional, but tags allow you to find hosts based on things other than the service(s) they host and their hostname

Consul Handler

This metadata is used by the Consul handler to drive updates to the status of the service in Consul. Luckily, if we update the state of a service in Consul – and that service does not yet exist – then Consul will add it. This makes the logic of the handler pretty simple.

When an event is triggered, then the handler will shoot JSON to Consul’s /v1/catalog/register endpoint:

{
    "Datacenter": "qlab01",
    "Node": "web-node-01",
    "Address": "172.16.15.15",
    "Service": {
        "Service": "web",
        "Tags": "green",
        "Address": "172.16.15.15",
        "Port": 80
    },
    "Check": {
        "Node": "web-node-01",
        "CheckID": "web:web-node-01",
        "Name": "sensu-driven check",
        "Status": "failing",
        "ServiceID": "web",
        "Output": 'CheckHTTP CRITICAL: Request error: Failed to open TCP connection to localhost:80 (Connection refused - connect(2) for "localhost" port 80)'
    }
}

This causes the service to be updated in Consul, like so:

Screen Shot 2017-06-02 at 4.48.10 PM

Likewise, when the check clears, it will post back the same JSON, except setting the status to “passing”.

Getting back to my penchant for “pushing monitoring to the left,” this means my Developers can add a standalone check on their boxes to get both a monitor and service registry, two great… you get the point.

But Wait! There’s More!

It turns out there is a maturity model you can enable with this setup, allowing you to get some of the benefits of SD as your teams to gain the maturity to do SD the right way.

But, that’s a post for another day.

Some Shortcomings

There is one glaring shortcoming of this approach: The handler only triggers on a state transition. I am hoping is just a matter of education on my part to find a way to get Sensu to auto-push the data to Consul. Worst case, I can have a check on the Sensu server that iterates the list of checks for the service_discovery metadata, but that feels a little dirty.

Again, I hope to be able to post my working prototype once I get permission. But, hopefully this helps someone either directly or just by helping people think outside the box and optimize for the constraints that surround your work.

Bloglet: Tales from WTFLand – nginx-auth-ldap misconfig causes crash with no error logged

… other than the crash itself, of course :)

Will keep this one short and sweet (not my norm, I know). Just wanted to post it out there, as I didn’t find a hit despite some GoogleFu.

I was refactoring some Ansible automation, which involved using conditionals in my templates when LDAP was enabled or not. I missed a change and had this section in one of my .conf files:

 auth_ldap "Closed content";
 auth_ldap_servers {{ nginx_ldap.server }};

… but, the other conditionals were working properly, thus the config that would have the LDAP config referenced above was not present.

Turns out nginx-auth-ldap doesn’t handle this kind of screwup gracefully. Instead, this is all you get in /var/log/messages:

Mar 20 09:27:59 logging-01 systemd: Starting The nginx HTTP and reverse proxy server...
Mar 20 09:27:59 logging-01 kernel: nginx[427]: segfault at 8 ip 00007f68ff51ca3a sp 00007fffd67e5a30 error 4 in nginx[7f68ff457000+f8000]
Mar 20 09:27:59 logging-01 systemd: nginx.service: control process exited, code=killed status=11
Mar 20 09:27:59 logging-01 systemd: Failed to start The nginx HTTP and reverse proxy server.
Mar 20 09:27:59 logging-01 systemd: Unit nginx.service entered failed state.

Nothing is logged to /var/log/nginx/error.log (or any other nginx logs).

Anyway, hope that ends up saving someone from the couple of hours of head scratching I did last night trying to grok WTF happened here.

Lego + Testors Model Cement = Bliss

Image

You might read the title and think, “Gluing Lego?!? Sacrilege!”

Normally I’d have my pitchfork and torch, standing alongside you. But, hear me out.

My (recently turned) 4 year-old, Toby, loves everything his (8 year-old) brother does. This includes playing with Lego. For his birthday, Toby got a couple of Lego sets: a firetruck and a cement mixer.

The firetruck is solidly designed and has withstood all the abuse Toby has thrown at it. The cement truck, however, has some serious structural issues that cause parts to fall off all the time – parts that see a fair amount of action.

Example: Check out the chute where the “cement” pours out.

Image

The gray part I’ve circled in red takes the load of the chute, but more importantly, it takes the load of any movement of the chute. Any significant downward force, and the whole chute comes off. Which is to say, roughly every 45 seconds Toby would run to me with the truck in one hand and the chute in the other crying, “Daddy, fix it!”

Typically, Lego does a good job of anticipating these types of scenarios and designs the kits to ensure they don’t happen. Again, the firetruck is an excellent example. In the case of the cement truck, they totally missed the boat.

After suffering though a couple of weeks of “Daddy, fix it!”, I surfed the net a bit to confirm what I suspected: The ABS Lego are made of can be “welded” by a solvent – like my handy Testors Liquid Model Cement.

I tested this out on a couple of 1×2 bricks, hoping that it might hold a bit better than using friction alone, thus buying me an hour or two of “fix it” free time. I was floored to find that the bond was super strong. So strong, in fact, that to separate the bricks would likely require breaking them.

My test fruitful, I shoved aside my Lego morality and (selectively) glued the problem parts together, yet all of the mobility and functionality of the kit is still intact!

True, these parts will forever be fused, but given the buckets of bricks lying around, I can live with that. I’ll have to atone for my sins later. I just hope the great Lego St. Peter in the sky will understand.

Obligatory “WTF is this guy?” Post

Obligatory Bio Post

Ok, so here goes the obligatory “WTF is this guy?” post.

Even more obligatory, “Under Consrtuction” warning

Fortunately, Gruber didn’t put in support for the blink or marquee tags – and I’ll spare you the most common animated GIF in the mid 90’s: the “Under Construction” sign.

Like any self-respecting geek, a lot of this will likely be left undone for a while.

Even more, more obligatory “Husband, Father, etc…” Section

So, yeah. That’s me, too.

Career

Like most geeks, I am defined – especially on the Internet – by my job. Here is a smattering of what I have done to help fund the coffers of Social Security and Medicare.

Currently: Technical Leader – WebEx Social Escalation Team Lead (and self-proclaimed “Product Quality Advocate”)

I head up the Escaltion Team for WebEx Social. WTF is “WebEx Social?,” you ask. The one sentence summary for normal humans: It’s Facebook, except for getting work done.

For the technically inclined, WebEx Social is Enterprise Social Software that combines Posts, Wikis, Blogs, Document Mangaement, Communities, Discussion Forums, etc – all with built-in integrations to Cisco’s Unified Communications stack (Communications Manager, Unity Connection, Cisco Unified Presence Server, WebEx Meetings WebEx Connect) and Show and Share (think: private, corporate YouTube), as well as Exchange and Domino Calendaring, OCS IM/Presence, SharePoint, and CMIS-compliant document repository.

To get even more technical, think of WebEx Social as a private cloud. We run on a hardened verison of CentOS, we have only ever deployed as a Virtual Machine, we use Puppet for configuration management and deployment, Monit/collectd/Nagios for monitoring, RabbitMQ for queuing, both MongoDB and Oracle for data stores, memcached to speed access to Oracle, and Solr for search. In short, its a relatively common DevOps technology stack.

Some of those pieces will evolve to more “modern” replacements, but that info I have to keep under wraps.

My team helps ensure that our customers are happy and their WebEx Social systems are running like a top.

Another key part of my job is to act as the product’s “Quality Advocate.” This is not just ensuring improving the customer experierience and we ship with a low number of bugs. It means I proving the installation and upgrade process, making the product easier to support, enabling Partners to deploy and customize the platform, etc.

As an example of improving serviceability, I am currently working to design and test a logging search framework that uses the following to help find point issues – and perform trending and proactive logging analysis:

Hobbies

With the 5 minutes of free time I have each week, here are the things I like to do:

Electronic Music Production

While still early days, at least I terms of my time spent doing it, I dabble in creating some EDM.

On the Mac, I use Propellerheads’ Reason. While they take it a bit too far – see: swaying patch cables – I like the approach they take. While I am generally not a fan of the whole skeumorphism thing, I think Props use it to great effect in Reason. They are “realistic” where it makes sense ( e.g.: knobs, buttons, displays), they also take advantage of the fact that this is all digital, as well. A perfect example of the latter is their Combinator. It allows you to bundle up synths, effects, and anything else into a single rack “item,” which is great for reuse and sharing piece parts of your creations.

On iOS, which I’ve recently started taking seriously, I am a big fan of:

  • Audiobus: Allows audio from one app to be streamed to another. It addresses the issue of having a great synth, groovebox, drum machine, etc with no ability to make an honest song with it. With Audiobus, you can route each of these to a multi-track DAW. This is a total game changer to help make iDevices viable music production/performance platforms.
  • DM–1: Like Reason, it’s a great implementation of using what works in the real world (knobs, drum pads, faders) and sprinkling in what you can take advantage of when it comes to simplifying the workflow of creating a song/groove.
  • LoopyHD: From the same guy that created Audiobus. LoopyHD is a great app in its ability to have anyone jump in and play without having to know music, programming a sequencer, etc. My 8 year old love to beatbox into Loopy to make ad-hoc songs.
  • NodeBeat HD: This is another app that allows you to make interesting music without any musical experience (noticing a trend here?) One of the latest updates allows you to shoot its output via MIDI to whatever synth you want.
  • Figure: Another or from the Popellerheads team. Where Reason went for “realistic” racks and gear, Figure allows you to play around to make cool grooves. They keep adding more and more features over time. Hard to go wrong for a buck.
  • NanoStudio: Just got this one, so I haven’t had a chance to run it through its paces but it’s a full-fledged Reason-/Abelton Live-like app that offers fully customizable synths, a serviceable drum machine, and is a multitrack DAW too! If/when this picks up Audiobus support it will be the killer app for iOS music production.

Gaming

“What, a geek that games? Get out!” Clichė, I know, but what do you expect from someone who had an Atari 2600 a month after it released? I can’t think of a time in my life where I did not have a console of some sort in my house.

Due to my frugal (read: cheap ass) nature, I never got too much into PC gaming. I spent more time downloading drivers, tweaking configs, over locking, etc than I did actually playing. A notible exception: Portal. Played both of them through multiple times and “Halls of Science” has been my ringtone for years now.

RC Racing

Just now getting back into Remote Control on-road racing. Dusted off my ~10 year old NTC3… only to find out that the nitro on-road scene is dead. So, I’m just bashing around in a local parking lot for now. But, I am looking into getting into 1/10 scale F1 RC cars, as there’s a nice track “nearby.”