Driving Service Discovery with Sensu: Two Great Tastes that Taste Great Together…

Prologue (AKA: WTF are you doing this?)

In my new gig, I’ve been given some new and interesting challenges to solve. One of them is Service Discovery (SD).

Aside: If you are unfamiliar with Service Discovery, check out this post by Julia Evans over at Stripe. Not only does it provide a succinct explanation and real-world application, but in typical Julia fashion, it has a cute comic that explains it all.

Another thing I am tasked with is to modernize our monitoring, enabling a more DevOps-y way of building monitors (read: “Build your own damned monitors… erm, I mean, let’s help enable you to create your own monitoring”).

For me, that means bringing Sensu to the party. I am a huge fan of using standalone checks, rather than driving monitors from the server down. Not only does this allow for that DevOps-y workflow, but it means I can focus on improving my monitoring platform and enabling teams to use it. But, I digress…

After an internal bakeoff, we settled on Consul as our SD platform of choice.

OK, We Picked Consul… What Now?

We loved Consul, but there was a problem: YAN (Yet Another Agent).

Being both a large Enterprise and a telco, there is a fair amount of hoop-jumping (InfoSec, customer data protection, etc) that needs to be done to add anything on a box – especially when it’s something you want on all boxes.

That said, even if we did install the Consul agent, it doesn’t pay us any benefit without the service owners either registering services themselves or using config management to create the required JSON files on each host.

While we have some teams who are living in Microserviceville, an overwhelming majority of our services are aging J2EE stacks built with proprietary frameworks; “Java middleware and sadness,” as Bridget Kromhout would call it. This means many of them have very little automation (which is something my team was brought in to fix!).

Rather than pack up the SD effort, I realized I could come at this from another direction. Consul has an HTTP API, so why not use the Sensu checks we have for each service endpoint to drive the registration – and state – of those services?

Go on.jpg

That way, we get a “twofer” – monitoring and service discovery. Two great tastes… wait, I did that one already.

Making Some Reese’s

Now we get to the fun part. It turns out that having Sensu drive the SD updates is pretty easy. You just need a couple of moving parts:

  1. Metadata about the service
  2. Handler

Note: Being relatively new, I’m working to find out what permissions I need to get to put my docker-compose based PoC on GitHub. I’ll update this post with the URL when/if that happens. That said, most of the important bits are detailed below.

Service Metadata

This part just involves adding a bit of metadata to whatever sensu-client check is monitoring the service is question. You are monitoring the services you care about? Right?

Say you have a web server you are monitoring like so:

    "checks": {
        "check-http": {
            "command": "/opt/sensu/embedded/bin/check-http.rb -u http://localhost",
            "occurrences": 2,
            "handler": "consul",
            "subscribers": ["web"],
            "standalone": true,
            "interval": 60,
            "ttl": 120,
            "service_registry": {
                "service_name": "web",
                "port": 80,
                "tags": ["green"]

It’s that “service_registry” section which drives the interaction with Consul. While it’s pretty self-explanatory, let’s cover the pieces just to be sure:

  • service_name: How you want the service to appear in Consul
  • port: The port on which the service can be found
  • tags: This is optional, but tags allow you to find hosts based on things other than the service(s) they host and their hostname

Consul Handler

This metadata is used by the Consul handler to drive updates to the status of the service in Consul. Luckily, if we update the state of a service in Consul – and that service does not yet exist – then Consul will add it. This makes the logic of the handler pretty simple.

When an event is triggered, then the handler will shoot JSON to Consul’s /v1/catalog/register endpoint:

    "Datacenter": "qlab01",
    "Node": "web-node-01",
    "Address": "",
    "Service": {
        "Service": "web",
        "Tags": "green",
        "Address": "",
        "Port": 80
    "Check": {
        "Node": "web-node-01",
        "CheckID": "web:web-node-01",
        "Name": "sensu-driven check",
        "Status": "failing",
        "ServiceID": "web",
        "Output": 'CheckHTTP CRITICAL: Request error: Failed to open TCP connection to localhost:80 (Connection refused - connect(2) for "localhost" port 80)'

This causes the service to be updated in Consul, like so:

Screen Shot 2017-06-02 at 4.48.10 PM

Likewise, when the check clears, it will post back the same JSON, except setting the status to “passing”.

Getting back to my penchant for “pushing monitoring to the left,” this means my Developers can add a standalone check on their boxes to get both a monitor and service registry, two great… you get the point.

But Wait! There’s More!

It turns out there is a maturity model you can enable with this setup, allowing you to get some of the benefits of SD as your teams to gain the maturity to do SD the right way.

But, that’s a post for another day.



Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )


Connecting to %s