Deploy An Elastic High-Availability SQL Cluster with Crate and Weave

• Sep 24, 2015

Recently, I sat down with one of our customers to see if we can get a Crate cluster running in their dedicated Giant Swarm cluster. As we have built Giant Swarm for exactly these kinds of use cases, it was easier than our customer would have expected, so I thought I’d share some details here with you.

TL;DR: Basically we just took the standard Crate container and created a single, simple component with it. That component, when scaled, automagically turns into a Crate cluster that you can scale up and down to your liking. If you need several clusters you can create duplicates of this component with a different cluster name. And with our new placement feature you can even ensure that your cluster is distributed over different machines. For the quick setup check out the GitHub repository.

What is Crate and Why Should I Care?

When thinking about high-availability, and scalable databases one is often tempted to just think about the typical NoSQL candidates, like Cassandra, CouchDB, etc, that support clustering. However, not all applications need or are built for NoSQL DBs and it’s also not always the best choice. Still, most SQL DBs are not made for clustering or are hard to get running in a clustered setup. An exception to this is Crate, which describes itself as a “scalable SQL database with the NoSQL goodies”.

Our friends at Crate have designed their DB from the ground up for a distributed setup, where not only data but also queries are distributed across a cluster. Further, it is especially optimized for ephemeral distributed environments like Docker and offers a very good official image on the Docker Hub.

What Do I Need?

Crate itself only needs a Docker host (it can also be installed directly on many clouds or Linux servers). However, for it to run distributed it should better be a cluster of docker hosts. Crate supports two different ways to form clusters:

Automatic cluster setup using multicast (default)
Manual cluster setup using unicast

The unicast setup is mainly for environments, like Amazon EC2, Google Compute Engine or standard Docker hosts, that don’t support multicast. As we want an elastic flexible setup without having to distribute a list of fixed nodes we go for the multicast setup. So what we need is a cluster that supports multicast with Docker containers.

Luckily, our customer already had a dedicated Giant Swarm cluster, which we had set up with Weave Net, a networking solution for Docker containers. Now, our friends from Weave have both multicast support as well as DNS in their solution, which is great for supporting different clustered services (e.g. MongoDB). For now with Crate we only need multicast, though.

The Setup

As mentioned above Crate offers a well functioning official image on the Docker Hub. The standard Docker command to run a crate container with persistent storage is:

$ docker run -d -p 4200:4200 -p 4300:4300 -v <data-dir>:/data crate

This we can translate into a simple swarm.json:

{
  "name": "crate-cluster",
  "components": {
    "crate": {
      "image": "crate",
      "ports": [
        4200,
        4300
      ],
      "volumes": [
        {
          "path": "/data",
          "size": "10 GB"
        }
      ],
      "domains": {
        "4200": "crate-admin.gigantic.io"
      }
    }
  }
}

We expose two ports on our crate container(s). One is 4300, which is the standard Crate port. The other is 4200, which is used by the admin backend of Crate, which is why we attach it to a domain, so we can access it easily in a browser from the outside. We also attach a persistent storage volume to the container, so data doesn’t get lost.

And that’s actually it. There’s nothing to add to get Crate running in an elastic setup on Giant Swarm. Once you have the swarm.json ready you run swarm up and up goes your Crate container. Now the only thing you need to do to scale it into a Crate cluster is:

$ swarm scaleup <number of additional instances>

Once you have a Crate cluster you can use swarm scaleup and swarm scaledown to elastically scale it horizontally. Because of Weave’s mutlicast network the crate nodes will find each other and replicate data automatically.

Multiple Crate Clusters In A Multicast Setup

As explained above the crate nodes find each other automatically over the network. But what if you don’t want them to find each other, i.e. what if you want several Crate clusters? It’s actually very easy, you just add an argument with a cluster name to your component in the swarm.json:

"args": [
        "crate",
        "-Des.cluster.name=crate-cluster-one"
      ]

With this argument added to your component you can create another component, similar to the first one, where you just change the cluster name and the name of the component (and also the domain you expose the admin backend to). You can do this for as many clusters as you want. For each cluster you just need a single component. Everything else is handled by your Giant Swarm cluster.

Towards High Availability

That was easy, so let’s take it to the next level. What if I not only want a elastic cluster setup, but one that is guaranteed to be distributed over different machines to be highly available? Again the answer is: with Giant Swarm it’s really easy. With the recently introduced placement feature we can tell our Giant Swarm cluster to place the instances of our Crate component on different machines:

"scale": {"min": 3, "placement": "one-per-machine"}

In above sample I added a minimum of 3 instances, so the Crate cluster starts right away with a decent amount of distribution.

Summary

As you could see above, getting elastic high-availability SQL clusters running on Giant Swarm with Weave and Crate is really easy. For more details on configuration options and usage of Crate, see the Docker Hub repo of the image or the Crate documentation.

For your convenience we pushed the full swarm.json of this setup to a GitHub repository. If you want to set up your own, you can just clone the component to your own service and connect it to your application components. However, currently you need a dedicated cluster for this, as multicast networking with Weave is only available in dedicated Giant Swarm setups (or on-premise). We are working on bringing extended networking to the shared environment, too. Until then, if you’re interested in a dedicated or on-premise cluster, get in touch with us!