Why Kubernetes or How Giant Swarm Builds Infrastructure
Giant Swarm’s goal is to build the simplest platform to professionally host your distributed applications. In pursuing this goal for the last few years we have worked with a lot of technologies in production and tested even more. From this experience we found that the best way to build infrastructure is to be open-minded, stay flexible, and make decisions guided by principles that we believe in.
These principles are at the core of every product decision we make.
User Focus: Get users to validate what we build in every phase. This further means that we are deeply involved in a multitude of communities trying to see what problems users have and helping out no matter if questions are about our own software or software that we have experience working with.
Open Source by default: We use a lot of open source software and try to release our own tools as open source as well, which for us goes hand in hand with the user focus mentioned above. Wherever possible we try to merge changes we make to third party open source software upstream so that the whole community benefits from them. This also implies that we choose open source software with an open community that welcomes contributions.
Primitives over frameworks: Some of you might know this from Werner Vogels’ 10 lessons from 10 years of AWS (point 3). For us this means that we don’t want to hide away functionality (i.e. primitives) even if we might offer layers on top that make the user’s life easier. It also means that we strive for native solutions and open standards as well as flexibility to support changes over time.
Lovable User Experience: A great UX does not only mean that what we build is beautiful and easy to use. It also means that we offer a product and service that helps users get their jobs done, that gives them a high level of feedback on interactions and that delights instead of locking them in. It further means great documentation and support, even being able to talk to any team member including our core developers if you have deeper questions or problems.
Metrics driven: Metrics are the key to offering a reliable and performant service and decisions should be at least partly based on data. If you don’t measure your current performance, how will you be able to evaluate new technologies you might want to test out as additions or substitutions to your stack?
These principles are what guides our decisions in building a best of breed platform that gives users the utmost freedom and power to build software in the way they like. As technologies are evolving and market dynamics change this platform cannot be a fixed set of integrated software but needs to be flexible to evolve according to the needs of our users.
One of the first decisions we made very early on was to go with Docker as our main container engine and it is still at the core of our product. Docker containers are the main building blocks for developers to package and deploy their code. However, other container engines like rkt are also developing fast and we want to avoid locking in users (and ourselves) to a single container engine.
We went with CoreOS as our main underlying OS, as it aligns very well with our principles and our way of thinking about infrastructure. Having a lightweight OS that was designed from the beginning with clustered deployments and containers in mind gave us a head start in building a full microservice infrastructure on top of it.
The most recent decision and maybe one of the biggest we took was to rebase our whole orchestration platform and use Kubernetes as the main building block of our infrastructure. As this was a pretty drastic move and implied a huge amount of work (still does) the decision was not an easy one and did not conclude without a significant amount of research.
Comparing and evaluating orchestration platforms like Kubernetes, Mesos, and Docker’s recent platform endeavors is not trivial. Even when feature lists might help you compare a bit of the status quo, functionality is changing by the day. There is a high possibility that feature x present in one system will also be available in another sometime soon, if it is a feature that is in common need for a lot of users out there. Currently, the best comparison out there might be this slightly biased blog post by Work-Bench (who are investors in CoreOS).
Still, current (and near-future) feature set is important as the future is uncertain and the release of a new feature does not necessarily mean that it is production-ready, yet. From a feature perspective Kubernetes had quite the upper hand in our view. The maturity and complexity of features that help with orchestrating containers as well as the ongoing fast release cycles driven by a vibrant community make it currently the leading platform in terms of functionality as well as robustness.
Said community is also a big differentiator. Sure, most platforms nowadays are open source, but there are differences in the communities they attract. To really evaluate the benefits of a community you need to look deeper than just its size and heterogeneity. What was most important for us was that the project we choose is not only used by a big community but actually developed by that community. You could come up with numbers of external contributors now, but this is less important. It is much more important that issues and suggestions in issues are openly discussed, that proposals and future roadmap are collaborative and do not favor any specific stakeholder(s), and that contributions are not only welcomed but also prioritized equally. These are things that cannot be subjectively measured and compared. For these factors you need to go deeper into issues, PRs, chats, and (video) meetings to actually get a feeling of how well these matters are handled in the project. For us the Kubernetes community was the one that seemed the best fit. In our experience it is very open and inclusive, which makes us happy to be a part of it.
Furthermore, Kubernetes is very much in line with our principle of primitives over frameworks in that resources and features in Kubernetes are always started from primitives and being able to use the primitives in any way you like is usually a given. There are many ways to extend and adapt Kubernetes concepts to your own needs and liking. If you need something that is not there yet, you can usually build it yourself without having to jump through too many hoops and if what you built is good and might help the rest of the community you can even try merging it upstream. You can see this for example in the recent developments around third party resources and custom controllers and what CoreOS calls operators.
Sure, there’s no project without faults and with Kubernetes the main challenges are twofold. One is the lack of a good UI/UX, which, however, is recently being worked on much more, e.g. when you look at the latest releases of the Kubernetes Dashboard project. The other is the state of documentation. For the latter you need to consider that Kubernetes by now is a really large project with many many individuals and companies contributing to each release. Further, it is moving at a dramatically fast pace and documentation sometimes takes a while to catch up. Especially blog posts and external tutorials might get outdated quickly. However, as mentioned above, the community is very open and problems and issues get handled nicely in both issues and Slack channels. Furthermore, this is one of the places we see ourselves able to also contribute and give back to the community, helping not only our own but all Kubernetes users to solve their problems.
To be clear, the platform includes many more technologies than the ones mentioned above. We also use flannel and Calico for networking, Quobyte for distributed storage, and a multitude of monitoring, logging, and alerting tools including Prometheus, Grafana, and an Elastic Stack. However, the above mentioned are at the core and influence the rest of the stack a lot.
Kubernetes’ feature set and maturity helps us build a production infrastructure faster, but what really drove the decision for it was the outlook of being part of and contributing to such a vibrant community and thus being able to keep the utmost flexibility to build the best platform for our users.