Interning - Building a Status Page for Microservice Clusters
As Marian and Puja before me, I want to introduce myself. 25 years ago I was born in east Germany and moved to a small city near Cologne a few years later. After 3 years of training and one year as a full time media designer, I realised that I had to find a different profession, which would fit my new found interest in computer science, so I decided to study computer science.
Currently I’m in the third semester at Hochschule Bonn-Rhein-Sieg. While working on side projects, I realized that deploying your software can be a stressful task. That’s when I learned about what the people at Giant Swarm are working on. After hearing about how they would give developers the ability to deploy and scale their applications as simple as possible and their approach towards company culture, I instantly wanted to be part of it. I got in touch with a friend of mine, who’s working at Giant Swarm. After a presentation and meeting the team, both Giant Swarm and I thought it would be a good fit and they hired me as their first intern.
As my first project I’m working on the Giant Swarm status page. So why do we need a status page? Since transparency is very important to us, we want to give our customers an easy way to see the status of our services. We build our service for developers and as developers expect APIs, providing an API for our status is mandatory (and a first step towards having a status page).
Researching status page providers, which match our requirements, we decided to go with statuspage.io, because they provide a user API, built in support for Pingdom, and convincing reference customers. To provide statuspage.io with metric data and a status of our services we decided to go with Pingdom, because of their stability and seamless integration with statuspage.io.
Currently our status page is still in alpha, because getting reliable status information on a distributed service architecture like ours is a challenge in and of itself. But we are tackling them right now. In a classic one-server infrastructure (or multiple virtual machines running on one server), we could ping the server and see if it’s up. But our infrastructure consists of clusters, which provide a service like the Application Cluster. Each cluster consists of multiple machines running various containers. In order to determine whether a service like our API is running, we have to check if all the needed containers are running and provide this information to Pingdom, which is not a trivial task.
In the near future our status page will provide an easy and reliable way for our customers to know if our (and their) services are up. Personally, I’m excited to work with tremendously experienced developers on new technologies (microservices, containers, orchestration, etc.), which I believe will change the way how developers think about their applications and how Ops will think about their systems.
Disclosure: An earlier version of this post contained the URL to the status page and API endpoint. Since the information provided via that page and API is by no means reliable, we removed it to avoid propagation of false information. For now, please consult our Gitter chat to get in-time updates on the status of our systems.