How It All Started?
When The New York Times first decided to move out of its data center, it set the foundation for a futuristic outlook to ultimately go cloud-native one day. To start with the deployments on the public cloud were smaller, less critical applications managed on virtual machines. At that time, Deepak Kapadia was tapped to lead a Delivery Engineering Team that would design for the abstractions that the cloud providers offered to the New York Times. When Deepak and the team started building more tools, in the process their team realized that treating Amazon as another data center was doing them a disservice.
Choosing GKE & Its Benefits
The team then decided to use Google Cloud Platform (GCP) and its Kubernetes-as-a-service offering, GKE. This led to an increase in the speed of delivery and some of the legacy VM-based deployments took 45 minutes. With Kubernetes that time was “just a few seconds to a couple of minutes,” explained Engineering Manager Brian Balser.
Transitioning to GKE ensured that teams that used to deploy on weekly basis or had to coordinate schedules with the infrastructure team deployed their updates independently and could do that daily when necessary.
Embracing the Cloud Native Computing Foundation technology allowed a more integrated approach to deployment across engineering staff and contributed to the portability of the company.
According to Deep Kapadia, Executive Director, Engineering at The New York Times, getting over the initial hump ensures that things get a lot easier and actually a lot faster and this scenario was absolutely true for teams at The New York Times.
How Kubernetes Shaped the Futuristic Outlook of The New York Times?
The New York Times is a digital pioneer, founded in the year 1851, the esteemed daily is also known as the newspaper of record. Its first website launched in 1996, way before Google came into existence. A few years ago, the company decided to move out of its private data centers – including the one located in the pricy real estate of Manhattan. The company recently took another step into the future by going cloud-native.
To start with, their infrastructure team managed the virtual machines in the Amazon cloud and they deployed more critical applications in their own data centers and less critical ones on Amazon Web Services (AWS) as an experiment. Eventually, as they built more tools they realized that the practice of treating Amazon as another data center was only proving to be detrimental for them. In mid-2016, the New York Times’ team began looking at the Google Cloud Platform and its Kubernetes-as-a-service offering, GKE.
The New York Times team then had some internal tooling that worked for the VMs in the same way Kubernetes works for containers. One critical question for the team at that time was: Why they were building and maintaining those tools by themselves?
At the beginning of the year 2017, the first production application for the New York Times, the nytimes.com mobile homepage began running on Kubernetes, serving just 1% of the traffic.
Today, almost 100% of end-user-facing applications on the nytimes.com website run on GCP, with the majority on Kubernetes.
The team found that the speed of delivery was immediately impacted when they shifted to GKE. Deploying Docker images versus spinning up VMs was quite a lot faster. Some of the legacy VM-based deployments took 45 minutes; with Kubernetes, that time was considerably reduced (from just a few seconds to a couple of minutes).
The Road Ahead: Moving Beyond Just the Website Running on Kubernetes
With GKS, The New York Times is moving towards serverless deployments, transcending beyond just the website running on Kubernetes to getting as much as possible. For instance, The New York Times crossword app by built on Google App Engine, which has been the main platform for the company’s experimentation with serverless.
“The toughest part was getting the engineers over the hurdle of how little they had to do,” elucidated Chief Technology Officer Nick Rockwell as he talked to The CTO Advisor. He added, “Our experience has been very, very good. We have invested a lot of work into deploying apps on container services, and I’m really excited about experimenting with deploying those on App Engine Flex and AWS Fargate and seeing how that feels because that’s a great migration path.”
There are some exceptions to the move to cloud-native, explains Nick. The New York Times has a print publishing business as well, a lot of which is definitely not going down the cloud-native path because they’re using vendor software and the special machinery to print the physical paper. However, even those teams are looking at things like App Engine and Kubernetes if they can.
Deepak quoted, “Right now every team is running a small Kubernetes cluster, but it would be nice if we could all live in a larger ecosystem. Then we can harness the power of things like service mesh proxies that can actually do a lot of instrumentation between microservices, or service-to-service orchestration. Those are the new things that we want to experiment with as we go forward.”
One of the basic techniques that worked in favor of The New York Times teams is their knowledge sharing. As the teams started sharing their own best practices with each other, they realized that they needn’t manage everything on their own. Most of the infrastructures and systems were managed by centralized functions and both Google and Amazon had tools that allowed the teams to do that. The teams were provided with complete ownership of their Google Cloud Platform projects and were given a set of sensible defaults or standards. The teams collaborated to make things work and fix the small problems by talking and collaborating with the engineering teams along the way.
This allowed the teams to accomplish their goals faster. With GKE, each team could get their won to compute clusters, reducing the number of individual instances that they needed to care about. The developers could treat the cluster as a whole. As the ticket-based workflows were removed from requesting resources and connections, the developers could just call an API to get what they want. Teams that had to deploy on weekly schedules or had to coordinate schedules with the infrastructure team could now deploy their updates independently as and when required as well as on a daily basis.
Yet another benefit of adopting Kubernetes was that it allowed for a more unified approach to deployment across the engineering staff. Before using Kubernetes, many teams were busy building their own tools for deployment. Leveraging Kubernetes, The New York Times team could benefit from the advances that the technology offered, instead of trying to catch up. Interestingly, The New York Times uses many other such technologies such as the other CNCF projects, including Fluentd to collect logs for all of its AWS servers, gRPC for its Publishing Pipeline, Prometheus, and Envoy.
Deploying these open-source technologies has proved to be more profitable for The New York Times. With CNCF, The New York Times teams have been able to follow an industry standard and have been able to think whether they want to drift away from their current service providers. Most of the applications of The New York Times are connected to Fluentd. This allows teams to switch their logging provider from provider A to provider B whenever they wish to. The teams have currently been running Kubernetes in GCP; however, if they wish to run it in Amazon or Azure they can potentially look into those aspects as well.
Tony Li, a Site Reliability Engineer at The New York Times, the Cloud Native Computing Foundation’s projects are like “a northern star that we can all look at and follow.” The teams at The New York Times are looking forward to ways to extract even more ways to extract more value out of the technology. Instead of running small Kubernetes clusters, they are planning on exploring a larger ecosystem and possibly living and breathing into it. There are a lot of new things that the teams want to experiment with and evolve as they move forward; these include things like service mesh proxies that allow a lot of instrumentation between microservices, or service-to-service orchestration.