State of the service mesh 2020
December 10 2020
by Jean Atelsek
In a year like no other, the promise of application-level service networking remains great, and the fundamental trade-off of extensibility vs. stability is being tested as deployments scale. With development largely in the open source domain, Envoy remains the standard-bearer for the data plane, and Istio continues to gain adherents as a control plane despite governance concerns. But maintainers and adopters agree there is plenty of room in the IT universe for multiple implementations of the technology.
The 451 Take
The promise of service mesh – application-level load balancing, traffic routing, authentication and observability – is great in theory but proving difficult at scale. Successful implementations controlling mammoth microservices-based applications such as Lyft and Pinterest are the envy of development and operations teams everywhere, but few real-life use cases can be so neatly bound and automated. As a result, focus has been splintered by new and evolving offerings aiming to capture interest among customers eager to get started but hesitant to proceed down a path that requires significant investment in non-standard components. Maintainers and practitioners are reaching for ways to ease the configuration burden for new deployments while still enabling purpose-built extensions for a variety of use cases, all while continually accommodating new services and changes in the underlying code – a tall order that some providers are hoping to fill with managed services.
What is a service mesh anyway?
A service mesh is a way to control how different parts of an application share data with each other. Unlike other systems for managing this communication, a service mesh is a dedicated infrastructure layer built right into an app. This visible infrastructure layer can document how well (or poorly) different parts of an app interact, so it becomes easier to optimize communication and avoid downtime as an app grows. Broadly speaking, a mesh controls the communication between microservices that compose an application, whereas API gateways dictate traffic flows between applications.
Sidecar proxies are widely seen as the best way to negotiate networking between microservices. A sidecar proxy is a container that sits alongside each microservice as a contact point, enabling requests to flow through the application without traversing the network, thus allowing instantaneous routing of data among what may be hundreds of microservices in a large application.
Microservices have a natural affinity for containers – both are lightweight, purpose-built, quick to deploy, portable and platform-agnostic. As such, any discussion of service mesh is likely to include a discussion of containers, particularly how the mesh handles information coming into the application container via a Kubernetes ingress controller. Two common misconceptions are that service meshes can't extend from containers to VMs or bare-metal servers (they can), and that a Kubernetes cluster needs to be created before a service mesh can be deployed (it doesn't).
An important distinction to make is between the control plane and the data plane, a concept borrowed from the networking domain. A service mesh detaches the control plane, which applies policy and configuration instructions to traffic entering and traversing the application, from the data plane, which is a programmable network of proxies that observes, authenticates and routes service traffic based on the control plane instructions. Envoy, the data plane developed at Lyft and open sourced at the end of 2016, is now the straw that stirs the drink in a vibrant ecosystem of service mesh control planes and managed services; it is the closest thing to a standard in this universe.
When to adopt a service mesh? Some say it's wise to start as soon as you're running two or more microservices; others put the threshold at 50 or more. Envoy creator Matt Klein points out that microservices themselves are a solution to a human scaling problem: when an organization has 80 or 100 engineers working on an application, maintaining a monolith becomes impractical, and this is when it makes sense to shift to microservices. This shift brings its own set of challenges, including how to implement networking and observability. Enter the service mesh.
Service mesh promises and perils
A service mesh can solve a host of problems by encoding instructions for encrypting, routing and authenticating traffic at the application level. A lot of the hype around the technology is due to its promise of Swiss Army Knife-like versatility. Beyond simple connectivity, a mesh can enable service discovery, availability functions (load balancing, source health monitoring, circuit breaker), security (authentication, access control, encryption) and observability (monitoring, metrics).
Yet while it takes complexity out of the application layer, a service mesh consolidates that complexity into a service fabric that can be fragile and difficult to manage. Compounding this is the difficulty of maintaining a mesh that spans multiple clusters as well as VMs – a use case that will become increasingly likely as deployments scale, and one that may be a reason for adopting service mesh in the first place.
Nevertheless, organizations with large deployments are finding it easier to manage inter-service communication with a dynamic software-defined networking layer than with inflexible and even harder-to-maintain libraries or agents. This has the added advantage of putting responsibility for connectivity, security, resiliency and routing in the hands of platform architects rather than 'shifting left' and leaving it to developers who would rather build applications without making any assumptions as to where the component services are running.
451 Research's recent survey found that 64% of respondents have either adopted service mesh or are running proofs of concept for the technology. Of those, the majority are deploying it either as part of an opinionated commercial stack or as a managed service, while fewer are venturing to configure open source control planes or build their own (see Figure 1 below).
How Organizations Are Deploying Service Mesh Control Planes
451 Research, DevOps, Organizational Dynamics 2020
What's up with Istio
Envoy has emerged as the service proxy of choice and serves as the basis for a range of do-it-yourself, open source and commercial control planes. Istio, which relies heavily on Envoy, has the pole position as a control plane leader. it is backed by big names (including Google, IBM and Red Hat) and plays well with Kubernetes, the standard for container orchestration. In July, Google launched a new open source organization, Open Usage Commons, and transferred Istio's trademarks there, drawing consternation from IBM and the Cloud Native Computing Foundation (CNCF), where it had been expected to land.
But apparently, Open Usage Commons is good enough for the US Department of Defense, which requires an open source pedigree for adoption and has standardized on Kubernetes for compute and Istio for application networking. The DoD cited the variety of apps that need to be covered by the service mesh umbrella, the need for all traffic to be automatically encrypted in transit (via consistent TLS), and Istio's built-in key rotation as reasons behind the choice.
Istio is famously difficult to configure, but its flexibility, breadth of features and extensibility make it popular among enterprises. Project maintainers have recently implemented several changes to make adoption a little less tricky, with a focus on maintaining a core set of features and making it easy for developers to add functionality on top. To this end, the project recently merged a number of components into a single Istio daemon, making it easier to configure communication with remote clusters; maintainers moved to a monolithic model when it became too complex to have a set of microservices managing a microservices fabric. Prometheus is no longer an installation default since most applications already have tools in place for monitoring. And Istio developers have been working toward enabling multiple installs on a cluster to ease the upgrade process.
Spoiled for choice?
Given the range of use cases and opportunities for service mesh, it's only natural that the market is spawning alternatives, both open source and commercial. The spectrum of options goes from tightly integrated products that are easy to adopt but tough to customize (Buoyant's Linkerd, AWS's App Mesh), to Microsoft's simple but extensible Open Service Mesh, to the expansive and feature-rich Istio (see Figure 2 below).
Figure 2: Selected Service Mesh Control Planes
Acknowledging that one control plane will not rule them all, and eager to establish open standards for interoperation across meshes, Microsoft in May 2019 created the Service Mesh Interface (SMI) project to establish a common denominator of functionality for service traffic policy, telemetry and management across different mesh technologies, including Istio, Linkerd, Consul Connect and now its own Open Service Mesh. The project was launched in partnership with Buoyant, HashiCorp, Solo.io, Kinvolk and Weaveworks, with support from Aspen Mesh, Canonical, Docker, Pivotal, Rancher, Red Hat and VMware. SMI became a CNCF sandbox project in March 2020.
We expect managed services to play an important role in the adoption of service mesh given the complexity of these deployments. These may be made available as part of an integrated commercial bundle (such as Red Hat's OpenShift Service Mesh, based on Istio and VMware's Tanzu Service Mesh, based on NSX), as an option on public clouds (as HashiCorp is doing) or as a stand-alone service (e.g., Aspen Mesh or Tetrate, both based on Istio).
Throughout 2020, service mesh vendors and service providers have been jockeying for position:
HashiCorp, maintainer of the well-liked Consul service networking platform and Consul Connect service mesh, in March joined the CNCF and in July launched the fully managed HashiCorp Consul Service on Azure; in July the company also introduced the fully managed HashiCorp Cloud Platform (HCP), with HCP Consul on AWS as the first service out of the gate.
Solo.io – which doesn't have its own service mesh but is trying to establish a 'Switzerland' where various options can find common ground – in April 2020 open sourced its Service Mesh Hub, a dashboard for installing, discovering, operating and extending service mesh. In June the company created the Istio Developer Portal to catalog, manage and securely publish APIs to a custom-branded portal that can be accessed by developers both inside and outside an organization.
Kong, which launched its Kuma service mesh as a supplement to its popular API gateway, donated the code to the CNCF as a sandbox project in June. In October it introduced its fully managed Konnect service networking platform, including a supported version of Kuma called Kong Mesh.
Buoyant in June released a new version of its Linkerd 2 service mesh, adding support for secure service networking across multiple clusters running in heterogeneous environments.
F5's Aspen Mesh, a supported service mesh for companies that want to use Istio as a control plane in production deployments, in May released version 1.5 to take advantage of new Istio capabilities, including security APIs and WebAssembly support.