The role of service mesh as a cloud-native enabler is building fast
April 11 2019
by Jean Atelsek, William Fellows
In a multi-cloud, hybrid IT architecture world, where applications are deployed as microservices, the use of service meshes is becoming an important (although not mandatory) component of cloud-native architecture. Early deployments of the technology – which promises network routing, security and configuration control for microservices-based applications – are largely based on open source code, with Envoy emerging as a de facto standard data plane.
There's a lot of excitement around service mesh, and with good reason. As microservices push software development and execution to become more granular and distributed, new ways of authenticating and controlling service-to-service communications are needed. Rigid and lengthy release pipelines for traditional software are naturally giving way to more nimble, lightweight and flexible routines. But this creates complexity and risk, with more attack surfaces and governance challenges, and these difficulties are compounded as applications grow to invoke hundreds or even thousands of services. Service meshes show promise for bringing observability, traffic management and policy control to modern-day runtime workflows, but this is an emerging opportunity with major decisions still to be made.
The 451 Take
Why all the excitement about service mesh? Because it has the potential to become a Swiss Army Knife of modern-day software, solving for the most vexing challenges of distributed microservices-based applications. The technology, which defines and controls networking at the application layer, is not suitable for every use case, but the ecosystem explosion around Envoy is creating proof points and surfacing problems that the open source community – the ultimate self-healing network – is striving to solve. It's still early, though, and the counteractive forces of innovation versus integration have yet to play out. The opportunity for innovation here is significant – see Figures 1 and 2 below.
What is a service mesh, anyway?
A service mesh is an evolution of traditional API gateways that offers a single point of entry for traffic into an application. With traditional software, the job of the API gateway is to intercept the data coming into the system at the edge and apply checks to authenticate and configure it so that it can be processed. Once inside the application, communication is handled by function calls – lines of code that deliver the information to functions where the software does its work – for example, creating an invoice, writing to a database or triggering another function.
Mobile platforms, DevOps and distributed computing have changed the nature of software development. Traditional monolithic applications are giving way to modern architectures that stitch together separate pieces of code for each function – i.e., microservices – and then use APIs to pass data between them. With these more granular services, processes now have to flow over the network via API calls and are, thus, subject to network limitations such as latency, timeouts and security risks – problems that compound as the number of microservices grows.
A service mesh addresses the fragmentation of modern-day software with a configurable layer that intermediates activity between an application's microservices. It typically works by applying a sidecar proxy to each microservice as a contact point. The use of proxies and reverse proxies enables requests to flow through the application without traversing the network, thus allowing instantaneous routing of data among what may be hundreds of microservices in a large application.
Microservices have a natural affinity for containers – both are lightweight, purpose-built, quick to deploy, portable and platform-agnostic. As such, any discussion of service mesh is likely to include a discussion of containers, particularly how the mesh handles information coming into the application container via a Kubernetes ingress controller. Two common misconceptions are that service meshes can't extend from containers to VMs or bare-metal servers (they can), and that a Kubernetes cluster needs to be created before a service mesh can be deployed (it doesn't).
An important distinction to make is between the control plane and the data plane. A service mesh detaches the control plane, which applies policy and configuration instructions to traffic entering and traversing the application, from the data plane, which is a programmable network of proxies that observes, authenticates and routes service traffic based on the control plane instructions. Envoy, the data plane developed at
has achieved impressive take-up since it was open-sourced at the end of 2016 – it is the default data plane for Istio, AWS's App Mesh and the Google Traffic Director managed control plane to be launched at Google Next in mid-April. Other data planes include Linkerd, NGINX, HAProxy and Traefik.
Service Mesh Day highlights progress and gaps
On March 29, Tetrate hosted Service Mesh Day as part of what it called 'the first-ever technology conference related to service mesh.' Sponsors included Google Cloud, Juniper Networks, AWS, the Cloud Native Computing Foundation and the Open Networking Foundation, and sessions delved into the future of service mesh as a next-generation networking model, adoption patterns (including in brownfield applications), and production readiness of Envoy and Istio.
Tetrate used the opportunity to highlight its products designed to help manage microservices at scale: GetEnvoy (a source for prebuilt, certified Envoy builds – basically a way for enterprises to offload the Envoy component of their CI/CD pipelines), Apache Skywalking (an open source platform for analyzing traces and metrics in distributed applications) and Tetrate Q (a just-launched access control framework for distributed systems).
Throughout the day, speakers and end users addressed potential benefits and pitfalls of service mesh. Key takeaways:
Service mesh is becoming a platform in its own right. The proxies in a service mesh can be configured to automate a variety of tasks that are inherently difficult in a distributed system, including service discovery, health checking, routing, load balancing, authentication and observability. Because microservices-based applications are disaggregated and dynamic, one important function of the control plane is to safely release updates into production, and this is where circuit breaks and phased rollouts come into play. Envoy has taken off in part because of its extensibility as a universal data plane, which makes it possible to build differentiated services on top – effectively, it has evolved from a generic proxy into a platform.
It's early (and this market is still being made up). Service meshes have become necessary because of problems created by microservices. For all their flexibility and innovation, microservices roll up into fragmented environments that are subject to difficulties as they scale, and they necessarily make networking an application-layer issue. Among the problems to be solved are how to accommodate multiple clouds, including edge and mobile clients; how to centralize authority to avoid balkanized control; and how to manage identity in applications that may have hundreds of app developers working on them. Still to be determined are which elements can be structurally encoded in hardware (e.g., network interface controllers) and how to enable federation (i.e., interoperability between a variety of meshes).
Service mesh is not for everyone. The enthusiasm for service mesh combined with its lack of maturity raises the danger of technical debt, where early adopters may have implemented an early version of a component and may then have to refactor when the underlying control plane changes or live with an outdated version. Service meshes are fundamentally complicated, and installation and scaling can be difficult. Some enterprises we spoke with are building their own control planes, in addition to testing 'opinionated' alternatives very carefully before committing. Although Service Mesh Day speakers cited use cases where Envoy and Istio were being deployed in advance of Kubernetes and in ways that encompass VMs and containers in brownfield environments, service mesh implementations will likely remain highly varied per application, and enterprises considering the technology must be satisfied with the performance and usability of the alternative(s) they choose.
AWS App Mesh
AWS App Mesh was previewed at re:Invent 2018, and is now generally available. The fully managed service mesh offering provides application-level networking, enabling customers to run and monitor microservices at scale. Services can be built and run using compute infrastructure such as Amazon EC2, AWS Fargate, Amazon Elastic Container Service and Amazon Elastic Container Service for Kubernetes. AWS App Mesh routes and monitors traffic, and provides insight and the ability to reroute traffic after failures or code changes. Previously, this required users to build monitoring and control logic directly into code and redeploy services every time there were changes. Service meshes resolve this problem. AWS App Mesh uses the open source Envoy proxy server software (data plane) developed by Lyft (and now part of CNCF), but it is not an implementation of the Istio control planes developed by Google, IBM and Lyft. AWS believes Istio is too 'opinionated' for the vast majority of customers, which are not Kubernetes-only shops and will therefore require communication across services running in different compute environments. AWS App Mesh works with AWS Cloud Map service discovery.
Beyond infrastructure: Cloud-native feature adoption plans are strong
Migration of application stack or portfolio to microservices by industry