Service mesh update: Maintainers add features while practitioners push federation
December 4 2019
by Jean Atelsek, William Fellows
Cloud adopters are enthusiastic about the promise of service mesh to consistently apply routing, policy and encryption across microservices-based applications, but implementation has been difficult due to fiddly configuration and management demands. Add to this competing control plane options – Istio, Consul, Kuma, Linkerd, NSX and AWS's proprietary App Mesh – at various stages of adoption and maturity, and you get a perfect storm of confusion; dare we say a bit of a 'service mess.' This is to be expected at the current stage of market development. It's a market that is being made up as we go – it is thrashing, crowded and complex. There's lots of confusion; clean, simple stories will be successful here. At KubeCon 2019 in San Diego, maintainers introduced tools to make their offerings easier to love, while practitioners cited the need for an open standard that can federate various preferences across environments.
The 451 Take
Service mesh was a prominent topic at this year's KubeCon North America, complete with its own Day Zero event (ServiceMeshCon), a CNCF roundtable and a raft of announcements from project maintainers. In a show of hands, about 10% of attendees to the sold-out ServiceMeshCon said they had experience with service mesh in production – about 50% had tried it out. The landscape seems to be branching out in several directions, with open source projects adding tools to ease adoption of their control planes, vendors hoping to capitalize on service mesh difficulty by offering to run it as a service on behalf of enterprises, and other participants promising to reconcile the various offerings with the help of an overarching specification. While few dispute the need for a way to route, monitor and authenticate traffic for service-to-service communications, the way forward for most organizations is far from clear, indicating opportunity as well as risk, although the industry has converged on sidecar proxies (primarily Envoy) as the best available choice for the data plane.
With the use of a service mesh important to successful microservices implementations, data from 451 Research's DevOps, 2H 2019 survey finds that 13.9% of enterprises are now in production with service mesh, 18.6% have some adoption and about 44% are in planning.
Please indicate your organization's adoption status for service mesh
451 Research's DevOps, 2H 2019
It's telling that most of the service mesh practitioners speaking at KubeCon were from large, technically sophisticated cloud-native organizations such as Lyft, Uber and Pinterest. Although many vendors are pursuing the opportunity to bridge the world of highly scalable cloud-native environments with on-premises data and legacy applications – a mesh is, after all, only as strong as its weakest link – advice gathered from organizations that have implemented service meshes at scale is instructive.
Collaborate with stakeholders starting early in the process. Tech talks, prototyping and encouraging opt-in by service owners who have the most to gain (e.g., supporting a new language or functionality) will help get the implementation off on the right foot. Proactively identify those most likely to be affected.
Start with an ingress solution. Establish a consistent way for external applications to call into the mesh. Vendors that can deliver an 'easy button' on-boarding run book for customers seeking to get started with service mesh will find beginning with ingress to be a useful first step.
Prioritize security for services and for the service mesh itself. A primary use case for service mesh is to ensure mTLS encryption of service-to-service traffic; application and sidecar communications need to be rock solid. With so many layers of software-defined interaction, bugs can arise from many sources. Have a systematic way of testing for and finding the source of problems.
Be careful with migration. Service mesh involves a big change in how services communicate with each other. Planning ahead requires service discovery, service registration and security infrastructure to be in place.
Disable unused components. Some service mesh features can cause problems during implementation even if they're not active; use the simplest set of tools that can address the problem you're trying to solve.
Never stop investing in performance improvements. The main downsides of service mesh are latency (multiplied by the number of hops in an application) and resource consumption (multiplied by the number of sidecars); batch chatty connections when possible.
Roll out slowly, start small, scale up. Begin with a use case that's not in the critical path. As problems are ironed out of initial deployments, iterate quickly and scale to other applications/teams. Doing service mesh for one application or team may mean you can end up with a pet, not cattle.
Plan an update process. Roll out updates slowly, qualifying new releases with critical users first. Allow users to do self-service rollbacks to a specific supported version, and keep track of how many users rolled back a given version to point to widespread difficulties. Fix user issues as soon as possible and ensure that rollbacks are temporary.
Be especially careful with newly opened connections. This is where errors are mostly likely to be introduced.
Keep the faith. Despite the difficulties, service mesh adopters say the benefits make the difficulties worthwhile.
Some vendors expect the industry to settle on a single standard, as it did with Kubernetes for container orchestration, and Istio has the pole position as a Google-driven project that plays well with Kubernetes. Google's decision to keep Istio under its own control for now (rather than donating it to the CNCF under an open governance model) worries some potential customers and makes it a nonstarter for others, but many players (including Tetrate, Aspen Mesh, VMware with NSX Service Mesh and IBM with App Connect) are investing in Istio as a foundation for enterprise-grade managed services to support heterogeneous environments.
Others expect there to remain a variety of service meshes to address a variety of use cases. Given that businesses are already using a variety of control planes in production, Microsoft introduced the Service Mesh Interface (SMI) project in May, a specification for interoperability across different mesh technologies, including Istio, Linkerd and Consul Connect. The project was launched in partnership with Buoyant, Hashicorp, Solo.io, Kinvolk and Weaveworks, with support from Aspen Mesh, Canonical, Docker, Pivotal, Rancher, Red Hat and VMware. The goal of SMI is for developer-friendly APIs to lower the barrier to entry and risk of using a service mesh, collaborate with the service mesh community on customer requests, and create a consistent experience across a new ecosystem with an interoperable, extensible framework. Microsoft provided a demo at ServiceMeshCon, but it won't see the light of day until 2020.
Among the new projects and features introduced by service mesh maintainers at KubeCon:
Solo.io, which does not have its own mesh, but offers a Service Mesh Hub dashboard that installs, discovers, manages and groups diverse meshes (including AWS's App Mesh) together into one big mesh, announced AutoPilot, an operator framework for building workflows on top of service mesh. AutoPilot will help Kubernetes operators to enable mesh metrics and APIs, automated mesh configuration, the ability to expose and invoke webhooks, and out-of-the-box GitOps workflows. The plan is to use telemetry within Kubernetes clusters to drive the behavior of the service mesh for what Solo.io calls 'adaptive service mesh'
Buoyant, maker of Linkerd (one of the few service meshes that doesn't use the Envoy proxy as a data plane) introduced Dive, a team collaboration tool that captures microservice deployments as events and compiles ownership information and dependencies into a service catalog – 'like a Facebook for microservices.' Dive is free and in private beta; there is currently a waitlist for the beta.
Network Service Mesh, a CNCF sandbox project announced in 2018, has attracted 40 contributors and is reportedly receiving interest from financial companies, enterprises and service providers. The project is designed to manage complicated layer 2 and layer 3 use cases in Kubernetes so app service meshes can focus on layer 7 connectivity.
VMware's NSX Service Mesh is a SaaS offering that runs in public clouds. Based on Istio and Envoy, NSX Service Mesh expands observability and policies to users, data and services, in addition to federation between service mesh clusters. It provides the ability for SecOps and DevOps integrations through policies and tools that allow them to set up application SLOs, access control, encryption and context-based security policies. NSX Service Mesh is built on a global control pane with the agents running on any Kubernetes cluster on any cloud. VMware sees key use cases including application mobility and migration, service mesh HA, E2E encryption for compliance, and visibility for Dev/SecOps.