X
97452

451 Perspective: Open source, data platforms and the cloud - trouble in paradise

July 16 2019
by Matt Aslett, James Curtis


Introduction


Open source licensing and cloud computing have been two of the most influential trends in the software sector over the past 20 years. The two trends have largely been complementary: open source software such as the Linux operating system and MySQL database helped provide the foundations for the development of cloud computing platforms, while the cloud giants have contributed to the corpus of open source software both directly (through projects such as Kubernetes) and indirectly (with Amazon's DynamoDB providing the inspiration for Apache Cassandra, for example).

Recent licensing changes by a number of commercial open source vendors have illustrated that the relationship is not without friction, however, with some cloud providers being accused of exploiting open source software without contributing back. In order to tip the balance back in their favor, some data platform vendors have gone so far as to change the licensing of their software, as well as introducing new products and add-on tooling using 'source available' licenses designed to place restrictions on the use of the code by cloud providers. Here we look at how the situation has escalated in the last 12 months and the likely implications.

The 451 Take

It has become almost assumed knowledge that cloud providers are exploiting open source software at the expense of commercial open source vendors. However, with at least two of the apparent victims – MongoDB and Elastic – both faring well in terms of revenue growth, finding clear examples of specific damage done by cloud vendors is not easy. That hasn't stopped a variety of – mostly data platform – vendors adopting 'source available' licenses that enable them to share their code with customers, while restricting the use of the code to create competing cloud services. We see no sign of this trend abating, particularly with the increased importance of as-a-service revenue to data platform specialists. As a result, the proliferation of these source available licenses can be expected to continue, raising the potential for confusion among enterprise development, purchasing and legal teams.

Closing the cloud loophole (again)


Friction between open source licensing and SaaS provision is nothing new. As long ago as 2008, 451 Research covered the creation of the Affero GPL (AGPL) v3 license, the primary goal of which was to close what was quaintly known at the time as the 'ASP loophole.' While the General Public License (GPL) requires any derivative of licensed code that is distributed externally to also be licensed using the GPL, the definition of distribution does not cover the software being delivered as a service by cloud providers or application service providers (ASPs).

The AGPLv3 was specifically designed to address this loophole by requiring cloud and SaaS providers to share modifications to code licensed with AGPLv3. While its usage pales in comparison to the more widely adopted GPLv3 and Apache Software License v2, the AGPLv3 was adopted by a number of high-profile commercial open source vendors, such as MongoDB, and was (and remains) banned by Google for use within the company.

While the AGPLv3 delivered on its aim to require the contribution of modifications made to licensed software delivered as a service, it did not close the ASP loophole entirely. There remained the potential for cloud/SaaS providers to take software licensed using the AGPLv3 and deliver it as a service without modification, without the need to contribute anything to the vendor or community that originally developed it.

This partly explains why MongoDB made the decision in October 2018 to relicense its database software from the AGPLv3 to the Server Side Public License (SSPL), which requires not only modifications to the MongoDB code to also be made available using the SSPL but any complementary software used to deliver MongoDB as a service (such as management and orchestration software and APIs) as well.

Source available – with limitations


MongoDB is by no means the only commercial open source software vendor to have changed its licensing in the last 12 months in response to the commercial pressures posed by the potential for cloud providers to deliver competing services.

Redis Labs adopted the Redis Source Available License (RSAL) in March for its Redis Modules add-ons (having briefly previously switched from the AGPL to the more controversial Apache 2 with Commons Clause), while in December 2018, Confluent relicensed some components of its Confluent Platform – specifically the REST Proxy, Schema Registry, KSL and Confluent Connectors – to the Confluent Community License, which specifically restricts delivery as a cloud service.

December 2018 also saw time series database provider Timescale adopt the Timescale License (TSL) for its community and enterprise features that complement the Apache-licensed core project, while more recently Cockroach Labs announced that it was switching from the Apache License to a version of the Business Source License (BSL) that specifically prevents the use of CockroachDB as a commercial database service.

In all these cases the new licenses provide access to the source code but are not open source, in the sense of being approved by the Open Source Initiative (OSI). While MongoDB did attempt to have the SSPL approved by the OSI, it was declined on the basis of being discriminatory. Others have not even bothered to seek the OSI's approval.

Therefore, these licenses are not to be considered open source, even though the code is shared and modifiable. Collective terms used to describe these licenses include 'source available' and 'shared source.' Whatever you choose to call them, they are becoming more common. Other vendors that have adopted source available licenses in recent years include Dgraph, MariaDB (which created the original BSL) and Elastic.

The latter is particularly notable, since while the other examples saw companies placing restrictions on open source software, Elastic actually made previously proprietary code available with Elasticsearch using the Elastic License. It is also noticeable since the move prompted Amazon Web Services (along with Expedia and Netflix) to fork the project with the creation of the Open Distro for Elasticsearch, citing challenges related to the intermingling of open source and proprietary code in Elasticsearch.

Motivations


The common thread in the reasons these vendors have given for changing their licenses is that the cloud giants are exploiting open source licensing by providing cloud services based on open source software without contributing back. It is not clear, however, to what extent these vendors are reacting to what they see as bad behavior from the cloud giants, rather than proactively preempting it.

It is certainly true that multiple cloud vendors offer services that take advantage of open source software. Redis is a prime example, with Microsoft offering Azure Cache for Redis, Google offering Cloud Memorystore for Redis, Amazon Web Services providing Amazon ElastiCache for Redis and Alibaba Cloud providing ApsaraDB for Redis.

Unlike the AGPLv3, however, the license used for the Redis project (the BSD license) makes it perfectly legitimate for any third party to take the code and offer its own service, even if it were to choose not to contribute anything back to the project. The same is true of the Apache License used for Apache Kafka (upon which Confluent has built a business), as well as Timescale and (originally) CockroachDB.

That also explains however, why Redis Labs (whose employees are the primary developers of Redis), is keen to protect the differentiating features (in the form of Redis Modules) it has developed from competitors, even if it is happy for customers and partners to have access to the source code. This is especially true as Redis Labs tries to grow its Redis Enterprise Cloud revenue.

While MongoDB was previously using the AGPLv3, there is also some doubt as to whether its licensing change was actually made in response to the actions of cloud providers. Both Microsoft and AWS offer database services that are compatible with the MongoDB API (Azure Cosmos DB and Amazon DocumentDB respectively) but we do not believe either uses MongoDB's code.

Meanwhile MongoDB has built a healthy database-as-a-service revenue of its own. Its Atlas service surpassing $100m in annualized revenue run rate in the company's fiscal 2019, less than three years since its launch, and currently contributes at least 33% of the company's quarterly revenue.

Indeed, in a recent interview with CBRonline.com, MongoDB CEO Dev Ittycheria made it clear that the company wasn't particularly interested in contributions from cloud providers anyway. The switch to the SSPL was less about keeping the cloud providers 'honest' than it was about protecting a commercial opportunity for MongoDB.

It's not hard to see why the database vendors should be so concerned about protecting their potential cloud revenue. Data from 451 Research's Data & Analytics report indicates that enterprises are reducing their use of data platforms deployed on non-cloud, on-premises infrastructure, while increasing their use of data platforms delivered via SaaS, PaaS and IaaS.

Figure 1
Figure 1: Data platform deployment locations, today and in two years Data & Analytics, 1H 2019
Data from 451 Research's Total Data: Data Platforms & Analytics Market Monitor illustrates the impact this trend will have on database revenue in the coming years, with as-a-service consumption expected to account for 35% of operational database revenue in 2023, compared with 16% in 2018, as well as 30% of analytic database revenue in 2023, compared with 14% in 2018.

Figure 2
Figure 2: As-a-service as a % of total database revenue 451 Research, Total Data Market Monitor

Little wonder then that data platform vendors are adjusting their licensing to protect their potential to generate as-a-service revenue. An interesting question, however, is why this trend is primarily occurring with data platform vendors rather than application providers. One answer is that the data platforms are, as the name suggests, platforms, which are used to develop and deploy applications and can therefore drive the consumption of associated cloud services. This is less true of SaaS applications, which tend to be consumed on a stand-alone basis.

Three-tier licensing


Whatever the reason, we do not believe that the companies listed above will be the last to shift toward source available licensing. This is in part due to the increasing importance of as-a-service revenue, but also because the trend is part of the related shift toward three-tier licensing approaches.

Redis Labs, Confluent and Timescale are all typical of this approach in which they are contributing to a core open source community project that is freely available using an OSI-approved open source license while also offering a second tier of functionality that uses source available licensing to drive adoption with enterprise developers, and a third tier of proprietary-licensed functionality that is the focus of production enterprise deployment.

As such, three-tier licensing approaches are designed to meet the needs of multiple constituents of open source users. Locking cloud providers out of using the second and third tiers of functionality (at least not without compensation) is an additional bonus but is not, necessarily, the primary justification for this approach.

Figure 3: Three-tier licensing Figure 3: Three-tier licensing

Tier

Licensing

Target audience

Enterprise functionality (e.g., advanced security, replication, management/monitoring)

Proprietary license

Enterprise operations

Advanced functionality (e.g., multi-model compatibility, language support, connectors)

Source available license

Enterprise developers

Core functionality

OSI-approved open source license

Developers

451 Research

Implications


There is still value for these companies from contributing to true open source community projects (even if they might be the dominant contributor) in terms of driving developer adoption. However, the trend toward source available licensing for differentiating capabilities does have some interesting potential implications for open source and the significance of OSI-approved licenses.

Whereas source available licensing was previously viewed by many as an attempt to circumvent the Open Source Definition, we have seen calls from a few quarters for the OSI to relax that definition in order to approve licenses that discriminate against the cloud service providers.

While we don't expect that to happen any time soon, it is interesting to note when it comes to differentiating functionality (and for MongoDB, the entire product), the vendors discussed above are making a clear decision that the freemium model and source available is more important than a stamp of approval from the OSI.

What is less clear is whether these license changes will result in changes of behavior by the cloud providers. The only indication we've seen of that so far is the announcement by Google at its Cloud Next customer event of strategic partnerships with the likes of Confluent, DataStax, Elastic, InfluxData, MongoDB, Neo4j and Redis Labs.

While the announcement had to be seen in the context of Google's desire to paint its cloud competitors in a bad light, it was a move that was also clearly designed to illustrate Google's willingness to work with, rather than against, these vendors via shared billing, management and support.

There has also been a good deal of finger-pointing between open source and cloud services providers. The release of AWS's Amazon DocumentDB service was certainly a trigger point and was perhaps one strong reason why MongoDB changed its licensing model. But it's important to point out that Amazon DocumentDB does not use MongoDB code but instead is considered 'MongoDB compatible' based on the MongoDB 3.6 API. The same is also true of Microsoft's Cosmos DB, which likewise has API compatibility with MongoDB, as well as Apache Cassandra, and Gremlin (for graph).

From the perspective of a commercial open source vendor, it is perfectly understandable that this might cause some angst, with concern over potential loss of revenue but also dilution of branding and possible market confusion. It could also be argued that these cloud services do much to promote the profile and adoption of the open source software. As such, what will be noteworthy is if and how cloud services based on open source projects might begin to diverge from the formerly open source projects, given the licensing changes.

While the licensing changes are ostensibly designed to encourage the cloud providers to share their modifications and as-a-service enabling software in return for access to code, the true aim is perhaps instead to encourage the cloud providers to share their cash, by entering into proprietary licensing relationships instead.

In the meantime, there is also the potential for these licensing changes to backfire on the data platform vendors. While the standard open source licenses are now generally well understood by enterprise buyers and lawyers, this is the result of many years of education combined with efforts to curb license proliferation and counter fear, uncertainty and doubt. By adopting their own non-standard source available licenses, these vendors are also opening the doors to confusion among enterprise development, purchasing and legal teams.