Editor’s Note: As of January 2022, iland is now 11:11 Systems, a managed infrastructure solutions provider at the forefront of cloud, connectivity, and security. As a legacy iland.com blog post, this article likely contains information that is no longer relevant. For the most up-to-date product information and resources, or if you have further questions, please refer to the 11:11 Systems Success Center or contact us directly.
The recent outage with Amazon’s S3 cloud storage facilities has highlighted the risk of relying on a single cloud provider.
To Amazon’s credit, they were quick to acknowledge the problem and get everything working again. However, the outage was significant, and it demonstrated the number of well-known applications that are relying on these kinds of cloud services. While S3 has historically had excellent availability statistics, this outage has made organizations revisit the way that they consume public cloud services.
As with any IT implementation, the availability of services is a trade-off between cost and risk. Traditionally, storage arrays would make use of RAIDed disks to ensure that data would still be available in the event of one or more physical disk failures. With object storage technologies, such as S3, data is spread across multiple disks within a particular availability zone to protect against disk failure (and zone failure). The option is there (at extra cost) to take advantage of replication and sharding over multiple AZs (with the same region) or even three-way replication over multiple regions (geo-replication). It depends on how important that data is to you. If it is data for long-term retention and infrequent access, perhaps you push it out to S3 IA or even Glacier.
So, IT architects are faced with some interesting decisions: Do they take advantage of these higher-value and higher-availability options? Or, should they think about using multiple clouds from different vendors to spread the risk?
Most people would agree that it is unlikely that Amazon, Microsoft, or Google will go bust anytime soon. However, in the ever-changing political landscape around data sovereignty, government access to data, security concerns, etc., there are many considerations. The Big Three are not the only cloud service providers out there. VMware provides a large ecosystem of vCloud Air Network (vCAN) service providers around the world, 11:11 Systems being one of them. For many organizations, there is comfort in the fact that 11:11 runs the same technologies that they are familiar with in their on-premises facilities.
Over the past year or so, there has been a new term used in cloud computing, especially in highly-regulated industries like financial services, and that is: vendor diversity. Prior to the advent and mass adoption of the public cloud, there were a number of instances where customers were faced with big decisions around their IT infrastructures. In some cases, the IT service provider did go bust, and customers had a very short window to get all their equipment out of co-location facilities. In the UK this is sometimes referred to as the 2e2 effect. Another instance is that the IT service providers decided to exit the co-location or managed service business for whatever reason, and again, customers had to make a move much faster and sooner than they’d planned.
So, the guidance that is coming out from industry regulators is quite simple— don’t put all of your eggs in one basket. See, for example, what the FCA has to say on the matter. If you run production services with one provider, we suggest you run DR with another, and maybe your backup with a third.
The consultants behind the Clover Index, the Finance Sector Private Cloud Vendor benchmarking tool, have been highlighting this concern to the market for some years. The Clover Index includes an assessment of a CSP’s ability to provide vendor diversity options to clients in their analysis.
Prior to joining 11:11, one of our employees spent some time as a Cloud Solutions Architect with Microsoft Azure. On one of the projects they worked on, the customer had decided to split their workloads between Azure, AWS, and a couple of VMware vCAN clouds, as well as keeping some VMware workloads running in their own data centers. They decided from the outset to put in an MPLS cloud that would connect their various office locations with the cloud providers, so high-speed connectivity would not be an issue. The initial requirements were all traditional virtual machine-based implementations using Windows and Linux, and they were building clustered, “design for failure” applications that incorporated most of the clouds all the time. Some of the applications even used back-end object storage that stored data in either Azure Blob storage or Amazon S3.
At 11:11, we are now working closely with managed service providers who either have their own VMware-based facilities and are using 11:11 for DRaaS and/or backup or who might be working with another VMware vCAN partner and, again, use 11:11 for DRaaS and/or backup. In this way, the MSP is not tightly coupled to either the production or the DR/backup facility. If the customer chooses to part ways with the MSP, all their “stuff’” does not necessarily need to be moved to another location.
In the near future, we foresee situations where an MSP might be managing traditional VM-based facilities in 11:11 or another vCAN partner while simultaneously having some of the new “cloud native” applications running in AWS, Azure, or GCE. As 11:11 runs in carrier-neutral data centers, often the exact same facilities that are used by the Big Three, it is very easy to provide a high-speed cross-connect between 11:11 and AWS Direct Connect or Azure ExpressRoute.
So, in summary, when deploying services to the cloud, explore all the options available. Are you prepared to “double up” or even “triple up” to get the availability you need? Or, are you prepared to take the risk? It’s “just IT” under the covers, and things will go wrong— no matter how shiny the marketing looks.
Take advantage of multiple cloud vendors for the reasons discussed but not to the extent of making things unmanageable.