Contributing to open source projects such as OpenStack traditionally involves individuals and companies providing code contributions that add new features and fix bugs. For nearly two years, I’ve been running one-off OpenStack clouds for demonstrations and labs at user group meetings across the US, using hardware donated from bare-metal service provider Packet. Six months ago, Packet asked how they could make a larger donation to the community, which brought us on our path to build a community cloud to support OpenStack.
Each day, hundreds of code commits to the OpenStack code base need to be tested as part of the continuous integration system managed by Zuul, "a program that drives continuous integration, delivery, and deployment systems with a focus on project gating and interrelated projects." Each commit runs through a series of tests (or gates) before a human review, and the gates run again before a code merge. All of these gates run across a pool of virtual machines instances (more than 900 instances at peak times) donated by a number of public cloud providers. All of the OpenStack CI is dependent on donated computing resources. The OpenStack Infra team coordinates all of these cloud providers and served as our point of contact for donating these resources.
We set out to build a cloud where all the computing resources were to be dedicated for the OpenStack Infra program. Building out our cloud, we had to meet the minimum requirements set by the OpenStack Infra team: support for a 100 concurrent VM instances, each with 8GB RAM, 8 vCPUs, and 80GB storage. Packet allocated us 11 bare-metal servers and an IPv4 /29 subnet to be used for floating IPs. With the computing and network resources in place, we moved ahead with the OpenStack architecture and implementation.
All the test instances, and the mirror instances, all use ephemeral storage, the cloud was set up without any persistent storage. Although enterprise workloads typically require persistent storage, this isn't required of a cloud dedicated to running continuous integration job instances. While the CI job logs are pulled back off the cloud to a central server, the rest of the CI job is disposed at the end of the test. This allows hardware resources that would have otherwise been allocated to persistent storage services (i.e., Cinder and Ceph) to be put toward compute services (Nova).
Working with the OpenStack Infra team has opened my eyes to the capabilities of Zuul and the frameworks the team has put together. I had the opportunity to catch up with the OpenStack Infra team at the most recent Project Teams Gathering (PTG). They realize that Zuul can put a strain on any cloud and are happy to work through issues that arise. Better still, they run a great set of tools that provide metrics such as failed launch attempts and time to ready, allowing me to identify issues as soon as possible.
A CI system such as Zuul puts an extreme load on a cloud environment as it continuously spins up and down virtual instances. While typical instances might be up for weeks or months, a CI instance through Zuul lives, on average, just a few hours. This means the control plane is always busy stopping and starting services. Through the tools provided by the OpenStack Infra team, we were able to identify performance issues. In the first few months of operations, we quickly realized we had to upsize the control plane to handle the workload and reconfigure the image storage space to handle the disk images created daily by Zuul.
One of the limiting factors of this cloud is the availability of IPv4 addressing. Each test instance requires a floating IP address to communicate back to Zuul. Because we have the compute resources, RAM and CPU, to group the cloud, we intend to start provisioning test instances with IPv6 addresses. Zuul and the OpenStack Infra project both already support IPv6.
Although we’re continuing to improve this community-run cloud, we’re also looking forward to exploring what else we can provide across this donated hardware. Nodepool has driver capabilities to handle resources outside of OpenStack, and we’re interested in using automated bare-metal support. We’re also hoping to extend CI resources to other open source projects through this same Zuul and Nodepool framework.
Setting up and running this cloud has been a rewarding experience, especially working with the OpenStack Infra team and seeing everything they’re doing with Zuul. The knowledge I’ve gained running a cloud to support the OpenStack Infra team has far exceeded my experiences running one-off clouds for user group demonstrations.
If you’re an OpenStack cloud provider (public or private) and have an interest in donating resources to OpenStack, I encourage you to reach out to me or the OpenStack Infra team for more information.
John Studarus will present What we learned building a Zuul CICD Cloud at the OpenStack Summit, November 13-15 in Berlin.
1 Comment