Drone CIJan 15, 2021
A message currently appears (mid-January 2021) at the top of the travis-ci.org website.
“Please be aware travis-ci.org will be shutting down in several weeks, with all accounts migrating to travis-ci.com. Please stay tuned here for more information.”
The transition has not been a smooth one, with long, disruptive delays occurring on existing builds, and lack of clear communication from the company. Many were unaware of the impending change. Some informative posts about the topic are Travis CI’s new pricing plan threw a wrench in my open source works and Extremely poor official communication of the .org shutdown.
The C++ Alliance has decided to implement an in-house CI solution and make the new service available for Boost libraries also.
The first step was choosing which software to use. There are truly a surprising number of alternatives. An extensive review was conducted, including many Continuous Integration services from awesome-ci which lists more than 50. Coincidentally, Rene Rivera had recently done an analysis as well for ci_playground, and his example config files eventually became the basis for our new config files.
The top choices:
From this list, Appveyor and Drone seemed the most promising to start with. Both allow for 100% self-hosting.
The appeal of Appveyor is that the config files are basic yaml, and everything runs in a Docker container. It sounds perfect. However, after experimentation there were a few issues.
- Appveyor was originally designed for Microsoft Windows, with .NET and Powershell being key ingredients. While it can run on Linux, it’s not a native Linux application. Most of the CI testing done by Boost and CPPAlliance runs on Linux.
- Specifically, the Docker experience on Windows is not nearly as smooth as Linux. I encountered numerous complexities when setting up Appveyor Windows Docker containers.
- Bugs in their app.
Due to a combination of those reasons, Appveyor was not the best choice for this project.
Within the first day or two experimenting with Drone, it became clear that this was an excellent CI framework:
- Simple, usable UI
- Easy installation
- Linux and Docker native
- Small number of processes to run
- Integrates with PostgreSQL, MySQL, Amazon S3
- Autoscale the agents
- Badges on github repos
The main drawback with Drone is the absence of “matrix” builds. Their alternative for matrices is jsonnet or starlark, which are flexible scripting languages. I was apprehensive about this point, thinking that the end-user would prefer simple yaml files - exactly like Travis. And, in fact, that is probably the case. Basic yaml files are easier to understand. However, on balance, this was the only noticable problem with Drone, and everything else seemed to be in order. The resulting starlark files do have a matrix-like configuration where each job can be customized.
To discuss Starlark for a moment - it’s a subset of python, and therefore easy to learn. Why would I use Starlark instead of just Python? “Sandboxing. The primary reason this was written is for the “hermetic execution” feature of Starlark. Python is notoriously difficult to sandbox and there didn’t appear to be any sandboxing solutions that could run within Python to run Python or Python-like code. While Starlark isn’t exactly Python it is very very close to it. You can think of this as a secure way to run very simplistic Python functions. Note that this library itself doesn’t really provide any security guarantees and your program may crash while using it (PRs welcome). Starlark itself is providing the security guarantees.”
Running a drone server
Create a script startdrone.sh with these contents:
#!/bin/bash docker run \ --volume=/var/lib/drone:/data \ --env=DRONE_GITHUB_CLIENT_ID= \ --env=DRONE_GITHUB_CLIENT_SECRET= \ --env=DRONE_RPC_SECRET= \ --env=DRONE_TLS_AUTOCERT= \ --env=DRONE_SERVER_HOST= \ --env=DRONE_SERVER_PROTO= \ --env=DRONE_CONVERT_PLUGIN_ENDPOINT= \ --env=DRONE_CONVERT_PLUGIN_SECRET= \ --env=DRONE_HTTP_SSL_REDIRECT= \ --env=DRONE_HTTP_SSL_TEMPORARY_REDIRECT= \ --env=DRONE_S3_BUCKET= \ --env=DRONE_LOGS_PRETTY= \ --env=AWS_ACCESS_KEY_ID= \ --env=AWS_SECRET_ACCESS_KEY= \ --env=AWS_DEFAULT_REGION= \ --env=AWS_REGION= \ --env=DRONE_DATABASE_DRIVER= \ --env=DRONE_DATABASE_DATASOURCE= \ --env=DRONE_USER_CREATE= \ --env=DRONE_REPOSITORY_FILTER= \ --env=DRONE_GITHUB_SCOPE= \ --publish=80:80 \ --publish=443:443 \ --restart=always \ --detach=true \ --name=drone \ drone/drone:1
Fill in the variables. (Many of those are secure keys which shouldn’t be published on a public webpage.)
Then, run the script.
Drone is up and running.
Next, the starlark plugin. Edit startstarlark.sh:
#!/bin/bash docker run -d \ --volume=/var/lib/starlark:/data \ --env= \ --publish= \ --env=DRONE_DEBUG= \ --env=DRONE_SECRET= \ --restart=always \ --name=starlark drone/drone-convert-starlark
and run it:
Starlark is up and running. Finally, the autoscaler.
#!/bin/bash docker run -d \ -v /var/lib/autoscaler:/data \ -e DRONE_POOL_MIN= \ -e DRONE_POOL_MAX= \ -e DRONE_SERVER_PROTO= \ -e DRONE_SERVER_HOST= \ -e DRONE_SERVER_TOKEN= \ -e DRONE_AGENT_TOKEN= \ -e DRONE_AMAZON_REGION= \ -e DRONE_AMAZON_SUBNET_ID= \ -e DRONE_AMAZON_SECURITY_GROUP= \ -e AWS_ACCESS_KEY_ID= \ -e AWS_SECRET_ACCESS_KEY= \ -e DRONE_CAPACITY_BUFFER= \ -e DRONE_REAPER_INTERVAL= \ -e DRONE_REAPER_ENABLED= \ -e DRONE_ENABLE_REAPER= \ -e DRONE_AMAZON_INSTANCE= \ -e DRONE_AMAZON_VOLUME_TYPE= \ -e DRONE_AMAZON_VOLUME_IOPS= \ -p \ --restart=always \ --name=autoscaler \ drone/autoscaler
Start the autoscaler.
Windows autoscaler is still experimental. For now, both Windows and Mac servers have been installed manually and will be scaled individually. Because they are less common operating systems, with most boost builds running in Linux, the CPU load on these other machines is not as significant.
The real complexities appear when composing the config files for each repository. After manually porting .travis.yml for https://github.com/boostorg/beast and https://github.com/boostorg/json, the next step was creating a Python script which automates the entire process.
A copy of the script can be viewed at https://github.com/CPPAlliance/droneconverter-demo
The idea is to be able to go into any directory with a .travis.yml file, and migrate to Drone by executing a single command:
cd boostorg/accumulators droneconverter
The converter ingests a source file, parses it with PyYAML, and dumps the output in Jinja2 templates. The method to write the script was by beginning with any library, such as boostorg/array, and just get that one working. Then, move on to others, boostorg/assign, boostorg/bind, etc. Each library contains a mix of travis jobs which are both similar and different to the previously translated libraries. Thus, each new travis file presents a new puzzle to solve, but hopefully in a generalized way that will also work for all repositories.
Versions of clang ranging from 3.3 to 11 are targeted in the tests. While later releases such as clang-6 or clang-7 usually build right away without errors, the earlier versions in the 3.x series were failing to build for a variety of reasons. First of all, those versions are not available on Ubuntu 16.04 and later, which means being stuck on Ubuntu 14.04, preventing an across-the-board upgrade to a newer Ubuntu. Then, if a standard “base” clang is simultaneously installed, such as clang-7 or 9, this seems to cause other dependent packages and libraries to be installed, which conflict with clang-3. The solution was to figure out what travis does, and copy it. Travis images have downloaded and installed clang-7 into a completely separate directory, not using the ordinary system packages. Then the extra directory /usr/local/clang-7.0.0/bin has been added to the $PATH.
Some .travis.yml files have a “matrix” section. Others have “jobs”. Some .travis files place all the logic in one main “install” script. Others refer to a variety of script sections including “install”, “script”, “before_install”, “before_script”, “after_success”, which may or may not be shared between the 20 jobs contained in the file. Some job in travis were moving (with the mv command) their source directory, which is baffling and not usually permitted in Jenkins or Drone. This must be converted to a copy command instead. “travis_wait” and “travis_retry” must be deleted, they are travis-specific. Many travis variables such as TRAVIS_OS_NAME or TRAVIS_BRANCH need to be set and/or replaced in the resulting scripts. Environment variables in the travis file might be presented as a list, or a string, without quotes, or with single quotes, or double quotes, or single quotes embedded in double quotes, or double quotes embedded in single quotes, or missing entirely and included as a global env section at the top of the file. The CXX variable, which determines the compiler used by boost build, isn’t always apparent and could be derived from a variety of sources, including the COMPILER variable or the list of packages. The list of these types of conversion fixes goes on and on, you can see them in the droneconverter program.
Apple Macs Developer Tools depends on Xcode, with an array of possible Xcode version numbers from 6 to 12, and counting. Catalina only runs 10 and above. High Sierra will run 7-9. None of these will operate without accepting the license agreement, however the license agreement might reset if switching to a newer version of Xcode. The presence of “Command Line Tools” seems to break some builds during testing, however everything works if “Command Line Tools” is missing and the build directly accesses Xcode in the /Applications/Xcode-11.7.app/Contents/Developer directory. On the other hand, the package manager “brew” needs “Command Line Tools” (on High Sierra) or it can’t install packages. A solution which appears to be sufficiently effective is to “mv CommandLineTools CommandLineTools.bck”, or the reverse, when needed.
Drone will not start on a Mac during reboot unless the Drone user has a window console sessions running too. So, Apple is not an ideal command-line-only remote server environment.
Pull requests with the new drone configs were rolled out to all those boost repositories with 100% travis build success rate and 100% drone build success rate, which accounts for about half of boost libraries. The other half of boost libraries are either missing a .travis.yml file, or they have a couple of travis/drone jobs which are failing. These should be addressed individually, even if only to post details about it in the pull requests. Ideally, attention should be focused on these failing tests, one by one, until the jobs attain a 100% build success rate and the badge displays green instead of red. The long tail distribution of edge-cases require individual attention and rewritten tests.
The droneconverter script is being enhanced to also generate Github Actions config files. A first draft, tested on a few boost libraries, is building Linux jobs successfully. Ongoing.