I just spent the past 6 hours migrating some of my open source projects from Travis CI to GitHub Actions, and I thought I'd pause for a bit (12 hours into this project, probably 15-20 more to go) to jot down a few thoughts.
I am not one to look a gift horse in the mouth. For almost a decade, Travis CI made it possible for me to build—and maintain, for years—hundreds of open source projects.
I have built projects for Raspberry Pi, PHP, Python, Drupal, Ansible, Kubernetes, macOS, iOS, Android, Docker, Arduino, and more. And almost every single project I built got immediate integration with Travis CI.
Without that testing, and the ability to run tests on a schedule, I would have abandoned most of these projects. But with the testing, I'm able to keep up with build failures induced by bit rot over the years and review PRs more easily.
What went wrong with Travis CI?
From the outset, Travis CI was built to integrate with GitHub repositories and offer free open source CI. At one time it was showered with praise on Hacker News and elsewhere for its culture and ethos.
For many new Ansible-based projects, I build my tests in Molecule, so I can easily run them locally or in CI. I also started using GitHub Actions for many of my new Ansible projects, just because it's so easy to get started and integrate with GitHub repositories.
I'm actually going to talk about this strategy in my next Ansible 101 live stream, covering Testing Ansible playbooks with Molecule and GitHub Actions CI, but I also wanted to highlight one thing that helps me when reviewing or observing playbook and molecule output, and that's color.
By default, in an interactive terminal session, Ansible colorizes its output so failures get 'red' color, good things / ok gets 'green', and changes get 'yellow-ish'. Also, warnings get a magenta color, which flags them well so you can go and fix them as soon as possible (that's one core principle I advocate to make your playbooks maintainable and scalable).
For the past few years, the number of issues and PRs across all my GitHub repositories has gone from a steady stream to an ongoing deluge. There are currently over 1,500 open issues across my 194 GitHub repositories, and there's no way I can keep up with all of them.
Initially, I went through each issue in each project's issue queue on a monthly basis (mind you, this was—and is still—done on nights and weekends in my spare time). That slipped to a quarterly task... and has now slipped to only happening for higher-profile projects once or twice a year.
October 2020 Update: This post still contains relevant information, but one update: the
community.kubernetescollection is moving to
kubernetes.core. Otherwise everything's the same, it's just changing names.
The Ansible community has long been a victim of its own success. Since I got started with Ansible in 2013, the growth in the number of Ansible modules and plugins has been astronomical. That's what happens when you build a very simple but powerful tool—easy enough for anyone to extend into any automation use case.
One thing that was not obvious when I was setting up GitHub Actions on the Ansible Kubernetes Collection repository was how to have a 'CI' workflow run both on pull requests and on a schedule. I like to have scheduled runs for most of my projects, so I can see if something starts failing because an underlying dependency changes and breaks my tests.
The documentation for
on.schedule just has an example with the workflow running on a schedule. For example:
# * is a special character in YAML so you have to quote this string
- cron: '*/15 * * * *'
Separately, there's documentation for triggering a workflow on events like a 'push' or a 'pull_request':
There's been a ton of writing about OSS stewardship, sustainability, funding, etc. in the past year, along with story after story of burnout. In this time, I've become very strict in my open source maintainership:
Unless it's generating income, it's for me and I'm not going to spend more than a couple hours a month looking at it—if that.
There are a number of projects that I maintain, which I'm not actively using on money-generating projects. I don't normally touch or even look at the issue queues on these projects until a CI test fails, or unless someone who contributes to my Patreon or GitHub supporters—or who I know from previous contributions—pings me directly about them. Every now and then I'll run through the list of PRs and merge a bugfix or docs fix here and there, but that only happens maybe once per repository per year.
I recently needed to do a quick audit on all my Ansible roles, and the easiest way (since almost every one is on GitHub, and that's the main source of truth I use) was to grab a list of all my GitHub repositories. However, it can be a little tricky if you have hundreds of repos. I'm guessing most people don't have this problem, but whether you do or not, the easiest way to get all of any given user's repositories using the GitHub v3 API is to run the following command:
curl "https://api.github.com/users/geerlingguy/repos?per_page=100&page=1" | jq -r '. | .name'
I noticed on one of the CI servers I'm running that the
.ssh/known_hosts file had ballooned up to over 1,000,000 lines!
Looking into the root cause (I
tailed the file until I could track down a few jobs that ran every minute), I found that there was the following line in a setup script:
ssh-keyscan -t rsa github.com >> /var/lib/jenkins/.ssh/known_hosts
"This can't be good!" I told myself, and I decided to add a condition to make it idempotent (that is, able to be run once or one million times but only affecting change the first time it's run—basically, a way to change something only if the change is required):
if ! grep -q "^github.com" /var/lib/jenkins/.ssh/known_hosts; then
ssh-keyscan -t rsa github.com >> /var/lib/jenkins/.ssh/known_hosts
Now the host key for github.com is only scanned once the first time that script runs, and it is only stored in known_hosts one time for the host github.com... instead of millions of times!
Recently I received an email from an IT student asking the following: I recently submitted a pull request to one of your open source projects on GitHub. What can I do to get this pull request merged? The answer below may sound somewhat like a cop-out, or harsh (especially considering it was to a starry-eyed student trying to dip his or her toes into the waters of open source software contribution)... but I've found that honesty is the best policy, and the best way I can maintain good OSS software is to guard my (limited) time for OSS work vigilantly, and try to not allow sentiment force the merge of any kind of code, no matter how simple/small the change. Here is my reply:
Thanks for the email! I maintain over 100 different open source projects on GitHub, all in my spare time (which can be hard to come by with 3 kids, a full time job at Acquia, and a few other hobbies!). I spend a few hours per quarter on any given project. Some of the more popular projects have dozens of issues, PRs, and new comments that need to be read through to figure out what I need to these few hours on.