YAML best practices for Ansible playbooks - tasks

This post is a follow-up to a recent discussion about YAML formatting for complex Ansible playbook tasks on the Ansible Project mailing list, and will also be appearing as part of Appendix B: Ansible Best Practices and Conventions in my Ansible for DevOps book on LeanPub.

YAML, a simple configuration language

YAML's usage for describing configuration has been increasing rapidly in the past few years, and with the introduction of SaltStack and Ansible, YAML finally made its way into the server configuration management realm as a first class citizen.

YAML is a pretty simple language; it is a human-readable, machine-parsable syntax that allows for complex nested object, list, and array structures, so it is a great fit for a configuration management tool. Consider the following method of defining a list (or 'collection') of widgets:

widget:
  - foo
  - bar
  - fizz

This would translate into Python (using the PyYAML library employed by Ansible) as the following:

translated_yaml = {'widget': ['foo', 'bar', 'fizz']}

And what about a structured list/map in YAML?

widget:
  foo: 12
  bar: 13

The Python that would result:

translated_yaml = {'widget': {'foo': 12, 'bar': 13}}

A few things to note with both of the above examples:

  • YAML will try to determine the type of an item automatically. So foo in the first example would be translated as a string, true or false would be a boolean, and 123 would be an integer. This post doesn't attempt to go further with this exploration, but realize that you might want to explicitly declare strings with quotes ('' or "") to minimize surprises.
  • Whitespace matters! YAML uses spaces (literal space characters—not tabs) to define structure (mappings, array lists, etc.), so set your editor to use spaces for tabs. You can technically use either a tab or a space to delimit parameters (like apt: name=foo state=installed—you can use either a tab or a space between parameters), but it's generally preferred to use spaces everywhere, to minimize errors and display irregularities across editors and platforms.
  • YAML syntax is robust and well-documented. Read through the official YAML Specification and/or the PyYAMLDocumentation to dig deeper.

A basic Ansible playbook

To use Ansible, you only need to know minimal YAML structure. Consider the following simple playbook:

---
# My Ansible playbook.
- hosts: all

  tasks:
    - name: Install foo.
      apt: pkg=foo state=installed
  • The first line above denotes the beginning of some YAML—anything above that line shouldn't be parsed as YAML (but I wouldn't rely on all YAML parsers being so smart... so I generally leave YAML in .yml and add documentation as comments inline with the YAML).
  • The second line is a comment. Comments start with # and can use multiple lines (as long as each line starts with #).
  • The third line begins a list; in this case, a list of plays Ansible should run. Generally, playbooks only affect one set of hosts (in this case, all hosts defined in available/given inventories), but you can add additional plays by starting another new - hosts: [group] section or including another playbook with another play.
  • For the first (and in this playbook, only) grouping of hosts, we define a list of tasks. The first task uses the apt module to install the foo package on a Debian-based system.

All well and good, right? Well, as you get deeper into Ansible and start defining more complex configuration, you might start seeing tasks like the following:

- name: Copy Phergie shell script into place.
  template: src=templates/phergie.sh.j2 dest=/home/{{ phergie_user }}/phergie.sh owner={{ phergie_user }} group={{ phergie_user }} mode=755

The one-line syntax (which uses Ansible-specific key=value shorthand for defining parameters) has some very positive attributes:

  • Typical tasks (like installations and copies) are compact and readable (apt: pkg=apache2 state=installed is just about as simple as apt-get install -y apache2; in this way, an Ansible playbook feels very much like a shell script).
  • Playbooks can be more compact, and more configuration can be displayed on one screen.
  • Ansible's official documentation follows this format, and many existing roles and playbooks use one line for all parameters.

However, as highlighted in the above example, there are a few issues with this key=value syntax, namely you have to:

  • Have a pretty large/widescreen monitor (able to display at least 120 characters comfortably)
  • Use a source control UI that displays output in a very wide display (e.g. not GitHub, GitLab, Gogs, etc.)
  • Read left-to-right
  • Have a diff viewer that easily highlights inter-line differences
  • Not worry about variable types being converted to strings in some situations

I argue that the shorthand syntax falls apart for more complicated, shared playbooks (especially roles), and I have a few ideas to help you make tasks more readable, better for version control software and diffing.

Methods for formatting Ansible tasks in YAML

Following a discussion over on the Ansible Project Google Group on YAML formatting best practices, and also the maintenance of dozens of roles and playbooks, I've finally settled on a few basic guidelines for my playbook tasks, and generally prefer using a multiline syntax rather than shorthand for more complex tasks.

Simple, straightforward tasks - shorthand/one-line (=)

For simpler tasks, I usually stick to the shorthand syntax, using key=value parameters.

- name: Install Nginx.
  yum: pkg=nginx state=installed

For any situation where an equivalent shell command would roughtly match what I'm writing in the YAML, I prefer this method, since it's immediately obvious what's happening, and it's highly unlikely any of the parameters (like state=installed) will change frequently during development.

Complex or 3+ parameter tasks - structured map (:)

For more complex tasks, like the longer template example above, I prefer the following format:

- name: Copy Phergie shell script into place.
  template:
    src: "templates/phergie.sh.j2"
    dest: "/home/{{ phergie_user }}/phergie.sh"
    owner: "{{ phergie_user }}"
    group: "{{ phergie_user }}"
    mode: 0755

A few notes on this syntax:

  • The structure is all valid YAML, using the structured list/map syntax mentioned in the beginning of this post.
  • Strings, booleans, integers, octals, etc. are all preserved (instead of being converted to strings).
  • Each parameter must be on its own line, so you can't chain together mode: 0755, owner: root, user: root to save space.
  • YAML syntax highlighting works slightly better for this format than key=value, since each key will be highlighted, and values will be displayed as constants, strings, etc.

A passable hybrid approach - folded scalars (>)

Another approach that's often used in the wild (I use it in many of my own work) is using the terse key=value parameter syntax as in Ansible's documentation, but splitting values over multiple lines using YAML's folded scalar syntax:

- name: Copy Phergie shell script into place.
  template: >
    src=templates/phergie.sh.j2
    dest=/home/{{ phergie_user }}/phergie.sh
    owner={{ phergie_user }} group={{ phergie_user }} mode=755

In YAML, the > character denotes a folded scalar, where every line that follows (as long as it's indented further than the first line) will be joined with the line above by a space. So the above YAML and the original template example will function exactly the same.

This syntax allows arbitrary splitting of lines on parameters, but it also doesn't preserve numeric, boolean, and other non-string types for values.

I have started to phase out this approach in my own work (and will be changing older examples in my book, Ansible for DevOps) in favor of the structure map style above. The only place where I can see the folded scalar approach being more helpful is for certain uses of the command and shell modules, where you need to pass in extra options:

- name: Install Drupal.
  command: >
    drush si -y
    --site-name="{{ drupal_site_name }}"
    --account-name=admin
    --account-pass={{ drupal_admin_pass }}
    --db-url=mysql://root@localhost/{{ domain }}
    chdir={{ drupal_core_path }}
    creates={{ drupal_core_path }}/sites/default/settings.php

Typically, if you can find a way to run a command without having to use creates and chdir, or very long commands (which are arguably unreadable either in single or multiline format!), it's better to do that instead of this monstrosity.

But sometimes the above is as good as you can do to keep unweildy tasks sane.

Summary

As I mentioned in the mailing list, one of Ansible's strengths is its flexibility; any one of the above methods of task definition is equally valid and functional. I have chosen to use the shorthand syntax for simpler tasks, and structured maps for more complex tasks, since they feel more maintainable and readable to my eye.

If you prefer to use a different syntax or conventions, that's not wrong. As with most programming and technical things, being consistent is more important than following a particular set of rules, especially if that set of rules isn't universally agreed upon.

I would encourage the use of one of these styles as a general rule, though, and also communicate as much to the people who will be working with you on your playbooks. Nothing's worse than debugging a complicated role and having to visually parse through three different YAML styles!

Comments

It would be nice to get these programmed into ansible-lint.

Good article!
Have you wrote any other best practices about Ansible?

I wrote a whole book on it :) See: https://www.ansiblefordevops.com/

I'm also going to be presenting on Ansible roles and best practices, in particular, at AnsibleFest SF.

Hi, I have a case like this:

users:
  - { name: "bob", group: "sysadmins"}
  - { name: "jessie", group: "developers"}

Could you suggest me how to a more structured YAML list/map?
Thanks a lot.

That's pretty precise and correct. The only change would be—if you want it to be formatted slightly more legibly, you can use the YAML object syntax:

users:
  - name: bob
    group: sysadmins
  - name: jessie
    group: developers

What if I have > 100 users, and more settings per user. Is it possible to compose the users array from including loose yml files? Each whould consist of a name and group variable, just like your example.

That is possible; I would do some performance testing to make sure Ansible doesn't consume way too much memory while reading in all the YAML files, but you would basically need to have a few tasks: one to read the directory listing (so you have an array of all the filenames to look up), then (in a with_items loop) another to read in the file with include, then a third (also in the loop) to set_fact and add to an array of all user data... something like that.

There are probably a few other ways this can be architected, and it might be better to do it differently; I'm not sure.

Hi Jeff,

I want to start by thanking you for what you are doing. In these harsh times we are living you are doing a great thing by teaching Ansible and essentially helping people gaining a new skill. I have recently started going into Ansible because I lost my job and wanted to learn a new skill or at least build the foundation as learning Ansible is an ongoing process.

For people like me who do not have prior programming experience it would be good if you could please give a tip or two about how to set up a common text editor like e.g VSCode or whatever it is that you use, what plugins to use which are helpful when writing playbooks.

One of the most common/often problems that I encounter after I write a playbook and execute it is the Syntax error and it is mainly caused by indentation meaning I either have too many spaces or none after which I spend a lot of time trying to figure out how to align everything with proper spacing, mostly it's a trial and error process. I modify something in the playbook (add or remove some spaces) then I execute the playbook in the terminal and so on..

If you could give me a tip or two about this it would help a lot.

Thank you for what you are doing.

You're welcome! And I'm so glad you're finding my work helpful in learning your new skill.

Syntax issues commonly trip up beginners, and so there are a few basic recommendations I have:

  1. Make sure you configure your editor to use spaces (not tabs) for indentation, and maybe consider making whitespace characters visible (I have it set in my editor to show them on highlight).
  2. Make sure your YAML files are displayed using syntax highlighting (this helps you spot simple errors pretty frequently).
  3. Consider using an extension like TrailingSpaces (for Sublime, other editors have similar plugins) to highlight extra spaces in places they shouldn't be.
  4. Use a linter like yamllint and/or ansible-lint in your workflow. You could even integrate yamllint directly with your code editor if it has the right lint plugin (e.g. for Sublime Text there's SublimeLinter-contrib-yamllint).

Over time YAML just becomes second nature—but you still bump into weird edge cases or 'brain farts' from time to time, I know I do!